Context
When we study a real-life network, we usually refer to an underlying mathematical object, called a graph. We use the terms network and graph interchangeably throughout the problem sets.
- Graph
- A graph is defined as a set of nodes, defining the locations, entities or elements of the graph, and a set of edges, representing the relationships between two nodes. Nodes are also called vertices, and edges are sometimes called links.
- Graph Theory
- The field of mathematics concerned with the study of graphs is called graph theory. Network Science relies heavily on graph theory, as every network is represented by a graph.
A graph with four nodes and four edges.
Both nodes and edges can have many different properties. One of the most fundamental properties of graphs is the number of edges connecting to each node. This gives a rough measure of importance, size, or connectedness. These concepts will become clearer as we delve deeper into Network Science and Graph Theory.
- Degree
- The number of edges connecting to a node
u
is called the degree ofu
. - Adjacent or Neighbors
- If two nodes
u
andv
are connected by an edge, they are called adjacent or neighbors. Ifu
andv
are adjacent, we writeu ~ v
.
Challenge
For now, we choose the simplest possible way to represent a graph, where each node is represented by a single integer, and each edge by a pair of two nodes. In this way, we can represent a graph in a file by defining the number of nodes, edges, and each edge in a separate line.
For this challenge, you need to read a file in this format and find the node with the most edges (or highest degree).
The input is a file in the aforementioned format, where the first line
contains two integers, \(n\) and \(m\), defining the number of nodes and edges,
respectively. The next \(m\) lines each contain two integers, representing
two nodes that are joined by an edge. If there are \(n\) nodes, they are
assumed to be labeled by the integers from 0
to n - 1
.
Output the node with the highest degree.
Sample Input
4 4
0 1
0 2
1 3
3 0
Sample Output
0
Solution
The solution to this challenge is hosted on Github.
Expansion Questions
-
We've defined one measurement of the "importance" of a node, its degree. What might be other ways of measuring the importance of a node in a network?
-
Consider the social network of your immediate Facebook friends. In this graph, every node stands in for one of your friends, and there is an edge between two nodes if the corresponding persons are friends with each other.
- Estimate the number of nodes and edges in this network.
- What is your degree in this network?
- Consider the extended network of all Facebook users. Estimate the number of nodes and edges. (Doesn't have to be an exact number.)
-
Discuss the possibility of measuring the "importance" of an edge in a network. Can you come up with a proposal for how to measure it?
Answers
-
Roughly speaking, the importance of a node is called centrality. There are many ways to measure centrality, such as betweenness centrality1, closeness centrality2, eigenvector centrality3, degree centrality4 (or degree), and Katz centrality5. We will see each of them in turn in later challenges.
-
According to one source6, young adult Facebook users have a median of \(300\) friends. For the sake of example, let's say I have exactly \(300\) friends on Facebook. Since I am connected to each one of them, my degree in this network is \(300\), while the total number of nodes is \(301\).
Now, estimating the number of edges can pose a challenge. Consider the nature of your Facebook friends. Most social networks can be partitioned in three big groups: close friends, family, and acquaintances. Now, most if not all of my close friends know each other, and the same is true for members of my family. Some of my friends know some of my family too. In other words, these two groups are very tightly knit, or dense, within themselves, while the connections between them are few, or sparse. And what about acquaintances? Well, some of them know some of my friends and/or family and some don't. In other words, it's a mixed bag of nuts.
Depending on the size of these groups, the total number of edges in the graph can vary considerably. Let's say I have \(35\) really close friends, all of whom know each other (from school, say), \(15\) family members, all of whom know each other, and a long list of \(250\) acquaintances, who know some of my other connections to varying degrees, say \(10\) on average. That's \(595\) edges within my group of friends, \(105\) in my family plus \(1250\) edges from acquaintances, for a total of \(1950\). In contrast, if I have a big family, say \(30\) members, all other things being equal, the total number of edges would be \(2280\). And if I was more popular in school, it might get to more than \(3000\).
To estimate the total size of all of Facebook is a different matter. The problem with estimating real-life networks is that even though it is possible to gather the data in principle, it can be too large for meaningful analysis. Furthermore, There are Facebook "users" who are not human, e.g. company or automated accounts, and there are also a number of users with more than one account. In these cases, we need careful and sophisticated analysis. Some sources7\(^,\)8, for example, estimate the number of users just shy of one billion, and the number of edges in the tens of billions.
-
Just like nodes, edges are "first-class citizens" in networks. Since they define the relationships between nodes, they are as important, if not more. Since every edge is connected to only two edges at most, there isn't a counterpart for node degree in the case of edges. However, there are many ways to measure edge centrality9\(^,\)10\(^,\)11.