Context
To perform any kind of research, it is always a good idea to keep some simple models at hand. These simple models can deviate from real-life data, but their utility comes from the fact that they are tractable and easy to understand. They are also useful as statistical null-models, against which to test our hypotheses.
In Network Science, when we uncover some property of a network and we want to test whether or not this property truly comes from the topology of the graph, and did not appear purely at random, we compare our network to a random sample of networks. If the networks generated at random also have the property, then it's likely that our network also has it by chance, and not as the product of its structure. This is why we need to generate random graphs.
- Random Graph
- A random graph is a graph that was generated by some stochastic process. In Mathematics, the term random graph is also used to refer to the ensemble of all such graphs that could possibly be generated by a random process. That is, the process itself embodies the random graph, instead of just one realization of such process.
One of the simplest ways to generate a random graph is to start with a fixed amount of nodes, \(n\). Then, we generate the edges at random, one by one, for example, by tossing a coin. Heads, we put down the link, tails, we don't. In this way, we perform a random procedure for each edge, and the end result is a graph generated at random.
Now, to have more control over the kind of graph that is generated, instead of tossing a coin, we can choose any other random process. In this case, we will focus on the process of drawing a number from \(0\) to \(1\). If the number is less than a predefined value of \(p\), we add the edge to the network. The graphs generated by such a process are the most common random graphs in Network Science, and are called the Erdos-Renyi random graph, after the mathematicians who studied them.
- Erdos-Renyi Random Graph
- The Erdos-Renyi Random Graph refers to the process of fixing the number of nodes \(n\) and the individual edge probability, \(p\), and drawing one edge at a time, with probability \(p\). Because it depends on two parameters, it is sometimes denoted \(G(n, p)\). This random graph model is so common in Network Science that it is often referred to as just random graph.
Challenge
This time, we will generate our own Erdos-Renyi random graphs.
The input is a file with one line, containing two numbers, \(n\) and \(p\), the number of nodes and the individual edge probabilities. Write a function that generates a random graph.
The output will be one realization of the Erdos-Renyi process. You can choose to show it as a picture using the visualization framework of your choice (we recommend NetworkX for python), or just printing the adjacency matrix.
Sample Input
100 0.15
The output is your choice of visualization for a network.
Solution
The solution to this challenge is hosted on Github.
Expansion Questions
The following questions require knowledge of elementary probability. Formulas should be expressed in terms of parameteres \(n\) and \(p\), where appropriate.
In a Erdos-Renyi graph,
-
what is the expected number of edges?
-
what is the expected degree of a node?
-
what is the expected clustering coefficient?
Answers
-
Since each edge follows a Bernoulli distribution with paramter \(p\), then the total number of edges, \(m\), follows a Binomial distribution \(m \sim Bin(\frac{n (n-1)}{2}, p)\). Thus, the expected value of \(m\) is \(\frac{n (n-1)}{2} p\).
-
Similarly, the degree of a node will be distributed as \(Bin(n - 1, p)\), with expected value \((n - 1) p\).
-
The clustering coefficient of node
i
is equal to the number of edges between its neighbors, which we call \(m_i\), divided by the total number of possible edges between them. Thus, if \(k_i\) isi
's degree, we have$$ k_i \sim Bin(n-1, p) \\ m_i \mid _{k_i} \sim Bin(\frac{k_i (k_i - 1)}{2}, p) \\ c_i = \frac{2 m_i}{k_i (k_i - 1)} $$Now, we can compute the conditional expectation of the clustering coefficient given the degree,
$$\begin{align*} \mathbf{E}[c_i \mid k_i] &= \sum_d \mathbf{E}[\frac{2 m_i}{d (d - 1)} \mid k_i = d] \mathbf{P}(k_i = d) \\ &= \sum_d \frac{2}{d (d - 1)} \mathbf{E}[m_i \mid k_i = d] \mathbf{P}(k_i = d) \\ &= p \sum_d \mathbf{P}(k_i = d) \\ &= p \\ \end{align*} $$Where we have used the fact that the expected value of a \(Bin(\frac{d (d - 1)}{2}, p)\) random variable is \(p \frac{d (d - 1)}{2}\).
We finish by observing that
$$ \mathbf{E}[c_i] = \mathbf{E}[\mathbf{E}[c_i \mid k_i]] = p. $$