Graph Models - Bernhard Pfann, CFA

A graph model allows us to assign probabilities to a network, i.e. how likely is it to see a certain network realization, given the underlying model of how it is constructed. Also generative models allow us to simulate how a network came to its existence. In reality we often observe the following network properties, which we would like to see in our network models as well: - Large clustering coefficient - Power law degree distribution - Low network diameter ## Erdös-Renyi Model - *Simple network model:* We fix the number of nodes and edges, and place these edges at random. - *Erdös-Renyi model:* We fix the number of nodes $n$ and the probability $p$ of an edge between any two nodes. |Property|Formula|Explanation| |---|---|---| |Expected number of edges|${n \choose 2}*p$|$n \choose 2$ expressed the possible bilateral combinations for $n$ nodes.| |Expected degree of node $i$|$(n-1)*p$|$(n-1)$ is the number of all possible neighbors, each with probability $p$ to have an edge.| |Degree distribution|$\text{Binomial}(n-1,p)$|All edges of a node are independent of each other. Therefore having edges is like multiple coin flips with probability $p$, which results in a Binomial distribution.| Dependent on the chosen value for $p$ we can expect different “phase transitions” as $n \to \infty$: - $S_{\max}:$ Number of nodes of the largest network component. - $c:$ Average node degree. | Parameter | Convergence Statement | Explanation | | ----------------------------------- | ------------------------------------------ | -------------------------------------------------------------------------------- | | $p<\frac{1}{n}$ | $\mathbf P(S_{\max} \ge c \log (n)) \to 0$ | There are many small components. The probability of a large component goes to 0. | | $\frac{1}{n} <p < \frac{\log n}{n}$ | $\mathbf P(S_{\max}=cn) \to 1$ | The probability of a giant component of size $cn$ goes to 1. | | $p>\frac{\log n}{n}$ | $\mathbf P(S_{\max}=n) \to 1$ | The probability of a fully connected graph goes to 1. | **Strength and Weaknesses:** | Argument | Description | | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Pro | Low diameter | | Contra | Edge distribution: The emergence of hub-nodes is a phenomena that is often observed in reality. Due to its setup, this pattern is very unlikely for Erdös-Renyi Models (i.e. the Binomial degree distribution does not have fat tails). | | Contra | Clustering coefficient: Due to the independence between edges, the clustering coefficient (closed triangles divided by connected triples) is low. | ## Configuration Model In the Configuration model we can actively choose our preferred degree distribution (e.g. power law). To construct the graph, we put the number of edges as stubs to each node. Then we connect a stub to any other stub at random, one after the other. ![[configuration-model.jpeg|center|400]] During this process the following could occur: - Multiple edges between the same nodes (multi-edges) - Stubs connecting to other stubs of the same node (self-loop) However one can show that the number of multi-edges and self-loops is a constant as $n \to \infty$. **Strength and Weaknesses:** | Argument | Description | | -------- | ----------------------------------------------------------- | | Pro | Degree distribution can be chosen as parameter of the model | | Contra | Low clustering coefficient (order of $\frac{1}{n}$) | | Contra | Not a generative model | ## Preferential Attachment Model We assume a model with an average node out-degree of $c$ (will be kept constant over generation process). A new node $i$ is added to the graph with (on average) $c$ outgoing edges. - The target nodes $j$, which these edges are pointing to, are sampled in proportion to their respective in-degree $k_j$. - We need to add some $\beta$ to every nodes’ in-degree, since any newly added node starts with $0$ in-degree and will never ever get a neighbor pointing to them. $ \mathbf P(\text{edge from } i \to j):=p_{ij}=\frac{k_j+ \beta}{\sum_j\big(k_j+\beta\big)} $ One can show that this model has a power law degree distribution. $ p \propto k^{-\alpha} \quad \text{where} \quad \alpha=2+ \frac{\beta}{c} $ >[!note:] >This is a generative model, as we can explore the generative build-up of the network as one node gets added after the next. **Strength and Weaknesses:** | Argument | Description | | -------- | --------------------------------------------------------------------------- | | Pro | Power law degree distribution | | Pro | Generative model | | Contra | Small clustering coefficient | | Contra | Leads to acyclic network (depends on the application if this is a drawback) | ## Small-World Model So far, all models have shown a clustering coefficient which is lower than usually observed in nature. The Small-World model overcomes this by starting with a circulant graph, where the nodes are arranged into a circle and each node is connected to $c$ of its closest neighbors. ![[small-world-model.png|center|400]] In this configuration we artificially create a high clustering coefficient. However the network diameter (longest path) is unrealistically long. We overcome this by re-wiring edges at random with a small probability $p$. While only a few edges get re-wired, the diameter decreases dramatically. **Strength and Weaknesses:** | Argument | Description | | -------- | -------------------------------- | | Pro | High clustering coefficient | | Pro | Low diameter | | Contra | No power law degree distribution |