• 📖 Cover
  • Contents

Chapter 5: Causal Inference and Peer Effects on Networks

Chapter Introduction

Consider the following observation, which appears in dozens of peer-reviewed public health studies: adolescents whose close friends smoke are substantially more likely to smoke themselves. The correlation is robust, large in magnitude, and replicates across countries, time periods, and measurement strategies. The instinctive interpretation is causal — friends cause each other to smoke, through social norms, direct encouragement, or simply the mechanics of spending time together around cigarettes.

But there is a problem. The same data pattern is consistent with an entirely different mechanism. Teenagers who are already inclined to smoke — because of personality, family background, neighborhood, or stress — also tend to befriend each other. They select into the same social circles not because their friends influenced them but because they were already alike before the friendship formed. If this is true, then the observed correlation between a teenager’s smoking and her friends’ smoking tells us nothing about peer influence. It reflects the pre-existing similarity of people who choose each other as friends.

And there is a third explanation. The school environment, the neighborhood, the local economy, the cultural moment — all of these shape both who becomes friends with whom and who smokes. Two students may be correlated in their smoking not because either influenced the other but because both are responding to the same external pressure.

These three explanations — (a) peer influence, (b) homophily or selection, and (c) confounding context — were articulated with unusual clarity by Cosma Shalizi and Andrew Thomas in a 2011 paper that shook the quantitative social science community. Their argument was not merely that these explanations are hard to distinguish empirically. Their argument was that, without randomization or a compelling instrument, they are in principle unidentifiable from cross-sectional or even panel data on a single network. The empirical social network analysis literature spent twenty years learning how to take this identification problem seriously, and the methods it developed form the content of this chapter.

The stakes are high well beyond public health. When Facebook’s 2012 emotional contagion experiment attempted to show that emotional states spread through the News Feed, it ran directly into the same identification problem: did users exposed to more positive posts feel more positive because of peer influence, or because positive-posting users tend to cluster with other positive users, or because some unobserved third factor drove both their friends’ positivity and their own? When economists study whether being assigned to a work team with high-ability colleagues raises your own productivity, they must rule out the possibility that firms assign similar workers together. When central banks study whether a country’s financial crisis spread to its trading partners through the banking network, they must distinguish true contagion from common exposure to the same global shock.

This chapter develops the econometric toolkit for making sense of these questions. The problems are genuine, the solutions are partial, and intellectual honesty requires keeping both facts in view simultaneously.

What you will learn

This chapter develops five interlocking ideas. Manski’s reflection problem exposes the fundamental identification failure in the most natural regression approach to peer effects. Bramoullé–Djebbari–Fortin identification shows how network heterogeneity — specifically, friends-of-friends who are not friends — resolves the reflection problem by providing a natural instrument. SUTVA violations in network experiments explain why standard randomized experiments give biased estimates whenever treatment of one unit affects outcomes of other units. Aronow–Samii exposure mapping provides the correct experimental estimator once we acknowledge that units receive different effective treatments depending on their position in the network. Randomized saturation designs extend this logic to field experiments where village-level randomization can separately identify direct treatment effects and spillovers. A mini case study at the end compares all three estimators on a simulated network experiment, making the practical differences concrete.


Table of Contents

  1. The Three Competing Explanations
  2. Manski’s Reflection Problem
  3. Bramoullé–Djebbari–Fortin Identification via Network Structure
  4. The SUTVA Violation: Spillovers in Experiments
  5. Aronow–Samii Exposure Mapping
  6. Randomized Saturation Designs
  7. Homophily versus Influence — The Fundamental Confound
  8. Modern Approaches
  9. Mini Case Study: Comparing Estimators on a Stochastic Block Model

The Three Competing Explanations

Observation and its ambiguity

Begin with a dataset. You have observed \(n\) individuals in a social network. For each individual \(i\) you know: (a) whether she smokes (\(y_i \in \{0, 1\}\)); (b) her set of friends \(N(i)\); and (c) a vector of individual characteristics \(x_i\) (age, gender, parental income, school). You compute, for each individual, the smoking rate of her friends: \(\bar{y}_{N(i)} = \frac{1}{|N(i)|} \sum_{j \in N(i)} y_j\). You run a regression and find a large, positive coefficient on \(\bar{y}_{N(i)}\).

What have you found? Three mechanisms are consistent with this coefficient being positive.

Mechanism A: Endogenous peer effects (social influence). Individual \(i\)’s behavior is causally affected by her peers’ behavior. If Alice’s friends smoke, Alice is directly caused to smoke more — through exposure, social norms, perceived acceptability, or simple imitation. This is the causal peer effect that policy-makers want to quantify, because it implies that reducing smoking in one person will reduce smoking in her social network.

Mechanism B: Correlated unobservables (homophily and selection). Friends tend to be similar because similar people select into friendships. If Alice smokes, she is more likely to befriend other smokers — not because they influenced each other after the link formed, but because the friendship itself was formed on the basis of pre-existing similarity. The correlation in outcomes is real but carries no causal content. Policy implications are completely different: reducing Alice’s smoking does nothing to reduce her friends’ smoking, because the causal channel does not exist.

Mechanism C: Contextual peer effects and common shocks. The correlation reflects shared exposure to the same environment. Alice and her friends go to the same school, live in the same neighborhood, are exposed to the same social norms and economic conditions. The school or neighborhood drives both the friendship formation and the behavior. This produces correlation in \(y_i\) and \(\bar{y}_{N(i)}\) even in the absence of any within-network causal channel.

Shalizi and Thomas (2011) — whose paper carries the disquieting title “Homophily and Contagion Are Generically Confounded in Observational Social Network Data” — proved that these three mechanisms are generically unidentifiable from observational data on a single network unless very strong parametric restrictions are imposed. The word “generically” is important: it means the confounding is not a special case or a technical detail, but the default condition.

The Christakis–Fowler controversy

In a series of high-profile papers published between 2007 and 2013, Nicholas Christakis and James Fowler used longitudinal data from the Framingham Heart Study to argue that obesity, smoking cessation, happiness, and loneliness all spread through social networks with causal peer effects. Their methods controlled for individual fixed effects and used lagged peer outcomes as instruments. Shalizi and Thomas (2011) and subsequent critics showed that these methods do not distinguish homophily from influence: an individual’s fixed effect may be correlated with her network position, and lagged peer outcomes may proxy for shared unobserved time trends rather than causal influence. The debate remains unresolved in the literature and serves as a permanent reminder that clever regression designs, however sophisticated, do not substitute for a credible identification strategy.

The honest starting point is therefore not “can I find a significant coefficient?” but “can I credibly claim that the coefficient I find identifies the causal mechanism I claim to be studying?” The rest of this chapter is about the conditions under which the answer to that question is yes.


Manski’s Reflection Problem

The linear-in-means model

The most widely used model of peer effects in economics is the linear-in-means specification, introduced by Charles Manski in a 1993 paper that has shaped every subsequent empirical study in this literature. The model is elegant and intuitive, and its failure is equally instructive.

Consider a population partitioned into reference groups (classrooms, villages, firms). Within each reference group \(g\), the outcome of individual \(i\) is given by

\[ y_i = \alpha + \beta \, \bar{y}_{-i}^{(g)} + \gamma' x_i + \delta' \bar{x}_{-i}^{(g)} + \epsilon_i \tag{7.1} \]

where:

  • \(\bar{y}_{-i}^{(g)} = \frac{1}{n_g - 1} \sum_{j \neq i, j \in g} y_j\) is the mean outcome of \(i\)’s peers (excluding \(i\) herself) — the endogenous peer effect,
  • \(x_i\) is a vector of individual characteristics,
  • \(\bar{x}_{-i}^{(g)} = \frac{1}{n_g - 1} \sum_{j \neq i, j \in g} x_j\) is the mean characteristics of \(i\)’s peers — the contextual (exogenous) peer effect,
  • \(\beta\) is the coefficient of interest: the endogenous social multiplier,
  • \(\delta\) captures contextual effects of the peer group’s characteristics on the individual’s outcome,
  • \(\epsilon_i\) is an idiosyncratic error, uncorrelated with \(x_i\) and \(\bar{x}^{(g)}\).

This is a sensible-looking regression. The coefficient \(\beta\) is exactly what we want to identify. A finding \(\hat{\beta} > 0\) would say: a one-unit increase in the mean outcome of my peers causes my outcome to rise by \(\beta\) units.

The reflection problem

The trouble arises from the endogeneity of \(\bar{y}_{-i}^{(g)}\). In a reference-group model, \(i\) and her peers are all simultaneously determined by the same system of equations. Because every member of the group influences every other member, the mean outcome \(\bar{y}^{(g)}\) satisfies its own fixed-point equation derived by averaging (7.1) across members of group \(g\).

Averaging equation (7.1) across all \(n_g\) members of group \(g\):

\[ \bar{y}^{(g)} = \alpha + \beta \, \bar{y}^{(g)} + \gamma' \bar{x}^{(g)} + \delta' \bar{x}^{(g)} + \bar{\epsilon}^{(g)} \tag{7.2} \]

(using the approximation \(\bar{y}^{(g)}_{-i} \approx \bar{y}^{(g)}\) for large groups). Solving for the group mean:

\[ \bar{y}^{(g)} = \frac{\alpha + (\gamma + \delta)' \bar{x}^{(g)} + \bar{\epsilon}^{(g)}}{1 - \beta} \tag{7.3} \]

Substituting (7.3) back into (7.1), and simplifying:

\[ y_i = \frac{\alpha}{1 - \beta} + \gamma' x_i + \frac{\gamma + \delta}{1 - \beta}' \bar{x}^{(g)} + \text{error terms} \tag{7.4} \]

Equation (7.4) is the reduced form: the outcome \(y_i\) depends on individual characteristics \(x_i\) and the group mean characteristics \(\bar{x}^{(g)}\), but the parameters \(\beta\) and \(\delta\) appear only as the composite \((\gamma + \delta)/(1 - \beta)\). The endogenous peer effect \(\beta\) and the contextual peer effect \(\delta\) are not separately identified: there are infinitely many values of \((\beta, \delta)\) consistent with any observed regression of \(y_i\) on \(x_i\) and \(\bar{x}^{(g)}\).

Manski called this the reflection problem: the individual outcome reflects the group mean, and the group mean reflects the individual outcomes, creating a circularity that makes the structural parameters unidentifiable from within-group variation alone.

Tip

An intuitive way to see the reflection problem: in the reference-group model, every member of the group observes the same \(\bar{y}^{(g)}\). There is no cross-sectional variation in the endogenous peer effect within a group, only across groups. Across groups, \(\bar{y}^{(g)}\) varies — but it varies because \(\bar{x}^{(g)}\) varies (the group has different characteristics), and the parameter \(\delta\) already captures the direct effect of group characteristics. The endogenous effect \(\beta\) and the contextual effect \(\delta\) are therefore perfectly collinear in a world where all members face the same group mean.

Live simulation: OLS recovers only the composite

The following cell simulates data from the linear-in-means model with known structural parameters \(\beta\) and \(\delta\), then runs OLS and shows that the recovered reduced-form coefficient is a composite that cannot be disentangled without additional structure.

The output demonstrates the core lesson precisely. The reduced-form OLS correctly estimates the composite coefficient, but this composite conflates the endogenous effect \(\beta\) and the contextual effect \(\delta\) in a ratio that cannot be decomposed without further restrictions. The naive structural regression, which mechanically includes \(\bar{y}_{\text{peers}}\) on the right-hand side, recovers a biased estimate of \(\beta\) because \(\bar{y}_{\text{peers}}\) is endogenous — it is jointly determined with \(y_i\) through the system of equations that defines the equilibrium.

In practice

This is not merely a technical econometric complaint. The reflection problem has direct policy consequences. Suppose you are a school district evaluating a program to reduce alcohol use among students. You observe that students whose peers drink more tend to drink more themselves. If \(\hat{\beta} = 0.4\) is a credible causal estimate, then reducing one student’s drinking by one unit creates a social multiplier: the total reduction in average drinking is \(1/(1 - 0.4) = 1.67\) times the direct effect. But if \(\hat{\beta}\) is actually a composite of homophily and contextual effects, the multiplier does not exist. Spending based on a multiplier that is not real is pure waste. The difference between \(\beta = 0\) (no peer effects) and \(\beta = 0.4\) (moderate peer effects) can be the difference between a program that works and one that does not.


Bramoullé–Djebbari–Fortin Identification via Network Structure

Beyond the reference group

Manski’s analysis assumed that every member of a group interacts with every other member — the reference-group model is a complete graph within each group. Bramoullé, Djebbari, and Fortin (2009) — hereafter BDF — made a breakthrough observation: when the network is incomplete and heterogeneous, the reflection problem disappears. The key geometric ingredient is the existence of intransitive triads: pairs of nodes that share a common neighbor but are not themselves connected.

Consider three individuals: Alice, Bob, and Carol. Alice and Bob are friends. Bob and Carol are friends. But Alice and Carol are not friends. This triad is intransitive. From Carol’s perspective, Bob is a direct peer. From Alice’s perspective, Carol is a peer-of-a-peer — someone who influences Carol and therefore (through Carol) has an indirect effect on Alice, but who is not in Alice’s direct reference group.

This asymmetry is the source of identification. Carol’s characteristics \(x_C\) affect Alice’s outcome only indirectly, through the channel \(x_C \to y_C \to y_{\text{Alice}}\). They do not directly affect Alice’s outcome because Alice and Carol are not connected. This exclusion restriction — shared contextual variables of non-friends who are connected through a common friend — provides the instrumental variable needed to identify \(\beta\).

Network notation and the BDF estimator

Rewrite the linear-in-means model in matrix form. Let \(G\) be the \(n \times n\) adjacency matrix of the network (with \(G_{ij} = 1\) if \(i\) and \(j\) are friends and 0 otherwise), normalized to row-stochastic form: \(\tilde{G}_{ij} = G_{ij} / d_i\) where \(d_i\) is the degree of node \(i\). The model (7.1) becomes, in the general network setting:

\[ \mathbf{y} = \alpha \mathbf{1} + \beta \tilde{G} \mathbf{y} + X \gamma + \tilde{G} X \delta + \boldsymbol{\epsilon} \tag{7.5} \]

where \(\mathbf{y}\) is the \(n\)-vector of outcomes, \(X\) is the \(n \times k\) matrix of individual characteristics, and \(\tilde{G}\mathbf{y}\) is the vector of peer mean outcomes. The contextual effect \(\tilde{G} X \delta\) is the vector of peer mean characteristics.

Now consider the matrix \(\tilde{G}^2 = \tilde{G} \cdot \tilde{G}\). The \((i,j)\) entry of \(\tilde{G}^2\) is positive if and only if there is a path of length 2 from \(i\) to \(j\) — that is, \(i\) and \(j\) share at least one common friend. Critically, when the network has intransitive triads, \(\tilde{G}^2\) has positive entries in positions \((i,j)\) where \(\tilde{G}\) has zero entries: \(i\) and \(j\) are connected at distance 2 but not at distance 1.

The BDF identification result is that the matrices \(\mathbf{1}\), \(\tilde{G}\), and \(\tilde{G}^2\) are linearly independent (as operators on the column space of \(X\)) if and only if the network is neither complete nor the union of complete components. This linear independence means that \(\tilde{G}^2 X\) — the characteristics of friends-of-friends — is not a linear combination of \(X\) and \(\tilde{G} X\). Therefore \(\tilde{G}^2 X\) can serve as an instrument for \(\tilde{G} \mathbf{y}\).

The IV estimator works as follows:

  1. First stage: regress \(\tilde{G}\mathbf{y}\) on \(X\), \(\tilde{G} X\), and \(\tilde{G}^2 X\). The instrument \(\tilde{G}^2 X\) (peer-of-peer characteristics) provides the exogenous variation needed to predict peer outcomes.
  2. Second stage: use the fitted values \(\widehat{\tilde{G}\mathbf{y}}\) from the first stage in place of \(\tilde{G}\mathbf{y}\) in the structural equation (7.5), and estimate \(\beta\) and \(\delta\) by OLS.

The intuition: a friend-of-a-friend’s education level affects your peer’s outcome (through their direct social interaction) but does not directly affect your outcome (since you are not connected to the friend-of-a-friend). This is the exclusion restriction, and it is provided for free by the network structure — no external experiment or policy lever is required.

Tip

The BDF insight is one of the most elegant in the empirical networks literature. The instrument is not some external variable that the researcher must find and justify. It is embedded in the network topology itself: the characteristics of nodes that are two steps away but not one step away from ego. The requirement is that such nodes exist — that the network has intransitive triads — which is true of almost every real social network. Complete graphs (reference-group models) are the exception, not the rule.

Live simulation: IV identification on a network with intransitive triads

The simulation confirms the theoretical prediction. The naive OLS estimator is biased upward — it conflates the endogenous effect \(\beta\) with the contextual effect \(\delta\) and the selection-driven correlation between peer outcomes and own outcomes. The 2SLS estimator, instrumented by the characteristics of friends-of-friends, recovers the true \(\beta\) with much smaller bias. The high first-stage F-statistic confirms that the instruments are strong: knowing the characteristics of your peer’s friends genuinely predicts your peer’s outcome.

In practice

Bramoullé, Djebbari, and Fortin (2009) applied their estimator to a dataset of 1,041 students in 69 rural Ecuadorian villages, studying peer effects in education. Calvo-Armengol, Patacchini, and Zenou (2009) used the same strategy to identify peer effects in school performance using Add Health data, instrumenting on the characteristics of friends-of-friends. In both studies, the OLS estimates were substantially larger than the IV estimates, consistent with upward bias from homophily — confirming Manski’s warning and validating the BDF instrument. The message for applied researchers: whenever you have a detailed network, the BDF instrument is available essentially for free. Use it.


The SUTVA Violation: Spillovers in Experiments

What randomization assumes

Randomized controlled trials are the gold standard for causal inference precisely because they break the correlation between treatment assignment and the unobserved characteristics that determine outcomes. If units are assigned to treatment or control at random, the expected potential outcomes are equalized across arms, and the difference in mean outcomes is an unbiased estimate of the average treatment effect. This logic has driven an enormous expansion of field experiments in development economics, health policy, and technology platforms over the past two decades.

But the logic rests on an assumption that is rarely stated explicitly and routinely violated in network settings. The Stable Unit Treatment Value Assumption (SUTVA), formalized by Rubin (1980), has two components:

  1. No interference: the potential outcome of unit \(i\) depends only on \(i\)’s own treatment assignment, not on the assignment of any other unit.
  2. No hidden versions of treatment: the treatment received by each unit is well-defined and consistent across units.

In a network, SUTVA’s first component — no interference — is typically false. Consider a vaccine trial. Alice is assigned to the treatment arm and vaccinated. Bob is assigned to the control arm and not vaccinated. But Alice and Bob are friends: they spend time together every day. If Alice’s vaccination reduces her probability of transmitting a disease to Bob, then Bob’s health outcome depends not just on his own treatment assignment but on Alice’s. Bob is in the control arm, but he is receiving indirect protection. His outcome under “no treatment” is better than the outcome of a control-arm subject who has no vaccinated friends.

When SUTVA is violated, the naive estimator of the average treatment effect:

\[ \widehat{\text{ATE}}_{\text{naive}} = \bar{y}_{\text{treated}} - \bar{y}_{\text{control}} \tag{7.6} \]

is biased. Specifically, it is biased toward zero when spillovers from treated to control units are positive: the control arm benefits from indirect exposure, raising \(\bar{y}_{\text{control}}\) and compressing the estimated treatment effect. This is called dilution bias or attenuation bias due to spillovers.

This is not an exotic scenario. It is the default in any experiment conducted on a connected population. Bond et al. (2012) ran a voter mobilization experiment on 61 million Facebook users and found precisely this: users whose friends were randomly shown a “I Voted” social message were more likely to vote even if they themselves were in the control arm. The treatment effect operated through social networks, and the naive ATE understated the total mobilization effect. Eckles, Karrer, and Ugander (2017) showed that standard A/B tests on social platforms routinely suffer from SUTVA violations because treatment and control users interact with each other through the platform’s social graph.

Before running the next cell: consider a network experiment where 50% of nodes are randomly assigned to treatment. Treatment raises a node’s outcome by \(\tau = 1.0\), and each treated neighbor of a control node raises that control node’s outcome by \(\kappa = 0.5\). The naive ATE compares mean outcomes in treated vs. control arms.

Predict: will the naive ATE over-estimate or under-estimate the true direct treatment effect \(\tau = 1.0\)? By how much, roughly?

Run the cell to check your answer.

The right panel tells the story clearly: control-arm units with more treated neighbors have higher mean outcomes, purely because of spillovers. This inflates \(\bar{y}_{\text{control}}\), compresses the gap between treated and control arms, and drives the naive ATE below the true direct effect \(\tau\). The bias is not small: on a network with average degree around 6, the typical control unit has 3 treated neighbors (given a 50% treatment rate), receiving an average spillover of \(\kappa \times 3 = 1.5\) — comparable in magnitude to the direct treatment effect itself.


Aronow–Samii Exposure Mapping

The problem with the naive ATE

The SUTVA violation we just documented is not merely a nuisance to be corrected. It is a signal that the naive ATE is answering the wrong question. When treatment effects spill over network edges, the relevant quantity is not “what is the effect of my treatment status?” but “what is the effect of my treatment status and my neighbors’ treatment statuses jointly?” The estimand needs to change before the estimator can.

Peter Aronow and Cyrus Samii (2017) provided the framework for doing this correctly. Their key concept is the exposure mapping: a function that takes the full treatment vector \(\mathbf{z} = (z_1, \ldots, z_n)\) and the network \(G\) and maps each unit \(i\) to a finite exposure condition \(T_i(\mathbf{z}, G) \in \mathcal{T}\).

The exposure mapping \(T_i\) captures everything about the treatment assignment that is relevant for unit \(i\)’s outcome. Under the SUTVA-violating model of the previous section, the relevant exposure for unit \(i\) is not just \(z_i\) but also the number or fraction of \(i\)’s neighbors who are treated. A natural four-way exposure mapping is:

\[ T_i(\mathbf{z}, G) = \begin{cases} \text{``direct+''} & \text{if } z_i = 1 \text{ and at least one neighbor treated} \\ \text{``direct only''} & \text{if } z_i = 1 \text{ and no neighbor treated} \\ \text{``indirect only''} & \text{if } z_i = 0 \text{ and at least one neighbor treated} \\ \text{``pure control''} & \text{if } z_i = 0 \text{ and no neighbor treated} \end{cases} \tag{7.7} \]

The average potential outcomes within each exposure class can be estimated separately. Define, for each exposure condition \(t \in \mathcal{T}\), the average potential outcome:

\[ \mu_t = \mathbb{E}[Y_i(\mathbf{z}) \mid T_i(\mathbf{z}, G) = t] \tag{7.8} \]

The direct effect of treatment (net of spillovers) is \(\mu_{\text{direct only}} - \mu_{\text{pure control}}\), and the spillover effect is \(\mu_{\text{indirect only}} - \mu_{\text{pure control}}\).

Horvitz–Thompson estimation

The challenge is that units are not assigned randomly to exposure conditions. A unit in the “pure control” condition is one who happens to have no treated neighbors — which depends on the network structure and the treatment assignments of all her neighbors simultaneously. The probability that unit \(i\) lands in exposure condition \(t\) under Bernoulli randomization with parameter \(p\) is:

\[ \pi_{i,t} = P(T_i(\mathbf{z}, G) = t) \tag{7.9} \]

which can be computed analytically given the network structure and the randomization distribution.

The Horvitz–Thompson estimator for the average potential outcome in exposure class \(t\) is:

\[ \hat{\mu}_t^{\text{HT}} = \frac{1}{n} \sum_{i=1}^n \frac{y_i \cdot \mathbf{1}[T_i = t]}{\pi_{i,t}} \tag{7.10} \]

This is a standard inverse-probability-weighted estimator. Units that are unlikely to end up in exposure class \(t\) receive high weight when they do, correcting for their under-representation. Aronow and Samii (2017) prove that this estimator is unbiased for \(\mu_t\) under mild regularity conditions.

Tip

The Horvitz–Thompson estimator has an intuitive interpretation. If unit \(i\) was assigned to exposure condition \(t\) with probability \(\pi_{i,t}\), then on average across many experiments, only a \(\pi_{i,t}\) fraction of the time will unit \(i\) contribute its outcome to the estimate of \(\mu_t\). To get an unbiased average, we weight each observation by the inverse of this probability — essentially “upscaling” rare events to represent the broader population they would have represented if seen more often.

Live implementation: HT estimation vs. naive ATE

The HT estimator separates what the naive ATE conflates. The direct effect — comparing treated units with no treated neighbors to pure control units with no treated neighbors — closely recovers the true \(\tau = 1.0\). The naive ATE is biased downward because it compares the full treated arm (which benefits from self-treatment) against the full control arm (which partly benefits from neighbor treatment). The HT estimator, by reweighting observations according to their probability of landing in each exposure class, removes this contamination.

In practice: Facebook voter mobilization and bond et al. 2012

The Bond et al. (2012) voter mobilization experiment on 61 million Facebook users is the largest social science experiment ever conducted at that time. Users randomly shown a social “I Voted” banner were more likely to vote, but so were their friends — even friends who were not shown the banner. The authors estimated that the direct effect of the banner was modest, but the total effect including network spillovers was substantial: the experiment may have increased actual voter turnout in the 2010 US congressional elections by roughly 340,000 votes. A naive ATE would have underestimated the total policy-relevant effect by a factor of approximately three, because it would not have counted the friend-transmission component.


Randomized Saturation Designs

The two-stage experiment

Even the Aronow–Samii framework requires knowing the network structure to compute exposure probabilities. In many field settings — particularly in developing-country economics — the researcher has a network of villages or communities, knows the community membership of each individual, but does not observe the full within-community social graph. A practical solution is the randomized saturation design, which achieves identification through the experimental design itself rather than through network measurement.

The design, developed independently in the context of deworming experiments (Miguel and Kremer 2004) and microfinance experiments (Banerjee, Chandrasekhar, Duflo, and Jackson 2013), works as follows:

  1. First stage: randomly assign each cluster (village, school, workplace) a treatment saturation \(\pi_c \in \{0.25, 0.50, 0.75\}\) (or a continuous distribution).
  2. Second stage: within each cluster, randomly treat each individual independently with probability \(\pi_c\), the cluster’s assigned saturation.

This two-stage design creates variation in both individual treatment status and in the density of treatment in the individual’s environment, independently of individual characteristics. The key identifying variation is:

  • Within any given saturation level \(\pi_c\), treated and control individuals have the same expected number of treated neighbors (because the saturation is the same). Within-saturation variation identifies the direct treatment effect.
  • Across saturation levels, control individuals have different densities of treated neighbors. Across-saturation variation (among control individuals) identifies the spillover effect.

Define:

  • \(\tau_{\text{direct}}\): the effect of own treatment, holding the share of treated peers constant,
  • \(\tau_{\text{spillover}}\): the effect of a one-unit increase in the share of peers who are treated, for a control individual.

The estimating equations are:

\[ y_{ic} = \alpha + \tau_{\text{direct}} \cdot z_{ic} + \tau_{\text{spillover}} \cdot \pi_c \cdot (1 - z_{ic}) + \gamma' x_{ic} + u_{ic} \tag{7.11} \]

where \(z_{ic}\) is individual \(i\)’s own treatment in cluster \(c\), and \(\pi_c\) is the cluster saturation. Equation (7.11) can be estimated by OLS with cluster-level fixed effects.

The Banerjee–Chandrasekhar–Duflo–Jackson microfinance experiment

Banerjee, Chandrasekhar, Duflo, and Jackson (2013) studied the diffusion of microfinance adoption in 75 villages in Karnataka, India. The program was introduced by sending “injection seeds” — initial contacts — to a subset of households in each village. Villages varied in how many seeds were planted (the analogue of saturation). The authors combined the experimental variation with detailed network data on within-village social ties to estimate both the direct effect of being an injection seed and the indirect spillover effect of being socially proximate to a seed. The study found that network structure was the dominant predictor of whether the program spread through a village: villages with more “gossipy” network structures (high eigenvector centrality of seeds) saw much faster microfinance adoption than villages where seeds were peripheral.

Live simulation: two-stage randomized saturation design

The left panel shows the design’s identifying variation clearly. Among treated units, mean outcomes are roughly constant across saturation levels — the direct effect does not depend on how many others in the village are also treated (consistent with the linear model). Among control units, mean outcomes rise with the saturation level — this monotone relationship is the spillover effect, and it is fully identified by the across-saturation variation among control units. The right panel confirms that the two-stage estimator recovers both the direct and spillover effects accurately, while the naive ATE is substantially biased.

Crépon–Devoto–Duflo–Parienté and the employment spillovers

Crépon, Devoto, Duflo, and Parienté (2015) used a randomized saturation design to study the employment effects of microcredit in Morocco. They found that access to microcredit increased self-employment and asset accumulation for treated households, but reduced employment among control households in the same village — a negative spillover consistent with competition for local business opportunities. A naive ATE would have found near-zero effects because the positive direct effect and the negative spillover nearly cancelled in the aggregate. Only the two-stage design revealed that both effects exist, with opposite signs, and both are economically significant. This is the case where the naive ATE is not just biased but actively misleading about the direction of welfare effects.


Homophily versus Influence — The Fundamental Confound

Why the problem does not go away

The BDF instrument and the randomized saturation design are powerful tools, but each addresses a specific identification challenge. BDF requires the network to have intransitive triads and requires the researcher to observe the network accurately. Randomized saturation requires the ability to randomize treatment at the cluster level. In many observational settings — where the researcher observes an existing network without any randomization — neither strategy is available.

In these settings, the Shalizi–Thomas (2011) result is the relevant theoretical benchmark. Their argument is essentially an impossibility theorem for observational identification. Consider a model in which individual \(i\)’s outcome at time \(t\) depends on both her own lagged outcome and her friends’ lagged outcomes. Without randomization:

  • Selection/homophily produces a positive correlation between \(i\)’s outcome and her friends’ outcomes because similar people form ties.
  • Contextual effects produce correlation because friends share environments.
  • True causal influence produces correlation because friends’ outcomes causally affect each other.

The key insight of Shalizi and Thomas (2011) is that all three mechanisms generate identical cross-sectional and panel covariance patterns under broad model classes. Adding time variation (panel data) and controlling for individual fixed effects eliminates some sources of homophily but not all: if the unobserved characteristic that drives both tie formation and outcomes is time-varying (e.g., mood, stress, local economic conditions), fixed-effect methods fail to control for it. The dynamic-network panel methods of the literature (controlling for lagged outcomes, using first-differences, including network fixed effects) partially reduce the confounding but cannot eliminate it.

The honest conclusion is that observational studies of peer effects in social networks are fundamentally limited. They can establish that correlation exists and can rule out some obvious confounders. They cannot, in general, establish that the correlation is causal. This is not a counsel of despair — it is a reason to invest in experimental designs, natural experiments, and the careful application of the identification strategies described in earlier sections.

What the panel literature can and cannot do

A popular approach in the empirical peer effects literature is to use a dynamic panel model of the form:

\[ y_{it} = \alpha_i + \lambda y_{it-1} + \beta \sum_{j \in N(i)} w_{ij} y_{jt-1} + \gamma' x_{it} + \epsilon_{it} \tag{7.12} \]

where \(\alpha_i\) is an individual fixed effect, \(\lambda\) captures own persistence, and \(\beta\) captures lagged peer influence. The individual fixed effect removes time-invariant sources of homophily. Lagged values of peer outcomes are used as instruments for contemporaneous peer outcomes (Arellano–Bond style GMM).

This approach has genuine power against static homophily: if Alice and Bob are friends because they are both extroverts, the fixed effect removes extroversion, and the remaining variation in \(y_{jt}\) should be driven by time-varying factors uncorrelated with Alice’s own outcome. But it fails against dynamic homophily: if Alice and Bob both have a bad week because of a shared stressor (a local event, a shared teacher, a mutual friend’s illness), then the correlation in \(y_{it}\) and \(y_{jt}\) is not causal, and the lagged peer outcome \(y_{jt-1}\) will correlate with \(\epsilon_{it}\) through the shared shock.

The Christakis–Fowler (2007) paper on obesity used exactly this design, with longitudinal Framingham Heart Study data. The critique of Shalizi and Thomas (2011) identified that the geographic clustering of obesity in the Framingham data — driven by shared neighborhood environments and local food environments — could generate all of the observed peer correlations without any causal transmission. The authors could not reject a model in which all of the apparent peer effect was actually common-shock confounding. The debate has not been resolved conclusively in the literature.

Tip

A useful diagnostic for distinguishing influence from homophily in panel data is the temporal precedence test: estimate whether person \(j\)’s outcome at time \(t-1\) predicts person \(i\)’s outcome at time \(t\), conditional on \(i\)’s own outcome at \(t-1\). If the effect is asymmetric — if \(j\) tends to precede \(i\) in the change (e.g., \(j\) gained weight before \(i\) did, and \(i\)’s change follows \(j\)’s change with a lag) — then influence is more plausible than simultaneous selection. Christakis and Fowler used this logic to argue for influence. Shalizi and Thomas showed that common shocks with heterogeneous timing can also generate such patterns. The test helps but does not resolve the debate.


Modern Approaches

Double/debiased machine learning for network spillovers

The recent double/debiased machine learning (DML) framework of Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018) provides a powerful approach to estimating causal effects in high-dimensional settings, and has been adapted to network contexts by Chernozhukov et al. (2019) and Leung (2022). The core idea is to partial out the confounding effect of high-dimensional controls using machine learning, then estimate the treatment effect on the residualized outcome.

In the network setting, define the network exposure variable \(E_i\) as a summary of how many of \(i\)’s neighbors are treated, the characteristics of those neighbors, and higher-order network statistics. The DML estimator proceeds in three steps:

  1. Regress \(y_i\) on the high-dimensional set of controls \(W_i\) (own characteristics, peer characteristics, network position measures) using cross-fitting with a flexible ML estimator (lasso, random forest). Obtain residuals \(\tilde{y}_i = y_i - \hat{E}[y_i \mid W_i]\).
  2. Regress \(z_i\) on the same \(W_i\) using ML. Obtain residuals \(\tilde{z}_i = z_i - \hat{P}(z_i = 1 \mid W_i)\).
  3. Estimate \(\hat{\tau} = \text{cov}(\tilde{y}_i, \tilde{z}_i) / \text{var}(\tilde{z}_i)\) — the coefficient from regressing the outcome residual on the treatment residual.

The DML estimator is \(\sqrt{n}\)-consistent and asymptotically normal even when the ML estimators converge at slower rates, because the cross-fitting procedure removes the regularization bias that would otherwise enter the final estimate. When applied to network data, the control vector \(W_i\) includes neighbor characteristics, peer network statistics, and community membership indicators, allowing the estimator to flexibly absorb homophily and contextual effects without imposing parametric functional forms.

Instrumental variables from network structure

Beyond the BDF instrument (peer-of-peer characteristics), several other instrumental variables strategies exploit network structure for identification:

Exogenous network variation. In studies of education peer effects, Sacerdote (2001) exploited the random assignment of college roommates at Dartmouth to identify peer effects in GPA. Quasi-random assignment creates a natural experiment: the roommate’s own high school GPA and family background are instruments for the peer’s college performance.

Regression discontinuity at network boundaries. When network membership is determined by a threshold rule — children assigned to different schools based on address, or workers assigned to different teams based on hire date — sharp discontinuities in network composition at the threshold provide regression discontinuity instruments for peer effects (Dahl, Løken, and Mogstad 2014, on sibling peer effects in earnings).

Shock propagation across network links. If one node in a network receives an idiosyncratic shock (a firm’s CEO dies suddenly, a country experiences a natural disaster, a student’s parent loses a job), the effect of that shock on connected nodes can be used as an instrument for peer effects, provided the shock is plausibly unrelated to the connected node’s own unobserved characteristics. This strategy is used extensively in the trade network literature (Acemoglu, Carvalho, Ozdaglar, and Tahbaz-Salehi 2012) and in the social influence literature (Cai, de Janvry, and Sadoulet 2015).

Cai–de Janvry–Sadoulet insurance experiment

Cai, de Janvry, and Sadoulet (2015) studied the diffusion of weather insurance among Chinese rice farmers. In a two-stage experiment, some farmers were given intensive information about weather insurance before others. The first-stage recipients then acted as social transmitters to their village networks. The researchers exploited variation in whether the farmer’s first-stage peers had received the intensive information as an instrument for the farmer’s own information set. They found large peer effects in insurance adoption — farmers were much more likely to buy insurance if their friends had received the intensive information, even controlling for the farmer’s own information. The instrumental variable strategy (using the random assignment of friends to information treatment) makes this a credible causal estimate.


Mini Case Study: Comparing Estimators on a Stochastic Block Model

Setup

This case study makes the chapter’s main argument concrete. We simulate a 50-node network experiment on a stochastic block model (SBM) with two communities of 25 nodes each. Within-community edges form with probability \(p_{\text{in}} = 0.4\); between-community edges form with probability \(p_{\text{out}} = 0.05\). This structure ensures that spillovers are primarily within-community — treating one node has a larger effect on nodes in the same community than on nodes in the other community.

The true data-generating process is:

\[ y_i = 5 + \tau \cdot z_i + \kappa \cdot \frac{\sum_{j \in N(i)} z_j}{d_i} + \epsilon_i \tag{7.13} \]

where \(\tau = 2.0\) is the direct treatment effect, \(\kappa\) is the spillover effect (which we vary), and \(d_i\) is node \(i\)’s degree. We compare three estimators:

  1. Naive ATE: \(\bar{y}_{\text{treated}} - \bar{y}_{\text{control}}\) — ignores spillovers entirely.
  2. Aronow–Samii HT: uses exposure classes (treated/control) × (has treated neighbor / no treated neighbor), weighted by inverse exposure probabilities.
  3. Randomized saturation: the experiment is designed as a two-community saturation design where one community is assigned saturation \(0.7\) and the other \(0.3\).

We evaluate estimator bias as a function of the spillover magnitude \(\kappa\).

Before running: as the spillover magnitude \(\kappa\) increases from 0 to 2, predict the shape of the bias curve for each estimator. Which estimator’s bias grows most steeply with \(\kappa\)? Which is most robust?

Run the cell to check your prediction.

Reading the results

The left panel — bias against spillover magnitude \(\kappa\) — is the chapter’s payoff. At \(\kappa = 0\) (no spillovers, SUTVA satisfied), all three estimators are approximately unbiased. This is the world where standard A/B testing is valid. As \(\kappa\) increases, the estimators diverge sharply.

The naive ATE bias grows linearly in \(\kappa\) with a steep slope. The direction is negative (downward bias), confirming the SUTVA analysis: treated neighbors raise control-arm outcomes, compressing the treatment-control gap. By \(\kappa = 2\) — where the spillover effect per fraction of treated neighbors equals the direct effect — the naive ATE is nearly zero even though the true direct effect is \(\tau = 2.0\).

The Aronow–Samii HT estimator is nearly unbiased across the entire range of \(\kappa\). By correctly classifying units into exposure classes and reweighting by inverse probability, it removes the contamination of the control arm by spillovers. It pays a price in variance (right panel): the HT estimator is more variable than the naive ATE because extreme reweighting for rare exposure conditions amplifies noise.

The randomized saturation estimator is also approximately unbiased, and with somewhat lower variance than the HT estimator in this design. This reflects the design efficiency of the two-stage randomization: by deliberately concentrating treatment in one community and diluting it in the other, the design creates maximal variation in saturation — which is the source of identification for the spillover effect — while controlling for community membership through demeaning.

The trade-off between bias and variance

The right panel reveals the standard bias-variance trade-off. The naive ATE has low variance (all units contribute equally, no reweighting) but large bias at high \(\kappa\). The HT estimator has near-zero bias but higher variance because of the inverse-probability weighting. The saturation estimator balances the two: by design, it ensures that every saturation level is populated with a substantial number of units, preventing the extreme reweighting that drives HT variance.

For practical experiments with limited samples, this trade-off matters. On small networks (fewer than 100 units), the HT estimator’s variance can be prohibitive, and the saturation design is preferable. On large platforms with millions of users, the HT estimator’s variance is negligible and its unbiasedness without requiring a pre-specified saturation structure makes it more flexible.

In practice: Eckles–Karrer–Ugander and cluster randomization

Eckles, Karrer, and Ugander (2017) ran experiments on the Facebook social graph to measure SUTVA violations in A/B testing. They found that standard Bernoulli randomization produces biased estimates whenever users interact through the platform’s social features. They advocated for cluster randomization — assigning entire friend clusters to treatment or control together — as a practical SUTVA remedy. Cluster randomization is the network analogue of the randomized saturation design: by treating all members of a cluster together, it eliminates within-cluster interference. The cost is reduced statistical power (fewer independent observations) and the difficulty of defining clusters in a network without natural community structure.

Practical guidance

The chapter’s analysis yields several actionable conclusions for empiricists working with network data.

In observational settings, the Bramoullé–Djebbari–Fortin instrument provides identification if the network has intransitive triads (virtually always true) and the network is accurately observed. Always use it in preference to naive OLS. Report both the naive and IV estimates and interpret the difference as a lower bound on the homophily bias.

In experimental settings on platforms or connected communities, never use naive Bernoulli A/B testing when units interact through the platform’s social features. At minimum, check whether the exposure class distribution is balanced across treatment and control arms. If it is not — if treated users have systematically more or fewer treated neighbors than control users — the naive ATE is biased.

When the network is partially observed or when experimental control is limited to clusters, the randomized saturation design is the most robust strategy. The key requirement is that saturation varies across clusters and that individual treatment is randomized within each cluster independently.

In all settings, the Shalizi–Thomas impossibility result is the appropriate benchmark for intellectual honesty: observational analysis can document correlation and can partially control for confounders, but it cannot achieve identification from correlation alone. Every observational peer effect estimate should be accompanied by a transparent discussion of which confounders have and have not been controlled for.


What’s Next

This chapter has moved the discussion from describing networks to identifying causal effects in and through networks. The methods developed here — Manski’s reflection problem and the BDF resolution, SUTVA violation and exposure mapping, randomized saturation — are the current frontier of applied empirical work in social and economic networks.

The natural next question is how to measure the network itself. All of the identification strategies in this chapter assume that either the network is accurately observed (BDF) or that the experimental design makes network measurement unnecessary (saturation designs). In practice, network measurement is expensive, subject to reporting error, and often incomplete. Survey-based network data systematically misses weak ties and informal influence channels. Platform-based network data captures structural connections but not the weight or salience of those connections for individual behavior.

The survey measurement of networks, the boundary specification problem (where does the relevant network end?), and the implications of measurement error for causal inference are active methodological frontiers. They connect back to the foundational questions of Chapter 1 — what is a network, and how do we observe it — and forward to the policy questions that motivate the entire course: when can network data help us design better interventions, and when is the network structure itself the object of policy?


Prof. Xuhu Wan  ·  HKUST  ·  Modern AI Stack for Social Data  ·  2026 Edition

 

Prof. Xuhu Wan · HKUST · Modern AI Stack for Social Data