Design and Analysis of Networked Experiments 2024

17-19 July, King’s College London

Programme

All scientific sessions are in Room K2.40. Coffee, lunch, tea and the reception are in the River Room.

Registration and coffee
10:30 - 10:55
Welcome
10.55 - 11.00 Vasiliki Koutra KCL
Keynote 1 (Chair: Vasiliki Koutra)
11:00 - 12.00 Dean Eckles MIT Learning from randomized interventions in social media

How should we reason about the effects of interventions in social media? What effect sizes should be expected from such changes to algorithms and content? And, given the fundamentally social nature of these services, what conclusions can we draw from individual-level experiments? I will comment on the published results from prominent, recent experiments on Facebook and Instagram conducted during the 2020 US Elections. I will also draw on two new field experiments on Facebook \((N>3.3\times 10^7)\) and Twitter \((N>7.5 \times 10^4)\), each randomizing exposure to advertising featuring content-general messages reminding people to think about accuracy.[Paper]

Lunch
12:00 - 13:30
Keynote 2 (Chair: Ben Parker)
13:30 - 14:30 Nathaniel Stevens University of Waterloo Design and Analysis of Network A/B Tests with General Additive Network Effect Models

As a means of continual improvement and innovation, online controlled experiments are widely used by internet and technology companies to test and evaluate product changes, and new features, and to ensure that user feedback drives decisions. This is true of companies like LinkedIn, Facebook, and X, large online social networks. However, experiments on networks are complicated by the fact that the stable unit treatment value assumption (SUTVA) no longer holds. Due to the interconnectivity of users in these networks, a user’s outcome may be influenced by their own treatment assignment as well as the treatment assignment of those they are socially connected with. The design and analysis of the experiment must account for this. The general additive network effect (GANE) family of models is proposed to jointly and flexibly model treatment and network effects. Experimental design and analysis considerations are discussed in the context of the proposed family of models.

Collaborators:

  • Trang Bui (University of Waterloo)
  • Stefan Steiner(University of Waterloo)
Tea
14:30 - 15.00
Invited session 1 (Chair: Werner Müller)
15:00 - 15:40 Alessio Zanga University of Milano - Bicocca Federated Causal Discovery with Missing Data in a Multicentric Study on Endometrial Cancer

Causal discovery is the task of learning a causal graph representing the cause-effect relationships. Being able to establish causal dependencies is crucial in several applied domains, especially when the decision-making policy needs to be explainable, as in the case of medicine and healthcare. In these settings, small sample size and missing data call for federated approaches to take advantage of the information distributed across multiple sources. In this work, we introduce a novel federated causal discovery algorithm capable of pooling together information from multiple sources with heterogeneous missing data distributions. In particular, we propose a score-based approach that learns a global causal graph while taking into account local missingness mechanisms and prior knowledge. We applied the proposed algorithm on a real-world multicentric study on endometrial cancer, validating the obtained causal graph through quantitative analyses and clinical literature. The federated methodology showed promising results in recovering the underlying causal graph, especially in the presence of data missing not at random.

15:40 - 16:20 Kirsten Schorning TU Dortmund Optimal Designs For State Estimation In Networks

We consider the problem of estimating the expected states in networks, where observations are given by repeated measurements of the random states at the nodes. The choice of the sensors directly influences the quality of the measurements at the different nodes. During the talk, we address the problem of optimally allocating different sensors to the nodes using optimal design theory. Hereby, we assume two models of different complexity. In the first model, the states of the different nodes are assumed to be independent from time. The design question is then to determine which nodes need greater precision of the measurements than others. Here, we derive explicitly A-optimal designs for different networks, e.g., a star network.
In the second model, the first model is extended to time-dependent states; in particular, we model the states using time-dependent functions. Then, the design problem concises the optimal allocation of different measurement devices and the different time points at which measurements should be taken. Both analytical and numerical results are provided for the second model using A-optimality.

16:20 - 17:00 Ben Parker Brunel University Design of Experiments for Autoregressive Networks

Much recent research exists on statistical analysis of network data; typically we wish to infer the influence of some network on some response we measure. However, in this talk we consider designing experiments on experimental units linked by some intrinsic network structure, in order to get maximum information about our interventions (treatments).

Networks in experiments may be obvious, or less so, and might correspond to:

  • social networks- for example we may give an advert to users connected by a social network, and try to infer the effectiveness (for example in A/B testing);
  • spatial networks- for example in agricultural experiments, we may apply a treatment to one plot of wheat, and the effect of the treatment spreads to the adjacent experimental unit;
  • temporal networks- we can even incorporate temporal elements into the definition, for example in a crossover experiment, a drug administered at a particular time might still have an effect afterwards.

In this talk, we briefly review some of the work we, and others, have done on experiments on networks, and talk about the vital importance of including the network structure in our model where it exists. We show that by not taking into account network structure, we can design experiments which have very low efficiency and/or produce biased results, and provide some guidelines for performing robust experiments on networked data.

In particular, we wish to distinguish between experiments where treatments propagate across a network, and those where the responses of experimental units in a network affect the other experimental units: an autoregressive network.

Close of the day
17:00
Keynote 3 (Chair: Nathaniel Stevens)
10:15-11:15 Alexander Volfovsky Duke University Neighborhood Adaptive Estimators for Causal Inference under Network Interference

How should we design and analyze experiments on networks when interference may differently impact different agents? In this work we study a setting where the network is known, but, unlike previous work in this area, the radius (and intensity) of the interference experienced by a unit is unknown and can depend on different sub-networks of those treated and untreated that are connected to this unit. We study estimators for the average direct treatment effect on the treated in such a setting. The proposed estimator builds upon a Lepski-like procedure that searches over the possible relevant radii and treatment assignment patterns. In contrast to previous work, the proposed procedure aims to approximate the relevant network interference patterns. We establish oracle inequalities and corresponding adaptive rates for the estimation of the interference function.  We leverage such estimates to analyze an estimator for the average direct treatment effect on the treated. We address several challenges steaming from the data-driven creation of the patterns (i.e. feature engineering) and the network dependence. In addition to rates of convergence, under mild regularity conditions, we show that the proposed estimator is asymptotically normal and unbiased and may be leveraged for experimental design. 

Coffee
11:15-11:45
Invited industry session (Chair: Dean Eckles)
11.45 - 12.30 Eleni Kalfountzou Procter & Gamble Experimentation in the FMCG industry - when A/B testing is not enough
Lunch
12:30-14:00
Invited session 2 (Chair: Rosemary Bailey)
14.00 - 14.40 Tim Waite University of Manchester Sequential Bayesian design with Laplace-parameterized policies

In the past few years policy-based approaches have been developed as a method for sequential Bayesian design, building on ideas from dynamic programming and deep reinforcement learning. The idea is to construct in advance a policy, i.e. a mapping from the current knowledge state to the next set of experimental conditions at which to observe a response. This policy is to be constructed in such a way as to maximize the ultimate expected utility, e.g. the expected Shannon information gain, after all experiments have been completed. This is usually done by taking the policy to be a neural network, and optimizing its weights. However, a key question in policy-based approaches is how to represent the knowledge state, i.e. what to use as the input to the policy network. It is clear the knowledge state is completely specified by the posterior distribution, and so an attractive choice would be an approximation to the posterior, especially one with only a few parameters which could then serve as inputs to the policy network. In this talk we discuss the use of the Laplace approximation to the posterior as the knowledge representation, and develop the methodology needed to train policies based on Laplace representations. 

14.40 - 15.20 Werner Müller Johannes Kepler University Linz A Practical Guide to Optimum Design of Experiments under Correlation

Given its significance in various fields such as spatio-temporal monitoring, computer simulation, network analysis, and genetic breeding, experimental design for regression models with correlated errors has garnered increasing interest. The theory for classical (non-)linear regression with uncorrelated errors is well-established, originating from Kiefer’s concept of design measures and now widely covered in textbooks. However, the literature on regression models with correlated errors remains fragmented, lacking a unified approach.

In this presentation, I aim to bridge this gap by introducing a best practice procedure that combines both traditional and recent research findings, addressing both estimation and prediction objectives. This will be illustrated with a range of applications, from simple textbook examples to comprehensive case studies.

Tea
15:20-16:00
Panel discussion (Chair: Steve Gilmour)
16:00-17:00 The future of networked experiments

Panellists: Dean Eckles, Nathaniel Stevens, Alexander Volfovsky, Eleni Kalfountzou, Rosemary Bailey

Reception
17:00-18:00
Close of the day
18:00
Keynote 4 (Chair: Tim Waite)
09:30-10:30 Rosemary Bailey University of St Andrews Designs on strongly-regular graphs

Most networks can be regarded as graphs. Some particularly nice graphs are the strongly-regular graphs. The edges and non-edges of such graphs form the associate classes of an association scheme. The corresponding Bose-Mesner algebra (linear combinations of the adjacency matrices) has three common eigenspaces, one of which consists of the constant vectors.

In classical work on design of experiments, the experimental units are grouped into \(b\) blocks of size \(k\). There are three common eigenspaces. One consists of the constant vectors (it has dimension \(1\)); one consists of vectors which are constant on each block and whose entries sum to zero (it has dimension \(b-1\); the third is the orthogonal complement of these two (it has dimension \(b(k-1)\).

In some experiments, the experimental units are all pairs of individuals who have to undertake a given task together. If all such pairs are used exactly once each, then the set of pairs forms a triangular association scheme. If there are \(n\) individuals then there are \(N = n(n − 1)/2\) such pairs. The corresponding Bose–Mesner algebra has three common eigenspaces. One consists of the constant vectors (it has dimension \(1\)); one consists of linear combinations of the indicator vectors of individuals, constrained so that the entries sum to zero (it has dimension \(n−1\)); the third is the orthogonal complement of these two (it has dimension \(N − n\)).

In both cases, we assume that the variance–covariance matrix \(C\) of the responses to the experiment is an unknown linear combination of the matrices of projection onto these eigenspaces.

Two types of block design are particularly important. In balanced block designs, the variance of the estimated difference between any two treatments is the same, no matter what the eigenvalues of \(C\) are. In orthogonal block designs, the linear combination of responses which gives the best unbiased estimator of any difference between treatments does not depend on what the eigenvalues of \(C\) are. Such designs are often said to have commutative orthogonal block structure.

In this talk I will give some constructions for balanced designs and some for designs which have commutative orthogonal block structure, in each scenario.

This is joint work with P. Cameron (University of St Andrews) and D. Ferreira, S. S. Ferreira and C. Nunes (Universidade de Beira Interior).

Coffee
10:30-11:00
Invited session 3 (Chair: Vasiliki Koutra)
11.00 - 11.40 Francesca Panero Sapienza University Modelling sparse networks with Bayesian nonparametrics

The graphex is a statistical framework to model random graphs, originally introduced in Caron and Fox (2017). It is particularly flexible, in that by using carefully chosen Bayesian nonparametric priors it allows us to describe dense and sparse networks, different degree distributions (power-law included) and positive clustering. After introducing the general graphex framework and its asymptotic properties, I will explain how we model sparse networks embedded in a latent space and use this framework to uncover structures underlying mobility patterns. I will conclude by introducing a different variety of the graphex that allows to describe dynamic networks with overlapping communities.

11.40 - 12.20 Ruben Sanchez-Garcia University of Southampton Exploiting symmetry in network analysis

Network models of real-world complex systems typically include a large amount of structural redundancy, which manifest itself as symmetries of the network. Network symmetries are inherited by any structural measure on the network, and thus exploited in any network analysis. I will explain the effect of network symmetry on arbitrary network measures and show how this can be exploited in practice in a number of ways, from redundancy compression to computational reduction. Computing network symmetries is very efficient in practice, and we test real-world examples up to several million nodes. Since network models are ubiquitous in the Applied Sciences, our results are widely applicable.

Concluding remarks
12:20-12:30
Lunch
12:30-14:00
Close of the workshop
14:00