Design and Analysis of Networked Experiments 2024
17-19 July, King’s College London
Programme
All scientific sessions are in Room K2.40. Coffee, lunch, tea and the reception are in the River Room.
Registration and coffee | ||||
---|---|---|---|---|
10:30 - 10:55 | ||||
Welcome | ||||
10.55 - 11.00 | Vasiliki Koutra | KCL | ||
Keynote 1 (Chair: Vasiliki Koutra) | ||||
11:00 - 12.00 | Dean Eckles | MIT | Learning from randomized interventions in social media | How should we reason about the effects of interventions in social media? What effect sizes should be expected from such changes to algorithms and content? And, given the fundamentally social nature of these services, what conclusions can we draw from individual-level experiments? I will comment on the published results from prominent, recent experiments on Facebook and Instagram conducted during the 2020 US Elections. I will also draw on two new field experiments on Facebook \((N>3.3\times 10^7)\) and Twitter \((N>7.5 \times 10^4)\), each randomizing exposure to advertising featuring content-general messages reminding people to think about accuracy.[Paper] |
Lunch | ||||
12:00 - 13:30 | ||||
Keynote 2 (Chair: Ben Parker) | ||||
13:30 - 14:30 | Nathaniel Stevens | University of Waterloo | Design and Analysis of Network A/B Tests with General Additive Network Effect Models | As a means of continual improvement and innovation, online controlled experiments are widely used by internet and technology companies to test and evaluate product changes, and new features, and to ensure that user feedback drives decisions. This is true of companies like LinkedIn, Facebook, and X, large online social networks. However, experiments on networks are complicated by the fact that the stable unit treatment value assumption (SUTVA) no longer holds. Due to the interconnectivity of users in these networks, a user’s outcome may be influenced by their own treatment assignment as well as the treatment assignment of those they are socially connected with. The design and analysis of the experiment must account for this. The general additive network effect (GANE) family of models is proposed to jointly and flexibly model treatment and network effects. Experimental design and analysis considerations are discussed in the context of the proposed family of models. Collaborators:
|
Tea | ||||
14:30 - 15.00 | ||||
Invited session 1 (Chair: Werner Müller) | ||||
15:00 - 15:40 | Alessio Zanga | University of Milano - Bicocca | Federated Causal Discovery with Missing Data in a Multicentric Study on Endometrial Cancer | Causal discovery is the task of learning a causal graph representing the cause-effect relationships. Being able to establish causal dependencies is crucial in several applied domains, especially when the decision-making policy needs to be explainable, as in the case of medicine and healthcare. In these settings, small sample size and missing data call for federated approaches to take advantage of the information distributed across multiple sources. In this work, we introduce a novel federated causal discovery algorithm capable of pooling together information from multiple sources with heterogeneous missing data distributions. In particular, we propose a score-based approach that learns a global causal graph while taking into account local missingness mechanisms and prior knowledge. We applied the proposed algorithm on a real-world multicentric study on endometrial cancer, validating the obtained causal graph through quantitative analyses and clinical literature. The federated methodology showed promising results in recovering the underlying causal graph, especially in the presence of data missing not at random. |
15:40 - 16:20 | Kirsten Schorning | TU Dortmund | Optimal Designs For State Estimation In Networks | We consider the problem of estimating the expected states in networks, where observations are given by repeated measurements of the random states at the nodes. The choice of the sensors directly influences the quality of the measurements at the different nodes. During the talk, we address the problem of optimally allocating different sensors to the nodes using optimal design theory. Hereby, we assume two models of different complexity.
In the first model, the states of the different nodes are assumed to be independent from time. The design question is then to determine which nodes need greater precision of the measurements than others. Here, we derive explicitly A-optimal designs for different networks, e.g., a star network. |
16:20 - 17:00 | Ben Parker | Brunel University | Design of Experiments for Autoregressive Networks | Much recent research exists on statistical analysis of network data; typically we wish to infer the influence of some network on some response we measure. However, in this talk we consider designing experiments on experimental units linked by some intrinsic network structure, in order to get maximum information about our interventions (treatments). Networks in experiments may be obvious, or less so, and might correspond to:
In this talk, we briefly review some of the work we, and others, have done on experiments on networks, and talk about the vital importance of including the network structure in our model where it exists. We show that by not taking into account network structure, we can design experiments which have very low efficiency and/or produce biased results, and provide some guidelines for performing robust experiments on networked data. In particular, we wish to distinguish between experiments where treatments propagate across a network, and those where the responses of experimental units in a network affect the other experimental units: an autoregressive network. |
Close of the day | ||||
17:00 |
Keynote 3 (Chair: Nathaniel Stevens) | ||||
---|---|---|---|---|
10:15-11:15 | Alexander Volfovsky | Duke University | Neighborhood Adaptive Estimators for Causal Inference under Network Interference | How should we design and analyze experiments on networks when interference may differently impact different agents? In this work we study a setting where the network is known, but, unlike previous work in this area, the radius (and intensity) of the interference experienced by a unit is unknown and can depend on different sub-networks of those treated and untreated that are connected to this unit. We study estimators for the average direct treatment effect on the treated in such a setting. The proposed estimator builds upon a Lepski-like procedure that searches over the possible relevant radii and treatment assignment patterns. In contrast to previous work, the proposed procedure aims to approximate the relevant network interference patterns. We establish oracle inequalities and corresponding adaptive rates for the estimation of the interference function. We leverage such estimates to analyze an estimator for the average direct treatment effect on the treated. We address several challenges steaming from the data-driven creation of the patterns (i.e. feature engineering) and the network dependence. In addition to rates of convergence, under mild regularity conditions, we show that the proposed estimator is asymptotically normal and unbiased and may be leveraged for experimental design. |
Coffee | ||||
11:15-11:45 | ||||
Invited industry session (Chair: Dean Eckles) | ||||
11.45 - 12.30 | Eleni Kalfountzou | Procter & Gamble | Experimentation in the FMCG industry - when A/B testing is not enough | |
Lunch | ||||
12:30-14:00 | ||||
Invited session 2 (Chair: Rosemary Bailey) | ||||
14.00 - 14.40 | Tim Waite | University of Manchester | Sequential Bayesian design with Laplace-parameterized policies | In the past few years policy-based approaches have been developed as a method for sequential Bayesian design, building on ideas from dynamic programming and deep reinforcement learning. The idea is to construct in advance a policy, i.e. a mapping from the current knowledge state to the next set of experimental conditions at which to observe a response. This policy is to be constructed in such a way as to maximize the ultimate expected utility, e.g. the expected Shannon information gain, after all experiments have been completed. This is usually done by taking the policy to be a neural network, and optimizing its weights. However, a key question in policy-based approaches is how to represent the knowledge state, i.e. what to use as the input to the policy network. It is clear the knowledge state is completely specified by the posterior distribution, and so an attractive choice would be an approximation to the posterior, especially one with only a few parameters which could then serve as inputs to the policy network. In this talk we discuss the use of the Laplace approximation to the posterior as the knowledge representation, and develop the methodology needed to train policies based on Laplace representations. |
14.40 - 15.20 | Werner Müller | Johannes Kepler University Linz | A Practical Guide to Optimum Design of Experiments under Correlation | Given its significance in various fields such as spatio-temporal monitoring, computer simulation, network analysis, and genetic breeding, experimental design for regression models with correlated errors has garnered increasing interest. The theory for classical (non-)linear regression with uncorrelated errors is well-established, originating from Kiefer’s concept of design measures and now widely covered in textbooks. However, the literature on regression models with correlated errors remains fragmented, lacking a unified approach. In this presentation, I aim to bridge this gap by introducing a best practice procedure that combines both traditional and recent research findings, addressing both estimation and prediction objectives. This will be illustrated with a range of applications, from simple textbook examples to comprehensive case studies. |
Tea | ||||
15:20-16:00 | ||||
Panel discussion (Chair: Steve Gilmour) | ||||
16:00-17:00 | The future of networked experiments | Panellists: Dean Eckles, Nathaniel Stevens, Alexander Volfovsky, Eleni Kalfountzou, Rosemary Bailey | ||
Reception | ||||
17:00-18:00 | ||||
Close of the day | ||||
18:00 |
Keynote 4 (Chair: Tim Waite) | ||||
---|---|---|---|---|
09:30-10:30 | Rosemary Bailey | University of St Andrews | Designs on strongly-regular graphs | Most networks can be regarded as graphs. Some particularly nice graphs are the strongly-regular graphs. The edges and non-edges of such graphs form the
associate classes of an association scheme. The corresponding Bose-Mesner algebra
(linear combinations of the adjacency matrices) has three common eigenspaces, one of which consists of the constant vectors. In classical work on design of experiments, the experimental units are grouped into \(b\) blocks of size \(k\). There are three common eigenspaces. One consists of the constant vectors (it has dimension \(1\)); one consists of vectors which are constant on each block and whose entries sum to zero (it has dimension \(b-1\); the third is the orthogonal complement of these two (it has dimension \(b(k-1)\). In some experiments, the experimental units are all pairs of individuals who have to undertake a given task together. If all such pairs are used exactly once each, then the set of pairs forms a triangular association scheme. If there are \(n\) individuals then there are \(N = n(n − 1)/2\) such pairs. The corresponding Bose–Mesner algebra has three common eigenspaces. One consists of the constant vectors (it has dimension \(1\)); one consists of linear combinations of the indicator vectors of individuals, constrained so that the entries sum to zero (it has dimension \(n−1\)); the third is the orthogonal complement of these two (it has dimension \(N − n\)). In both cases, we assume that the variance–covariance matrix \(C\) of the responses to the experiment is an unknown linear combination of the matrices of projection onto these eigenspaces. Two types of block design are particularly important. In balanced block designs, the variance of the estimated difference between any two treatments is the same, no matter what the eigenvalues of \(C\) are. In orthogonal block designs, the linear combination of responses which gives the best unbiased estimator of any difference between treatments does not depend on what the eigenvalues of \(C\) are. Such designs are often said to have commutative orthogonal block structure. In this talk I will give some constructions for balanced designs and some for designs which have commutative orthogonal block structure, in each scenario. This is joint work with P. Cameron (University of St Andrews) and D. Ferreira, S. S. Ferreira and C. Nunes (Universidade de Beira Interior). |
Coffee | ||||
10:30-11:00 | ||||
Invited session 3 (Chair: Vasiliki Koutra) | ||||
11.00 - 11.40 | Francesca Panero | Sapienza University | Modelling sparse networks with Bayesian nonparametrics | The graphex is a statistical framework to model random graphs, originally introduced in Caron and Fox (2017). It is particularly flexible, in that by using carefully chosen Bayesian nonparametric priors it allows us to describe dense and sparse networks, different degree distributions (power-law included) and positive clustering. After introducing the general graphex framework and its asymptotic properties, I will explain how we model sparse networks embedded in a latent space and use this framework to uncover structures underlying mobility patterns. I will conclude by introducing a different variety of the graphex that allows to describe dynamic networks with overlapping communities. |
11.40 - 12.20 | Ruben Sanchez-Garcia | University of Southampton | Exploiting symmetry in network analysis | Network models of real-world complex systems typically include a large amount of structural redundancy, which manifest itself as symmetries of the network. Network symmetries are inherited by any structural measure on the network, and thus exploited in any network analysis. I will explain the effect of network symmetry on arbitrary network measures and show how this can be exploited in practice in a number of ways, from redundancy compression to computational reduction. Computing network symmetries is very efficient in practice, and we test real-world examples up to several million nodes. Since network models are ubiquitous in the Applied Sciences, our results are widely applicable. |
Concluding remarks | ||||
12:20-12:30 | ||||
Lunch | ||||
12:30-14:00 | ||||
Close of the workshop | ||||
14:00 |