# What are Scientific Publications?

We can distinguish different categories of scientific publications. *Preprints* are manuscripts that are currently under review at either a journal or conference. We usually make our latest work immediately available to the public in this way. *Journal Articles* are articles that have passed the peer-review process for a journal. *Proceedings and Conference Contributions* are commonly shorter articles, that were peer-reviewed and accepted for publication with the associated conference. *Technical Reports* usually contain mathematical details for which there was too little space in a submission. *Books* are often longer collections of academic work, or sometimes written with the goal to educate a specific topic.

## List of my Publications

Here is a list containing my publications. You can click on a title to read an article you are interested in.

### Preprints

- Alexander Van Werde, Albert Senen-Cerda, Gianluca Kosmella, Jaron Sanders (2022).
. Preprint. ArXiv 2210.01679. [Abstract]**Detection and Evaluation of Clusters within Sequential Data**Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking.

We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper.

Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.

- Jaron Sanders, Alexander Van Werde (2022).
. Preprint. ArXiv 2204.13534. [Abstract]**Singular value distribution of dense random matrices with block Markovian dependence**A block Markov chain is a Markov chain whose state space can be partitioned into a finite number of clusters such that the transition probabilities only depend on the clusters. Block Markov chains thus serve as a model for Markov chains with communities. This paper establishes limiting laws for the singular value distributions of the empirical transition matrix and empirical frequency matrix associated to a sample path of the block Markov chain whenever the length of the sample path is Θ(n2) with n the size of the state space.The proof approach is split into two parts. First, we introduce a class of symmetric random matrices with dependence called approximately uncorrelated random matrices with variance profile. We establish their limiting eigenvalue distributions by means of the moment method. Second, we develop a coupling argument to show that this general-purpose result applies to block Markov chains.

- Jaron Sanders, Albert Senen-Cerda (2021).
. Preprint. ArXiv 2111.06201. [Abstract]**Spectral norm bounds for block Markov chain random matrices**This paper quantifies the asymptotic order of the largest singular value of a centered random matrix built from the path of a Block Markov Chain (BMC). In a BMC there are n labeled states, each state is associated to one of K clusters, and the probability of a jump depends only on the clusters of the origin and destination. Given a path X0, X1, …, XTn started from equilibrium, we construct a random matrix N that records the number of transitions between each pair of states. We prove that if ω(n)=Tn=o(n*n), then ∥N−E[N]∥=ΩP(sqrt(Tn/n)). We also prove that if Tn=Ω(nlnn), then ∥N−E[N]∥=OP(sqrt(Tn/n)) as n→∞; and if Tn=ω(n), a sparser regime, then ∥NΓ−E[N]∥=OP(sqrt(Tn/n)). Here, NΓ is a regularization that zeroes out entries corresponding to jumps to and from most-often visited states. Together this establishes that the order is ΘP(sqrt(Tn/n)) for BMCs.

- Albert Senen-Cerda, Jaron Sanders (2020).
. Preprint. ArXiv 2002.02247. [Abstract]**Almost sure convergence of dropout algorithms for neural networks**We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with {0,1}-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN’s weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.

### Journal Articles

- Oxana A. Manita, Mark A. Peletier, Jacobus W. Portegies, Jaron Sanders, Albert Senen-Cerda (2022).
. Journal article. Journal of Machine Learning Research, 23(19):1-46. [Preprint] [Abstract]**Universal Approximation in Dropout Neural Networks**We prove two universal approximation theorems for a range of dropout neural networks. These are feed-forward neural networks in which each edge is given a random {0,1}-valued filter, that have two modes of operation: in the first each edge output is multiplied by its random filter, resulting in a random output, while in the second each edge output is multiplied by the expectation of its filter, leading to a deterministic output. It is common to use the random mode during training and the deterministic mode during testing and prediction.Both theorems are of the following form: Given a function to approximate and a threshold ε>0, there exists a dropout network that is ε-close in probability and in Lq. The first theorem applies to dropout networks in the random mode. It assumes little on the activation function, applies to a wide class of networks, and can even be applied to approximation schemes other than neural networks. The core is an algebraic property that shows that deterministic networks can be exactly matched in expectation by random networks. The second theorem makes stronger assumptions and gives a stronger result. Given a function to approximate, it provides existence of a network that approximates in both modes simultaneously. Proof components are a recursive replacement of edges by independent copies, and a special first-layer replacement that couples the resulting larger network to the input.

The functions to be approximated are assumed to be elements of general normed spaces, and the approximations are measured in the corresponding norms. The networks are constructed explicitly. Because of the different methods of proof, the two results give independent insight into the approximation properties of random dropout networks. With this, we establish that dropout neural networks broadly satisfy a universal-approximation property.

- Daan Rutten, Jaron Sanders (2020).
. Journal article. Accepted for publication in Physical Review A. ArXiv 2010.13728. [Abstract]**Modeling Rydberg gases using random sequential adsorption on random graphs**The statistics of strongly interacting, ultracold Rydberg gases are governed by the interplay of two factors: geometrical restrictions induced by blockade effects, and quantum mechanical effcts. To shed light on their relative roles in the statistics of Rydberg gases, we compare three models in this paper: a quantum mechanical model describing the excitation dynamics within a Rydberg gas, a Random Sequential Adsorption (RSA) process on a Random Geometric Graph (RGG), and a RSA process on a Decomposed Random Intersection Graph (DRIG). The latter model is new, and refers to choosing a particular subgraph of a mixture of two other random graphs. Contrary to the former two models, it lends itself for a rigorous mathematical analysis; and it is built speciffcally to have particular structural properties of a RGG. We establish for it a fluid limit describing the time-evolution of number of Rydberg atoms, and show numerically that the expression remains accurate across a wider range of particle densities than an earlier approach based on an RSA process on an Erdos-Renyi Random Graph (ERRG). Finally, we also come up with a new heuristic using random graphs that gives a recursion to describe a normalized pair-correlation function of a Rydberg gas. Our results suggest that even without dissipation, on long time scales the statistics are affected most by the geometrical restrictions induced by blockade effects, while on short time scales the statistics are affected most by quantum mechanical effects.

- Jaron Sanders, Alexandre Proutiere, Se-Young Yun (2019).
. Journal article. Accepted for publication in Annals of Statistics. ArXiv 1712.09232v3. [Download] [Abstract]**Clustering in Block Markov Chains**This paper considers cluster detection in Block Markov Chains (BMCs). These Markov chains are characterized by a block structure in their transition matrix. More precisely, the n possible states are divided into a finite number of K groups or clusters, such that states in the same cluster exhibit the same transition rates to other states. One observes a trajectory of the Markov chain, and the objective is to recover, from this observation only, the (initially unknown) clusters. In this paper we devise a clustering procedure that accurately, efficiently, and provably detects the clusters. We first derive a fundamental information-theoretical lower bound on the detection error rate satisfied under any clustering algorithm. This bound identifies the parameters of the BMC, and trajectory lengths, for which it is possible to accurately detect the clusters. We next develop two clustering algorithms that can together accurately recover the cluster structure from the shortest possible trajectories, whenever the parameters allow detection. These algorithms thus reach the fundamental detectability limit, and are optimal in that sense.

- P. Bermolen, M. Jonckheere, Jaron Sanders (2017).
. Journal Article. Journal of Statistical Physics 169(5), p. 989-1018. [Abstract]**Scaling limits and generic bounds for exploration processes**We consider exploration algorithms of the random sequential adsorption type both for homogeneous random graphs and random geometric graphs based on spatial Poisson processes. At each step, a vertex of the graph becomes active and its neighboring nodes become explored. Given an initial number of vertices N growing to infinity, we study statistical properties of the proportion of explored nodes in time using scaling limits. We obtain exact limits for homogeneous graphs and prove an explicit central limit theorem for the final proportion of active nodes, known as the jamming constant, through a diffusion approximation for the exploration process. We then focus on bounding the trajectories of such exploration processes on random geometric graphs, i.e. random sequential adsorption. As opposed to homogeneous random graphs, these do not allow for a reduction in dimensionality. Instead we build on a fundamental relationship between the number of explored nodes and the discovered volume in the spatial process, and obtain generic bounds: bounds that are independent of the dimension of space and the detailed shape of the volume associated to the discovered node. Lastly, we give two trajectorial interpretations of our bounds by constructing two coupled processes that have the same fluid limits.

- Jaron Sanders, S.C. Borst, J.S.H. van Leeuwaarden, A.J.E.M. Janssen (2016).
. Journal Article. Operations Research Letters 44(3), p. 359-365. [Abstract]**Optimality gaps in asymptotic dimensioning of many-server systems**The Quality-and-Efficiency-Driven (QED) regime provides a basis for solving asymptotic dimensioning problems that trade off revenue, costs and service quality. We derive bounds for the optimality gaps that capture the differences between the true optimum and the asymptotic optimum based on the QED approximations. Our bounds generalize earlier results for classical many-server systems. We also apply our bounds to a many-server system with threshold control.

- Jaron Sanders, S.C. Borst, J.S.H. van Leeuwaarden (2015).
. Journal Article. IEEE Transactions on Automatic Control 61(7), p. 1838-1853. [Abstract]**Online network optimization using product-form Markov processes**We develop a gradient algorithm for optimizing the performance of product-form networks through online adjustment of control parameters. The use of standard algorithms for finding optimal parameter settings is hampered by the prohibitive computational burden of calculating the gradient in terms of the stationary probabilities. The proposed approach instead relies on measuring empirical frequencies of the various states through simulation or online operation so as to obtain estimates for the gradient. Besides the reduction in computational effort, a further benefit of the online operation lies in the natural adaptation to slow variations in ambient parameters as commonly occurring in dynamic environments. On the downside, the measurements result in inherently noisy and biased estimates. We exploit mixing time results in order to overcome the impact of the bias and establish sufficient conditions for convergence to a globally optimal solution.

We discuss our algorithm in the context of different systems, including queueing networks, loss networks, and wireless networks. We also illustrate how the algorithm can be used in such systems to optimize a service/cost trade-off, to map parameter regions that lead to systems meeting specified constraints, and to achieve target performance measures. For the latter application, we first identify which performance measures can be controlled depending on the set of configurable parameters. We then characterize the achievable region of performance measures in product-form networks, and finally we describe how our algorithm can be used to achieve the target performance in an online, distributed fashion, depending on the application context.

- Jaron Sanders, Matthieu Jonckheere, Servaas Kokkelmans (2015).
. Journal Article. Physical Review Letters 115(4):043002. [Abstract]**Sub-Poissonian statistics of jamming limits in ultracold Rydberg gases**Several recent experiments have established by measuring the Mandel Q parameter that the number of Rydberg excitations in ultracold gases exhibits sub-Poissonian statistics. This effect is attributed to the Rydberg blockade that occurs due to the strong interatomic interactions between highly-excited atoms. Because of this blockade effect, the system can end up in a state in which all particles are either excited or blocked: a jamming limit. We analyze appropriately constructed random-graph models that capture the blockade effect, and derive formulae for the mean and variance of the number of Rydberg excitations in jamming limits. This yields an explicit relationship between the Mandel Q parameter and the blockade effect, and comparison to measurement data shows strong agreement between theory and experiment.

- Jaron Sanders, Rick van Bijnen, Edgar Vredenbregt, Servaas Kokkelmans (2014).
. Journal Article. Physical Review Letters, 112(16):163001. [Abstract]**Wireless network control of interacting Rydberg atoms**We identify a relation between the dynamics of ultra-cold Rydberg gases in which atoms experience a strong dipole blockade and spontaneous emission, and a stochastic process that models certain wireless random-access networks. We then transfer insights and techniques initially developed for these wireless networks to the realm of Rydberg gases, and explain how the Rydberg gas can be driven into crystal formations using our understanding of wireless networks. Notably, we also propose a method to determine Rabi frequencies (laser intensities) such that particles in the Rydberg gas are excited with specified target excitation probabilities.

- Jaron Sanders, S.C. Borst, J.S.H. van Leeuwaarden, A.J.E.M. Janssen (2014).
. Journal Article. Stochastic Systems 7(2). [Abstract]**Optimal admission control for many-server systems with QED-driven revenues**We consider Markovian many-server systems with admission control operating in a QED regime, where the relative utilization approaches unity while the number of servers grows large, providing natural Economies-of-Scale. In order to determine the optimal admission control policy, we adopt a revenue maximization framework, and suppose that the revenue rate attains a maximum when no customers are waiting and no servers are idling. When the revenue function scales properly with the system size, we show that a nondegenerate optimization problem arises in the limit. Detailed analysis demonstrates that the revenue is maximized by nontrivial policies that bar customers from entering when the queue length exceeds a certain threshold of the order of the typical square-root level variation in the system occupancy. We identify a fundamental equation characterizing the optimal threshold, which we extensively leverage to provide broadly applicable upper/lower bounds for the optimal threshold, establish its monotonicity, and examine its asymptotic behavior, all for general revenue structures. For linear and exponential revenue structures, we present explicit expressions for the optimal threshold.

- A.J.E.M. Janssen, J.S.H. van Leeuwaarden, Jaron Sanders (2013).
. Journal Article. Performance Evaluation, 70(10), p. 750-769. [Abstract]**Scaled control in the QED regime**We develop many-server asymptotics in the QED regime for models with admission control. The admission control, designed to reduce the incoming traffic in periods of congestion, scales with the size of the system. For a class of Markovian models with this scaled control, we identify the QED limits for two stationary performance measures. We also derive corrected QED approximations, generalizing earlier results for the Erlang B, C and A models. These results are useful for the dimensioning of large systems equipped with an active control policy. In particular, the corrected approximations can be leveraged to establish the optimality gaps related to square-root staffing and asymptotic dimensioning with admission control.

## Proceedings and Conference Contributions

- Albert Senen-Cerda, Jaron Sanders (2020).
. Accepted for publication at ACM SIGMETRICS. ArXiv 2012.01978. Code repository. [Abstract]**Asymptotic convergence rate of Dropout on shallow linear neural networks**We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) – which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.

- Long Ma, Jaron Sanders (2019).
. Accepted for the 14th EAI International Conference on Performance Evaluation Methodologies and Tools (Valuetools). ArXiv 1909.04432. Code repository. [Abstract]**Markov chains for error accumulation in quantum circuits**We study a model for the accumulation of errors in multi-qubit quantum computations, as well as a model describing continuous errors accumulating in a single qubit. By modeling the error process in a quantum computation using two coupled Markov chains, we are able to capture a weak form of time-dependency between errors in the past and future. By subsequently using techniques from the field of discrete probability theory, we calculate the probability that error measures such as the fidelity and trace distance exceed a threshold analytically. The formulae cover fairly generic error distributions, cover multi-qubit scenarios, and are applicable to e.g. the randomized benchmarking protocol. To combat the numerical challenge that may occur when evaluating our expressions, we additionally provide an analytical bound on the error probabilities that is of lower numerical complexity, and we also discuss a state space reduction that occurs for stabilizer circuits. Finally, taking inspiration from the field of operations research, we illustrate how our expressions can be used to e.g. decide how many gates one can apply before too many errors accumulate with high probability, and how one can lower the rate of error accumulation in existing circuits through simulated annealing.

- Jaron Sanders, S.C. Borst, J.S.H. van Leeuwaarden (2012).
. Conference Proceeding. Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton House, Illinois, USA, October 1-5), p. 928-935. [Abstract]**Achievable performance in product-form networks**We characterize the achievable range of performance measures in product-form networks where one or more system parameters can be freely set by a network operator. Given a product-form network and a set of configurable parameters, we identify which performance measures can be controlled and which target values can be attained. We also discuss an online optimization algorithm, which allows a network operator to set the system parameters so as to achieve target performance metrics. In some cases, the algorithm can be implemented in a distributed fashion, of which we give several examples. Finally, we give conditions that guarantee convergence of the algorithm, under the assumption that the target performance metrics are within the achievable range.

- Jaron Sanders, S.C. Borst, J.S.H. van Leeuwaarden (2012).
. Conference Proceeding. Proceedings of the 6th International Conference on Performance Evaluation Methodologies and Tools (Valuetools, Cargèse, France, October 9-12), p. 21-30. [Abstract]**Online optimization of product-form networks**We develop an online gradient algorithm for optimizing the performance of product-form networks through online adjustment of control parameters. The use of standard algorithms for finding optimal parameter settings is hampered by the prohibitive computational burden of calculating the gradient in terms of the stationary probabilities. The proposed approach instead relies on measuring empirical frequencies of the various states through simulation or online operation so as to obtain estimates for the gradient. Besides the reduction in computational effort, a further benefit of the online operation lies in the natural adaptation to slow variations in ambient parameters as commonly occurring in dynamic environments. On the downside, the measurements result in inherently noisy and biased estimates. We exploit mixing time results in order to overcome the impact of the bias and establish sufficient conditions for convergence to a globally optimal solution.

### Technical Reports

- Paola Bermolen, Matthieu Jonckheere, Jaron Sanders (2015).
. Technical report. ArXiv 1504.02438. [Abstract]**Scaling limits for exploration algorithms on random graphs**We consider exploration algorithms of the random sequential adsorption type both for homogeneous random graphs and random geometric graphs based on spatial Poisson processes. At each step, a vertex of the graph becomes active and its neighboring nodes become explored. Given an initial number of vertices N growing to infinity, we study statistical properties of the proportion of explored nodes in time using scaling limits. We obtain exact limits for homogeneous graphs and prove an explicit central limit theorem for the final proportion of active nodes, known as the jamming constant, through a diffusion approximation for the exploration process. We then focus on bounding the trajectories of such exploration processes on random geometric graphs, i.e. random sequential adsorption. As opposed to homogeneous random graphs, these do not allow for a reduction in dimensionality. Instead we build on a fundamental relationship between the number of explored nodes and the discovered volume in the spatial process, and obtain generic bounds: bounds that are independent of the dimension of space and the detailed shape of the volume associated to the discovered node. Lastly, we give two trajectorial interpretations of our bounds by constructing two coupled processes that have the same fluid limits.

### Books

- Jaron Sanders (2016).
. PhD thesis. Eindhoven University of Technology. [Abstract]**Stochastic Optimization of Large-Scale Complex Systems**This thesis develops analysis techniques and optimization procedures that are broadly applicable to large-scale complex systems. The focus is on probabilistic models of interacting particle systems, stochastic networks, and service systems, wwhich are all large systems and display fascinatingly complex behavior. In practice, these systems obey simple local rules leading to Markov processes that are amenable to analyses that shed light on the interplay between the local rules and the global system behavior. Chapter 1 provides an overview of this thesis’s content, and illustrates our approaches of analysis and optimization.

Chapter 2 deals with the topic Control and optimization of large-scale stochastic networks. There, we develop optimization algorithms that are applicable to the whole class of product-form Markov processes. These algorithms can be implemented in such a way that individual components of stochastic networks make autonomous decisions that ultimately lead to globally optimal network behavior, and can for instance be used to balance a network of queues, i.e. to achieve equal average queue lengths. The algorithm does so by solving an inversion problem in an online fashion: every queue individually adapts its service rate based on online observations of its own average queue length. The individual queues do not need global network information (like the network structure), and even though all queues influence each other, the algorithm guarantees that the whole network achieves their common goal of a balanced operation.

Throughout Chapters 3, 4, and 5, we discuss the topic Ultracold Rydberg gases and quantum engineering. Rydberg gases consist of atoms that exhibit strong mutual blockade effects, and this gives rise to an interacting particle system with intriguing complex interactions. These particle systems are investigated in laboratory environments because of their application in quantum computing and condensed matter physics. Our research finds that in certain regimes, the complex interactions of these particles can be described using the stochastic processes that also model the behavior of transmitters in wireless networks. This allows us to identify interesting connections between the fields of physics and mathematics, and to transfer techniques and insights from applied probability to the realm of Rydberg gases. For example, the optimization algorithms for large scale stochastic networks (described above) can be used to actively engineer the atomic system, and by constructing special random graphs we can give theoretical descriptions of statistical properties of the Rydberg gas.

Chapters 6, 7, and 8 cover the final topic Performance analysis and revenue maximization of critically loaded service systems. There, we consider large scale Markovian many-server systems that operate in the so-called Quality-andEfficiency-Driven (QED) regime and dwarf the usual trade-off between high system utilization and short waiting times. In order to achieve these dual goals, these systems are scaled so as to approach full utilization, while the number of servers grows simultaneously large, rendering crucial Economies-of-Scale. Our research extends the applicability of the QED regime by incorporating scalable admission control schemes and general revenue functions. This also allows us to identify for a broad class of revenue functions exactly which nontrivial threshold control policies are optimal in the QED regime, yielding insight into the relation between the optimal control and revenue structure. By studying the system’s precise asymptotic behavior when nearing the QED regime, we are able to also analyze the effectiveness of the QED regime as a framework for system dimensioning.

### Other

- Jaron Sanders (2021).
. Article review. Mathematical Reviews. MR4193188.**Eigenvalue rigidity for truncations of random unitary matrices, by E. Meckes and K. Stewart**

- Jaron Sanders (2020).
. Book review. Nieuw Archief voor Wiskunde.**Waarschijnlijkheidsrekening, door K. van Harn en P.J. Holewijn**

- Jaron Sanders (2014).
. Journalistic article. STAtOR.**Atomaire gassen en draadloze netwerken**

- Jaron Sanders (2011).
. MSc thesis. Eindhoven University of Technology.**Interacting particles in wireless networks and Rydberg gasses**