Example random neural network when using Dropconnect
New publications

Almost Sure Convergence of Dropout Algorithms for Neural Networks

We have submitted Almost Sure Convergence of Dropout Algorithms for Neural Networks, and it is currently under review. This is joint work between Albert Senen-Cerda and myself. A preprint is available on arXiv.

Dropout algorithms, neural networks

Our manuscript investigates mathematically the convergence properties of a class of well-known and often used training algorithms in neural networks. The techniques used lie in the domains of stochastic approximation, and nonconvex optimization. The mathematical analysis of neural networks is highly interesting, topical, and challenging.

Abstract

We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with {0,1}-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN’s weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.

Preprint

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download

Curious for more?

Head on over to My Articles for more of my work, and check out My Research for a peek into upcoming themes. You can also find out who is on our team right here: Academic Supervision.

Jaron
Jaron Sanders received in 2012 M.Sc. degrees in Mathematics and Physics from the Eindhoven University of Technology, The Netherlands, as well as a PhD degree in Mathematics in 2016. After he obtained his PhD degree, he worked as a post-doctoral researcher at the KTH Royal Institute of Technology in Stockholm, Sweden. Jaron Sanders then worked as an assistant professor at the Delft University of Technology, and now works as an assistant professor at the Eindhoven University of Technology. His research interests are applied probability, queueing theory, stochastic optimization, stochastic networks, wireless networks, and interacting (particle) systems.
https://www.jaronsanders.nl