2 layer neural networks as Wasserstein gradient flows
Jump to navigation
Jump to search
Artificial neural networks consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a continuous viewpoint of neural networks with infinitely many neurons in a layer for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.