2 layer neural networks as Wasserstein gradient flows: Difference between revisions
Jump to navigation
Jump to search
Paigehillen (talk | contribs) No edit summary |
Paigehillen (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
<ref name="Figalli" /> | <ref name="Figalli" /> | ||
Artificial neural networks consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a continuous viewpoint of neural networks with infinitely many neurons in a layer for the sake of developing a theory that explains how ANNs | Artificial neural networks consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a continuous viewpoint of neural networks with infinitely many neurons in a layer for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow. | ||
==Motivation== | ==Motivation== | ||
Line 7: | Line 7: | ||
==Shallow Neural Networks== | ==Shallow Neural Networks== | ||
===Continuous Formulation=== | ===Continuous Formulation=== | ||
==Minimization Problem== | ===Minimization Problem=== | ||
==Wasserstein Gradient Flow== | ==Wasserstein Gradient Flow== |
Revision as of 01:44, 10 February 2022
Artificial neural networks consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a continuous viewpoint of neural networks with infinitely many neurons in a layer for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.