2 layer neural networks as Wasserstein gradient flows: Difference between revisions

From Optimal Transport Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
<ref name="Figalli" />  
<ref name="Figalli" />  


Artificial neural networks consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a continuous viewpoint of neural networks with infinitely many neurons in a layer for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.  
[https://en.wikipedia.org/wiki/Artificial_neural_network#:~:text=Artificial%20neural%20networks%20(ANNs)%2C,neurons%20in%20a%20biological%20brain.] Artificial neural networks (ANNs) consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a neural network layer with infinitely many neurons, for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.  


==Motivation==
==Motivation==
Line 7: Line 7:


==Shallow Neural Networks==
==Shallow Neural Networks==


===Continuous Formulation===
===Continuous Formulation===

Revision as of 03:15, 10 February 2022

[1]

[1] Artificial neural networks (ANNs) consist of layers of artificial "neurons" which take in information from the previous layer and output information to the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a neural network layer with infinitely many neurons, for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.

Motivation

Shallow Neural Networks

Continuous Formulation

Minimization Problem

Wasserstein Gradient Flow

Main Results

References