2 layer neural networks as Wasserstein gradient flows: Difference between revisions

From Optimal Transport Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 8: Line 8:
==Shallow Neural Networks==
==Shallow Neural Networks==
Let us introduce the mathematical framework and notation for a neural network with a single hidden layer. Let <math> D \subset \mathbb{R}^d <\math> be open . The set D represents the space of inputs into the NN.  
Let us introduce the mathematical framework and notation for a neural network with a single hidden layer. Let <math> D \subset \mathbb{R}^d <\math> be open . The set D represents the space of inputs into the NN.  
<math> \mu: X \to \mathbb{R}</math> and <math> \nu: Y \to \mathbb{R}</math>





Revision as of 03:30, 10 February 2022

[1]

Artificial neural networks (ANNs) consist of layers of artificial "neurons" which take in information from the previous layer and output information to neurons in the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a neural network layer with infinitely many neurons, for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.

Motivation

Shallow Neural Networks

Let us introduce the mathematical framework and notation for a neural network with a single hidden layer. Let Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle D \subset \mathbb{R}^d <\math> be open . The set D represents the space of inputs into the NN. <math> \mu: X \to \mathbb{R}} and


Continuous Formulation

Minimization Problem

Wasserstein Gradient Flow

Main Results

References