2 layer neural networks as Wasserstein gradient flows: Difference between revisions

Revision as of 03:41, 10 February 2022

Artificial neural networks (ANNs) consist of layers of artificial "neurons" which take in information from the previous layer and output information to neurons in the next layer. Gradient descent is a common method for updating the weights of each neuron based on training data. While in practice every layer of a neural network has only finitely many neurons, it is beneficial to consider a neural network layer with infinitely many neurons, for the sake of developing a theory that explains how ANNs work. In particular, from this viewpoint the process of updating the neuron weights for a shallow neural network can be described by a Wasserstein gradient flow.

Motivation

Shallow Neural Networks

Let us introduce the mathematical framework and notation for a neural network with a single hidden layer. Let $D\subset \mathbb {R} ^{d}$ be open . The set $D$ represents the space of inputs into the network. Let $N\in \mathbb {N}$ be the number of neurons in the hidden layer. For each $i\in \{1,\dots ,N\}$ let

h_{i}:D\times \mathbb {R} \times \mathbb {R} \rightarrow \mathbb {R} ^{k}

be given by $h_{i}(x,\omega ,\theta )=$

Continuous Formulation

Minimization Problem

Wasserstein Gradient Flow

Main Results

References

↑ Xavier Fernandez-Real and Alessio Figalli, The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows

[Figalli-1] Xavier Fernandez-Real and Alessio Figalli, The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows

[1]

@@ Line 7: / Line 7: @@
 ==Shallow Neural Networks==
-Let us introduce the mathematical framework and notation for a neural network with a single hidden layer. Let <math> D \subset \mathbb{R}^d </math> be open . The set <math> D </math> represents the space of inputs into the network. Let <math> N  \in \mathbb{N} </math> be the number of neurons in the hidden layer.
+Let us introduce the mathematical framework and notation for a neural network with a single hidden layer. Let <math> D \subset \mathbb{R}^d </math> be open . The set <math> D </math> represents the space of inputs into the network. Let <math> N  \in \mathbb{N} </math> be the number of neurons in the hidden layer. For each <math> i \in \{1,\dots,N\} </math> let
+: <math> h_i : D \times \mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}^k </math>
+be given by <math> h_i(x, \omega,\theta) = </math>

2 layer neural networks as Wasserstein gradient flows: Difference between revisions

Revision as of 03:41, 10 February 2022

Contents

Motivation

Shallow Neural Networks

Continuous Formulation

Minimization Problem

Wasserstein Gradient Flow

Main Results

References

Navigation menu

2 layer neural networks as Wasserstein gradient flows: Difference between revisions

Revision as of 03:41, 10 February 2022

Motivation

Shallow Neural Networks

Continuous Formulation

Minimization Problem

Wasserstein Gradient Flow

Main Results

References

Navigation menu

Search