Sinkhorn's Algorithm: Difference between revisions

From Optimal Transport Wiki
Jump to navigation Jump to search
No edit summary
m (Removed protection from "Sinkhorn's Algorithm")
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan <math> \pi\in\Gamma(\alpha,\beta) </math> for the [[Kantorovich problem]] with [[entropic regularization]] in the case of finitely supported positive measures <math>\alpha, \beta \in \mathcal M_+(X)</math>.  
Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan <math> \pi\in\Gamma(\alpha,\beta) </math> for the [[Kantorovich Problem]] with [[Entropic Regularization|entropic regularization]] in the case of finitely supported positive measures <math>\alpha, \beta \in \mathcal M_+(X)</math>. For further reading see Peyré & Cuturi (pg. 62-73)<ref name=Peyre />


==Problem Formulation==
==Continuous Problem Formulation==
Entropic regularization modifies the Kantorovich problem by adding a [https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Kullback-Leibler] divergence term to the optimization goal. Specifically, the general form of the problem is now to determine
Entropic regularization modifies the Kantorovich problem by adding a [https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Kullback-Leibler] divergence term to the optimization goal. Specifically, the general form of the problem is now to determine


Line 8: Line 8:
where <math>\alpha\otimes\beta</math> is the product measure of <math>\alpha</math> and <math>\beta</math>, and where
where <math>\alpha\otimes\beta</math> is the product measure of <math>\alpha</math> and <math>\beta</math>, and where


:<math> \operatorname{KL}(\mu\mid\nu) = \int_X \log\left(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}(x) \right) \mathop{}\!\mathrm{d}\mu(x) + \int_X (\mathop{}\!\mathrm{d}\mu(x) - \mathop{}\!\mathrm{d}\nu(x)) </math>
:<math> \operatorname{KL}(\mu\mid\nu) = \int_X \log\left(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}(x) \right) \mathop{}\!\mathrm{d}\mu(x) + \int_X (\mathop{}\!\mathrm{d}\nu(x) - \mathop{}\!\mathrm{d}\mu(x)) </math>


whenever the Radon-Nikodym derivative <math> \tfrac{\mathrm{d}\mu}{\mathrm{d}\nu} </math> exists (i.e. when <math> \mu </math> is absolutely continuous w.r.t. <math> \nu </math>) and <math>+\infty</math> otherwise. This form of the KL divergence is applicable even when <math>\mu,\nu\in\mathcal M_+(X)</math> differ in total mass and it reduces to the standard definition whenever <math>\mu</math> and <math>\nu</math> have equal total mass. From this definition it immediately follows that for <math> \epsilon >0 </math> an optimal coupling <math> \pi^* </math> must be absolutely continuous w.r.t <math> \alpha\otimes\beta </math>. As a result, the optimal plan is in some sense less singular and hence "smoothed out."  
whenever the Radon-Nikodym derivative <math> \tfrac{\mathrm{d}\mu}{\mathrm{d}\nu} </math> exists (i.e. when <math> \mu </math> is absolutely continuous w.r.t. <math> \nu </math>) and <math>+\infty</math> otherwise. This form of the KL divergence is applicable even when <math>\mu,\nu\in\mathcal M_+(X)</math> differ in total mass and it reduces to the standard definition whenever <math>\mu</math> and <math>\nu</math> have equal total mass. From this definition it immediately follows that for <math> \epsilon >0 </math> an optimal coupling <math> \pi^* </math> must be absolutely continuous w.r.t <math> \alpha\otimes\beta </math>. As a result, the optimal plan is in some sense less singular and hence "smoothed out."  


To apply Sinkhorn's algorithm to approximate <math> L^\epsilon_c(\alpha,\beta)</math>, it will be necessary to assume finite support so let <math> \alpha = \textstyle\sum_{i=1}^n a_i \delta_{x_i} </math> and <math> \beta = \textstyle\sum_{j=1}^m b_i \delta_{y_j} </math> and denote the corresponding vector of weights by <math> \mathbf{a}\in\mathbb R_+^n </math> and <math> \mathbf{b}\in\mathbb R_+^m </math>. Additionally let <math>C_{ij} = c(x_i, y_j) </math> and denote the discrete version <math> \Gamma(\alpha,\beta) </math> by <math> U(a,b)=\{ P\in\mathbb R^{n\times m} \mid \textstyle\sum_j P_{ij}=a_i, \textstyle\sum_i P_{ij}=b_j \} </math>. This let's us write the entropic Kantorovich problem as
==Discrete Problem Formulation==


:<math> L^\epsilon_c(\mathbf{a},\mathbf{b}) = \inf_{P\in U(\mathbf{a},\mathbf{b})} \sum_{i,j} C_{ij} P_{ij} + \epsilon \operatorname{KL}(P \mid \mathbf{a}\mathbf{b}^T)</math>
To apply Sinkhorn's algorithm to approximate <math> L^\epsilon_c(\alpha,\beta)</math>, it will be necessary to assume finite support so let <math> \alpha = \textstyle\sum_{i=1}^n a_i \delta_{x_i} </math> and <math> \beta = \textstyle\sum_{j=1}^m b_i \delta_{y_j} </math> and denote the corresponding vector of weights by <math> \mathbf{a}\in\mathbb R_+^n </math> and <math> \mathbf{b}\in\mathbb R_+^m </math>. Additionally let <math>C_{ij} = c(x_i, y_j) </math> and denote the discrete version of <math> \Gamma(\alpha,\beta) </math> by <math> U(a,b)=\{ P\in\mathbb R^{n\times m} \mid \textstyle\sum_j P_{ij}=a_i, \textstyle\sum_i P_{ij}=b_j \} </math>. This lets us write the entropic Kantorovich problem as


==Intuition==
:<math> L^\epsilon_c(\mathbf{a},\mathbf{b}) = \inf_{P\in U(\mathbf{a},\mathbf{b})} \sum_{i,j} C_{ij} P_{ij} + \epsilon \operatorname{KL}(P\mid \mathbf{a}\mathbf{b}^T) </math>


where
:<math> \operatorname{KL}(P\mid \mathbf{a}\mathbf{b}^T) = \sum_{i,j} P_{ij} \log\left(\frac{P_{ij}}{a_i b_j}\right) + a_i b_j - P_{i,j} </math>
See here for more on [[Discrete Optimal Transport|discrete optimal transport]].
==Characterizing the Solution==
The solution to the discrete problem formulation is unique and has a special form.
:'''Theorem (Peyré & Cuturi, Proposition 4.3 on pg. 63)<ref name=Peyre />'''
:''The solution <math>P\in\mathbb R^{n\times m}</math> to discrete regularized Kantorovich problem is unique and has the form <math> P_{ij} = u_iK_{ij}v_j </math> for some <math>\mathbf u\in\mathbb R^n, \mathbf v\in R^m</math> where <math>K_{ij}=e^{-C_{ij}/\epsilon}</math>. Moreover, <math>\mathbf u</math> and <math>\mathbf v</math> are unique up to multiplication and division by some scaling factor.''


==Sinkhorn's Algorithm==
==Sinkhorn's Algorithm==
Sinkhorn's algorithm takes advantage of the aforementioned characterization result to iteratively approximate the scaling factors <math>\mathbf u</math> and <math>\mathbf v</math>. The procedure is simple and only involves matrix-vector multiplication and entrywise division as follows
:<math> u^{(\ell +1)} = \frac{\textbf a}{K\textbf v^{(\ell)}} \quad\text{and}\quad v^{(\ell +1)} = \frac{\textbf a}{K^T\textbf u^{(\ell)}} </math>
Once a sufficient number of iterations <math>L</math> have been taken, we let <math>\hat P_{ij}=u^{(L)}_i K_{ij} v^{(L)}_j</math> be our approximation of the optimal plan.
==References==
<references>
<ref name="Peyre">[https://arxiv.org/abs/1803.00567, Peyré, Gabriel & Cuturi, Marco. "Computational Optimal Transport" (2018)]</ref>
</references>

Latest revision as of 04:38, 28 February 2022

Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan for the Kantorovich Problem with entropic regularization in the case of finitely supported positive measures . For further reading see Peyré & Cuturi (pg. 62-73)[1]

Continuous Problem Formulation

Entropic regularization modifies the Kantorovich problem by adding a Kullback-Leibler divergence term to the optimization goal. Specifically, the general form of the problem is now to determine

where is the product measure of and , and where

whenever the Radon-Nikodym derivative exists (i.e. when is absolutely continuous w.r.t. ) and otherwise. This form of the KL divergence is applicable even when differ in total mass and it reduces to the standard definition whenever and have equal total mass. From this definition it immediately follows that for an optimal coupling must be absolutely continuous w.r.t . As a result, the optimal plan is in some sense less singular and hence "smoothed out."

Discrete Problem Formulation

To apply Sinkhorn's algorithm to approximate , it will be necessary to assume finite support so let and and denote the corresponding vector of weights by and . Additionally let and denote the discrete version of by . This lets us write the entropic Kantorovich problem as

where

See here for more on discrete optimal transport.

Characterizing the Solution

The solution to the discrete problem formulation is unique and has a special form.

Theorem (Peyré & Cuturi, Proposition 4.3 on pg. 63)[1]
The solution to discrete regularized Kantorovich problem is unique and has the form for some where . Moreover, and are unique up to multiplication and division by some scaling factor.

Sinkhorn's Algorithm

Sinkhorn's algorithm takes advantage of the aforementioned characterization result to iteratively approximate the scaling factors and . The procedure is simple and only involves matrix-vector multiplication and entrywise division as follows

Once a sufficient number of iterations have been taken, we let be our approximation of the optimal plan.

References