Sinkhorn's Algorithm: Difference between revisions

Revision as of 11:16, 9 May 2020

Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan $\pi \in \Gamma (\alpha ,\beta )$ for the Kantorovich problem with entropic regularization in the case of finitely supported positive measures $\alpha ,\beta \in {\mathcal {M}}_{+}(X)$ .

Continuous Problem Formulation

Entropic regularization modifies the Kantorovich problem by adding a Kullback-Leibler divergence term to the optimization goal. Specifically, the general form of the problem is now to determine

L_{c}^{\epsilon }(\alpha ,\beta )=\inf _{\pi \in \Gamma (\alpha ,\beta )}\underbrace {\int _{X\times Y}c(x,y)\mathop {} \!\mathrm {d} \pi (x,y)} _{{\text{Kantorovich functional }}\mathbb {K} (\pi )}+\epsilon \underbrace {\operatorname {KL} (\pi \mid \alpha \otimes \beta )} _{\text{entropic term}}

where $\alpha \otimes \beta$ is the product measure of $\alpha$ and $\beta$ , and where

\operatorname {KL} (\mu \mid \nu )=\int _{X}\log \left({\frac {\mathrm {d} \mu }{\mathrm {d} \nu }}(x)\right)\mathop {} \!\mathrm {d} \mu (x)+\int _{X}(\mathop {} \!\mathrm {d} \nu (x)-\mathop {} \!\mathrm {d} \mu (x))

whenever the Radon-Nikodym derivative ${\tfrac {\mathrm {d} \mu }{\mathrm {d} \nu }}$ exists (i.e. when $\mu$ is absolutely continuous w.r.t. $\nu$ ) and $+\infty$ otherwise. This form of the KL divergence is applicable even when $\mu ,\nu \in {\mathcal {M}}_{+}(X)$ differ in total mass and it reduces to the standard definition whenever $\mu$ and $\nu$ have equal total mass. From this definition it immediately follows that for $\epsilon >0$ an optimal coupling $\pi ^{*}$ must be absolutely continuous w.r.t $\alpha \otimes \beta$ . As a result, the optimal plan is in some sense less singular and hence "smoothed out."

Discrete Problem Formulation

To apply Sinkhorn's algorithm to approximate $L_{c}^{\epsilon }(\alpha ,\beta )$ , it will be necessary to assume finite support so let $\alpha =\textstyle \sum _{i=1}^{n}a_{i}\delta _{x_{i}}$ and $\beta =\textstyle \sum _{j=1}^{m}b_{i}\delta _{y_{j}}$ and denote the corresponding vector of weights by $\mathbf {a} \in \mathbb {R} _{+}^{n}$ and $\mathbf {b} \in \mathbb {R} _{+}^{m}$ . Additionally let $C_{ij}=c(x_{i},y_{j})$ and denote the discrete version of $\Gamma (\alpha ,\beta )$ by $U(a,b)=\{P\in \mathbb {R} ^{n\times m}\mid \textstyle \sum _{j}P_{ij}=a_{i},\textstyle \sum _{i}P_{ij}=b_{j}\}$ . This let's us write the entropic Kantorovich problem as

L_{c}^{\epsilon }(\mathbf {a} ,\mathbf {b} )=\inf _{P\in U(\mathbf {a} ,\mathbf {b} )}\sum _{i,j}C_{ij}P_{ij}+\epsilon \operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})

where

\operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})=\sum _{i,j}P_{ij}\log \left({\frac {P_{ij}}{a_{i}b_{j}}}\right)+a_{i}b_{j}-P_{i,j}

Characterizing the Solution

The solution to the discrete problem formulation is unique and has a special form.

Theorem

The solution $P\in \mathbb {R} ^{n\times m}$ to discrete regularized Kantorovich problem is unique and has the form $P_{ij}=u_{i}K_{ij}v_{j}$ for some $\mathbf {u} \in \mathbb {R} ^{n},\mathbf {v} \in R^{m}$ where $K_{ij}=e^{-C_{ij}/\epsilon }$ .

@@ Line 1: / Line 1: @@
 Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan <math> \pi\in\Gamma(\alpha,\beta) </math> for the [[Kantorovich problem]] with [[entropic regularization]] in the case of finitely supported positive measures <math>\alpha, \beta \in \mathcal M_+(X)</math>.
-==Problem Formulation==
+==Continuous Problem Formulation==
 Entropic regularization modifies the Kantorovich problem by adding a [https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Kullback-Leibler] divergence term to the optimization goal. Specifically, the general form of the problem is now to determine
@@ Line 8: / Line 8: @@
 where <math>\alpha\otimes\beta</math> is the product measure of <math>\alpha</math> and <math>\beta</math>, and where
-:<math> \operatorname{KL}(\mu\mid\nu) = \int_X \log\left(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}(x) \right) \mathop{}\!\mathrm{d}\mu(x) + \int_X (\mathop{}\!\mathrm{d}\mu(x) - \mathop{}\!\mathrm{d}\nu(x)) </math>
+:<math> \operatorname{KL}(\mu\mid\nu) = \int_X \log\left(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}(x) \right) \mathop{}\!\mathrm{d}\mu(x) + \int_X (\mathop{}\!\mathrm{d}\nu(x) - \mathop{}\!\mathrm{d}\mu(x)) </math>
 whenever the Radon-Nikodym derivative <math> \tfrac{\mathrm{d}\mu}{\mathrm{d}\nu} </math> exists (i.e. when <math> \mu </math> is absolutely continuous w.r.t. <math> \nu </math>) and <math>+\infty</math> otherwise. This form of the KL divergence is applicable even when <math>\mu,\nu\in\mathcal M_+(X)</math> differ in total mass and it reduces to the standard definition whenever <math>\mu</math> and <math>\nu</math> have equal total mass. From this definition it immediately follows that for <math> \epsilon >0 </math> an optimal coupling <math> \pi^* </math> must be absolutely continuous w.r.t <math> \alpha\otimes\beta </math>. As a result, the optimal plan is in some sense less singular and hence "smoothed out."
-To apply Sinkhorn's algorithm to approximate <math> L^\epsilon_c(\alpha,\beta)</math>, it will be necessary to assume finite support so let <math> \alpha = \textstyle\sum_{i=1}^n a_i \delta_{x_i} </math> and <math> \beta = \textstyle\sum_{j=1}^m b_i \delta_{y_j} </math> and denote the corresponding vector of weights by <math> \mathbf{a}\in\mathbb R_+^n </math> and <math> \mathbf{b}\in\mathbb R_+^m </math>. Additionally let <math>C_{ij} = c(x_i, y_j) </math> and denote the discrete version <math> \Gamma(\alpha,\beta) </math> by <math> U(a,b)=\{ P\in\mathbb R^{n\times m} \mid \textstyle\sum_j P_{ij}=a_i, \textstyle\sum_i P_{ij}=b_j \} </math>. This let's us write the entropic Kantorovich problem as
+==Discrete Problem Formulation==
-:<math> L^\epsilon_c(\mathbf{a},\mathbf{b}) = \inf_{P\in U(\mathbf{a},\mathbf{b})} \sum_{i,j} C_{ij} P_{ij} + \epsilon \operatorname{KL}(P \mid \mathbf{a}\mathbf{b}^T)</math>
+To apply Sinkhorn's algorithm to approximate <math> L^\epsilon_c(\alpha,\beta)</math>, it will be necessary to assume finite support so let <math> \alpha = \textstyle\sum_{i=1}^n a_i \delta_{x_i} </math> and <math> \beta = \textstyle\sum_{j=1}^m b_i \delta_{y_j} </math> and denote the corresponding vector of weights by <math> \mathbf{a}\in\mathbb R_+^n </math> and <math> \mathbf{b}\in\mathbb R_+^m </math>. Additionally let <math>C_{ij} = c(x_i, y_j) </math> and denote the discrete version of <math> \Gamma(\alpha,\beta) </math> by <math> U(a,b)=\{ P\in\mathbb R^{n\times m} \mid \textstyle\sum_j P_{ij}=a_i, \textstyle\sum_i P_{ij}=b_j \} </math>. This let's us write the entropic Kantorovich problem as
-==Intuition==
+:<math> L^\epsilon_c(\mathbf{a},\mathbf{b}) = \inf_{P\in U(\mathbf{a},\mathbf{b})} \sum_{i,j} C_{ij} P_{ij} + \epsilon \operatorname{KL}(P\mid \mathbf{a}\mathbf{b}^T) </math>
+where
+:<math> \operatorname{KL}(P\mid \mathbf{a}\mathbf{b}^T) = \sum_{i,j} P_{ij} \log\left(\frac{P_{ij}}{a_i b_j}\right) + a_i b_j - P_{i,j} </math>
+==Characterizing the Solution==
+The solution to the discrete problem formulation is unique and has a special form.
+:'''Theorem'''
+:''The solution <math>P\in\mathbb R^{n\times m}</math> to discrete regularized Kantorovich problem is unique and has the form <math> P_{ij} = u_iK_{ij}v_j </math> for some <math>\mathbf u\in\mathbb R^n, \mathbf v\in R^m</math> where <math>K_{ij}=e^{-C_{ij}/\epsilon}</math>.''
 ==Sinkhorn's Algorithm==

Sinkhorn's Algorithm: Difference between revisions

Revision as of 11:16, 9 May 2020

Contents

Continuous Problem Formulation

Discrete Problem Formulation

Characterizing the Solution