Sinkhorn's Algorithm: Difference between revisions

Revision as of 10:26, 9 May 2020

Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan $\pi \in \Gamma (\alpha ,\beta )$ for the Kantorovich problem with entropic regularization in the case of finitely supported positive measures $\alpha ,\beta \in {\mathcal {M}}_{+}(X)$ .

Problem Formulation

Entropic regularization modifies the Kantorovich problem by adding a Kullback-Leibler divergence term to the optimization goal. Specifically, the general form of the problem is now to determine

L_{c}^{\epsilon }(\alpha ,\beta )=\inf _{\pi \in \Gamma (\alpha ,\beta )}\underbrace {\int _{X\times Y}c(x,y)\mathop {} \!\mathrm {d} \pi (x,y)} _{{\text{Kantorovich functional }}\mathbb {K} (\pi )}+\epsilon \underbrace {\operatorname {KL} (\pi \mid \alpha \otimes \beta )} _{\text{entropic term}}

where $\alpha \otimes \beta$ is the product measure of $\alpha$ and $\beta$ , and where

\operatorname {KL} (\mu \mid \nu )=\int _{X}\log \left({\frac {\mathrm {d} \mu }{\mathrm {d} \nu }}(x)\right)\mathop {} \!\mathrm {d} \mu (x)+\int _{X}(\mathop {} \!\mathrm {d} \mu (x)-\mathop {} \!\mathrm {d} \nu (x))

whenever the Radon-Nikodym derivative ${\tfrac {\mathrm {d} \mu }{\mathrm {d} \nu }}$ exists (i.e. when $\mu$ is absolutely continuous w.r.t. $\nu$ ) and $+\infty$ otherwise. This form of the KL divergence is applicable even when $\mu ,\nu \in {\mathcal {M}}_{+}(X)$ differ in total mass and it reduces to the standard definition whenever $\mu$ and $\nu$ have equal total mass. From this definition it immediately follows that for $\epsilon >0$ an optimal coupling $\pi ^{*}$ must be absolutely continuous w.r.t $\alpha \otimes \beta$ . As a result, the optimal plan is in some sense less singular and hence "smoothed out."

To apply Sinkhorn's algorithm to approximate $L_{c}^{\epsilon }(\alpha ,\beta )$ , it will be necessary to assume finite support so let $\alpha =\textstyle \sum _{i=1}^{n}a_{i}\delta _{x_{i}}$ and $\beta =\textstyle \sum _{j=1}^{m}b_{i}\delta _{y_{j}}$ and denote the corresponding vector of weights by $\mathbf {a} \in \mathbb {R} _{+}^{n}$ and $\mathbf {b} \in \mathbb {R} _{+}^{m}$ . Additionally let $C_{ij}=c(x_{i},y_{j})$ and denote the discrete version $\Gamma (\alpha ,\beta )$ by $U(a,b)=\{P\in \mathbb {R} ^{n\times m}\mid \textstyle \sum _{j}P_{ij}=a_{i},\textstyle \sum _{i}P_{ij}=b_{j}\}$ . This let's us write the entropic Kantorovich problem as

L_{c}^{\epsilon }(\mathbf {a} ,\mathbf {b} )=\inf _{P\in U(\mathbf {a} ,\mathbf {b} )}\sum _{i,j}C_{ij}P_{ij}+\epsilon \operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})

@@ Line 1: / Line 1: @@
-Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan <math> \pi\in\gamma(\mu,\nu) </math> for the [[Kantorovich problem]] with [[entropic regularization]] in the case of finitely supported measures <math>\mu</math> and <math>\nu</math>.
+Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan <math> \pi\in\Gamma(\alpha,\beta) </math> for the [[Kantorovich problem]] with [[entropic regularization]] in the case of finitely supported positive measures <math>\alpha, \beta \in \mathcal M_+(X)</math>.
 ==Problem Formulation==
+Entropic regularization modifies the Kantorovich problem by adding a [https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Kullback-Leibler] divergence term to the optimization goal. Specifically, the general form of the problem is now to determine
+:<math> L^\epsilon_c(\alpha,\beta)=\inf_{\pi\in\Gamma(\alpha,\beta)} \underbrace{\int_{X\times Y} c(x,y) \mathop{}\!\mathrm{d}\pi(x,y)}_{\text{Kantorovich functional } \mathbb K(\pi)} + \epsilon \underbrace{\operatorname{KL}(\pi\mid \alpha\otimes\beta)}_{\text{entropic term}}  </math>
+where <math>\alpha\otimes\beta</math> is the product measure of <math>\alpha</math> and <math>\beta</math>, and where
+:<math> \operatorname{KL}(\mu\mid\nu) = \int_X \log\left(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}(x) \right) \mathop{}\!\mathrm{d}\mu(x) + \int_X (\mathop{}\!\mathrm{d}\mu(x) - \mathop{}\!\mathrm{d}\nu(x)) </math>
+whenever the Radon-Nikodym derivative <math> \tfrac{\mathrm{d}\mu}{\mathrm{d}\nu} </math> exists (i.e. when <math> \mu </math> is absolutely continuous w.r.t. <math> \nu </math>) and <math>+\infty</math> otherwise. This form of the KL divergence is applicable even when <math>\mu,\nu\in\mathcal M_+(X)</math> differ in total mass and it reduces to the standard definition whenever <math>\mu</math> and <math>\nu</math> have equal total mass. From this definition it immediately follows that for <math> \epsilon >0 </math> an optimal coupling <math> \pi^* </math> must be absolutely continuous w.r.t <math> \alpha\otimes\beta </math>. As a result, the optimal plan is in some sense less singular and hence "smoothed out."
+To apply Sinkhorn's algorithm to approximate <math> L^\epsilon_c(\alpha,\beta)</math>, it will be necessary to assume finite support so let <math> \alpha = \textstyle\sum_{i=1}^n a_i \delta_{x_i} </math> and <math> \beta = \textstyle\sum_{j=1}^m b_i \delta_{y_j} </math> and denote the corresponding vector of weights by <math> \mathbf{a}\in\mathbb R_+^n </math> and <math> \mathbf{b}\in\mathbb R_+^m </math>. Additionally let <math>C_{ij} = c(x_i, y_j) </math> and denote the discrete version <math> \Gamma(\alpha,\beta) </math> by <math> U(a,b)=\{ P\in\mathbb R^{n\times m} \mid \textstyle\sum_j P_{ij}=a_i, \textstyle\sum_i P_{ij}=b_j \} </math>. This let's us write the entropic Kantorovich problem as
+:<math> L^\epsilon_c(\mathbf{a},\mathbf{b}) = \inf_{P\in U(\mathbf{a},\mathbf{b})} \sum_{i,j} C_{ij} P_{ij} + \epsilon \operatorname{KL}(P \mid \mathbf{a}\mathbf{b}^T)</math>
 ==Intuition==

Sinkhorn's Algorithm: Difference between revisions

Revision as of 10:26, 9 May 2020

Problem Formulation

Intuition

Sinkhorn's Algorithm

Navigation menu

Search