Sinkhorn's Algorithm

Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan $\pi \in \Gamma (\alpha ,\beta )$ for the Kantorovich Problem with entropic regularization in the case of finitely supported positive measures $\alpha ,\beta \in {\mathcal {M}}_{+}(X)$ . For further reading see Peyré & Cuturi (pg. 62-73)^[1]

Continuous Problem Formulation

Entropic regularization modifies the Kantorovich problem by adding a Kullback-Leibler divergence term to the optimization goal. Specifically, the general form of the problem is now to determine

L_{c}^{\epsilon }(\alpha ,\beta )=\inf _{\pi \in \Gamma (\alpha ,\beta )}\underbrace {\int _{X\times Y}c(x,y)\mathop {} \!\mathrm {d} \pi (x,y)} _{{\text{Kantorovich functional }}\mathbb {K} (\pi )}+\epsilon \underbrace {\operatorname {KL} (\pi \mid \alpha \otimes \beta )} _{\text{entropic term}}

where $\alpha \otimes \beta$ is the product measure of $\alpha$ and $\beta$ , and where

\operatorname {KL} (\mu \mid \nu )=\int _{X}\log \left({\frac {\mathrm {d} \mu }{\mathrm {d} \nu }}(x)\right)\mathop {} \!\mathrm {d} \mu (x)+\int _{X}(\mathop {} \!\mathrm {d} \nu (x)-\mathop {} \!\mathrm {d} \mu (x))

whenever the Radon-Nikodym derivative ${\tfrac {\mathrm {d} \mu }{\mathrm {d} \nu }}$ exists (i.e. when $\mu$ is absolutely continuous w.r.t. $\nu$ ) and $+\infty$ otherwise. This form of the KL divergence is applicable even when $\mu ,\nu \in {\mathcal {M}}_{+}(X)$ differ in total mass and it reduces to the standard definition whenever $\mu$ and $\nu$ have equal total mass. From this definition it immediately follows that for $\epsilon >0$ an optimal coupling $\pi ^{*}$ must be absolutely continuous w.r.t $\alpha \otimes \beta$ . As a result, the optimal plan is in some sense less singular and hence "smoothed out."

Discrete Problem Formulation

To apply Sinkhorn's algorithm to approximate $L_{c}^{\epsilon }(\alpha ,\beta )$ , it will be necessary to assume finite support so let $\alpha =\textstyle \sum _{i=1}^{n}a_{i}\delta _{x_{i}}$ and $\beta =\textstyle \sum _{j=1}^{m}b_{i}\delta _{y_{j}}$ and denote the corresponding vector of weights by $\mathbf {a} \in \mathbb {R} _{+}^{n}$ and $\mathbf {b} \in \mathbb {R} _{+}^{m}$ . Additionally let $C_{ij}=c(x_{i},y_{j})$ and denote the discrete version of $\Gamma (\alpha ,\beta )$ by $U(a,b)=\{P\in \mathbb {R} ^{n\times m}\mid \textstyle \sum _{j}P_{ij}=a_{i},\textstyle \sum _{i}P_{ij}=b_{j}\}$ . This lets us write the entropic Kantorovich problem as

L_{c}^{\epsilon }(\mathbf {a} ,\mathbf {b} )=\inf _{P\in U(\mathbf {a} ,\mathbf {b} )}\sum _{i,j}C_{ij}P_{ij}+\epsilon \operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})

where

\operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})=\sum _{i,j}P_{ij}\log \left({\frac {P_{ij}}{a_{i}b_{j}}}\right)+a_{i}b_{j}-P_{i,j}

See here for more on discrete optimal transport.

Characterizing the Solution

The solution to the discrete problem formulation is unique and has a special form.

Theorem (Peyré & Cuturi, Proposition 4.3 on pg. 63)^[1]

The solution $P\in \mathbb {R} ^{n\times m}$ to discrete regularized Kantorovich problem is unique and has the form $P_{ij}=u_{i}K_{ij}v_{j}$ for some $\mathbf {u} \in \mathbb {R} ^{n},\mathbf {v} \in R^{m}$ where $K_{ij}=e^{-C_{ij}/\epsilon }$ . Moreover, $\mathbf {u}$ and $\mathbf {v}$ are unique up to multiplication and division by some scaling factor.