Sinkhorn's Algorithm is an iterative numerical method used to obtain an optimal transport plan
for the Kantorovich Problem with entropic regularization in the case of finitely supported positive measures
. For further reading see Peyré & Cuturi (pg. 62-73)[1]
Continuous Problem Formulation
Entropic regularization modifies the Kantorovich problem by adding a Kullback-Leibler divergence term to the optimization goal. Specifically, the general form of the problem is now to determine
data:image/s3,"s3://crabby-images/b67ce/b67cebc4a695dd7dbf6232bc49b72371db07741d" alt="{\displaystyle L_{c}^{\epsilon }(\alpha ,\beta )=\inf _{\pi \in \Gamma (\alpha ,\beta )}\underbrace {\int _{X\times Y}c(x,y)\mathop {} \!\mathrm {d} \pi (x,y)} _{{\text{Kantorovich functional }}\mathbb {K} (\pi )}+\epsilon \underbrace {\operatorname {KL} (\pi \mid \alpha \otimes \beta )} _{\text{entropic term}}}"
where
is the product measure of
and
, and where
data:image/s3,"s3://crabby-images/37944/379449788f32fd4d7f1062b77e4e35291533c8d0" alt="{\displaystyle \operatorname {KL} (\mu \mid \nu )=\int _{X}\log \left({\frac {\mathrm {d} \mu }{\mathrm {d} \nu }}(x)\right)\mathop {} \!\mathrm {d} \mu (x)+\int _{X}(\mathop {} \!\mathrm {d} \nu (x)-\mathop {} \!\mathrm {d} \mu (x))}"
whenever the Radon-Nikodym derivative
exists (i.e. when
is absolutely continuous w.r.t.
) and
otherwise. This form of the KL divergence is applicable even when
differ in total mass and it reduces to the standard definition whenever
and
have equal total mass. From this definition it immediately follows that for
an optimal coupling
must be absolutely continuous w.r.t
. As a result, the optimal plan is in some sense less singular and hence "smoothed out."
Discrete Problem Formulation
To apply Sinkhorn's algorithm to approximate
, it will be necessary to assume finite support so let
and
and denote the corresponding vector of weights by
and
. Additionally let
and denote the discrete version of
by
. This lets us write the entropic Kantorovich problem as
data:image/s3,"s3://crabby-images/a7177/a717715403f4956fae652c704858bc1578780d9d" alt="{\displaystyle L_{c}^{\epsilon }(\mathbf {a} ,\mathbf {b} )=\inf _{P\in U(\mathbf {a} ,\mathbf {b} )}\sum _{i,j}C_{ij}P_{ij}+\epsilon \operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})}"
where
data:image/s3,"s3://crabby-images/769a3/769a3a3224d6b49fa91e901071aa4baa5b06ae88" alt="{\displaystyle \operatorname {KL} (P\mid \mathbf {a} \mathbf {b} ^{T})=\sum _{i,j}P_{ij}\log \left({\frac {P_{ij}}{a_{i}b_{j}}}\right)+a_{i}b_{j}-P_{i,j}}"
See here for more on discrete optimal transport.
Characterizing the Solution
The solution to the discrete problem formulation is unique and has a special form.
- Theorem (Peyré & Cuturi, Proposition 4.3 on pg. 63)[1]
- The solution
to discrete regularized Kantorovich problem is unique and has the form
for some
where
. Moreover,
and
are unique up to multiplication and division by some scaling factor.
Sinkhorn's Algorithm
Sinkhorn's algorithm takes advantage of the aforementioned characterization result to iteratively approximate the scaling factors
and
. The procedure is simple and only involves matrix-vector multiplication and entrywise division as follows
data:image/s3,"s3://crabby-images/89896/89896997f29a010549929d6536c9ed89019f4a63" alt="{\displaystyle u^{(\ell +1)}={\frac {\textbf {a}}{K{\textbf {v}}^{(\ell )}}}\quad {\text{and}}\quad v^{(\ell +1)}={\frac {\textbf {a}}{K^{T}{\textbf {u}}^{(\ell )}}}}"
Once a sufficient number of iterations
have been taken, we let
be our approximation of the optimal plan.
References