Entropic Regularization: Difference between revisions
(Created page with "== Entropic Regularization == Due the convex structure of the Kantorovich problem, it is natural to turn to methods that, upon discretizing, exploit the fact that we aim to so...") |
No edit summary |
||
Line 14: | Line 14: | ||
<math> \min \left( \sum_{i,j}(c_{ij}\alpha_{ij} + \varepsilon \alpha_{ij}\log(\alpha_{ij})) : \alpha_{ij} \geq 0, \sum_{i}\alpha_{ij} = b_{j}, \sum_{j}\alpha_{ij} = a_{i} \right) </math> | <math> \min \left( \sum_{i,j}(c_{ij}\alpha_{ij} + \varepsilon \alpha_{ij}\log(\alpha_{ij})) : \alpha_{ij} \geq 0, \sum_{i}\alpha_{ij} = b_{j}, \sum_{j}\alpha_{ij} = a_{i} \right) </math> | ||
which converges as <math> \varepsilon \rightarrow 0.</math> | which converges as <math> \varepsilon \rightarrow 0.</math> Rewriting the objective function at hand by using | ||
<math> c_{ij}\alpha_{ij} + \varepsilon \alpha_{ij}\log(\alpha_{ij}) = \varepsilon \alpha_{ij}\log(\frac{\alpha_{ij}}{\eta_{ij}}),\ \ \ \sum_{i,j}(c_{ij}\alpha_{ij} + \varepsilon \alpha_{ij}\log(\alpha_{ij})) = \varepsilon \textbf{KL}(\alpha| \eta), </math> | <math> c_{ij}\alpha_{ij} + \varepsilon \alpha_{ij}\log(\alpha_{ij}) = \varepsilon \alpha_{ij}\log(\frac{\alpha_{ij}}{\eta_{ij}}),\ \ \ \sum_{i,j}(c_{ij}\alpha_{ij} + \varepsilon \alpha_{ij}\log(\alpha_{ij})) = \varepsilon \textbf{KL}(\alpha| \eta), </math> |
Revision as of 07:04, 9 May 2020
Entropic Regularization
Due the convex structure of the Kantorovich problem, it is natural to turn to methods that, upon discretizing, exploit the fact that we aim to solve a linear programming problem. Upon replacing our measures with and for some appropriately determined and fixed The resulting minimization problem becomes:
which we may approach using classical methods that exploit the structure of the resulting linear system.
The Entropic Regularization
In studying a slight variant to this problem, for fixed we may now instead focus on tackling
which converges as Rewriting the objective function at hand by using
where and \textbf{KL} denotes the divergence (based on the relative entropy).
Thus, from here, this minimization problem reads as a projection from this divergence, of the point on the set of constraints
In particular, this Kullback-Leibler divergence has the property that the projection onto the intersection of two linear subspaces such as can be achieved by alternate projections. This is done by defining and with and looking at the limit as we take .
As such, the problem now becomes of projecting onto one of the linear subspaces, which can be done. If we are given a , we wish to solve
which we can solve an approximated problem of with the following algorithm.
Algorithm
- Start from
- define with
- then define with
The limit we observe as provides us with the solution to the regularized problem. This idea is called the iterative proportional fitting procedure (). In utilizing this entropic regularization, this algorithm lends itself to the following advantages:
- we may compute the projections explicitly at every step the moment we need only project to or
- it is quite memory-efficient since the cumulating product of the coefficients appear at each step, thus allowing us to only store as opposed to
- it can be easily parallelized given the separability of the structure at any given step.
References
<references> [1] </references.