Fenchel-Rockafellar and Linear Programming: Difference between revisions

From Optimal Transport Wiki
Jump to navigation Jump to search
(Finished/corrected proof sketch)
No edit summary
 
Line 1: Line 1:
The Fenchel-Rockafellar Theorem is a minimax principle especially suited to converting between [[Fenchel-Moreau and Primal/Dual Optimization Problems#Applications to Primal/Dual Optimization Problems|primal and dual]] [//en.wikipedia.org/wiki/Linear_programming linear programs].
The Fenchel-Rockafellar Theorem is a well-known result from convex analysis that establishes a minimax principle between convex functions and their convex conjugates under some regularity conditions. One fundamental application of this theorem is the characterization of the dual problem of a finite dimensional linear program.


= The Fenchel-Rockafellar Theorem =


Suppose <math>\phi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> and <math>\psi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> are convex, lower semicontinuous, proper functions which are both finite at some <math>x_0 \in X</math>, and <math>\phi</math> is continuous there.
== The Fenchel-Rockafellar Theorem ==
Then <math display="block">\inf\left\{\phi(x) + \psi(x) : x \in X\right\} = \max\left\{-\phi^*(-u) - \psi^*(u) : u \in X^*\right\},</math> where the existence of the minimizer on the right hand side is part of the theorem.


== Proof Sketch ==
Let <math>\phi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> and <math>\psi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> be convex, lower semicontinuous and proper functions. Suppose that there exists <math>x_0 \in X</math> such that <math> \phi(x_0),\varphi(x_0) < \infty</math>, where <math> \phi </math> is continuous at <math> x_0 </math>. Then, it holds that


Details may be found in Brezis' text<ref name="Brezis" />, but a sketch of the proof follows.
<math display="block"> \underset{x \in X}{\inf}\left\{ \phi(x) + \psi(x) \right\} = \underset{u \in X^*} {\max}\left\{-\phi^*(-u) - \psi^*(u) \right\}.</math>


By Young's Inequality, for any <math>x \in X</math> and <math>u \in X^*</math>, we have <math>\phi(x) + \phi^*(-u) + \psi(x) + \psi^*(u) \geq \langle x, -u \rangle + \langle x, u \rangle = 0</math>, so the infimum on the left is greater than or equal to the supremum on the right.
=== Proof ===
If the infimum is <math>-\infty</math>, equality must be obtained and every <math>u \in X^*</math> must realize the supremum; otherwise assume <math>\phi(x) + \psi(x) > -\infty</math> everywhere, and put <math>m</math> for the value of the infimum.


Let <math display="block">A := \left\{(x, t) : \phi(x) \leq t\right\},</math> and observe that since <math>\phi</math> is continuous at <math>x_0</math>, the interior of <math>C</math> is nonempty.
We provide a sketch of the proof, the reader is referred to Brezis<ref name="Brezis" /> for further reading. Note that, by Young's Inequality, for any <math >x \in X</math> and <math>u \in X^*</math> we have
Likewise, let <math display="block">B := \left\{(x, t) : t \leq m - \psi(x)\right\}.</math>
<math display="block" >\phi(x) + \phi^*(-u) + \psi(x) + \psi^*(u) \geq \langle x, -u \rangle + \langle x, u \rangle = 0.</math>  
Hence, the infimum of the left hand side is greater than or equal to the supremum of the right hand side. If the infimum is <math>-\infty</math>, equality must be obtained, and for every <math>u \in X^*</math> the supremum is realized; otherwise assume <math>\phi(x) + \psi(x) > -\infty</math> for all <math> x \in X </math>, and let <math>m</math> be the value of the infimum.


Both sets are convex, and <math>B \cap \text{int}A = \emptyset</math> by construction, so a [https://en.wikipedia.org/wiki/Hahn%E2%80%93Banach_theorem#Separation_of_sets geometric Hahn-Banach Theorem] gives that there is some <math>(f, k) \in X^* \times \mathbb{R}</math> nonzero and <math>\alpha \in \mathbb{R}</math> such that <math display="block">\begin{cases} \langle x, f \rangle + kt \geq \alpha, & (x, t) \in A, \\ \langle x, f \rangle + kt \leq \alpha, & (x, t) \in B.\end{cases}</math>
Let <math> A := \left\{(x, t) : \phi(x) \leq t\right\},</math> and observe that since <math>\phi</math> is continuous at <math>x_0</math>, the interior of <math>C</math> is nonempty. Likewise, let <math> B := \left\{(x, t) : t \leq m - \psi(x)\right\}</math>. Both sets are convex, and by construction we have <math>B \cap \text{int}A = \emptyset</math>. So a [https://en.wikipedia.org/wiki/Hahn%E2%80%93Banach_theorem#Separation_of_sets geometric Hahn-Banach Theorem] implies that there is some nonzero <math>(f, k) \in X^* \times \mathbb{R}</math> and <math>\alpha \in \mathbb{R}</math> such that <math display="block">\begin{cases} \langle x, f \rangle + kt \geq \alpha, & (x, t) \in A, \\ \langle x, f \rangle + kt \leq \alpha, & (x, t) \in B.\end{cases}</math>
Since <math>\phi</math> is finite at <math>x_0</math>, letting <math>t \to +\infty</math> shows that <math>k</math> is nonnegative, and if <math>k = 0</math>, continuity and joint finiteness imply <math>\lVert f \rVert = 0</math>, a contradiction, so <math>k > 0</math>.
Since <math>\phi</math> is finite at <math>x_0</math>, letting <math>t \to +\infty</math> shows that <math>k</math> is nonnegative, and if <math>k = 0</math>, continuity and joint finiteness imply <math>\lVert f \rVert = 0</math>, a contradiction, so <math>k > 0</math>. Thus, we can use the inequalities above with the definitions of <math>\phi^*</math> and <math>\psi^*</math> to conclude that
<math display="block"> \phi^*\left(-\frac f k\right) \leq -\frac \alpha k, \quad \text{and} \quad \psi^*\left(\frac f k\right) \leq \frac \alpha k - m.</math>
Which implies that <math display="block">-\phi^*\left(-\frac f k\right) - \psi^*\left(\frac f k\right) \geq m.</math>
Finally, since the supremum includes this term, it must also be greater than or equal to the infimum, which yields their equality.


Thus, we can use the inequalities above with the definitions of <math>\phi^*</math> and <math>\psi^*</math> to conclude that <math display="block">\begin{cases}\phi^*\left(-\frac f k\right) &\leq -\frac \alpha k, \\ \psi^*\left(\frac f k\right) &\leq \frac \alpha k - m.\end{cases}</math>
This shows that <math display="block">-\phi^*\left(-\frac f k\right) - \psi^*\left(\frac f k\right) \geq m.</math>
Since the supremum includes this term, it must also be greater than or equal to the infimum, yielding equality.


== Application to Linear Programs ==


Let <math> A_1 \in \mathbb{R}^{p \times m} </math>,<math> A_2 \in \mathbb{R}^{q\times n} </math>,<math> b_1 \in \mathbb{R}^{p} </math>, <math> b_2 \in \mathbb{R}^{q} </math>,<math> c_1 \in \mathbb{R}^{m} </math> and <math> c_2 \in \mathbb{R}^{m} </math> and consider the following finite dimensional linear program<ref name="Rockafellar" /> 


= Application to Linear Programs =


Consider the linear program<ref name="Rockafellar" /> <math display="block">\inf\left\{\langle c, x \rangle + \theta_Y(b-Ax) : x \in X\right\},</math> where <math>X = \mathbb{R}^r_+ \times \mathbb{R}^{n-r}</math>, <math>Y = \mathbb{R}^s_+ \times \mathbb{R}^{m-s}</math>, and <math>\theta_Z(u) := \sup\left\{\langle z, u \rangle : z \in Z\right\} = \chi_Z^*(u)</math>.
<math display="block">  
Then if we put <math display="block">\phi(x) + \psi(x) := \left(\langle c, x \rangle + \chi_X(x)\right) + \theta_Y(b - Ax),</math> (noting that <math>\langle c, x \rangle + \chi_X(x)</math> and <math>\theta_Y(b-Ax)</math> are both convex lower semicontinuous proper functions), we are interested in computing <math>\phi^*(u) = \sup\left\{\langle x, u - c \rangle : x \in X \right\} = \theta_X(u-c)</math> and <math>\psi^*(u) = \sup\left\{\langle x, u \rangle - \sup\left\{ \langle y, b-Ax \rangle : y \in Y \right\} : z \in \mathbb{R}^n \right\}</math>.
\mathcal{P} = \quad \begin{aligned}
By writing the latter as <math>\psi^*(u) = \sup\left\{\inf\left\{\langle x, u + A^*y \rangle - \langle y, b \rangle : y \in Y\right \} : z \in \mathbb{R}^n \right\}</math>, we can see that we have a finite value of <math>\psi^*(u)</math> if and only if <math>u = -A^*y_0</math> for some <math>y_0 \in Y</math>, and the value is then the infimum of <math>-\langle y, b \rangle</math> where <math>A^*y = A^*y_0</math>.
& {\text{         }\inf}
Putting this together, the Fenchel-Rockafellar Theorem gives that <math display="block">\inf\left\{\langle c, x \rangle + \theta_Y(b-Ax) : x \in X\right\} = \max\left\{\langle y, b \rangle - \theta_X(A^*y - c) : y \in Y\right\}, </math> provided we can find a point of continuity for <math>\phi</math> and mutual finiteness for <math>\phi</math> and <math>\psi</math>.
& & \langle c_1, x \rangle + \langle c_2, y \rangle \\
& \text{subject to} & &  A_1 x \geq b_1 \\
&  & &  A_2 y = b_2 \\
&  & &  x \geq 0 \\
\end{aligned}
</math>


To do this, we use our knowledge of <math>X</math> and <math>Y</math> to be more precise about the <math>\theta</math> functions.
In particular, if we write <math>(\pi_1^Y, \pi_2^Y) : Y \to \mathbb{R}^s \times \mathbb{R}^{m-s}</math> and <math>(\pi_1^X, \pi_2^X) : X \to \mathbb{R}^r \times \mathbb{R}^{n-r}</math> for the projections, and put <math>(b_1, b_2) = (\pi_1^Y, \pi_2^Y)b</math>, <math>(A_1, A_2) = (\pi_1^Y, \pi_2^Y) \circ A</math>, <math>(c_1, c_2) = (\pi_1^X, \pi_2^X)c</math>, and <math>(A_1^*, A_2^*) = (\pi_1^X, \pi_2^X) \circ A^*</math> (which do satisfy <math>A_i^* = (A_i)^*</math>, justifying the notation) we get (writing inequalities termwise) <math display="block">\theta_Y(b-Ax) = \begin{cases} 0, & b_1 \leq A_1x \land b_2 = A_2x, \\ +\infty, & \text{else}, \end{cases} \qquad \theta_X(A^*y - c) = \begin{cases} 0, & c_1 \geq A_1^*y \land c_2 = A_2^*y \\ +\infty, & \text{else}.\end{cases}</math>
So, if some point in the interior of <math>X</math> is mapped strictly greater than <math>b_1</math> and onto <math>b_2</math> by <math>A</math>, we may write the duality result in the more customary way, <math display="block"> \inf_{x \in X, A_1x \geq b_1, A_2x = b_2} \langle c, x \rangle = \max_{y \in Y, A_1^*y \leq c_1, A_2^*y = c_2} \langle y, b \rangle.</math>
If some point in the interior of <math>Y</math> is mapped strictly less than <math>c_1</math> and onto <math>c_2</math> by <math>A^*</math>, then reversing the roles of primal and dual problems by taking negatives gives that the infimum is achieved as well.


where <math> x \in \mathbb{R}^{m} </math> and <math> y \in \mathbb{R}^{n} </math>. Moreover, let us suppose that there exist at least one feasible solution <math> \tilde{x} \in \mathbb{R}^{m},\tilde{y} \in \mathbb{R}^{n} </math> such that <math> A_1 \tilde{x} > b_1 </math> <math> A_2 \tilde{y} = b_2 </math> and <math> \tilde{x} > 0 </math>. Then, dual problem of <math> \mathcal{P} </math> is given by


<math display="block">
\mathcal{D}= \quad  \begin{aligned}
& {\text{        }\max}
& & \langle  b_1 , \alpha  \rangle + \langle b_2 , \beta \rangle \\
& \text{subject to} & &  A_1^T \alpha \leq c_1 \\
&  & &  A_2^T \beta = c_2 \\
&  & &  \alpha \geq 0 \\
\end{aligned}
</math>
where <math> \alpha \in \mathbb{R}^{p} </math> and <math> \beta \in \mathbb{R}^{q} </math>. Furthermore, <math> \mathcal{D} </math> attains its optimal value and we have <math> \mathcal{P}  =  \mathcal{D} </math>.
===Proof===
Note that, the linear program may be equivalently written as
<math display="block">
  \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\inf} \left\{ \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) + \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) +  \Chi_{\{0\}} (b_2 - A_2 y) \right\}
</math>
where <math> \Chi_A </math> denotes the characteristic function of <math> A \subseteq \mathbb{R}^n </math>. Further, let us define <math> \phi(x,y) = \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) </math>, and <math> \varphi(x,y) = \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) +  \Chi_{\{0\}} (b_2 - A_2 y) </math>. Note that, both <math> \phi </math> and <math> \varphi </math> are convex, lower semicontinuous and proper functions, where we have <math> \phi ( \tilde{x},\tilde{y}),\varphi(\tilde{x},\tilde{y}) < \infty </math>. Moreover, since we have <math> \tilde{x} > 0 </math>, we observe that <math> \phi </math> is continuous at <math> (\tilde{x},\tilde{y}) </math>. Therefore, by the Fenchel-Rockafellar Theorem we obtain
<math display="block">
    \underset{w \in \mathbb{R}^{m},t \in \mathbb{R}^{n}}{\max} \left\{ -\phi^\star(-w,-t) - \varphi^\star(w,t)  \right\} = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\inf} \left\{ \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) + \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) +  \Chi_{\{0\}} (b_2 - A_2 y) \right\}.
</math>
In terms of convex conjugate of <math> \phi </math> we have
<math display="block">
    \begin{align} \phi^\star(-w,-t) & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ -\langle x, w+c_1 \rangle - \langle y, t+c_2 \rangle - \Chi_{ \mathbb{R}_+ }(x)  \right\} \\ & = \underset{ x \in \mathbb{R}^{m}_{-},y \in \mathbb{R}^{n} }{\sup} \left\{ -\langle x, w+c_1 \rangle - \langle y, t+c_2 \rangle  \right\} \\ & = \Chi_{ \mathbb{R}_+ }(w+c_1) + \Chi_{\{0\}}(t+c_2).
\end{align}
</math>
Furthermore, for the convex conjugate of <math> \varphi </math> we observe
<math display="block">
    \begin{align} \varphi^\star(w,t) & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \langle x, w \rangle + \langle y, t \rangle - \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) -  \Chi_{\{0\}} (b_2 - A_2 y)  \right\}
\\ & =  \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \langle x, w \rangle + \langle y, t \rangle - \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\sup} \left\{ \langle  b_1 - A_1 x ,\alpha \rangle + \langle  b_2 - A_2 y , \beta \rangle \right\}  \right\}
\\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\inf} \left\{ \langle x, w \rangle + \langle y, t \rangle + \langle  A_1 x - b_1 , \alpha  \rangle + \langle A_2 y - b2 , \beta \rangle \right\}  \right\}
\\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\inf} \left\{ \langle x, w+ A_1^T \alpha \rangle + \langle y, t+A_2^T\beta \rangle - \langle  b_1 , \alpha  \rangle - \langle b_2 , \beta \rangle \right\}  \right\}
\end{align}
</math>
Notice that, <math> \varphi^\star </math> is finite if and only if we have <math>  w = A_1^T \alpha </math> and <math> t = A_2^T\beta </math> for some <math> \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} </math>. Therefore, as <math> \phi^\star = \Chi_{ \mathbb{R}_+ }(w+c_1) + \Chi_{\{0\}}(t+c_2)</math>, by combining these two results with the Fenchel-Rockafellar Theorem we obtain the dual linear program as follows.
<math display="block">
  \underset{\alpha \in \mathbb{R}^{p}_+,\beta \in \mathbb{R}^{q}}{\max} \left\{ \langle  b_1 , \alpha  \rangle + \langle b_2 , \beta \rangle - \Chi_{\mathbb{R}_+} (c_1-A_1^T \alpha) - \Chi_{\{0\}} (A_2^T \beta -c_2) \right\}
</math>


= References =
= References =

Latest revision as of 12:22, 5 March 2022

The Fenchel-Rockafellar Theorem is a well-known result from convex analysis that establishes a minimax principle between convex functions and their convex conjugates under some regularity conditions. One fundamental application of this theorem is the characterization of the dual problem of a finite dimensional linear program.


The Fenchel-Rockafellar Theorem

Let and be convex, lower semicontinuous and proper functions. Suppose that there exists such that , where is continuous at . Then, it holds that

Proof

We provide a sketch of the proof, the reader is referred to Brezis[1] for further reading. Note that, by Young's Inequality, for any and we have

Hence, the infimum of the left hand side is greater than or equal to the supremum of the right hand side. If the infimum is , equality must be obtained, and for every the supremum is realized; otherwise assume for all , and let be the value of the infimum.

Let and observe that since is continuous at , the interior of is nonempty. Likewise, let . Both sets are convex, and by construction we have . So a geometric Hahn-Banach Theorem implies that there is some nonzero and such that

Since is finite at , letting shows that is nonnegative, and if , continuity and joint finiteness imply , a contradiction, so . Thus, we can use the inequalities above with the definitions of and to conclude that
Which implies that
Finally, since the supremum includes this term, it must also be greater than or equal to the infimum, which yields their equality.


Application to Linear Programs

Let ,,, , and and consider the following finite dimensional linear program[2]



where and . Moreover, let us suppose that there exist at least one feasible solution such that and . Then, dual problem of is given by

where and . Furthermore, attains its optimal value and we have .

Proof

Note that, the linear program may be equivalently written as

where denotes the characteristic function of . Further, let us define , and . Note that, both and are convex, lower semicontinuous and proper functions, where we have . Moreover, since we have , we observe that is continuous at . Therefore, by the Fenchel-Rockafellar Theorem we obtain

In terms of convex conjugate of we have

Furthermore, for the convex conjugate of we observe

Notice that, is finite if and only if we have and for some . Therefore, as , by combining these two results with the Fenchel-Rockafellar Theorem we obtain the dual linear program as follows.

References