|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| The Fenchel-Rockafellar Theorem is a minimax principle especially suited to converting between [[Fenchel-Moreau and Primal/Dual Optimization Problems#Applications to Primal/Dual Optimization Problems|primal and dual]] [//en.wikipedia.org/wiki/Linear_programming linear programs]. | | The Fenchel-Rockafellar Theorem is a well-known result from convex analysis that establishes a minimax principle between convex functions and their convex conjugates under some regularity conditions. One fundamental application of this theorem is the characterization of the dual problem of a finite dimensional linear program. |
|
| |
|
| = The Fenchel-Rockafellar Theorem =
| |
|
| |
|
| Suppose <math>\phi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> and <math>\psi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> are convex, lower semicontinuous, proper functions which are both finite at some <math>x_0 \in X</math>, and <math>\phi</math> is continuous there.
| | == The Fenchel-Rockafellar Theorem == |
| Then <math display="block">\inf\left\{\phi(x) + \psi(x) : x \in X\right\} = \max\left\{-\phi^*(-u) - \psi^*(u) : u \in X^*\right\},</math> where the existence of the minimizer on the right hand side is part of the theorem.
| |
|
| |
|
| == Proof Sketch ==
| | Let <math>\phi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> and <math>\psi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> be convex, lower semicontinuous and proper functions. Suppose that there exists <math>x_0 \in X</math> such that <math> \phi(x_0),\varphi(x_0) < \infty</math>, where <math> \phi </math> is continuous at <math> x_0 </math>. Then, it holds that |
|
| |
|
| Details may be found in Brezis' text<ref name="Brezis" />, but a sketch of the proof follows.
| | <math display="block"> \underset{x \in X}{\inf}\left\{ \phi(x) + \psi(x) \right\} = \underset{u \in X^*} {\max}\left\{-\phi^*(-u) - \psi^*(u) \right\}.</math> |
|
| |
|
| By Young's Inequality, for any <math>x \in X</math> and <math>u \in X^*</math>, we have <math>\phi(x) + \phi^*(-u) + \psi(x) + \psi^*(u) \leq \langle x, -u \rangle + \langle x, u \rangle = 0</math>, so the infimum on the left is less than or equal to the supremum on the right.
| | === Proof === |
| If the supremum is <math>-\infty</math>, equality must be obtained and every <math>u \in X^*</math> must realize it; otherwise assume <math>-\phi(-u) - \psi(u) > -\infty</math> everywhere.
| |
| (INCOMPLETE)
| |
|
| |
|
| | We provide a sketch of the proof, the reader is referred to Brezis<ref name="Brezis" /> for further reading. Note that, by Young's Inequality, for any <math >x \in X</math> and <math>u \in X^*</math> we have |
| | <math display="block" >\phi(x) + \phi^*(-u) + \psi(x) + \psi^*(u) \geq \langle x, -u \rangle + \langle x, u \rangle = 0.</math> |
| | Hence, the infimum of the left hand side is greater than or equal to the supremum of the right hand side. If the infimum is <math>-\infty</math>, equality must be obtained, and for every <math>u \in X^*</math> the supremum is realized; otherwise assume <math>\phi(x) + \psi(x) > -\infty</math> for all <math> x \in X </math>, and let <math>m</math> be the value of the infimum. |
|
| |
|
| | Let <math> A := \left\{(x, t) : \phi(x) \leq t\right\},</math> and observe that since <math>\phi</math> is continuous at <math>x_0</math>, the interior of <math>C</math> is nonempty. Likewise, let <math> B := \left\{(x, t) : t \leq m - \psi(x)\right\}</math>. Both sets are convex, and by construction we have <math>B \cap \text{int}A = \emptyset</math>. So a [https://en.wikipedia.org/wiki/Hahn%E2%80%93Banach_theorem#Separation_of_sets geometric Hahn-Banach Theorem] implies that there is some nonzero <math>(f, k) \in X^* \times \mathbb{R}</math> and <math>\alpha \in \mathbb{R}</math> such that <math display="block">\begin{cases} \langle x, f \rangle + kt \geq \alpha, & (x, t) \in A, \\ \langle x, f \rangle + kt \leq \alpha, & (x, t) \in B.\end{cases}</math> |
| | Since <math>\phi</math> is finite at <math>x_0</math>, letting <math>t \to +\infty</math> shows that <math>k</math> is nonnegative, and if <math>k = 0</math>, continuity and joint finiteness imply <math>\lVert f \rVert = 0</math>, a contradiction, so <math>k > 0</math>. Thus, we can use the inequalities above with the definitions of <math>\phi^*</math> and <math>\psi^*</math> to conclude that |
| | <math display="block"> \phi^*\left(-\frac f k\right) \leq -\frac \alpha k, \quad \text{and} \quad \psi^*\left(\frac f k\right) \leq \frac \alpha k - m.</math> |
| | Which implies that <math display="block">-\phi^*\left(-\frac f k\right) - \psi^*\left(\frac f k\right) \geq m.</math> |
| | Finally, since the supremum includes this term, it must also be greater than or equal to the infimum, which yields their equality. |
|
| |
|
| = Application to Linear Programs =
| |
|
| |
|
| Consider the linear program<ref name="Rockafellar" /> <math display="block">\inf\left\{\langle c, x \rangle + \theta_Y(b-Ax) : x \in X\right\},</math> where <math>X = \mathbb{R}^r_+ \times \mathbb{R}^{n-r}</math>, <math>Y = \mathbb{R}^s_+ \times \mathbb{R}^{m-s}</math>, and <math>\theta_Z(u) := \sup\left\{\langle z, u \rangle : z \in Z\right\} = \chi_Z^*(u)</math>.
| | == Application to Linear Programs == |
| Then if we put <math display="block">\phi(x) + \psi(x) := \left(\langle c, x \rangle + \chi_X(x)\right) + \theta_Y(b - Ax),</math> (noting that <math>\langle c, x \rangle + \chi_X(x)</math> and <math>\theta_Y(b-Ax)</math> are both convex lower semicontinuous proper functions), we are interested in computing <math>\phi^*(u) = \sup\left\{\langle x, u - c \rangle : x \in X \right\} = \theta_X(u-c)</math> and <math>\psi^*(u) = \sup\left\{\langle x, u \rangle - \sup\left\{ \langle y, b-Ax \rangle : y \in Y \right\} : z \in \mathbb{R}^n \right\}</math>.
| |
| By writing the latter as <math>\psi^*(u) = \sup\left\{\inf\left\{\langle x, u + A^*y \rangle - \langle y, b \rangle : y \in Y\right \} : z \in \mathbb{R}^n \right\}</math>, we can see that we have a finite value of <math>\psi^*(u)</math> if and only if <math>u = -A^*y_0</math> for some <math>y_0 \in Y</math>, and the value is then the infimum of <math>-\langle y, b \rangle</math> where <math>A^*y = A^*y_0</math>.
| |
| Putting this together, the Fenchel-Rockafellar Theorem gives that <math display="block">\inf\left\{\langle c, x \rangle + \theta_Y(b-Ax) : x \in X\right\} = \max\left\{\langle y, b \rangle - \theta_X(A^*y - c) : y \in Y\right\}, </math> provided we can find a point of continuity and mutual finiteness for <math>\phi</math> and <math>\psi</math>.
| |
|
| |
|
| To do this, we use our knowledge of <math>X</math> and <math>Y</math> to be more precise about the <math>\theta</math> functions.
| | Let <math> A_1 \in \mathbb{R}^{p \times m} </math>,<math> A_2 \in \mathbb{R}^{q\times n} </math>,<math> b_1 \in \mathbb{R}^{p} </math>, <math> b_2 \in \mathbb{R}^{q} </math>,<math> c_1 \in \mathbb{R}^{m} </math> and <math> c_2 \in \mathbb{R}^{m} </math> and consider the following finite dimensional linear program<ref name="Rockafellar" /> |
| In particular, if we write <math>(\pi_1^Y, \pi_2^Y) : Y \to \mathbb{R}^s \times \mathbb{R}^{m-s}</math> and <math>(\pi_1^X, \pi_2^X) : X \to \mathbb{R}^r \times \mathbb{R}^{n-r}</math> for the projections, and put <math>(b_1, b_2) = (\pi_1^Y, \pi_2^Y)b</math>, <math>(A_1, A_2) = (\pi_1^Y, \pi_2^Y) \circ A</math>, <math>(c_1, c_2) = (\pi_1^X, \pi_2^X)c</math>, and <math>(A_1^*, A_2^*) = (\pi_1^X, \pi_2^X) \circ A^*</math> (which do satisfy <math>A_i^* = (A_i)^*</math>, justifying the notation) we get (writing inequalities termwise) <math display="block">\theta_Y(b-Ax) = \begin{cases} 0, & b_1 \leq A_1x \land b_2 = A_2x, \\ +\infty, & \text{else}, \end{cases} \qquad \theta_X(A^*y - c) = \begin{cases} 0, & c_1 \geq A_1^*y \land c_2 = A_2^*y \\ +\infty, & \text{else}.\end{cases}</math>
| |
| So, if some point in the interior of <math>X</math> is mapped strictly greater than <math>b_1</math> by <math>A_1</math> and onto <math>b_2</math> by <math>A_2</math>, we may write the duality result in the more customary way, <math display="block"> \inf_{x \in X, A_1x \geq b_1, A_2x = b_2} \langle c, x \rangle = \max_{y \in Y, A_1^*y \leq c_1, A_2^*y = c_2} \langle y, b \rangle.</math>
| |
|
| |
|
|
| |
|
| | <math display="block"> |
| | \mathcal{P} = \quad \begin{aligned} |
| | & {\text{ }\inf} |
| | & & \langle c_1, x \rangle + \langle c_2, y \rangle \\ |
| | & \text{subject to} & & A_1 x \geq b_1 \\ |
| | & & & A_2 y = b_2 \\ |
| | & & & x \geq 0 \\ |
| | \end{aligned} |
| | </math> |
| | |
| | |
| | where <math> x \in \mathbb{R}^{m} </math> and <math> y \in \mathbb{R}^{n} </math>. Moreover, let us suppose that there exist at least one feasible solution <math> \tilde{x} \in \mathbb{R}^{m},\tilde{y} \in \mathbb{R}^{n} </math> such that <math> A_1 \tilde{x} > b_1 </math> <math> A_2 \tilde{y} = b_2 </math> and <math> \tilde{x} > 0 </math>. Then, dual problem of <math> \mathcal{P} </math> is given by |
| | |
| | <math display="block"> |
| | \mathcal{D}= \quad \begin{aligned} |
| | & {\text{ }\max} |
| | & & \langle b_1 , \alpha \rangle + \langle b_2 , \beta \rangle \\ |
| | & \text{subject to} & & A_1^T \alpha \leq c_1 \\ |
| | & & & A_2^T \beta = c_2 \\ |
| | & & & \alpha \geq 0 \\ |
| | \end{aligned} |
| | </math> |
| | |
| | where <math> \alpha \in \mathbb{R}^{p} </math> and <math> \beta \in \mathbb{R}^{q} </math>. Furthermore, <math> \mathcal{D} </math> attains its optimal value and we have <math> \mathcal{P} = \mathcal{D} </math>. |
| | |
| | ===Proof=== |
| | |
| | |
| | Note that, the linear program may be equivalently written as |
| | |
| | <math display="block"> |
| | \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\inf} \left\{ \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) + \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) + \Chi_{\{0\}} (b_2 - A_2 y) \right\} |
| | </math> |
| | |
| | where <math> \Chi_A </math> denotes the characteristic function of <math> A \subseteq \mathbb{R}^n </math>. Further, let us define <math> \phi(x,y) = \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) </math>, and <math> \varphi(x,y) = \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) + \Chi_{\{0\}} (b_2 - A_2 y) </math>. Note that, both <math> \phi </math> and <math> \varphi </math> are convex, lower semicontinuous and proper functions, where we have <math> \phi ( \tilde{x},\tilde{y}),\varphi(\tilde{x},\tilde{y}) < \infty </math>. Moreover, since we have <math> \tilde{x} > 0 </math>, we observe that <math> \phi </math> is continuous at <math> (\tilde{x},\tilde{y}) </math>. Therefore, by the Fenchel-Rockafellar Theorem we obtain |
| | |
| | <math display="block"> |
| | \underset{w \in \mathbb{R}^{m},t \in \mathbb{R}^{n}}{\max} \left\{ -\phi^\star(-w,-t) - \varphi^\star(w,t) \right\} = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\inf} \left\{ \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) + \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) + \Chi_{\{0\}} (b_2 - A_2 y) \right\}. |
| | </math> |
| | |
| | In terms of convex conjugate of <math> \phi </math> we have |
| | |
| | <math display="block"> |
| | \begin{align} \phi^\star(-w,-t) & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ -\langle x, w+c_1 \rangle - \langle y, t+c_2 \rangle - \Chi_{ \mathbb{R}_+ }(x) \right\} \\ & = \underset{ x \in \mathbb{R}^{m}_{-},y \in \mathbb{R}^{n} }{\sup} \left\{ -\langle x, w+c_1 \rangle - \langle y, t+c_2 \rangle \right\} \\ & = \Chi_{ \mathbb{R}_+ }(w+c_1) + \Chi_{\{0\}}(t+c_2). |
| | |
| | \end{align} |
| | </math> |
| | |
| | Furthermore, for the convex conjugate of <math> \varphi </math> we observe |
| | |
| | <math display="block"> |
| | \begin{align} \varphi^\star(w,t) & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \langle x, w \rangle + \langle y, t \rangle - \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) - \Chi_{\{0\}} (b_2 - A_2 y) \right\} |
| | \\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \langle x, w \rangle + \langle y, t \rangle - \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\sup} \left\{ \langle b_1 - A_1 x ,\alpha \rangle + \langle b_2 - A_2 y , \beta \rangle \right\} \right\} |
| | \\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\inf} \left\{ \langle x, w \rangle + \langle y, t \rangle + \langle A_1 x - b_1 , \alpha \rangle + \langle A_2 y - b2 , \beta \rangle \right\} \right\} |
| | \\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\inf} \left\{ \langle x, w+ A_1^T \alpha \rangle + \langle y, t+A_2^T\beta \rangle - \langle b_1 , \alpha \rangle - \langle b_2 , \beta \rangle \right\} \right\} |
| | \end{align} |
| | </math> |
| | |
| | Notice that, <math> \varphi^\star </math> is finite if and only if we have <math> w = A_1^T \alpha </math> and <math> t = A_2^T\beta </math> for some <math> \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} </math>. Therefore, as <math> \phi^\star = \Chi_{ \mathbb{R}_+ }(w+c_1) + \Chi_{\{0\}}(t+c_2)</math>, by combining these two results with the Fenchel-Rockafellar Theorem we obtain the dual linear program as follows. |
| | |
| | <math display="block"> |
| | \underset{\alpha \in \mathbb{R}^{p}_+,\beta \in \mathbb{R}^{q}}{\max} \left\{ \langle b_1 , \alpha \rangle + \langle b_2 , \beta \rangle - \Chi_{\mathbb{R}_+} (c_1-A_1^T \alpha) - \Chi_{\{0\}} (A_2^T \beta -c_2) \right\} |
| | </math> |
|
| |
|
| = References = | | = References = |
The Fenchel-Rockafellar Theorem is a well-known result from convex analysis that establishes a minimax principle between convex functions and their convex conjugates under some regularity conditions. One fundamental application of this theorem is the characterization of the dual problem of a finite dimensional linear program.
The Fenchel-Rockafellar Theorem
Let and be convex, lower semicontinuous and proper functions. Suppose that there exists such that , where is continuous at . Then, it holds that
Proof
We provide a sketch of the proof, the reader is referred to Brezis[1] for further reading. Note that, by Young's Inequality, for any and we have
Hence, the infimum of the left hand side is greater than or equal to the supremum of the right hand side. If the infimum is
, equality must be obtained, and for every
the supremum is realized; otherwise assume
for all
, and let
be the value of the infimum.
Let and observe that since is continuous at , the interior of is nonempty. Likewise, let . Both sets are convex, and by construction we have . So a geometric Hahn-Banach Theorem implies that there is some nonzero and such that
Since
is finite at
, letting
shows that
is nonnegative, and if
, continuity and joint finiteness imply
, a contradiction, so
. Thus, we can use the inequalities above with the definitions of
and
to conclude that
Which implies that
Finally, since the supremum includes this term, it must also be greater than or equal to the infimum, which yields their equality.
Application to Linear Programs
Let ,,, , and and consider the following finite dimensional linear program[2]
where and . Moreover, let us suppose that there exist at least one feasible solution such that and . Then, dual problem of is given by
where and . Furthermore, attains its optimal value and we have .
Proof
Note that, the linear program may be equivalently written as
where denotes the characteristic function of . Further, let us define , and . Note that, both and are convex, lower semicontinuous and proper functions, where we have . Moreover, since we have , we observe that is continuous at . Therefore, by the Fenchel-Rockafellar Theorem we obtain
In terms of convex conjugate of we have
Furthermore, for the convex conjugate of we observe
Notice that, is finite if and only if we have and for some . Therefore, as , by combining these two results with the Fenchel-Rockafellar Theorem we obtain the dual linear program as follows.
References