|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| The Fenchel-Rockafellar Theorem is a minimax principle especially suited to converting between [[Fenchel-Moreau and Primal/Dual Optimization Problems#Applications to Primal/Dual Optimization Problems|primal and dual]] [//en.wikipedia.org/wiki/Linear_programming linear programs]. | | The Fenchel-Rockafellar Theorem is a well-known result from convex analysis that establishes a minimax principle between convex functions and their convex conjugates under some regularity conditions. One fundamental application of this theorem is the characterization of the dual problem of a finite dimensional linear program. |
|
| |
|
| = The Fenchel-Rockafellar Theorem =
| |
|
| |
|
| Suppose <math>\phi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> and <math>\psi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> are convex, lower semicontinuous, proper functions which are both finite at some <math>x_0 \in X</math>, and <math>\phi</math> is continuous there.
| | == The Fenchel-Rockafellar Theorem == |
| Then <math display="block">\inf\left\{\phi(x) + \psi(x) : x \in X\right\} = \max\left\{-\phi^*(-u) - \psi^*(u) : u \in X^*\right\},</math> where the existence of the minimizer on the right hand side is part of the theorem.
| |
|
| |
|
| == Proof Sketch ==
| | Let <math>\phi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> and <math>\psi : X \to \mathbb{R} \cup \left\{+\infty\right\}</math> be convex, lower semicontinuous and proper functions. Suppose that there exists <math>x_0 \in X</math> such that <math> \phi(x_0),\varphi(x_0) < \infty</math>, where <math> \phi </math> is continuous at <math> x_0 </math>. Then, it holds that |
|
| |
|
| Details may be found in Brezis' text<ref name="Brezis" />, but a sketch of the proof follows.
| | <math display="block"> \underset{x \in X}{\inf}\left\{ \phi(x) + \psi(x) \right\} = \underset{u \in X^*} {\max}\left\{-\phi^*(-u) - \psi^*(u) \right\}.</math> |
|
| |
|
| By Young's Inequality, for any <math>x \in X</math> and <math>u \in X^*</math>, we have <math>\phi(x) + \phi^*(-u) + \psi(x) + \psi^*(u) \leq \langle x, -u \rangle + \langle x, u \rangle = 0</math>, so the infimum on the left is less than or equal to the supremum on the right.
| | === Proof === |
| If the supremum is <math>-\infty</math>, equality must be obtained and every <math>u \in X^*</math> must realize it; otherwise assume <math>-\phi(-u) - \psi(u) > -\infty</math> everywhere.
| |
| (INCOMPLETE)
| |
|
| |
|
| | We provide a sketch of the proof, the reader is referred to Brezis<ref name="Brezis" /> for further reading. Note that, by Young's Inequality, for any <math >x \in X</math> and <math>u \in X^*</math> we have |
| | <math display="block" >\phi(x) + \phi^*(-u) + \psi(x) + \psi^*(u) \geq \langle x, -u \rangle + \langle x, u \rangle = 0.</math> |
| | Hence, the infimum of the left hand side is greater than or equal to the supremum of the right hand side. If the infimum is <math>-\infty</math>, equality must be obtained, and for every <math>u \in X^*</math> the supremum is realized; otherwise assume <math>\phi(x) + \psi(x) > -\infty</math> for all <math> x \in X </math>, and let <math>m</math> be the value of the infimum. |
|
| |
|
| | Let <math> A := \left\{(x, t) : \phi(x) \leq t\right\},</math> and observe that since <math>\phi</math> is continuous at <math>x_0</math>, the interior of <math>C</math> is nonempty. Likewise, let <math> B := \left\{(x, t) : t \leq m - \psi(x)\right\}</math>. Both sets are convex, and by construction we have <math>B \cap \text{int}A = \emptyset</math>. So a [https://en.wikipedia.org/wiki/Hahn%E2%80%93Banach_theorem#Separation_of_sets geometric Hahn-Banach Theorem] implies that there is some nonzero <math>(f, k) \in X^* \times \mathbb{R}</math> and <math>\alpha \in \mathbb{R}</math> such that <math display="block">\begin{cases} \langle x, f \rangle + kt \geq \alpha, & (x, t) \in A, \\ \langle x, f \rangle + kt \leq \alpha, & (x, t) \in B.\end{cases}</math> |
| | Since <math>\phi</math> is finite at <math>x_0</math>, letting <math>t \to +\infty</math> shows that <math>k</math> is nonnegative, and if <math>k = 0</math>, continuity and joint finiteness imply <math>\lVert f \rVert = 0</math>, a contradiction, so <math>k > 0</math>. Thus, we can use the inequalities above with the definitions of <math>\phi^*</math> and <math>\psi^*</math> to conclude that |
| | <math display="block"> \phi^*\left(-\frac f k\right) \leq -\frac \alpha k, \quad \text{and} \quad \psi^*\left(\frac f k\right) \leq \frac \alpha k - m.</math> |
| | Which implies that <math display="block">-\phi^*\left(-\frac f k\right) - \psi^*\left(\frac f k\right) \geq m.</math> |
| | Finally, since the supremum includes this term, it must also be greater than or equal to the infimum, which yields their equality. |
|
| |
|
| = Application to Linear Programs =
| |
|
| |
|
| Consider the linear program<ref name="Rockafellar" /> <math display="block">\inf\left\{\langle c, x \rangle + \theta_Y(b-Ax) : x \in X\right\},</math> where <math>X = \mathbb{R}^r_+ \times \mathbb{R}^{n-r}</math>, <math>Y = \mathbb{R}^s_+ \times \mathbb{R}^{m-s}</math>, and <math>\theta_Z(u) := \sup\left\{\langle z, u \rangle : z \in Z\right\} = \chi_Z^*(u)</math>.
| | == Application to Linear Programs == |
| Then if we put <math display="block">\phi(x) + \psi(x) := \left(\langle c, x \rangle + \chi_X(x)\right) + \theta_Y(b - Ax),</math> (noting that <math>\langle c, x \rangle + \chi_X(x)</math> and <math>\theta_Y(b-Ax)</math> are both convex lower semicontinuous proper functions), we are interested in computing <math>\phi^*(u) = \sup\left\{\langle x, u - c \rangle : x \in X \right\} = \theta_X(u-c)</math> and <math>\psi^*(u) = \sup\left\{\langle x, u \rangle - \sup\left\{ \langle y, b-Ax \rangle : y \in Y \right\} : z \in \mathbb{R}^n \right\}</math>.
| |
| By writing the latter as <math>\psi^*(u) = \sup\left\{\inf\left\{\langle x, u + A^*y \rangle - \langle y, b \rangle : y \in Y\right \} : z \in \mathbb{R}^n \right\}</math>, we can see that we have a finite value of <math>\psi^*(u)</math> if and only if <math>u = -A^*y_0</math> for some <math>y_0 \in Y</math>, and the value is then the infimum of <math>-\langle y, b \rangle</math> where <math>A^*y = A^*y_0</math>.
| |
| Putting this together, the Fenchel-Rockafellar Theorem gives that <math display="block">\inf\left\{\langle c, x \rangle + \theta_Y(b-Ax) : x \in X\right\} = \max\left\{\langle y, b \rangle - \theta_X(A^*y - c) : y \in Y\right\}, </math> provided we can find a point of continuity and mutual finiteness for <math>\phi</math> and <math>\psi</math>.
| |
|
| |
|
| To do this, we use our knowledge of <math>X</math> and <math>Y</math> to be more precise about the <math>\theta</math> functions.
| | Let <math> A_1 \in \mathbb{R}^{p \times m} </math>,<math> A_2 \in \mathbb{R}^{q\times n} </math>,<math> b_1 \in \mathbb{R}^{p} </math>, <math> b_2 \in \mathbb{R}^{q} </math>,<math> c_1 \in \mathbb{R}^{m} </math> and <math> c_2 \in \mathbb{R}^{m} </math> and consider the following finite dimensional linear program<ref name="Rockafellar" /> |
| In particular, if we write <math>(\pi_1^Y, \pi_2^Y) : Y \to \mathbb{R}^s \times \mathbb{R}^{m-s}</math> and <math>(\pi_1^X, \pi_2^X) : X \to \mathbb{R}^r \times \mathbb{R}^{n-r}</math> for the projections, and put <math>(b_1, b_2) = (\pi_1^Y, \pi_2^Y)b</math>, <math>(A_1, A_2) = (\pi_1^Y, \pi_2^Y) \circ A</math>, <math>(c_1, c_2) = (\pi_1^X, \pi_2^X)c</math>, and <math>(A_1^*, A_2^*) = (\pi_1^X, \pi_2^X) \circ A^*</math> (which do satisfy <math>A_i^* = (A_i)^*</math>, justifying the notation) we get (writing inequalities termwise) <math display="block">\theta_Y(b-Ax) = \begin{cases} 0, & b_1 \leq A_1x \land b_2 = A_2x, \\ +\infty, & \text{else}, \end{cases} \qquad \theta_X(A^*y - c) = \begin{cases} 0, & c_1 \geq A_1^*y \land c_2 = A_2^*y \\ +\infty, & \text{else}.\end{cases}</math>
| |
| So, if some point in the interior of <math>X</math> is mapped strictly greater than <math>b_1</math> by <math>A_1</math> and onto <math>b_2</math> by <math>A_2</math>, we may write the duality result in the more customary way, <math display="block"> \inf_{x \in X, A_1x \geq b_1, A_2x = b_2} \langle c, x \rangle = \max_{y \in Y, A_1^*y \leq c_1, A_2^*y = c_2} \langle y, b \rangle.</math>
| |
|
| |
|
|
| |
|
| | <math display="block"> |
| | \mathcal{P} = \quad \begin{aligned} |
| | & {\text{ }\inf} |
| | & & \langle c_1, x \rangle + \langle c_2, y \rangle \\ |
| | & \text{subject to} & & A_1 x \geq b_1 \\ |
| | & & & A_2 y = b_2 \\ |
| | & & & x \geq 0 \\ |
| | \end{aligned} |
| | </math> |
| | |
| | |
| | where <math> x \in \mathbb{R}^{m} </math> and <math> y \in \mathbb{R}^{n} </math>. Moreover, let us suppose that there exist at least one feasible solution <math> \tilde{x} \in \mathbb{R}^{m},\tilde{y} \in \mathbb{R}^{n} </math> such that <math> A_1 \tilde{x} > b_1 </math> <math> A_2 \tilde{y} = b_2 </math> and <math> \tilde{x} > 0 </math>. Then, dual problem of <math> \mathcal{P} </math> is given by |
| | |
| | <math display="block"> |
| | \mathcal{D}= \quad \begin{aligned} |
| | & {\text{ }\max} |
| | & & \langle b_1 , \alpha \rangle + \langle b_2 , \beta \rangle \\ |
| | & \text{subject to} & & A_1^T \alpha \leq c_1 \\ |
| | & & & A_2^T \beta = c_2 \\ |
| | & & & \alpha \geq 0 \\ |
| | \end{aligned} |
| | </math> |
| | |
| | where <math> \alpha \in \mathbb{R}^{p} </math> and <math> \beta \in \mathbb{R}^{q} </math>. Furthermore, <math> \mathcal{D} </math> attains its optimal value and we have <math> \mathcal{P} = \mathcal{D} </math>. |
| | |
| | ===Proof=== |
| | |
| | |
| | Note that, the linear program may be equivalently written as |
| | |
| | <math display="block"> |
| | \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\inf} \left\{ \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) + \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) + \Chi_{\{0\}} (b_2 - A_2 y) \right\} |
| | </math> |
| | |
| | where <math> \Chi_A </math> denotes the characteristic function of <math> A \subseteq \mathbb{R}^n </math>. Further, let us define <math> \phi(x,y) = \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) </math>, and <math> \varphi(x,y) = \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) + \Chi_{\{0\}} (b_2 - A_2 y) </math>. Note that, both <math> \phi </math> and <math> \varphi </math> are convex, lower semicontinuous and proper functions, where we have <math> \phi ( \tilde{x},\tilde{y}),\varphi(\tilde{x},\tilde{y}) < \infty </math>. Moreover, since we have <math> \tilde{x} > 0 </math>, we observe that <math> \phi </math> is continuous at <math> (\tilde{x},\tilde{y}) </math>. Therefore, by the Fenchel-Rockafellar Theorem we obtain |
| | |
| | <math display="block"> |
| | \underset{w \in \mathbb{R}^{m},t \in \mathbb{R}^{n}}{\max} \left\{ -\phi^\star(-w,-t) - \varphi^\star(w,t) \right\} = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\inf} \left\{ \langle c_1, x \rangle + \langle c_2, y \rangle + \Chi_{ \mathbb{R}_+ }(x) + \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) + \Chi_{\{0\}} (b_2 - A_2 y) \right\}. |
| | </math> |
| | |
| | In terms of convex conjugate of <math> \phi </math> we have |
| | |
| | <math display="block"> |
| | \begin{align} \phi^\star(-w,-t) & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ -\langle x, w+c_1 \rangle - \langle y, t+c_2 \rangle - \Chi_{ \mathbb{R}_+ }(x) \right\} \\ & = \underset{ x \in \mathbb{R}^{m}_{-},y \in \mathbb{R}^{n} }{\sup} \left\{ -\langle x, w+c_1 \rangle - \langle y, t+c_2 \rangle \right\} \\ & = \Chi_{ \mathbb{R}_+ }(w+c_1) + \Chi_{\{0\}}(t+c_2). |
| | |
| | \end{align} |
| | </math> |
| | |
| | Furthermore, for the convex conjugate of <math> \varphi </math> we observe |
| | |
| | <math display="block"> |
| | \begin{align} \varphi^\star(w,t) & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \langle x, w \rangle + \langle y, t \rangle - \Chi_{ \mathbb{R}_+ }(b_1 - A_1 x) - \Chi_{\{0\}} (b_2 - A_2 y) \right\} |
| | \\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \langle x, w \rangle + \langle y, t \rangle - \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\sup} \left\{ \langle b_1 - A_1 x ,\alpha \rangle + \langle b_2 - A_2 y , \beta \rangle \right\} \right\} |
| | \\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\inf} \left\{ \langle x, w \rangle + \langle y, t \rangle + \langle A_1 x - b_1 , \alpha \rangle + \langle A_2 y - b2 , \beta \rangle \right\} \right\} |
| | \\ & = \underset{ x \in \mathbb{R}^{m},y \in \mathbb{R}^{n} }{\sup} \left\{ \underset{ \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} }{\inf} \left\{ \langle x, w+ A_1^T \alpha \rangle + \langle y, t+A_2^T\beta \rangle - \langle b_1 , \alpha \rangle - \langle b_2 , \beta \rangle \right\} \right\} |
| | \end{align} |
| | </math> |
| | |
| | Notice that, <math> \varphi^\star </math> is finite if and only if we have <math> w = A_1^T \alpha </math> and <math> t = A_2^T\beta </math> for some <math> \alpha \in \mathbb{R}^{p}_+ , \beta \in \mathbb{R}^{q} </math>. Therefore, as <math> \phi^\star = \Chi_{ \mathbb{R}_+ }(w+c_1) + \Chi_{\{0\}}(t+c_2)</math>, by combining these two results with the Fenchel-Rockafellar Theorem we obtain the dual linear program as follows. |
| | |
| | <math display="block"> |
| | \underset{\alpha \in \mathbb{R}^{p}_+,\beta \in \mathbb{R}^{q}}{\max} \left\{ \langle b_1 , \alpha \rangle + \langle b_2 , \beta \rangle - \Chi_{\mathbb{R}_+} (c_1-A_1^T \alpha) - \Chi_{\{0\}} (A_2^T \beta -c_2) \right\} |
| | </math> |
|
| |
|
| = References = | | = References = |
The Fenchel-Rockafellar Theorem is a well-known result from convex analysis that establishes a minimax principle between convex functions and their convex conjugates under some regularity conditions. One fundamental application of this theorem is the characterization of the dual problem of a finite dimensional linear program.
The Fenchel-Rockafellar Theorem
Let
and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi : X \to \mathbb{R} \cup \left\{+\infty\right\}}
be convex, lower semicontinuous and proper functions. Suppose that there exists
such that
, where
is continuous at
. Then, it holds that

Proof
We provide a sketch of the proof, the reader is referred to Brezis[1] for further reading. Note that, by Young's Inequality, for any Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x \in X}
and
we have

Hence, the infimum of the left hand side is greater than or equal to the supremum of the right hand side. If the infimum is

, equality must be obtained, and for every

the supremum is realized; otherwise assume

for all

, and let

be the value of the infimum.
Let
and observe that since
is continuous at
, the interior of
is nonempty. Likewise, let
. Both sets are convex, and by construction we have
. So a geometric Hahn-Banach Theorem implies that there is some nonzero
and
such that

Since

is finite at
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_0}
, letting
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t \to +\infty}
shows that

is nonnegative, and if

, continuity and joint finiteness imply

, a contradiction, so

. Thus, we can use the inequalities above with the definitions of

and
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi^*}
to conclude that

Which implies that

Finally, since the supremum includes this term, it must also be greater than or equal to the infimum, which yields their equality.
Application to Linear Programs
Let
,
,
,
,
and
and consider the following finite dimensional linear program[2]

where
and
. Moreover, let us suppose that there exist at least one feasible solution
such that
and
. Then, dual problem of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P} }
is given by

where
and
. Furthermore,
attains its optimal value and we have
.
Proof
Note that, the linear program may be equivalently written as

where
denotes the characteristic function of
. Further, let us define
, and
. Note that, both
and
are convex, lower semicontinuous and proper functions, where we have
. Moreover, since we have
, we observe that
is continuous at
. Therefore, by the Fenchel-Rockafellar Theorem we obtain

In terms of convex conjugate of
we have

Furthermore, for the convex conjugate of
we observe

Notice that,
is finite if and only if we have
and
for some
. Therefore, as
, by combining these two results with the Fenchel-Rockafellar Theorem we obtain the dual linear program as follows.

References