# (p.536) Appendix B Convex functions

# (p.536) Appendix B Convex functions

Free energies $\mathcal{F}$ of models of interacting lattice clusters are convex functions (as shown in theorem 3.3). This has certain consequences; namely, that $\mathcal{F}$ is differentiable almost everywhere (if it is finite), and finite size approximations to the free energy of an infinite system may be shown to converge to the free energy almost everywhere. The convexity of $\mathcal{F}$ is related to the stability of the thermodynamic equilibrium of a system.

If *f* is a convex function, then $-f$ is a concave function, and with appropriate reinterpretation, all results for convex functions apply to concave functions. There are numerous classical and more recent references on convex functions; see, for example, the book by GH Hardy, JE Littlewood and G Polya [274], or references [43, 497], for more details.

# B.1 Convex functions and the midpoint condition

An extended real-valued function $f:\mathbb{R}\to \mathbb{R}\cup \{\mathrm{\infty}\}$ is *convex* on $I=[a,b]$ if, for each $0\le \mathrm{\lambda}\le 1$, and $a\le x<y\le b$,

This reduces to the *midpoint condition*

if $\mathrm{\lambda}=\frac{1}{2}$. (p.537)

Convexity may also be defined in terms of the epigraph of *f*. This is the subset in $\mathbb{R}\times \mathbb{R}$ defined by

The function *f* is *closed* if $\text{epi}f$ is a closed set and is convex if and only if $\text{epi}f$ is a convex set in ${\mathbb{R}}^{2}$, as may be seen immediately from the definition of convexity in equation (B.1). Observe that, if *f* is a lower semicontinuous extended real-valued function, then *f* is necessarily closed. (A function *f* is lower semicontinuous at a point *x*_{0} if $\underset{x\to {x}_{0}}{liminf}f(x)\ge f({x}_{0})$).

Denote the Lebesgue (outer)-measure by *μ*. In the next theorem convex functions are shown to be continuous and thus Lebesgue measurable.

Theorem B.1

Suppose that

fis convex on a closed intervalI. Iffis bounded inI, thenfis continuous onI, except perhaps at the endpoints ofI.

Proof

Put $I=[a,b]$. Then

fis bounded in $(a,b)$, say $\left|f\right|<C$ in $(a,b)$.Let $x\in (a,b)$. Choose $n>m$ in $\mathbb{N}$, and $\mathrm{\delta}>0$ a small real number, such that $x+n\mathrm{\delta}\in (a,b)$. Then

$$f(x+m\mathrm{\delta})=f\phantom{\rule{negativethinmathspace}{0ex}}\left(\frac{1}{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(m(x+n\mathrm{\delta})+(n-m)x\right)\right)\le \frac{m}{n}f(x+n\mathrm{\delta})+\frac{n-m}{n}f(x).$$This can be rearranged into

$$\frac{1}{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+n\mathrm{\delta})-f(x)\right)\ge \frac{1}{m}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+m\mathrm{\delta})-f(x)\right),$$and, if $\mathrm{\delta}\to -\mathrm{\delta}$, then

$$\frac{1}{m}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x)-f(x-m\mathrm{\delta})\right)\ge \frac{1}{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x)-f(x-n\mathrm{\delta})\right).$$Put $m=1$ and note that $f(x+\mathrm{\delta})+f(x-\mathrm{\delta})\ge 2f(x)$, and $\left|f\right|<C$;

$$\frac{1}{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(C-f(x)\right)\ge f(x+\mathrm{\delta})-f(x)\ge f(x)-f(x-\mathrm{\delta})\ge \frac{1}{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x)-C\right)\phantom{\rule{negativethinmathspace}{0ex}}.$$Next, take $\mathrm{\delta}\to 0$ and $n\to \mathrm{\infty}$ such that $x\pm n\mathrm{\delta}\in (c,d)$, and $n\mathrm{\delta}\to 0$. Then, by the squeeze theorem for limits,

fis left- and right-continuous atx.

Theorem B.2

Suppose that

fsatisfies the midpoint condition on a closed intervalI. Iffis bounded above in some open interval $J\subset I$, thenfis convex and bounded inJ, and $f(x)=+\mathrm{\infty}$ in $I\setminus J$.

Proof

The proof proceeds by generalising the midpoint condition through proving the following two claims.

CLAIM: If ${\mathrm{\lambda}}_{r}\in \mathbb{Q}$, and for $x,y\in I$, then

$${\mathrm{\lambda}}_{r}f(x)+(1-{\mathrm{\lambda}}_{r})f(y)\ge f\phantom{\rule{negativethinmathspace}{0ex}}\left({\mathrm{\lambda}}_{r}x+(1-{\mathrm{\lambda}}_{r})y\right).$$PROOF OF CLAIM: Put $m={2}^{n}$ and consider

mpoints $\{{x}_{i}{\}}_{i=1}^{m}$ inI.(p.538) Then repeated application of the midpoint condition shows that

(B.4)$$f({x}_{1})+f({x}_{2})+\cdots +f({x}_{m})\ge mf\phantom{\rule{negativethinmathspace}{0ex}}\left(\frac{1}{m}({x}_{1}+{x}_{2}+\cdots +{x}_{m})\right)$$If this relation is true for $m-1$, then it will be true for all $m\in \mathbb{N}$.

Consider $m-1$ points $\{{x}_{i}{\}}_{i=1}^{m-1}$ and put ${x}_{m}=\frac{1}{m-1}({x}_{1}+{x}_{2}+\cdots +{x}_{m-1})$. Then $m{x}_{m}=(m-1){x}_{m}+{x}_{m}=({x}_{1}+{x}_{2}+\cdots +{x}_{m})$ and, by equation (B.4), it follows that

$$mf({x}_{m})=mf\phantom{\rule{negativethinmathspace}{0ex}}\left(\frac{1}{m}({x}_{1}+{x}_{2}+\cdots +{x}_{m})\right)\le f({x}_{1})+f({x}_{2})+\cdots +f({x}_{m})$$In other words, subtracting $f({x}_{m})$ from both sides gives

$$f({x}_{1})+f({x}_{2})+\cdots +f({x}_{m-1})\ge (m-1)f\phantom{\rule{negativethinmathspace}{0ex}}\left(\frac{1}{m-1}({x}_{1}+{x}_{2}+\cdots +{x}_{m-1})\right)\phantom{\rule{negativethinmathspace}{0ex}}.$$This shows that equation (B.4) is true for any $m\in \mathbb{N}$.

If $p+q=m$, and $x={x}_{1}={x}_{2}=\cdots ={x}_{p}$, while $y={x}_{p+1}={x}_{p+2}=\cdots ={x}_{m}$, then a consequence of equation (B.4) is that

(B.5)$${\mathrm{\lambda}}_{p}f(x)+(1-{\mathrm{\lambda}}_{p})f(y)\ge f({\mathrm{\lambda}}_{p}x+(1-{\mathrm{\lambda}}_{p})y),$$for any rational number ${\mathrm{\lambda}}_{p}=\frac{p}{m}$. This completes the proof of the claim.

If

fis continuous, then a limit ${\mathrm{\lambda}}_{p}\to \mathrm{\lambda}$ can be taken through rational numbers, and this will complete the proof. Iffis not necessarily continuous, then prove the following claim.CLAIM: It may be assumed that

fis bounded on $J\subseteq I$ and infinite on $I\setminus J$.PROOF OF CLAIM: Put $I=[a,b]$, put $J=(c,d)$, and assume there is a $y\in I\setminus J$ such that $\left|f(y)\right|<\mathrm{\infty}$. Without loss of generality, suppose that $a<y<c$.

Let $x\in (y,c)$ and choose integers $p>q$ such that $\mathrm{\chi}=y+\frac{p}{q}(x-y)\in (c,d)$.

By equation (B.5),

$$f(x)=f(\frac{1}{p}(q\mathrm{\chi}+(p-q)y))\le \frac{q}{p}f(x)+\frac{p-q}{p}f(y)<\mathrm{\infty}.$$Thus, $f(x)<\mathrm{\infty}$, and put $c=y$ to find $f(x)$ bounded in an enlarged interval $(c,d)$.

A similar argument shows that $(c,d)$ can be grown by increasing

dif there are pointsywith $d<y<b$, with $\left|f(y)\right|<\mathrm{\infty}$.By theorem B.1,

fis continuous inJsince it is bounded onJ.By theorem B.1 and the last claim,

fis continuous inJ.

# B.2 Derivatives of convex functions

Finite convex functions on open intervals are continuous and hence measurable.

Theorem B.3

Suppose that

fis a finite and convex function on an open intervalI. Thenfhas finite and non-decreasing left- and right-derivatives inI. Moreover, if ${D}^{-}f=\frac{{d}^{-}}{dx}f$ is the left-derivative off, and ${D}^{+}f=\frac{{d}^{+}}{dx}f$ is the right-derivative off, then ${D}^{-}f\le {D}^{+}f$.

(p.539)Proof

Put $I=(a,b)$ and choose $x\in I$. Let $n\in \mathbb{N}$ and let $h>0$ small, such that $x+(\frac{n+1}{n})h\in I$. Then define $q(x,h)=f(x+h)-f(x)$. It follows that

$$q\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\frac{h}{n},h\right)-q(x,h)=\sum _{i=0}^{n-1}\left(f(x+\frac{h}{n}(i+2))-2f(x+\frac{h}{n}(i+1))+f(x+\frac{h}{n}i)\right)\ge 0,$$by convexity of

f. This shows that $q\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\frac{h}{n},h\right)\ge q(x,h)$. Repeat the above to obtain a sequence of inequalities:$$q(x,h)\le q\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\frac{h}{n},h\right)\le \cdots \le q\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\frac{mh}{n},h\right)\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\text{for some}m\in \mathbb{N}.$$Choose $\frac{m}{n}\in \mathbb{Q}$ such that $\frac{m}{n}\to \frac{\mathrm{\delta}}{n}$ for some $\mathrm{\delta}>0$. By continuity of

f, $q(x,h)\le q(x+\mathrm{\delta},h)$. Thus $q(x,h)$ is a non-decreasing function ofx.It remains to construct the derivatives of

f.CLAIM: The right-derivative of

fexists and is finite inI.PROOF OF CLAIM: It follows from the above that

$$q(x,\frac{h}{n})\le q(x+\frac{h}{n},\frac{h}{n})\le \cdots \le q(x+(n-1)\frac{h}{n},\frac{h}{n}).$$Thus, for $0<m<n$,

$$\frac{1}{m}\sum _{i=0}^{m-1}q\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\frac{ih}{n},\frac{h}{n}\right)\le \frac{1}{n}\sum _{i=0}^{n-1}q\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\frac{ih}{n},\frac{h}{n}\right)\phantom{\rule{negativethinmathspace}{0ex}}.$$These sums telescope to

$$\frac{n}{mh}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+\frac{1}{n}mh)-f(x)\right)\le \frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+h)-f(x)\right)$$if $q(x,h)=f(x+h)-f(x)$ is substituted. Let $0<{h}^{\mathrm{\prime}}<h$ and suppose that $\frac{m}{n}$ is a sequence of rational numbers which converges to $\frac{1}{h}{h}^{\mathrm{\prime}}$. Taking $n\to \mathrm{\infty}$ and assuming that $x<y$ give

(B.6)$$\frac{1}{{h}^{\mathrm{\prime}}}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+{h}^{\mathrm{\prime}})-f(x)\right)\le \frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+h)-f(x)\right)\le \frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(y+h)-f(y)\right)$$by continuity of

fand since $q(x,h)$ is non-decreasing withx. This shows that $\frac{1}{h}(f(x+h)-f(x))$ is a non-decreasing function ofh. Thus, $\underset{h\to {0}^{+}}{lim}\frac{1}{h}(f(x+h)-f(x))$ exists in the extended real numbers, for every $x\in (a,b)$.Taking ${h}^{\mathrm{\prime}}\to 0$ in equation (B.6) with

yandhfixed shows that ${D}^{+}f<\mathrm{\infty}$. Since $x<y$, taking $h\to {0}^{+}$ in equation (B.6) also shows that ${D}^{+}f(x)$ is non-decreasing withx.An argument similar to the above gives, for $0>{h}^{\mathrm{\prime}}>h$, and $x>y$,

$$\frac{1}{{h}^{\mathrm{\prime}}}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+{h}^{\mathrm{\prime}})-f(x)\right)\ge \frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+h)-f(x)\right)\ge \frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(y+h)-f(y)\right)\phantom{\rule{negativethinmathspace}{0ex}}.$$A similar argument to the above proves the existence of the left-derivative ${D}^{-}f$, and shows that it is finite and non-decreasing.

(p.540) Finally, since $q(x,h)$ is non-decreasing with

x, and for $h>0$,$$-\frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x-h)-f(x)\right)=\frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x)-f(x-h)\right)\le \frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+h)-f(x)\right).$$Taking $h\to {0}^{+}$ shows that ${D}^{-}f\le {D}^{+}f$, as required. This proves the claim.

This completes the proof.

Thus, a finite convex function on an open interval has left- and right-derivatives everywhere. These derivatives are non-decreasing and, hence, measurable.

Finite convex functions are differentiable almost everywhere. This follows from a standard theorem in measure theory, namely, that a non-decreasing real-valued function on a closed interval is differentiable almost everywhere in the interval; see for example reference [497].

The proof that a convex function is differentiable almost everywhere uses *Vitali converings*. Let $I=[a,b]$ be an interval. The *length* of *I* is $l(I)=b-a$. A set *E* is *covered (in the sense of Vitali)* if there is an (infinite) collection of intervals $\mathcal{I}=\{{I}_{i}\}$ such that $E\subset {{\displaystyle {\cup}_{i}I}}_{i}$ and, for every $x\in E$, and $\u03f5>0$, there is an $I\in \mathcal{I}$ such that $x\in I$ and $l(I)<\u03f5$; see for example reference [497].

The critical property about Vitali’s coverings is given by Vitali’s theorem.

Theorem B.4 (Vitali’s theorem)

Let

Ebe a finite measure subset of the real line and let $\mathcal{I}$ be a collection of intervals which coverEin the sense of Vitali. Given an $\u03f5>0$, there is a finite and disjoint collection of intervals in $\mathcal{I}$, say $\u3008{I}_{1},{I}_{2},\dots ,{I}_{N}\u3009$, such that$${\mathrm{\mu}}^{\ast}\phantom{\rule{negativethinmathspace}{0ex}}\left[E\setminus {\cup}_{i=1}^{N}{I}_{n}\right]<\u03f5,$$where ${\mathrm{\mu}}^{\ast}$ is Lebesgue outer measure.

To show that finite convex functions are differentiable almost everywhere, it is only necessary to show that non-decreasing functions are differentiable almost everywhere. To see this, suppose that $x<y<z$ and that *f* is a convex function. Put $\mathrm{\lambda}=\frac{y-z}{x-z}$. Then $0<\mathrm{\lambda}<1$, and

If $f(x)\le f(y)$, then replacing $f(x)$ by $f(y)$ in the last inequality gives $f(y)\le f(z)$.

In other words, if *f* is non-decreasing in an interval containing a point *x*, then it remains non-decreasing for larger values of *x*. A similar argument shows that, if *f* is non-increasing on an interval containing *x*, then it is non-increasing for all smaller values of *x*.

Hence, a convex function is either non-increasing or non-decreasing, or first non-increasing and then non-decreasing.

Theorem B.5

Let

fbe a non-decreasing real-valued function on $[a,b]$. Thenfis differentiable almost everywhere.

(p.541)Proof

Define the following (Dini)-derivatives of

f:$$\begin{array}{rl}{D}^{+}f(x)& =\underset{h\to {0}^{+}}{limsup}\frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+h)-f(x)\right);\\ {D}^{-}f(x)& =\underset{h\to {0}^{+}}{limsup}\frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x)-f(x-h)\right);\\ {D}_{+}f(x)& =\underset{h\to {0}^{+}}{liminf}\frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+h)-f(x)\right);\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\text{and}\\ {D}_{-}f(x)& =\underset{h\to {0}^{+}}{liminf}\frac{1}{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x)-f(x-h)\right).\end{array}$$Since

fis non-decreasing, ${D}^{+}f(x)\ge {D}_{-}f(x)$, and similar relations are valid between the other derivatives.By theorem B.3, ${D}^{+}f(x)={D}_{+}f(x)$, and ${D}^{-}f(x)={D}_{-}f(x)$.

For each pair of rational numbers $(u,v)$, define the set

$${E}_{u,v}=\left\{x|{D}^{+}f(x)>u>v>{D}_{-}f(x)\right\}.$$Then $F=\bigcup _{u,v\in \mathbb{Q}}{E}_{u,v}$ is the set of points where ${D}^{+}f(x)>{D}_{-}f(x)$.

CLAIM: The set

Fis a null set: $\mathrm{\mu}F=0$ PROOF OF CLAIM: It is shown that $\mathrm{\mu}{E}_{u,v}=0$ for all pairs $(u,v)\in {\mathbb{Q}}^{2}$.Put $\mathrm{\mu}{E}_{u,v}=s$ and let $\u03f5>0$. There is an open set $U\supseteq {E}_{u,v}$ such that $\mathrm{\mu}U<s+\u03f5$. By the definition of ${D}_{-}f(x)$, there is an

h_{x}such that ${I}_{x}=[x-{h}_{x},x]\subset U$, and $f(x)-f(x-{h}_{x})<v{h}_{x}$, for each $x\in U$.The set of intervals $[x-{h}_{x},x]$ is a Vitali covering, and there is a finite disjoint set $\left\{{I}_{1},{I}_{2},\dots ,{I}_{N}\right\}$ which almost covers ${E}_{u,v}$: if ${I}_{j}^{o}$ is the interior of interval

I_{j}, then $\left(\bigcup {I}_{j}^{o}\right)\cap U>s-\u03f5$. Put $A=\left(\bigcup {I}_{j}^{o}\right)\subseteq U$. Then $\mathrm{\mu}A\le \mathrm{\mu}U\le s+\u03f5$.Let ${I}_{j}=[{x}_{j}-{h}_{j},{x}_{j}]$ and put $\mathrm{\ell}({I}_{j})={h}_{j}$, the length of the

j-th interval. Sum over the intervals:(B.7)$$\sum _{i=1}^{N}\left(f({x}_{i})-f({x}_{i}-h)\right)<v\sum _{i=1}^{N}{h}_{i}<v\mathrm{\mu}A<v(s+\u03f5).$$Each $y\in A$ is the left endpoint of ${J}_{y}=(y,y+k)\in {I}_{n}$ for some value of

nand for small enoughk(kcan be found such that $f(y+k)-f(y)>uk$, by the definition of ${D}^{+}f(y)$).The collection of intervals $\{{J}_{y}\}$ is a Vitali covering of

A, and there is a finite disjoint collection $\{{J}_{1},{J}_{2},\dots ,{J}_{m}\}$ almost coveringA: $\left(\bigcup {J}_{j}\right)\cap A>s-2\u03f5$.Define ${J}_{j}=({y}_{j}+{k}_{j},{y}_{j})$ so that $\mathrm{\ell}({J}_{j})={k}_{j}$. By the definition of ${D}^{+}f(y)$,

(B.8)$$\sum _{i=1}^{m}[f({y}_{i}+{k}_{i})-f({y}_{i})]>u\sum _{i=1}^{m}{k}_{i}>u(s-2\u03f5).$$Each of the

J_{j}is contained in some intervalI_{i}. Sum over those ${J}_{j}\subset {I}_{i}$ and index theJ_{j}byj_{i}to obtain$$\sum _{{j}_{i}=1}\left(f({y}_{{j}_{i}}+{k}_{{j}_{i}})-f({y}_{{j}_{i}})\right)\le f({x}_{i})-f({x}_{i}-{h}_{i}),$$since

fis increasing. Thus, by summing this over alli, one obtains$$\sum _{j=1}^{m}\left(f({y}_{j}+{k}_{j})-f({y}_{j})\right)\le \sum _{i=1}^{n}\left(f({x}_{i})-f({x}_{i}-{h}_{i})\right).$$This shows by equations (B.7) and (B.8) that $v(s+\u03f5)\ge u(s-2\u03f5)$.

Take $\u03f5\to {0}^{+}$ to find that $vs\ge us$. Since $u>v$, this implies that $s=0$, with the result that $\mathrm{\mu}{E}_{u,v}=0$. This completes the proof of the claim.

By this claim, $\mathrm{\mu}F=0$, and ${D}^{+}f(x)={D}_{-}f(x)$, for almost every

x. This shows thatfis differentiable almost everywhere (since it is real valued and finite). This completes the proof.

Corollary B.6

If

fis a real-valued non-decreasing function on $[a,b]$, then ${f}^{\mathrm{\prime}}$ is measurable.

Proof

By theorem B.5, $g={f}^{\mathrm{\prime}}$ is defined almost everywhere in $[a,b]$, except on a null set

A(a zero-measure set).Define ${g}_{n}(x)=n\phantom{\rule{negativethinmathspace}{0ex}}\left(f(x+\frac{1}{n})-f(x)\right)$ for $x\in [a,b]\setminus A$ and put $f(x)=f(a)$ if $x\le a$, and put $f(x)=f(b)$ if $x\ge b$. Then ${g}_{n}(x)\to g(x)$ pointwise for almost all

x, so thatgis measurable. Since $g={f}^{\mathrm{\prime}}$ almost everywhere in $[a,b]$, this shows that ${f}^{\mathrm{\prime}}$ is measurable in $[a,b]$.

Since a convex function is either non-increasing, non-decreasing or first non-increasing and then non-decreasing, it follows from theorem B.5 and corollary B.6 that, if *f* is convex on $[a,b]$, then *f* is differentiable almost everywhere in $(a,b)$, and ${f}^{\mathrm{\prime}}$ is measurable in $[a,b]$. Since the Lebesgue measure is *σ*-finite, these properties extend naturally to all of $\mathbb{R}$.

By theorem B.3, if *f* is a convex function, then ${f}^{\mathrm{\prime}}$ is non-decreasing, and so ${f}^{\mathrm{\prime}}$ is continuous and differentiable almost everywhere.

Theorem B.7

If

fis a real-valued convex function, thenfis differentiable everywhere, except on a countable set of points.

Proof

Since

fis convex, ${f}^{\mathrm{\prime}}$ is non-decreasing.Thus, ${f}^{\mathrm{\prime}}$ is continuous almost everywhere.

To see that the number of points where ${f}^{\mathrm{\prime}}$ is not continuous is countable, consider first the case that

fis convex and real valued (and so finite) on $[a,b]$. Put $A=f(b)-f(a)$ and define the jump function$$J(x)=\underset{h\to {0}^{+}}{lim}{f}^{\mathrm{\prime}}(x+h)-\underset{h\to {0}^{+}}{lim}{f}^{\mathrm{\prime}}(x-h).$$If $J(x)=0$, then ${f}^{\mathrm{\prime}}$ is continuous at

x. If $J(x)>0$, then ${f}^{\mathrm{\prime}}$ is said to bediscontinuousatx.(p.543) The

gapatxis the interval ${I}_{x}=\left({f}^{\mathrm{\prime}}({x}^{-}),{f}^{\mathrm{\prime}}({x}^{+})\right)$, where the left- and right-limits are ${f}^{\mathrm{\prime}}({x}^{\pm})=\underset{h\to {0}^{+}}{lim}{f}^{\mathrm{\prime}}(x\pm h)$. It follows that $J(x)=\text{length}\left({I}_{x}\right)$, the length of this interval.Since ${f}^{\mathrm{\prime}}$ is non-decreasing, the intervals

I_{x}compose a disjoint family.Put

N_{k}equal to the number of intervals with $J(x)\ge {2}^{-k}$. Then ${2}^{-k}{N}_{k}\le A$, or ${N}_{k}\le {2}^{k}A<\mathrm{\infty}$. Thus,N_{k}is finite for eachk. Since the number of points where ${f}^{\mathrm{\prime}}$ is discontinuous is less than $\bigcup _{k}{N}_{k}$ the number ofI_{x}is countable. Hence, the number of points where ${f}^{\mathrm{\prime}}$ is discontinuous is countable. This completes the proof.

Thus, finite real-valued convex functions are differentiable except on a countable sets of points.

# B.3 Convergence

If $\u3008{f}_{n}\u3009$ is a sequence of convex functions, then, by Fatou’s lemma, $liminf{f}_{n}$ is measurable. If, in addition, ${f}_{n}\to f$ pointwise almost everywhere (or in measure), then *f* is measurable. By Lusin’s theorem, there is a continuous function *g* such that $f=g$, except in a set of small Lebesgue measure.

More generally, if $\u3008{f}_{n}\u3009$ is a Cauchy in measure sequence of functions on a finite measure set, then there exists a measurable *f* such that ${f}_{n}\to f$ in measure on the finite measure set. By the *σ*-finiteness of the Lebesgue measure, these results extend to all of $\mathbb{R}$.

Lemma B.8

Suppose that $\u3008{f}_{n}\u3009$ is a sequence of convex functions converging pointwise to a limit

falmost everywhere. Thenfis convex.

Proof

Without loss of generality the ae condition may be ignored.

Suppose first that

f_{n}andfare defined on the closed interval $[a,b]$.Suppose that

fis not convex on $[a,b]$. That is, there exist points $c,d\in [a,b]$ such that $f(c)+f(d)<2f(\frac{1}{2}(c+d))$.Put $A=2f(\frac{1}{2}(c+d))-f(c)-f(d)>0$. Since ${f}_{n}(x)$ is convex,

$${f}_{n}(c)+{f}_{n}(d)\ge 2{f}_{n}(\frac{1}{2}(c+d)).$$For every $\u03f5>0$, there exist

N_{c},N_{d}andNin $\mathbb{N}$ such that

(1) for all $n>{N}_{c}$, $\left|{f}_{n}(c)-f(c)\right|<\u03f5$;

(2) for all $n>{N}_{d}$, $\left|{f}_{n}(d)-f(d)\right|<\u03f5$; and

(3) for all $n>N$, $\left|{f}_{n}(\frac{1}{2}(c+d))-f(\frac{1}{2}(c+d))\right|<\u03f5$.

However,

$$\begin{array}{rl}\left|{f}_{n}(c)-f(c)\right|+\left|{f}_{n}(d)-f(d)\right|& \ge \left|{f}_{n}(c)+{f}_{n}(d)-f(c)-f(d)\right|\\ & \ge 2\left({f}_{n}(\frac{1}{2}(c+d))-f(\frac{1}{2}(c+d))\right)+A.\end{array}$$Take $n>max\{{N}_{c},{N}_{d},N\}$. Then, by (1), (2) and (3) above, $2\u03f5\ge A-2\u03f5$, or $A\le 4\u03f5$. This shows that $A\le 0$, since $\u03f5>0$ is arbitrary. This is a contradiction, and hence

fis convex on $[a,b]$. Since the Lebesgue measure isσ-finite, this extends to $\mathbb{R}$.

(p.544) By lemma B.8, the limit of a sequence of convex functions is itself convex. The limit is differentiable almost everywhere. It is also the limit of the sequence of derivatives almost everywhere:

Theorem B.9

Suppose that $\u3008{f}_{n}\u3009$ is a sequence of convex functions converging pointwise to a limit

falmost everywhere. Thenfis convex. Moreover, the sequence of derivatives $\u3008{f}_{n}^{\mathrm{\prime}}\u3009$ converges to ${f}^{\mathrm{\prime}}$ almost everywhere.

Proof

Without loss of generality the ae condition may be ignored. Put ${D}^{+}\equiv \frac{{d}^{+}}{dx}$ and ${D}^{-}\equiv \frac{{d}^{-}}{dx}$.

By lemma B.8,

fis convex and differentiable almost everywhere. Without loss of generality, assume thatfandf_{n}are differentiable everywhere.By theorem B.3,

f_{n}andfhave non-decreasing left- and right-derivatives everywhere.Next, note by convexity of

f_{n}that, for fixed $x<y$, and $\mathrm{\lambda}\in (0,1)$,$$\begin{array}{rl}{f}_{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\mathrm{\lambda}(y-x)\right)& ={f}_{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\lambda}y+(1-\mathrm{\lambda})x\right)\\ & \le \mathrm{\lambda}{f}_{n}(y)+(1-\mathrm{\lambda}){f}_{n}(x)={f}_{n}(x)+\mathrm{\lambda}\left({f}_{n}(y)-{f}_{n}(x)\right)\end{array}$$Rearrange this to see that

$$\frac{{f}_{n}\phantom{\rule{negativethinmathspace}{0ex}}\left(x+\mathrm{\lambda}(y-x)\right)-{f}_{n}(x)}{\mathrm{\lambda}(y-x)}\le \frac{{f}_{n}(y)-{f}_{n}(x)}{y-x}.$$Take $\mathrm{\lambda}\to {0}^{+}$ on the left-hand side and put $y=x+h$. This gives

(B.9)$${D}^{+}{f}_{n}(x)\le \frac{1}{h}({f}_{n}(x+h)-{f}_{n}(x)).$$Choose an $\u03f5>0$. Fix $x\in \mathbb{R}$. Then ${f}_{n}(x)\to f(x)$, and there is an ${N}_{0}\in \mathbb{N}$ such that for all $n\ge {N}_{0}$,

(B.10)$$\left|{f}_{n}(x)-f(x)\right|\le \u03f5.$$Suppose that $n\ge {N}_{0}$. By equations (B.9) and (B.10),

$${D}^{+}{f}_{n}(x)\le \frac{1}{h}({f}_{n}(x+h)-{f}_{n}(x))\le \frac{1}{h}({f}_{n}(x+h)-f(x)+\u03f5).$$Take the limit superior of the left-hand side. This gives

$$\underset{n\to \mathrm{\infty}}{limsup}{D}^{+}{f}_{n}(x)\le \frac{1}{h}(f(x+h)-f(x)+\u03f5).$$Take $\u03f5\to {0}^{+}$ and then $h\to {0}^{+}$ to find that $\underset{n\to \mathrm{\infty}}{limsup}{D}^{+}{f}_{n}(x)\le {D}^{+}f(x)$.

A similar argument shows that $\underset{n\to \mathrm{\infty}}{liminf}{D}^{-}{f}_{n}(x)\ge {D}^{-}f(x)$.

Since

fis differentiable almost everywhere by theorem B.5, $\underset{n\to \mathrm{\infty}}{lim}\frac{d}{dx}{f}_{n}(x)={f}^{\mathrm{\prime}}(x)$ for almost allx, and thus ${f}_{n}^{\mathrm{\prime}}\to {f}^{\mathrm{\prime}}$ pointwise almost everywhere. This completes the proof.

# (p.545) B.4 The Legendre transform

Suppose that *f* is an extended real-valued function and that $f>-\mathrm{\infty}$. The essential domain of *f* is defined by the set

If $f<\mathrm{\infty}$, then $\text{dom}f=\mathbb{R}$.

The Legendre transform of an extended real-valued function *f* of one variable is usually defined by the supremum

The transform ${f}^{\ast}$ is necessarily convex. To see this, consider

for any $x\in \text{dom}f$. Take the supremum on the right-hand side to see that ${f}^{\ast}(p)+{f}^{\ast}(q)\ge 2f(\frac{1}{2}(p+q))$.

If *f* is itself convex, finite and closed, then it can be recovered from its Legendre transform, since ${f}^{\ast \ast}=f$, as will be seen below.

The above definition generalised to real-valued functions on ${\mathbb{R}}^{n}$ is the Legendre-Fenchel transformation [550].

Instead of using equation (B.12) as a definition, consider the alternative definition, where ${f}^{\ast}$ is defined such that

Integrating the left-hand side with respect to *x* for fixed *p* shows that $px=f(x)+{C}_{p}$, and the right-hand side with respect to *p* for fixed *x* shows $px={f}^{\ast}(p)+{C}_{x}^{\mathrm{\prime}}$. This shows that

a result which follows essentially by inverting the derivatives of *f* and ${f}^{\ast}$ above.

This result is valid if *f* and ${f}^{\ast}$ are differentiable almost everywhere, in which case the last equality is true almost everywhere. More generally, definition (B.12) is used and is defined for a more general class of (measurable) functions *f*.

In the case that *f* is convex, then it is differentiable almost everywhere, and its derivative is monotone and continuous almost everywhere. This shows that $p=\frac{d}{dx}f(x)$ has a solution ${x}^{\ast}={f}^{\ast}(p)$ almost everywhere, which is a convex function in *p*.

The above can be implemented by considering the *subdifferential* of *f* instead of the derivative. The subdifferential of *f* at *x* is the set $\mathrm{\partial}f(x)$ defined by

This defines a map $x\to \mathrm{\partial}f(x)$, which is not a bijection but which may be inverted at *p* by finding an *x*_{p} such that $p\in \mathrm{\partial}f({x}_{p})$. The function *x*_{p} is not necessarily unique.

(p.546)
This in particular shows that $f(y)-py\ge f({x}_{p})-p{x}_{p}$ and so one may choose ${x}_{p}\in \text{dom}f$ to maximise the right-hand side. This gives the definition for ${f}^{\ast}$ in equation (B.12) by inverting the subdifferential of *f* and generalising the arguments following equation (B.14).

Lemma B.10

The Legendre transform ${f}^{\ast}$ is necessarily closed and convex.

Proof

Define ${g}_{x}(p)=px-f(x)$. Then ${f}^{\ast}(p)$ is the supremum of ${g}_{x}(p)$ over $x\in \mathbb{R}$. This shows that $\text{epi}{f}^{\ast}=\bigcap _{x}\text{epi}{g}_{x}$. The epigraph of

g_{x}is a closed and convex set for eachx. Thus, $\text{epi}{f}^{\ast}$ is closed and convex, since it is the intersection of closed and convex sets. Hence, ${f}^{\ast}$ is convex.

Since ${f}^{\ast}$ is convex, it is continuous and also differentiable æ. By theorem B.7, it is in fact differentiable everywhere except on a countable set of points in its essential domain.

Since $f>-\mathrm{\infty}$ and ${f}^{\ast}$ is convex, it follows that ${f}^{\ast}>-\mathrm{\infty}$. Thus, the Legendre transform of ${f}^{\ast}$ (or the *biconjugate* of *f*) is defined by

In addition, ${f}^{\ast \ast}$ is convex and therefore continuous and also differentiable almost everywhere, by lemma B.10.

Theorem B.11

For a measurable function

f,

(a) ${f}^{\ast \ast}$ is convex, and ${f}^{\ast \ast}\le f$;

(b) if ${f}^{\ast \ast}=f$, then

fis closed and convex; and(c) if

fis closed and convex, then ${f}^{\ast \ast}=f$ on the essential domain off.

Proof

The function ${f}^{\ast \ast}$ is convex, and $\text{epi}{f}^{\ast}$ is closed, by lemma B.10. For any $x\in \mathbb{R}$, and $p\in \text{dom}{f}^{\ast}$, ${f}^{\ast}(p)\le px-f(x)$; therefore,

$$px-{f}^{\ast}(p)\le px-\left(px-f(x)\right)=f(x).$$Take the supremum on the left-hand side for $p\in \text{dom}{f}^{\ast}$ to see that ${f}^{\ast \ast}\le f$.

If ${f}^{\ast \ast}=f$, then, by lemma B.10,

fis closed and convex.Fix $x\in \text{dom}f$ and assume

fis closed and convex. Then $f(x)<\mathrm{\infty}$, and $\mathrm{\partial}f(x)\ne \mathrm{\varnothing}$. Let $p\in \mathrm{\partial}f(x)$; then it follows from equation (B.16) that$$f(y)\ge f(x)+p(y-x)\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\text{for all}y\in \mathbb{R}.$$Choose

ysuch that ${f}^{\ast}(p)=py-f(y)$ (sincefis closed, there exists such ay). Then ${f}^{\ast}(p)=py-f(y)\le px-f(x)$, and this shows that $f(x)\le px-{f}^{\ast}(p)\le {f}^{\ast \ast}(x)$.

Hence, if *f* is finite and convex, then ${f}^{\ast \ast}=f$ if *f* is closed. If *f* is a convex function on $\mathbb{R}$, then it is lower semicontinuous and so closed; thus ${f}^{\ast \ast}=f$.