Jump to ContentJump to Main Navigation
The Statistical Mechanics of Interacting Walks, Polygons, Animals and Vesicles$

E.J. Janse van Rensburg

Print publication date: 2015

Print ISBN-13: 9780199666577

Published to Oxford Scholarship Online: August 2015

DOI: 10.1093/acprof:oso/9780199666577.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 26 February 2017

(p.536) Appendix B Convex functions

(p.536) Appendix B Convex functions

Source:
The Statistical Mechanics of Interacting Walks, Polygons, Animals and Vesicles
Publisher:
Oxford University Press

Free energies F of models of interacting lattice clusters are convex functions (as shown in theorem 3.3). This has certain consequences; namely, that F is differentiable almost everywhere (if it is finite), and finite size approximations to the free energy of an infinite system may be shown to converge to the free energy almost everywhere. The convexity of F is related to the stability of the thermodynamic equilibrium of a system.

If f is a convex function, then f is a concave function, and with appropriate reinterpretation, all results for convex functions apply to concave functions. There are numerous classical and more recent references on convex functions; see, for example, the book by GH Hardy, JE Littlewood and G Polya [274], or references [43, 497], for more details.

B.1 Convex functions and the midpoint condition

An extended real-valued function f:RR{} is convex on I=[a,b] if, for each 0λ1, and ax<yb,

(B.1)
λf(x)+(1λ)f(y)fλx+(1λ)y.

This reduces to the midpoint condition

(B.2)
f(x)+f(y)2f12(x+y)

if λ=12. (p.537)

Appendix B Convex functions

Fig. B.1 The epigraph of a function f is the shaded region with boundary f and vertical lines as indicated. The function f is convex if and only if epif is a convex set in the plane.

Convexity may also be defined in terms of the epigraph of f. This is the subset in R×R defined by

(B.3)
epif={(x,y)|yf(x)}.

The function f is closed if epif is a closed set and is convex if and only if epif is a convex set in R2, as may be seen immediately from the definition of convexity in equation (B.1). Observe that, if f is a lower semicontinuous extended real-valued function, then f is necessarily closed. (A function f is lower semicontinuous at a point x0 if lim infxx0f(x)f(x0)).

Denote the Lebesgue (outer)-measure by μ‎. In the next theorem convex functions are shown to be continuous and thus Lebesgue measurable.

Theorem B.1

Suppose that f is convex on a closed interval I. If f is bounded in I, then f is continuous on I, except perhaps at the endpoints of I.

Proof

Put I=[a,b]. Then f is bounded in (a,b), say f<C in (a,b).

Let x(a,b). Choose n>m in N, and δ>0 a small real number, such that x+nδ(a,b). Then

f(x+mδ)=f1nm(x+nδ)+(nm)xmnf(x+nδ)+nmnf(x).

This can be rearranged into

1nf(x+nδ)f(x)1mf(x+mδ)f(x),

and, if δδ, then

1mf(x)f(xmδ)1nf(x)f(xnδ).

Put m=1 and note that f(x+δ)+f(xδ)2f(x), and f<C;

1nCf(x)f(x+δ)f(x)f(x)f(xδ)1nf(x)C.

Next, take δ0 and n such that x±nδ(c,d), and nδ0. Then, by the squeeze theorem for limits, f is left- and right-continuous at x.

Theorem B.2

Suppose that f satisfies the midpoint condition on a closed interval I. If f is bounded above in some open interval JI, then f is convex and bounded in J, and f(x)=+ in IJ.

Proof

The proof proceeds by generalising the midpoint condition through proving the following two claims.

CLAIM: If λrQ, and for x,yI, then

λrf(x)+(1λr)f(y)fλrx+(1λr)y.

PROOF OF CLAIM: Put m=2n and consider m points {xi}i=1m in I.

(p.538) Then repeated application of the midpoint condition shows that

(B.4)
f(x1)+f(x2)++f(xm)mf1m(x1+x2++xm)

If this relation is true for m1, then it will be true for all mN.

Consider m1 points {xi}i=1m1 and put xm=1m1(x1+x2++xm1). Then mxm=(m1)xm+xm=(x1+x2++xm) and, by equation (B.4), it follows that

mf(xm)=mf1m(x1+x2++xm)f(x1)+f(x2)++f(xm)

In other words, subtracting f(xm) from both sides gives

f(x1)+f(x2)++f(xm1)(m1)f1m1(x1+x2++xm1).

This shows that equation (B.4) is true for any mN.

If p+q=m, and x=x1=x2==xp, while y=xp+1=xp+2==xm, then a consequence of equation (B.4) is that

(B.5)
λpf(x)+(1λp)f(y)f(λpx+(1λp)y),

for any rational number λp=pm. This completes the proof of the claim.

If f is continuous, then a limit λpλ can be taken through rational numbers, and this will complete the proof. If f is not necessarily continuous, then prove the following claim.

CLAIM: It may be assumed that f is bounded on JI and infinite on IJ.

PROOF OF CLAIM: Put I=[a,b], put J=(c,d), and assume there is a yIJ such that f(y)<. Without loss of generality, suppose that a<y<c.

Let x(y,c) and choose integers p>q such that χ=y+pq(xy)(c,d).

By equation (B.5),

f(x)=f(1p(qχ+(pq)y))qpf(x)+pqpf(y)<.

Thus, f(x)<, and put c=y to find f(x) bounded in an enlarged interval (c,d).

A similar argument shows that (c,d) can be grown by increasing d if there are points y with d<y<b, with f(y)<.

By theorem B.1, f is continuous in J since it is bounded on J.

By theorem B.1 and the last claim, f is continuous in J.

B.2 Derivatives of convex functions

Finite convex functions on open intervals are continuous and hence measurable.

Theorem B.3

Suppose that f is a finite and convex function on an open interval I. Then f has finite and non-decreasing left- and right-derivatives in I. Moreover, if Df=ddxf is the left-derivative of f, and D+f=d+dxf is the right-derivative of f, then DfD+f.

(p.539)

Proof

Put I=(a,b) and choose xI. Let nN and let h>0 small, such that x+(n+1n)hI. Then define q(x,h)=f(x+h)f(x). It follows that

qx+hn,hq(x,h)=i=0n1f(x+hn(i+2))2f(x+hn(i+1))+f(x+hni)0,

by convexity of f. This shows that qx+hn,hq(x,h). Repeat the above to obtain a sequence of inequalities:

q(x,h)qx+hn,hqx+mhn,hfor some mN.

Choose mnQ such that mnδn for some δ>0. By continuity of f, q(x,h)q(x+δ,h). Thus q(x,h) is a non-decreasing function of x.

It remains to construct the derivatives of f.

CLAIM: The right-derivative of f exists and is finite in I.

PROOF OF CLAIM: It follows from the above that

q(x,hn)q(x+hn,hn)q(x+(n1)hn,hn).

Thus, for 0<m<n,

1mi=0m1qx+ihn,hn1ni=0n1qx+ihn,hn.

These sums telescope to

nmhf(x+1nmh)f(x)1hf(x+h)f(x)

if q(x,h)=f(x+h)f(x) is substituted. Let 0<h<h and suppose that mn is a sequence of rational numbers which converges to 1hh. Taking n and assuming that x<y give

(B.6)
1hf(x+h)f(x)1hf(x+h)f(x)1hf(y+h)f(y)

by continuity of f and since q(x,h) is non-decreasing with x. This shows that 1h(f(x+h)f(x)) is a non-decreasing function of h. Thus, limh0+1h(f(x+h)f(x)) exists in the extended real numbers, for every x(a,b).

Taking h0 in equation (B.6) with y and h fixed shows that D+f<. Since x<y, taking h0+ in equation (B.6) also shows that D+f(x) is non-decreasing with x.

An argument similar to the above gives, for 0>h>h, and x>y,

1hf(x+h)f(x)1hf(x+h)f(x)1hf(y+h)f(y).

A similar argument to the above proves the existence of the left-derivative Df, and shows that it is finite and non-decreasing.

(p.540) Finally, since q(x,h) is non-decreasing with x, and for h>0,

1hf(xh)f(x)=1hf(x)f(xh)1hf(x+h)f(x).

Taking h0+ shows that DfD+f, as required. This proves the claim.

This completes the proof.

Thus, a finite convex function on an open interval has left- and right-derivatives everywhere. These derivatives are non-decreasing and, hence, measurable.

Finite convex functions are differentiable almost everywhere. This follows from a standard theorem in measure theory, namely, that a non-decreasing real-valued function on a closed interval is differentiable almost everywhere in the interval; see for example reference [497].

The proof that a convex function is differentiable almost everywhere uses Vitali converings. Let I=[a,b] be an interval. The length of I is l(I)=ba. A set E is covered (in the sense of Vitali) if there is an (infinite) collection of intervals I={Ii} such that EiIi and, for every xE, and ϵ>0, there is an II such that xI and l(I)<ϵ; see for example reference [497].

The critical property about Vitali’s coverings is given by Vitali’s theorem.

Theorem B.4 (Vitali’s theorem)

Let E be a finite measure subset of the real line and let I be a collection of intervals which cover E in the sense of Vitali. Given an ϵ>0, there is a finite and disjoint collection of intervals in I, say I1,I2,,IN, such that

μEi=1NIn<ϵ,

where μ is Lebesgue outer measure.

To show that finite convex functions are differentiable almost everywhere, it is only necessary to show that non-decreasing functions are differentiable almost everywhere. To see this, suppose that x<y<z and that f is a convex function. Put λ=yzxz. Then 0<λ<1, and

λf(x)+(1λ)f(z)f(y).

If f(x)f(y), then replacing f(x) by f(y) in the last inequality gives f(y)f(z).

In other words, if f is non-decreasing in an interval containing a point x, then it remains non-decreasing for larger values of x. A similar argument shows that, if f is non-increasing on an interval containing x, then it is non-increasing for all smaller values of x.

Hence, a convex function is either non-increasing or non-decreasing, or first non-increasing and then non-decreasing.

Theorem B.5

Let f be a non-decreasing real-valued function on [a,b]. Then f is differentiable almost everywhere.

(p.541)

Proof

Define the following (Dini)-derivatives of f:

D+f(x)=lim suph0+1hf(x+h)f(x);Df(x)=lim suph0+1hf(x)f(xh);D+f(x)=lim infh0+1hf(x+h)f(x);andDf(x)=lim infh0+1hf(x)f(xh).

Since f is non-decreasing, D+f(x)Df(x), and similar relations are valid between the other derivatives.

By theorem B.3, D+f(x)=D+f(x), and Df(x)=Df(x).

For each pair of rational numbers (u,v), define the set

Eu,v=x|D+f(x)>u>v>Df(x).

Then F=u,vQEu,v is the set of points where D+f(x)>Df(x).

CLAIM: The set F is a null set: μF=0 PROOF OF CLAIM: It is shown that μEu,v=0 for all pairs (u,v)Q2.

Put μEu,v=s and let ϵ>0. There is an open set UEu,v such that μU<s+ϵ. By the definition of Df(x), there is an hx such that Ix=[xhx,x]U, and f(x)f(xhx)<vhx, for each xU.

The set of intervals [xhx,x] is a Vitali covering, and there is a finite disjoint set I1,I2,,IN which almost covers Eu,v: if Ijo is the interior of interval Ij, then IjoU>sϵ. Put A=IjoU. Then μAμUs+ϵ.

Let Ij=[xjhj,xj] and put (Ij)=hj, the length of the j-th interval. Sum over the intervals:

(B.7)
i=1Nf(xi)f(xih)<vi=1Nhi<vμA<v(s+ϵ).

Each yA is the left endpoint of Jy=(y,y+k)In for some value of n and for small enough k (k can be found such that f(y+k)f(y)>uk, by the definition of D+f(y)).

The collection of intervals {Jy} is a Vitali covering of A, and there is a finite disjoint collection {J1,J2,,Jm} almost covering A: JjA>s2ϵ.

Define Jj=(yj+kj,yj) so that (Jj)=kj. By the definition of D+f(y),

(B.8)
i=1m[f(yi+ki)f(yi)]>ui=1mki>u(s2ϵ).

Each of the Jj is contained in some interval Ii. Sum over those JjIi and index the Jj by ji to obtain

(p.542)

ji=1f(yji+kji)f(yji)f(xi)f(xihi),

since f is increasing. Thus, by summing this over all i, one obtains

j=1mf(yj+kj)f(yj)i=1nf(xi)f(xihi).

This shows by equations (B.7) and (B.8) that v(s+ϵ)u(s2ϵ).

Take ϵ0+ to find that vsus. Since u>v, this implies that s=0, with the result that μEu,v=0. This completes the proof of the claim.

By this claim, μF=0, and D+f(x)=Df(x), for almost every x. This shows that f is differentiable almost everywhere (since it is real valued and finite). This completes the proof.

Corollary B.6

If f is a real-valued non-decreasing function on [a,b], then f is measurable.

Proof

By theorem B.5, g=f is defined almost everywhere in [a,b], except on a null set A (a zero-measure set).

Define gn(x)=nf(x+1n)f(x) for x[a,b]A and put f(x)=f(a) if xa, and put f(x)=f(b) if xb. Then gn(x)g(x) pointwise for almost all x, so that g is measurable. Since g=f almost everywhere in [a,b], this shows that f is measurable in [a,b].

Since a convex function is either non-increasing, non-decreasing or first non-increasing and then non-decreasing, it follows from theorem B.5 and corollary B.6 that, if f is convex on [a,b], then f is differentiable almost everywhere in (a,b), and f is measurable in [a,b]. Since the Lebesgue measure is σ‎-finite, these properties extend naturally to all of R.

By theorem B.3, if f is a convex function, then f is non-decreasing, and so f is continuous and differentiable almost everywhere.

Theorem B.7

If f is a real-valued convex function, then f is differentiable everywhere, except on a countable set of points.

Proof

Since f is convex, f is non-decreasing.

Thus, f is continuous almost everywhere.

To see that the number of points where f is not continuous is countable, consider first the case that f is convex and real valued (and so finite) on [a,b]. Put A=f(b)f(a) and define the jump function

J(x)=limh0+f(x+h)limh0+f(xh).

If J(x)=0, then f is continuous at x. If J(x)>0, then f is said to be discontinuous at x.

(p.543) The gap at x is the interval Ix=f(x),f(x+), where the left- and right-limits are f(x±)=limh0+f(x±h). It follows that J(x)=lengthIx, the length of this interval.

Since f is non-decreasing, the intervals Ix compose a disjoint family.

Put Nk equal to the number of intervals with J(x)2k. Then 2kNkA, or Nk2kA<. Thus, Nk is finite for each k. Since the number of points where f is discontinuous is less than kNk the number of Ix is countable. Hence, the number of points where f is discontinuous is countable. This completes the proof.

Thus, finite real-valued convex functions are differentiable except on a countable sets of points.

B.3 Convergence

If fn is a sequence of convex functions, then, by Fatou’s lemma, lim inffn is measurable. If, in addition, fnf pointwise almost everywhere (or in measure), then f is measurable. By Lusin’s theorem, there is a continuous function g such that f=g, except in a set of small Lebesgue measure.

More generally, if fn is a Cauchy in measure sequence of functions on a finite measure set, then there exists a measurable f such that fnf in measure on the finite measure set. By the σ‎-finiteness of the Lebesgue measure, these results extend to all of R.

Lemma B.8

Suppose that fn is a sequence of convex functions converging pointwise to a limit f almost everywhere. Then f is convex.

Proof

Without loss of generality the ae condition may be ignored.

Suppose first that fn and f are defined on the closed interval [a,b].

Suppose that f is not convex on [a,b]. That is, there exist points c,d[a,b] such that f(c)+f(d)<2f(12(c+d)).

Put A=2f(12(c+d))f(c)f(d)>0. Since fn(x) is convex,

fn(c)+fn(d)2fn(12(c+d)).

For every ϵ>0, there exist Nc, Nd and N in N such that

  1. (1) for all n>Nc, fn(c)f(c)<ϵ;

  2. (2) for all n>Nd, fn(d)f(d)<ϵ; and

  3. (3) for all n>N, fn(12(c+d))f(12(c+d))<ϵ.

However,

fn(c)f(c)+fn(d)f(d)fn(c)+fn(d)f(c)f(d)2fn(12(c+d))f(12(c+d))+A.

Take n>max{Nc,Nd,N}. Then, by (1), (2) and (3) above, 2ϵA2ϵ, or A4ϵ. This shows that A0, since ϵ>0 is arbitrary. This is a contradiction, and hence f is convex on [a,b]. Since the Lebesgue measure is σ‎-finite, this extends to R.

(p.544) By lemma B.8, the limit of a sequence of convex functions is itself convex. The limit is differentiable almost everywhere. It is also the limit of the sequence of derivatives almost everywhere:

Theorem B.9

Suppose that fn is a sequence of convex functions converging pointwise to a limit f almost everywhere. Then f is convex. Moreover, the sequence of derivatives fn converges to f almost everywhere.

Proof

Without loss of generality the ae condition may be ignored. Put D+d+dx and Dddx.

By lemma B.8, f is convex and differentiable almost everywhere. Without loss of generality, assume that f and fn are differentiable everywhere.

By theorem B.3, fn and f have non-decreasing left- and right-derivatives everywhere.

Next, note by convexity of fn that, for fixed x<y, and λ(0,1),

fnx+λ(yx)=fnλy+(1λ)xλfn(y)+(1λ)fn(x)=fn(x)+λfn(y)fn(x)

Rearrange this to see that

fnx+λ(yx)fn(x)λ(yx)fn(y)fn(x)yx.

Take λ0+ on the left-hand side and put y=x+h. This gives

(B.9)
D+fn(x)1h(fn(x+h)fn(x)).

Choose an ϵ>0. Fix xR. Then fn(x)f(x), and there is an N0N such that for all nN0,

(B.10)
fn(x)f(x)ϵ.

Suppose that nN0. By equations (B.9) and (B.10),

D+fn(x)1h(fn(x+h)fn(x))1h(fn(x+h)f(x)+ϵ).

Take the limit superior of the left-hand side. This gives

lim supnD+fn(x)1h(f(x+h)f(x)+ϵ).

Take ϵ0+ and then h0+ to find that lim supnD+fn(x)D+f(x).

A similar argument shows that lim infnDfn(x)Df(x).

Since f is differentiable almost everywhere by theorem B.5, limnddxfn(x)=f(x) for almost all x, and thus fnf pointwise almost everywhere. This completes the proof.

(p.545) B.4 The Legendre transform

Suppose that f is an extended real-valued function and that f>. The essential domain of f is defined by the set

(B.11)
domf={x|f(x)<}.

If f<, then domf=R.

The Legendre transform of an extended real-valued function f of one variable is usually defined by the supremum

(B.12)
f(p)=sup{pxf(x)|for xdomf}.

The transform f is necessarily convex. To see this, consider

(B.13)
f(p)+f(q)px+f(x)+qx+f(x)=(p+q)x+2f(x)

for any xdomf. Take the supremum on the right-hand side to see that f(p)+f(q)2f(12(p+q)).

If f is itself convex, finite and closed, then it can be recovered from its Legendre transform, since f=f, as will be seen below.

The above definition generalised to real-valued functions on Rn is the Legendre-Fenchel transformation [550].

Instead of using equation (B.12) as a definition, consider the alternative definition, where f is defined such that

(B.14)
p=ddxf(x)x=ddpf(p).

Integrating the left-hand side with respect to x for fixed p shows that px=f(x)+Cp, and the right-hand side with respect to p for fixed x shows px=f(p)+Cx. This shows that

(B.15)
px=f(x)+f(p),

a result which follows essentially by inverting the derivatives of f and f above.

This result is valid if f and f are differentiable almost everywhere, in which case the last equality is true almost everywhere. More generally, definition (B.12) is used and is defined for a more general class of (measurable) functions f.

In the case that f is convex, then it is differentiable almost everywhere, and its derivative is monotone and continuous almost everywhere. This shows that p=ddxf(x) has a solution x=f(p) almost everywhere, which is a convex function in p.

The above can be implemented by considering the subdifferential of f instead of the derivative. The subdifferential of f at x is the set f(x) defined by

(B.16)
f(x)={p|f(y)f(x)+p(yx), for all yR}.

This defines a map xf(x), which is not a bijection but which may be inverted at p by finding an xp such that pf(xp). The function xp is not necessarily unique.

(p.546) This in particular shows that f(y)pyf(xp)pxp and so one may choose xpdomf to maximise the right-hand side. This gives the definition for f in equation (B.12) by inverting the subdifferential of f and generalising the arguments following equation (B.14).

Lemma B.10

The Legendre transform f is necessarily closed and convex.

Proof

Define gx(p)=pxf(x). Then f(p) is the supremum of gx(p) over xR. This shows that epif=xepigx. The epigraph of gx is a closed and convex set for each x. Thus, epif is closed and convex, since it is the intersection of closed and convex sets. Hence, f is convex.

Since f is convex, it is continuous and also differentiable æ. By theorem B.7, it is in fact differentiable everywhere except on a countable set of points in its essential domain.

Since f> and f is convex, it follows that f>. Thus, the Legendre transform of f (or the biconjugate of f) is defined by

(B.17)
f(x)=supp{pxf(p)|for pdomf}.

In addition, f is convex and therefore continuous and also differentiable almost everywhere, by lemma B.10.

Theorem B.11

For a measurable function f,

  1. (a) f is convex, and ff;

  2. (b) if f=f, then f is closed and convex; and

  3. (c) if f is closed and convex, then f=f on the essential domain of f.

Proof

The function f is convex, and epif is closed, by lemma B.10. For any xR, and pdomf, f(p)pxf(x); therefore,

pxf(p)pxpxf(x)=f(x).

Take the supremum on the left-hand side for pdomf to see that ff.

If f=f, then, by lemma B.10, f is closed and convex.

Fix xdomf and assume f is closed and convex. Then f(x)<, and f(x). Let pf(x); then it follows from equation (B.16) that

f(y)f(x)+p(yx)for all yR.

Choose y such that f(p)=pyf(y) (since f is closed, there exists such a y). Then f(p)=pyf(y)pxf(x), and this shows that f(x)pxf(p)f(x).

Hence, if f is finite and convex, then f=f if f is closed. If f is a convex function on R, then it is lower semicontinuous and so closed; thus f=f.