# (p.280) Appendix 6.1 Likelihood of a distribution

# (p.280) Appendix 6.1 Likelihood of a distribution

In this appendix we first wish to examine in more detail the starting formula for the discrete model, in which the distribution function is replaced by a collection of integers *n* _{0}, *n* _{1}, *n* _{2}, …, *n* _{p}. If, following Boltzmann, we give the name “complexion” to a reparation with assigned values of these integers, we wish to show that the number *P* of the complexions compatible with a given distribution is given by eqn (6.2) of the main text:

The starting point deals with the number of permutations of *n* objects. Suppose we have *n* different letters and we wish to put them in all possible orders. What is the number of possible alignments in which none of the *n* letters is repeated? This number is called the number of permutations of *n* distinguishable objects.

This number increases incredibly fast with *n*. Thus, if we have three letters, say *e, a, t*, the possible alignments are six:

*a, e, l, t*, we have:

*p*

_{n}gives the number of permutations of

*n*distinguishable objects, it will be

*n*times the number of permutations of

*n*−1 distinguishable objects, because we can order the alignments by putting first all those which have a given letter (which can be chosen in

*n*ways) and then permuting the remaining

*n*− 1. Thus, in the second list above there are six “words” starting with “I” and the letters following “I” are those which appear in the previous list.

Then we compute easily: if we have one letter, there will be just one “word”; if we have two letters, the “words” will be 2 · 1 = 2; if we have three letters, the “words” will be 3 · 2 · 1 = 6; if we have four letters, the “words” will be 4 · 3 · 2 · 1 = 24; if we have five letters, the “words” will be 5 · 4 · 3 · 2 · 1 = 120. Thus if we have *n* letters, the “words” will be *n* · …5 · 4 · 3 · 2 · 1, the product of the first *n* natural numbers, which is denoted by *n*!. Now if the objects are not necessarily distinguishable, either because some of them are the same or their difference is not of interest, the situation changes. If we have five *a*’s, three *e*’s, four *t*’s, we have twelve objects, but we are a long way from being able to form the rather large number of 12! (close to 500 million) “words” that we can produce with twelve different letters. How many of these 12! “words” will be different? It is easy to answer; if we think for a moment of distinguishing the equal letters (e.g. by painting them in different colours), then we shall have 12! different “words”. However, they are artificially different: we can permute the five *a*’s, the three *e*’s, the four *t*’s in a “word” and obtain again the same “word” (apart from colours). So we must divide 12! by 5!3!4! in order to eliminate the spurious replicas. We obtain

Thus the problem of computing the permutations without repetition of *n* objects when there are *n* _{0}, *n* _{1}, *n* _{2}, …, *n* _{p} replicas of *p* distinguishable objects is solved by eqn (6.2) of the main text.

Passing now to the case of continuous variables, we wish to show that if *P* is a probability density, then

*M*, the probability density of which is

*P*) is a measure of the likelihood of

*P*. In other words, if we take many

*P*’s “at random”, if positive and normalized, most of them will be close to the

*P*’s for which

*V*(

*P*) has a maximum. In order to give a meaning to the expression “at random”, let us start by subdividing the state space

*M*into

*n*little cells Ω

_{i}of volume μ

_{i}, while replacing

*P*by

*n*numbers

*P*

_{i}, the averages of

*P*over the cells:

*N*objects and distribute them at random in the cells. If

*N*

_{i}of them are in Ω

_{i}(0 ≤

*N*

_{i}≤

*N*), let us take

*P*

_{i}=

*N*

_{i}/(

*N*μ

_{i}) as the probability for the cell Ω

_{i}. Analogously, given a probability density

*P*, we can represent it in terms of a distribution of

*N*objects in

*n*cells, with arbitrary accuracy, provided that we take

*n*and

*N*sufficiently large. For a

*given order of approximation*, however, we have just a finite, though huge, number of possible distributions. If we distribute the objects at random, there are

*W(P)*=

*N*!/(

*N*

_{1}!

*N*

_{2}!…

*N*

_{n}!) ways of obtaining the distribution

*P*= (

*P*

_{1},

*P*

_{2},…,

*P*

_{n}). The sum Σ

*W(P)*that gives the number of ways of obtaining any possible distribution equates to

*n*

^{N}, thanks to the formula for the

*N*th power of a polynomial, as applied to (1 + 1 + … + 1)

^{N}(for details, see e.g. [27]).

It is then reasonable to take as a measure of the likelihood of the discretized distribution *P* = (*P* _{1}, *P* _{2}, …, *P* _{n}) the quantity *W(P)*/*n* ^{N} or its logarithm divided by *N* (in order to obtain a finite limit when *N* → ∞):

*N*→ ∞, which should give us the appropriate expression for

*V(P)*when the probabilities averaged over each cell,

*P*

_{i}, take all the admissible real values. When

*N*→ ∞, the following estimate holds:

*o(N)*indicates a quantity such that

*o(N)/N*tends to zero when

*N*→ ∞. Equation (A6.4) follows from Stirling’s formula [28] or the inequality

*a*

_{N}=

*N*!e

^{N}/

*N*

^{N}, then

*a*

_{N+1}>

*a*

_{N}and

*a*

_{N+1}/(

*N*+1) <

*a*

_{N}/

*N*, thanks to the well-known elementary inequality:

*a*

_{N+1}>

*a*

_{N}and

*a*

_{N+1}/(

*N*+ 1) <

*a*

_{N}/

*N*can in fact be proved to follow from (A6.6) by induction, since

*a*

_{1}= e > 2 and

*a*

_{1}/1 = e < 3. If we now use (A6.4) in (A6.3) we obtain:

*P*

_{i}=

*N*

_{i}/(

*N*μ

_{i}). When

*N*→ ∞, the last term in (A6.7) disappears.

It remains now to let (at the same time) *n* go to infinity and μ_{i} to zero; before doing this, let us remark that it is always possible to arrange things in such a way that (total volume of the space of states, taken to be finite for simplicity); it is in fact enough to take . Then the discrete distribution (*P* _{1}, *P* _{2}, …, *P* _{n}) tends to a continuous distribution *P* and eqn (A6.7) becomes:

Let us underline the difficulty which we remarked upon in the main text and which did not escape Boltzmann’s attention, that if we change the variable through a transformation with non-constant Jacobian, *V(P)* in the new variables is not equal to *V(P)* in the old ones. Hence there is a class of privileged (canonical) variables, singled out by the fact that the volume element in the state space is invariant during the time evolution of the system (thanks to Liouville’s theorem); to choose these variables means, from a physical viewpoint, that the sets of equal volume (according to the choice that we have made) are equiprobable in the state space.

Some applications of this expression for the likelihood are given in Appendix 7.1.