Jump to ContentJump to Main Navigation
Ludwig BoltzmannThe Man Who Trusted Atoms$

Carlo Cercignani

Print publication date: 2006

Print ISBN-13: 9780198570646

Published to Oxford Scholarship Online: January 2010

DOI: 10.1093/acprof:oso/9780198570646.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 27 February 2017

(p.280) Appendix 6.1 Likelihood of a distribution

(p.280) Appendix 6.1 Likelihood of a distribution

Source:
Ludwig Boltzmann
Publisher:
Oxford University Press

In this appendix we first wish to examine in more detail the starting formula for the discrete model, in which the distribution function is replaced by a collection of integers n 0, n 1, n 2, …, n p. If, following Boltzmann, we give the name “complexion” to a reparation with assigned values of these integers, we wish to show that the number P of the complexions compatible with a given distribution is given by eqn (6.2) of the main text:

Appendix 6.1 Likelihood of a distribution
Readers familiar with combinatorics can skip this first part, up to the sentence containing eqn (A6.1).

The starting point deals with the number of permutations of n objects. Suppose we have n different letters and we wish to put them in all possible orders. What is the number of possible alignments in which none of the n letters is repeated? This number is called the number of permutations of n distinguishable objects.

This number increases incredibly fast with n. Thus, if we have three letters, say e, a, t, the possible alignments are six:

Appendix 6.1 Likelihood of a distribution
If there are four letters, the possible alignments are 24. Thus, if the letters are a, e, l, t, we have:
Appendix 6.1 Likelihood of a distribution
If we start thinking how to produce the alignments, we also quickly discover the rule for making them. In fact, suppose we fix the first letter; then all the permutations of the remaining letters will give all the possible permutations starting with that letter. Thus if p n gives the number of permutations of n distinguishable objects, it will be n times the number of permutations of n −1 distinguishable objects, because we can order the alignments by putting first all those which have a given letter (which can be chosen in n ways) and then permuting the remaining n − 1. Thus, in the second list above there are six “words” starting with “I” and the letters following “I” are those which appear in the previous list.

Then we compute easily: if we have one letter, there will be just one “word”; if we have two letters, the “words” will be 2 · 1 = 2; if we have three letters, the “words” will be 3 · 2 · 1 = 6; if we have four letters, the “words” will be 4 · 3 · 2 · 1 = 24; if we have five letters, the “words” will be 5 · 4 · 3 · 2 · 1 = 120. Thus if we have n letters, the “words” will be n · …5 · 4 · 3 · 2 · 1, the product of the first n natural numbers, which is denoted by n!. Now if the objects are not necessarily distinguishable, either because some of them are the same or their difference is not of interest, the situation changes. If we have five a’s, three e’s, four t’s, we have twelve objects, but we are a long way from being able to form the rather large number of 12! (close to 500 million) “words” that we can produce with twelve different letters. How many of these 12! “words” will be different? It is easy to answer; if we think for a moment of distinguishing the equal letters (e.g. by painting them in different colours), then we shall have 12! different “words”. However, they are artificially different: we can permute the five a’s, the three e’s, the four t’s in a “word” and obtain again the same “word” (apart from colours). So we must divide 12! by 5!3!4! in order to eliminate the spurious replicas. We obtain

Appendix 6.1 Likelihood of a distribution
(p.281) quite a large number but very far off 500 million!

Thus the problem of computing the permutations without repetition of n objects when there are n 0, n 1, n 2, …, n p replicas of p distinguishable objects is solved by eqn (6.2) of the main text.

Passing now to the case of continuous variables, we wish to show that if P is a probability density, then

(A6.1) Appendix 6.1 Likelihood of a distribution
(where dμ is the volume element in the state space M, the probability density of which is P) is a measure of the likelihood of P. In other words, if we take many P’s “at random”, if positive and normalized, most of them will be close to the P’s for which V(P) has a maximum. In order to give a meaning to the expression “at random”, let us start by subdividing the state space M into n little cells Ωi of volume μi, while replacing P by n numbers P i, the averages of P over the cells:
(A6.2) Appendix 6.1 Likelihood of a distribution
Let us take then N objects and distribute them at random in the cells. If N i of them are in Ωi (0 ≤ N iN), let us take P i = N i/(Nμi) as the probability for the cell Ωi. Analogously, given a probability density P, we can represent it in terms of a distribution of N objects in n cells, with arbitrary accuracy, provided that we take n and N sufficiently large. For a given order of approximation, however, we have just a finite, though huge, number of possible distributions. If we distribute the objects at random, there are W(P) = N!/(N 1!N 2!…N n!) ways of obtaining the distribution P = (P 1, P 2,…, P n). The sum ΣW(P) that gives the number of ways of obtaining any possible distribution equates to n N, thanks to the formula for the Nth power of a polynomial, as applied to (1 + 1 + … + 1)N (for details, see e.g. [27]).

It is then reasonable to take as a measure of the likelihood of the discretized distribution P = (P 1, P 2, …, P n) the quantity W(P)/n N or its logarithm divided by N (in order to obtain a finite limit when N → ∞):

(A6.3) Appendix 6.1 Likelihood of a distribution
Let us now compute the limit when N → ∞, which should give us the appropriate expression for V(P) when the probabilities averaged over each cell, P i, take all the admissible real values. When N → ∞, the following estimate holds:
(A6.4) Appendix 6.1 Likelihood of a distribution
where o(N) indicates a quantity such that o(N)/N tends to zero when N → ∞. Equation (A6.4) follows from Stirling’s formula [28] or the inequality
(A6.5) Appendix 6.1 Likelihood of a distribution
In turn, (A6.5) follows from the fact that, if we let a N = N!eN/N N, then a N+1 > a N and a N+1/(N+1) < a N/N, thanks to the well-known elementary inequality:
(A6.6) Appendix 6.1 Likelihood of a distribution
(p.282) The fact that a N+1 > a N and a N+1/(N + 1) < a N/N can in fact be proved to follow from (A6.6) by induction, since a 1 = e > 2 and a 1/1 = e < 3. If we now use (A6.4) in (A6.3) we obtain:
(A6.7) Appendix 6.1 Likelihood of a distribution
where due account has been taken of the fact that Appendix 6.1 Likelihood of a distribution and P i = N i/(Nμi). When N → ∞, the last term in (A6.7) disappears.

It remains now to let (at the same time) n go to infinity and μi to zero; before doing this, let us remark that it is always possible to arrange things in such a way that Appendix 6.1 Likelihood of a distribution (total volume of the space of states, taken to be finite for simplicity); it is in fact enough to take Appendix 6.1 Likelihood of a distribution. Then the discrete distribution (P 1, P 2, …, P n) tends to a continuous distribution P and eqn (A6.7) becomes:

(A6.8) Appendix 6.1 Likelihood of a distribution
This formula coincides with (A6.1) apart from an inessential constant (which is exactly zero if the measure is normalized in such a way that the total measure Appendix 6.1 Likelihood of a distribution equals 1).

Let us underline the difficulty which we remarked upon in the main text and which did not escape Boltzmann’s attention, that if we change the variable through a transformation with non-constant Jacobian, V(P) in the new variables is not equal to V(P) in the old ones. Hence there is a class of privileged (canonical) variables, singled out by the fact that the volume element in the state space is invariant during the time evolution of the system (thanks to Liouville’s theorem); to choose these variables means, from a physical viewpoint, that the sets of equal volume (according to the choice that we have made) are equiprobable in the state space.

Some applications of this expression for the likelihood are given in Appendix 7.1.