# (p.334) (p.335) Appendix Probability distributions

Random variables are variables whose possible values are the outcomes of random processes. The probability distribution for a particular random variable describes the chance of occurrence of each possible value of that variable. We denote the random variable of interest as *Y*, and *y* is used to denote a specific value taken by that variable. Discrete random variables (e.g., count data) consist of values from a countable set (that is, a quantity that can be represented by integers). The probability that a discrete random variable takes on each of the set of possible values (e.g., a butterfly might lay 20, 21, 22 eggs on a leaf) is described by a probability mass function (PMF). Continuous random variables (e.g., lengths, rates, probabilities) consist of real values, and can take *any* value in some range (e.g., the mass of each butterfly egg can vary by quantities much smaller than we can ever measure). The probability density function (PDF) describes how the probability changes as the measurement changes. Consult basic statistics books for more detail.

Each PMF and PDF is described by one or more parameters that determine the shape of the distribution. The most common shape measurements are the mean, variance, and skewness, which describe the central tendency, the spread, and the asymmetry of the distribution, respectively. Distributions can be written in a number of mathematically equivalent ways, all with the same number of parameters. Why choose one of these as opposed to another? Some ways of writing these distributions are easier to interpret biologically, or to work with mathematically. Because the mean is such a useful descriptor, for all of the probability distributions presented in this appendix, we show how to calculate the mean, along with other useful parameters. These parameters are not always provided by software packages.

Because many statistical analyses, especially those involving likelihoods, require calculating the natural logarithm of a probability distribution (see chapter 3), in this appendix we present both the distribution/mass function and its natural logarithm (henceforth referred to as lnPDF or lnPMF), for each distribution.

This appendix provides only a summary of some of the more frequently used probability distributions mentioned throughout the book. As with the rest of the book, R code for the figures is available at our online site; you may find it useful to perform numerical experiments aimed at deepening your understanding of how the parameters affect the distributions. We have not included commonly encountered statistical PDFs that are used when performing null-hypothesis tests, such as the Student’s *t*-distribution, the *F*-distribution, and the chi-squared distribution, because they are mainly used for hypothesis testing, while the distributions discussed below are often used for models of statistical populations.

# (p.336) A.1 Continuous random variables

## A.1.1 Normal distribution (chapters 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13)

The Normal distribution is the most commonly used continuous probability distribution. It assumes that variation is symmetric about the mean *μ*. The typical form of this distribution is:

where ${\mathrm{\sigma}}^{2}$ is the variance of $Y;$ see figure A.1 for some examples of the PDF. As discussed above, it is frequently more useful to work with the lnPDF:

Although this distribution takes on values from minus infinity to plus infinity, it is often used for variables that can only take on positive values. This misspecification is usually not problematic if the mean is sufficiently large and the standard deviation is small, so that negative outcomes are extremely unlikely.

## A.1.2 Lognormal distribution (chapters 3, 5, 6, and 13)

If a continuous random variable is positive $(y>0)$, expected to have positive skew, and intermediate values are most likely, then the lognormal distribution may be appropriate. Some care is needed in using this distribution, because it describes the case in which the logarithm of *y* is normally distributed; this makes it essential to distinguish between *y* and its log, and between quantities like the mean or variance of *y* and the mean or variance of its log. If we set *μ* as the mean of $ln(y)$ and *σ* as the standard deviation of $ln(y)$, the typical form of this distribution is:

(p.337)
See figure A.2 for some examples of the PDF. While $\mathrm{\mu}$ and ${\mathrm{\sigma}}^{2}$ are the mean and variance of $ln(y)$, they are not the mean and variance of the lognormal distribution; the mean is $\text{E}[Y]={e}^{\mathrm{\mu}+{\mathrm{\sigma}}^{2}/2}$, the median is ${e}^{\mathrm{\mu}}$, and the variance is $\text{Var}[Y]=\left({e}^{{\mathrm{\sigma}}^{2}}-1\right){e}^{2\mathrm{\mu}+{\mathrm{\sigma}}^{2}}$. It is sometimes convenient to rewrite the PDF with new parameters *m* and *ϕ*, so that the variance positively scales with *ϕ* (so *ϕ* is called a variance parameter); then the lnPDF for the lognormal distribution is:

In this case $\text{E}\left[Y\right]=m$ and $\text{Var}[Y]=\left({e}^{{\varphi}^{2}}-1\right){m}^{2}$. Ecologists frequently use the lognormal distribution in studies of individual growth, time series analyses of populations, and species abundance distributions.

## A.1.3 Gamma distribution (chapters 3, 6, 12, and 13)

If a continuous random variable is positive $(y>0)$ and expected to have positive skew, but there is the possibility of lower values always being more likely, then the gamma distribution may be a good model. The typical form of this distribution is:

See figure A.3 for some examples of the PDF. The mean of this distribution is $\text{E}[Y]=ab,$ and $\text{Var}[Y]={ab}^{2}$. The lnPDF is given by:

where $\mathrm{\Gamma}$ is the complete gamma function.

(p.338)
The complete gamma function is essentially an extension of the factorial function, to include positive numbers that are not integers. (Recall that, for a positive integer *n*, the factorial *n*! is the product of *n* and all smaller positive integers—e.g., $4!=4\times 3\times 2\times 1=24$.) The relationship between factorials and the complete gamma is $\mathrm{\Gamma}(n)=\left(n-1\right)!$ for any positive integer $n.$ The general formula (for any positive number $q)$ is $\mathrm{\Gamma}(q)={\int}_{0}^{\mathrm{\infty}}{x}^{q-1}exp(-x)dx$. Factorials may seem more intuitive to you, but they are more problematic computationally, especially for large *n*; we recommend getting used to the complete gamma function!

There are several widely used alternative parameterizations of the gamma distribution, all of which are mathematically equivalent. For example, the gamma distribution can be formulated in terms of its mean, $\mathrm{\mu}$, and a positive variance parameter *ϕ*, by setting $a=\mathrm{\mu}/\mathrm{\varphi}$ and $b=\mathrm{\varphi}$. In this case the variance is $\text{Var}[Y]=\mathrm{\mu}\mathrm{\varphi}$. The gamma distribution arises naturally in event–time data, as the sum of time intervals that are each exponentially distributed, or as the waiting time until a given number of events has occurred. The gamma distribution is also an important component of mixture models; see discussion of the negative binomial distribution.

## A.1.4 Exponential distribution (chapters 5 and 6)

If a stochastic event occurs at some fixed rate *λ* (the event rate), then the time between consecutive events is described by an exponential distribution. The typical form of this distribution is:

See figure A.4 for some examples of the PDF. The lnPDF is:

(p.339)
This distribution can only take non-negative values, and the event rate must also be positive. The mean of *Y* is $\text{E}[Y]=\mathrm{\mu}={\mathrm{\lambda}}^{-1}$, and the variance of *Y* is $\text{Var}[Y]={\mathrm{\lambda}}^{-2}$. This distribution is most often used to describe event times, like time to death or flowering; exponentially distributed survival times mean a Type II survival curve. In ecology the exponential distribution is frequently taken as a null model for event times.

## A.1.5 Weibull distribution (chapter 5)

If the rate at which events occur increases or decreases over time, then the time to the first event can be described by the Weibull distribution. Suppose the event occurs at rate $(a/b){t}^{a-1}$ at time *t*. Then for $a>1$ the rate increases over time, for $a=1$ the rate is constant, and for $a<1$ the rate decreases over time. The typical form of this distribution is:

See figure A.5 for some examples of the PDF. The lnPDF is:

The Weibull is a generalization of the exponential distribution; if $a=1$, the expressions above are the same as for the exponential. For the time to the first event, the mean is $\text{E}[Y]=b\mathrm{\Gamma}(1+1/a)$, and the variance is $\text{Var}[Y]={b}^{2}[\mathrm{\Gamma}(1+2/a)-(\mathrm{\Gamma}(1+1/a){)}^{2}]$. This distribution can only model non-negative values. Ecologists frequently use this distribution when performing survival analyses, in which case the event rate corresponds to the age-dependent mortality rate. For each individual studied, the first event being modeled is death.

## (p.340) A.1.6 Beta distribution (chapters 3, 6, 12, and 13)

If a continuous random variable is bounded by [0,1] (e.g., it describes a probability or a proportion), then the beta distribution can be a good choice. The typical form of this distribution is:

(p.341) See figure A.6 for some examples of the PDF. The lnPDF is:

The mean is $\text{E}\left[Y\right]=\left(\frac{a}{a+b}\right)$ and the variance is $\text{Var}[Y]=\frac{ab}{{\left(a+b\right)}^{2}\left(a+b+1\right)}$. This distribution is sometimes parameterized in terms of the mean, $\mathrm{\mu}$, and a dispersion parameter $\mathrm{\varphi}$. In this case $a=\mathrm{\mu}/\mathrm{\varphi}$ and $b=(1-\mathrm{\mu})/\mathrm{\varphi}$, which gives $\text{E}[Y]=\mathrm{\mu}$ and $\text{Var}[Y]=\mathrm{\mu}(1-\mathrm{\mu})\mathrm{\varphi}/(1+\mathrm{\varphi})$. Ecologists have used the beta distribution to model quantities like the fraction of plant cover in a habitat, but recent use of this distribution is mainly to model random variation in the binomial parameter (overdispersion; see beta-binomial distribution, below).

# A.2 Discrete random variables

## A.2.1 Poisson distribution (chapters 3, 4, 6, 8, 12, and 13)

Count data in ecology are usually (in theory) unbounded and positively skewed $(y\ge 0)$. If events occur at some constant rate, then the number of events counted during a fixed interval is given by the Poisson distribution. The typical form of the Poisson PMF is:

See figure A.7 for some examples of the PMF. The lnPMF is:

We use the complete gamma function here, rather than a factorial, because it is much easier to compute. The expected number of counts is equal to the variance among counts, so $E[Y]=\text{Var}[y]=\mathrm{\mu}$. Alternatively, this distribution can be used to describe the number of (p.342) subjects counted in an area of given size when subjects are randomly distributed in space with given density. Ecologists frequently use this distribution as a simple model for the numbers of individuals counted in a unit area, transect, or similar measure.

## A.2.2 Binomial distribution (chapters 3, 4, 6, 7, 8, 12, and 13)

Binomial distributions are used for data with two possible outcomes that occur with probability *p* and $1-p$. Thus the binomial is quite different from the Poisson: the binomial models the number of successes or failures in a known number of trials (say, the number of seeds that germinate, of a known number planted), while the Poisson models the total number of units (say, the number of seeds found in a quadrat), with no fixed maximum. For *n* binomial trials, the PMF for the number of successes $(0\le y\le n)$ is:

Here, $\left(\begin{array}{c}n\\ y\end{array}\right)=\frac{n!}{k!\left(n-k\right)!}$ is the binomial coefficient, often read as “*n* choose *k*.”

See figure A.8 for some examples of the PMF. The lnPMF is:

This distribution has mean $\text{E}[Y]=np$ and variance $\text{Var}[Y]=np(1-p)$ Ecologists frequently use this distribution when modeling outcomes like the number of seeds that germinate, or the number of individuals surviving a season.

## (p.343) A.2.3 Negative binomial distribution (chapters 3, 6, 8, 12, and 13)

Often ecological count data are collected for which the variance is much greater than the mean. One discrete distribution for this case is the negative binomial; mathematically this distribution comes from allowing a Poisson distribution to have a parameter (the mean) that varies according to a gamma distribution. There are many different parameterizations of the negative binomial distribution. One that is especially useful for ecologists models the observed number of counts, *y*, given a mean, $\mathrm{\mu}$, and a positive clustering coefficient *k*:

See figure A.9 for some examples of the PMF. The lnPMF is:

This PMF has mean $\text{E}[Y]=\mathrm{\mu}$, and variance $\text{Var}[Y]=\mathrm{\mu}(1+\mathrm{\mu}/k)$. As *k* becomes large, the variance approaches the mean and the negative binomial approaches the Poisson distribution. In many ecological settings, $k<1.$

Lindén and Mäntyniemi (2011) provide a derivation (and examples) of a useful generalization of the negative binomial that allows researchers more flexibility in modeling overdispersion. Their approach is to use a second parameter in the mean–variance relationship, so that $\text{Var}[Y]=\mathrm{\mu}(1+{\mathrm{\varphi}}_{1}+{\mathrm{\varphi}}_{2}\mathrm{\mu})$; both ${\mathrm{\varphi}}_{1}$ and ${\mathrm{\varphi}}_{2}$ are non-negative. This approach allows models in which the mean–variance relationship is Poisson $({\mathrm{\varphi}}_{x}={\mathrm{\varphi}}_{2}=0)$, linear $({\mathrm{\varphi}}_{2}=0)$, or quadratic. If we write $a=\mathrm{\mu}/({\mathrm{\varphi}}_{1}+{\mathrm{\varphi}}_{2}\mathrm{\mu})$ and $b=1/({\mathrm{\varphi}}_{1}+{\mathrm{\varphi}}_{2}\mathrm{\mu})$, the lnPMF is:

## (p.344) A.2.4 Beta-binomial distribution (chapters 3 and 12)

Suppose *n* binomial trials occur and the probability that each of the *n* trials is a success is drawn from a beta distribution with mean *p* and dispersion parameter $\mathrm{\theta}$; large values of $\mathrm{\theta}$ mean that there is small overdispersion. If we are interested in the number of successes, then we have the beta-binomial distribution. The PMF describing the number of successes $(0\le y\le n)$ is:

While this looks complicated, it reduces to the binomial for very large $\mathrm{\theta}$ (that is, small overdispersion). For large overdispersion (as $\mathrm{\theta}$ approaches 0), the mass is concentrated at zero (all failures) and *n* (all successes), with ${P}_{\text{bb}}(Y=0|n,p,0)=1-p$ and ${P}_{\text{bb}}(Y=n|n,p,0)=p.$ See figure A.10 for some examples of the PMF. The lnPMF is:

This distribution has a mean of $\text{E}[Y]=np$ and a variance of

The last term in the variance is the variance inflation factor, relative to the binomial distribution; if $\mathrm{\theta}$ is very large, the variance inflation factor approaches 0 because the denominator in the fraction becomes very large. There are many alternative parameterizations of the beta-binomial; the example presented in chapter 3 replaces {$\mathrm{\theta}$} with $1/\mathrm{\varphi}$. The beta-binomial is often used in ecology to model binary data (e.g., presence/absence, survived/died, success/failure of fertilization) when the probability of success varies among sample units.