(p.448) Appendix C THE ODDS RATIO AND ITS PROPERTIES
(p.448) Appendix C THE ODDS RATIO AND ITS PROPERTIES
The odds ratio, central to many epidemiologic analyses, is not the simplest statistical measure of association. This section reviews some of its properties. Complete descriptions exist elsewhere (for example, [1], [2] or [3]).
The odds are a ratio of two probabilities: the probability an event occurs, divided by the probability the event does not occur. (The word odds refers to a single entity, but tradition and formal English dictate that the word be treated as a plural noun.) When an event has probability p, the odds of the event are p/(1 − p). When a series of n binary outcomes are observed where a events occur and b events do not (n = a + b), then the odds are estimated by a/b. For example, if a specific race horse finishes among the top three finishers 30 times out of 50 races, then the odds are estimated as 3 to 2 (30/20 = 1.5) the horse will “show.”
The odds ratio compares the relative magnitude of two sets of odds occurring under different conditions. For example, suppose
Data summarized by an odds ratio are typically displayed in a 2 × 2 table such as Table C–1. An estimate of the odds ratio is then
The possible values of an odds ratio run from zero to infinity. The odds ratio statistic is symmetric about the value 1.0 in the sense that or represents the same degree of association as 1/or.
Aside: An advertisement and brochure for the March of Dimes (1998) states, “Debbie and Rick Hedding of Pittsford VT were devastated when they lost two babies to neural tube defects (NTDs). Then Debbie read about folic acid in a March of Dimes brochure. She was astonished when she learned about the role of folic acid in preventing NTDs. ‘I was in tears by the time I finished reading the material. I haven’t been taking folic acid nor had I been told about it. I couldn’t believe that I could have reduced the risk of recurrence by 70 percent. I immediately began taking folic acid and telling every women I could about it.’” [odds ratio = 0.28; 95% CI 0.12 to 0.71—Lancet (1996)].
The odds ratio is a measure of an association on a ratio scale, so percentage differences are not meaningful, such as (1 − 0.28) × 100 = 72%. When odds ratios are used to indicate risk, it is necessary to consider ratios when differences are expressed in terms of percentage of change. The odds ratio associated with folic
(p.450) acid compared to 1.0 is the ratio 1/0.28 = 3.6 (a 360% reduction and not a 28% reduction).Table C–1. A 2 × 2 table
Disease
No disease
Total
Condition 1 = risk factor present (F)
a
b
a + b
Condition 2 = risk factor absent
c
d
c + d
Total
a + c
b + d
n
When the odds ratio is estimated from a sample of data, it varies, like all estimates, from sample to sample. An estimate of this sampling variation is
Figure C–1 (top left) shows a simulated distribution of 500 values of the odds ratio calculated from samples of size n = 50 collected under two conditions (p _{1} = 0.8 and p _{2} = 0.6). The distribution of these estimated odds ratios is asymmetry, skewed to the right. The mean of the distribution is 3.170, showing that the directly estimated odds ratio is biased (the expected odds ratio is [(0.8)(0.20)]/[(0.6)/(0.4)] = 2.667). To reduce this bias by producing a more symmetric distribution, the logarithm of the odds ratio is used. A simulated distribution of is also shown in Figure C–1 (top right) for, again, p _{1} = 0.8 and p _{2} = 0.6. The distribution is visually more symmetric with the mean of the distribution equal to 1.037 where log(or) = log(2.667) = 0.981 is expected. The estimated variance associated with the distribution of the estimate is
Other biasreducing transformations have been suggested for the odds ratio. Two examples are
Aside: An alternate prospective on an interpretation of an odds ratio arises from comparing observed and expected values along the same lines as a standard mortality ratio (SMR, defined in Chapter 1). Using the notation for a 2 × 2 table (Table C–1), the observed number of diseased individuals with a risk factor (F) is a. From the same table, when the risk factor is absent , the number of cases of disease is c and the number of individuals who are diseasefree is d. The quantity c/d then estimates the ratio of disease to nondiseased among individuals who do not have the risk factor. If the risk factor has no effect on the disease frequency, the ratio of diseased to nondiseased individuals among those with the risk factor should differ from c/d only because of random variation. In symbols, a/b ≈ c/d. Equating these two ratios gives an expression for the number of diseased individuals expected when the (p.452) risk factor is unrelated to the disease, or expected number with the disease is estimated by b(c/d). The ratio of observed to expected cases of disease is then
Therefore, an estimated odds ratio can be viewed as an SMRlike quantity, comparing an observed count to the count expected when the risk factor has no influence on the disease.
PROPERTIES
Some of the properties of or and derived from a 2 × 2 table are as follow:

1. The odds ratio is invariant to interchanging rows and columns, although interchanging a row only or a column only changes or to 1/or.

2. Multiplying the rows and/or the columns by positive constants does not change the value of or.

3. The odds ratio has a probabilistic interpretation (as described).

4. The odds ratio approximates the relative risk:
when the frequency of the disease is rare among those with and without the risk factor.

5. The distribution of the logarithm of the odds ratio produces an essentially symmetric distribution and is accurately approximated with a normal distribution.
The last property is perhaps the most useful because it allows approximate statistical tests and confidence intervals to be constructed using the relatively simple statistical methods. The exact properties of the odds ratio have been derived, but they are complex and difficult to compute. Using the normal distribution as an approximation is simple and sufficiently accurate for most situations.
A null hypothesis of the form H _{0}: or = or _{0} can be assessed by reference to a standard normal distribution with the teststatistic
A (1 − α)level confidence interval can also be constructed using a normal distribution as an approximation for the distribution of the logarithm of the estimated odds ratio, or