## Steve Selvin

Print publication date: 2004

Print ISBN-13: 9780195172805

Published to Oxford Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780195172805.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 25 February 2017

# (p.448) Appendix C THE ODDS RATIO AND ITS PROPERTIES

Source:
Statistical Analysis of Epidemiologic Data
Publisher:
Oxford University Press

The odds ratio, central to many epidemiologic analyses, is not the simplest statistical measure of association. This section reviews some of its properties. Complete descriptions exist elsewhere (for example, [1], [2] or [3]).

The odds are a ratio of two probabilities: the probability an event occurs, divided by the probability the event does not occur. (The word odds refers to a single entity, but tradition and formal English dictate that the word be treated as a plural noun.) When an event has probability p, the odds of the event are p/(1 − p). When a series of n binary outcomes are observed where a events occur and b events do not (n = a + b), then the odds are estimated by a/b. For example, if a specific race horse finishes among the top three finishers 30 times out of 50 races, then the odds are estimated as 3 to 2 (30/20 = 1.5) the horse will “show.”

The odds ratio compares the relative magnitude of two sets of odds occurring under different conditions. For example, suppose

and are estimated by a/b, where a events occur and b events do not occur. Also, suppose
(p.449) and are estimated by c/d, where c events occur and d events do not occur. The symbols p 1 and p 2 represent the probability of occurrence of the binary events under conditions 1 and 2, respectively. The odds ratio (or) is then
and is estimated by

Data summarized by an odds ratio are typically displayed in a 2 × 2 table such as Table C–1. An estimate of the odds ratio is then

The possible values of an odds ratio run from zero to infinity. The odds ratio statistic is symmetric about the value 1.0 in the sense that or represents the same degree of association as 1/or.

Aside: An advertisement and brochure for the March of Dimes (1998) states, “Debbie and Rick Hedding of Pittsford VT were devastated when they lost two babies to neural tube defects (NTDs). Then Debbie read about folic acid in a March of Dimes brochure. She was astonished when she learned about the role of folic acid in preventing NTDs. ‘I was in tears by the time I finished reading the material. I haven’t been taking folic acid nor had I been told about it. I couldn’t believe that I could have reduced the risk of recurrence by 70 percent. I immediately began taking folic acid and telling every women I could about it.’” [odds ratio = 0.28; 95% CI 0.12 to 0.71—Lancet (1996)].

The odds ratio is a measure of an association on a ratio scale, so percentage differences are not meaningful, such as (1 − 0.28) × 100 = 72%. When odds ratios are used to indicate risk, it is necessary to consider ratios when differences are expressed in terms of percentage of change. The odds ratio associated with folic

Table C–1. A 2 × 2 table

Disease

No disease

Total

Condition 1 = risk factor present (F)

a

b

a + b

Condition 2 = risk factor absent

c

d

c + d

Total

a + c

b + d

n

(p.450) acid compared to 1.0 is the ratio 1/0.28 = 3.6 (a 360% reduction and not a 28% reduction).

When the odds ratio is estimated from a sample of data, it varies, like all estimates, from sample to sample. An estimate of this sampling variation is

Figure C–1 (top left) shows a simulated distribution of 500 values of the odds ratio calculated from samples of size n = 50 collected under two conditions (p 1 = 0.8 and p 2 = 0.6). The distribution of these estimated odds ratios is asymmetry, skewed to the right. The mean of the distribution is 3.170, showing that the directly estimated odds ratio is biased (the expected odds ratio is [(0.8)(0.20)]/[(0.6)/(0.4)] = 2.667). To reduce this bias by producing a more symmetric distribution, the logarithm of the odds ratio is used. A simulated distribution of is also shown in Figure C–1 (top right) for, again, p 1 = 0.8 and p 2 = 0.6. The distribution is visually more symmetric with the mean of the distribution equal to 1.037 where log(or) = log(2.667) = 0.981 is expected. The estimated variance associated with the distribution of the estimate is

The values of range from negative infinity to positive infinity. When the mean value is 0.0, is equivalent to as a measure of association. Simulated distributions of and are also shown (Fig. C–1, bottom) for the case where the two conditions generating the odds are not different (i.e., p 1 = p 2 = 0.4).

Other bias-reducing transformations have been suggested for the odds ratio. Two examples are

(H for J. B. S. Haldane’s suggestion [4]) with estimated variance
(p.451)

Figure C–1. Histograms displaying the odds ratio and log-odds for two sets of conditions.

and
(SS for small sample odds ratio) [5]. A useful property of these two estimates is that the odds ratio remains defined when b = 0 or c = 0, which is not the case for .

Aside: An alternate prospective on an interpretation of an odds ratio arises from comparing observed and expected values along the same lines as a standard mortality ratio (SMR, defined in Chapter 1). Using the notation for a 2 × 2 table (Table C–1), the observed number of diseased individuals with a risk factor (F) is a. From the same table, when the risk factor is absent , the number of cases of disease is c and the number of individuals who are disease-free is d. The quantity c/d then estimates the ratio of disease to nondiseased among individuals who do not have the risk factor. If the risk factor has no effect on the disease frequency, the ratio of diseased to nondiseased individuals among those with the risk factor should differ from c/d only because of random variation. In symbols, a/bc/d. Equating these two ratios gives an expression for the number of diseased individuals expected when the (p.452) risk factor is unrelated to the disease, or expected number with the disease is estimated by b(c/d). The ratio of observed to expected cases of disease is then

Therefore, an estimated odds ratio can be viewed as an SMR-like quantity, comparing an observed count to the count expected when the risk factor has no influence on the disease.

# PROPERTIES

Some of the properties of or and derived from a 2 × 2 table are as follow:

1. 1. The odds ratio is invariant to interchanging rows and columns, although interchanging a row only or a column only changes or to 1/or.

2. 2. Multiplying the rows and/or the columns by positive constants does not change the value of or.

3. 3. The odds ratio has a probabilistic interpretation (as described).

4. 4. The odds ratio approximates the relative risk:

when the frequency of the disease is rare among those with and without the risk factor.

5. 5. The distribution of the logarithm of the odds ratio produces an essentially symmetric distribution and is accurately approximated with a normal distribution.

The last property is perhaps the most useful because it allows approximate statistical tests and confidence intervals to be constructed using the relatively simple statistical methods. The exact properties of the odds ratio have been derived, but they are complex and difficult to compute. Using the normal distribution as an approximation is simple and sufficiently accurate for most situations.

A null hypothesis of the form H 0: or = or 0 can be assessed by reference to a standard normal distribution with the test-statistic

(p.453) where the value for or 0 is usually chosen to be 1.0 (log(1) = 0). An odds ratio of 1 implies that the probability of occurrence of the event under study is the same for both conditions 1 and 2 (p 1 = p 2). The value Z has an approximate normal distribution with mean = 0 and variance = 1 when or = or 0.

A (1 − α)-level confidence interval can also be constructed using a normal distribution as an approximation for the distribution of the logarithm of the estimated odds ratio, or

where z 1 − α is the (1 − α)th percentile of a standard normal distribution. The probability that the parameter log(or) is found between these upper and lower bounds is approximately 1 − α. These limits can be directly transformed to provide a (1 − α)-level confidence interval of the odds ratio itself. The probability is approximately 1 − α that the “true” odds ratio or is found in the interval (e lower, e upper) based on the estimated odds ratio . That is, the probability that the odds ratio underlying the observed data (estimated by ) will be less than e lower or greater than e upper is small—namely, α.