Jump to ContentJump to Main Navigation
Nutritional Epidemiology$

Walter C. Willett

Print publication date: 1998

Print ISBN-13: 9780195122978

Published to Oxford Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780195122978.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 25 February 2017

Overview of Nutritional Epidemiology

Overview of Nutritional Epidemiology

(p.3) 1 Overview of Nutritional Epidemiology
Nutritional Epidemiology

Walter Willett

Oxford University Press

Abstract and Keywords

This chapter provides an overview of nutritional epidemiology for those unfamiliar with the field. It discusses the epidemiologic approaches to diet and disease, and the interpretation of epidemiologic data.

Keywords:   nutritional epidemiology, diet, disease, epidemiologic data interpretation

The field of nutritional epidemiology has developed from interest in the concept that aspects of diet may influence the occurrence of human disease. Although it is relatively new as a formal area of research, investigators have used basic epidemiologic methods for more than 200 years to identify numerous essential nutrients. In the mid-eighteenth century, observations that fresh fruits and vegetables could cure scurvy led Lind (1753) to conduct one of the earliest controlled clinical trials; lemons and oranges had the “most sudden and good effects” on the course of this disease, which was ultimately found to be the result of vitamin C deficiency. In an example from the late nineteenth century, the unusual occurrence of beri-beri among sailors subsisting largely on polished rice led Takaki to hypothesize that some factor was lacking in their diet; the addition of milk and vegetables to their rations effectively eliminated this disease (Williams, 1961). Decades later a deficiency of thiamine was found to be primarily responsible for this syndrome (Davidson et al., 1973). Similarly, Goldberger (1964) used epidemiologic methods to determine that pellagra was a disease of nutritional deficiency, primarily associated with a corn meal subsistence diet in the southern United States. More recently, Chinese investigators determined by epidemiologic means that selenium deficiency is responsible for the high incidence of Keshan disease in central China (Guangi-Qi, 1987).

Typically, deficiency syndromes occur with high frequency among those with very low intake and rarely or never occur among those not so exposed. In addition, these deficiency diseases often have short latent periods; symptoms are usually manifested within months of starting a deficient diet and can typically be reversed within days or weeks. Hence, research has moved rapidly from observations to experiments in both animals and humans.

Deficiency states for essential nutrients, such as scurvy and rickets, differ from most issues confronting nutritional epidemiologists (p.4) today. The primary focus of contemporary nutritional epidemiology has been the major diseases of Western civilization, particularly heart disease and cancer. More recently, osteoporosis, cataracts, stroke, diabetes, and congenital malformations also have been the objects of such research. Unlike nutritional deficiencies, these diseases almost always have multiple causes, potentially including not only diet but also genetic, occupational, psychosocial, and infectious factors; level of physical activity; behavioral characteristics such as cigarette use; and other influences. These multiple potential determinants may act alone or in combination. Also, many of these diseases have long latent periods; they may sometimes result from cumulative exposure over many years and in other instances from a relatively short exposure occurring many years before diagnosis. For most of these diseases, the relevant period of exposure is unknown. It is possible that exposures with short latent periods are also important for some of these diseases. For example, it is conceivable that smoking a cigarette or consuming large amounts of a certain food could, within hours, precipitate an acute myocardial infarction or thrombotic stroke by altering blood coagulability even though the underlying atherosclerosis has accumulated over decades. A third characteristic of these diseases is that they occur with relatively low frequency despite a substantial cumulative lifetime risk. In addition, these conditions are not readily reversible and may result from excessive, as well as insufficient, intake of dietary factors. All these features have important implications for the design of studies to elucidate their etiologies.

The traditional methods of nutritionists, such as basic biochemistry, animal experimentation, and metabolic studies in humans, contribute substantially but do not address directly the relation between diet and occurrence of major diseases of our civilization. These issues should fall naturally within the realm of epidemiology, a discipline whose focus is the occurrence of human disease. Although epidemiologic efforts originally concentrated primarily on infectious diseases, during the last 40 years attention has largely shifted to the etiology of chronic diseases. Thus contemporary epidemiologists are accustomed to the study of diseases with low frequencies, long latency periods, and multiple causes. For example, hypertension, hypercholesterolemia, and cigarette smoking have been identified as major determinants of coronary heart disease; this knowledge has contributed to a major decline in this cause of death during recent years.

Although epidemiology is logically equipped to address the dietary causes of disease, the complex nature of diet has posed an unusually difficult challenge to this discipline (Willett, 1987). Cigarette smoking is more typical of exposures studied by epidemiologists: With a high degree of accuracy, subjects or their spouses can report whether they smoke cigarettes. Furthermore, individuals can readily provide quantitative information on the number of cigarettes they smoke per day, their usual brand of cigarettes, the age at which they started smoking, and changes in their pattern of use. The ease with which relatively accurate information on cigarette smoking can be obtained has contributed to the rapid accumulation of an enormous and remarkably consistent literature on the health effects of this habit. Diet, in contrast, represents an unusually complex set of exposures that are strongly intercorrelated. With few exceptions, all individuals are exposed to hypothesized causal factors; everyone eats fat, fiber, and vitamin A, for instance. Thus, exposures cannot be characterized as present or absent; rather, they are continuous variables, often with a rather limited range of variation. Furthermore, individuals rarely make clear changes in their diet at identifiable points in time; more typically, eating patterns evolve over periods of years. Finally, individuals are generally not aware of the content of the foods that they eat; therefore, the consumption of nutrients is usually determined (p.5) indirectly based on the reported use of foods or on the level of biochemical measurements.

Thus, the most serious limitation to research in nutritional epidemiology has been the lack of practical methods to measure diet. Because such epidemiologic studies usually involve at least several hundred and sometimes tens of thousands of subjects, dietary assessment methods must be not only reasonably accurate but also relatively inexpensive.

The difficulties in assessing diet have led some epidemiologists to believe that it is unlikely that useful measurements of the diets of individual subjects within free-living populations can be collected at all (Wynder, 1976). In addition, some have believed that the diets of persons within one country are too homogeneous to detect relationships with disease (Hebert and Wynder, 1987). Much of this skepticism arose from the intense interest in dietary lipids, serum cholesterol, and coronary heart disease beginning in the 1950s. Although it has been demonstrated in controlled metabolic studies that increases in saturated fat and cholesterol or decreases in polyunsaturated fat raise serum cholesterol, no correlation between intake of these lipids and serum cholesterol was found in many cross-sectional studies within the United States (Jacobs et al., 1979). Many concluded that, as an association clearly did exist, the measurement of diet was so inaccurate that the relationship was obscured. In retrospect, it is apparent that any expectation of a strong correlation is unrealistic and that a lack of correlation has several explanations. Most importantly, serum cholesterol is relatively insensitive to dietary lipid intake; metabolic studies clearly show that substantial changes in dietary intake produce rather modest changes in serum cholesterol (see Chapter 17). The expected correlation between cholesterol intake and serum cholesterol in the populations studied would be only on the order of 0.10, even with a perfect measure of dietary intake, because many factors, including genetic determinants, influence serum cholesterol.

Example: In the data of Shekelle et al. (1981), the standard deviation (SD) for dietary cholesterol is 68 mg/1,000 kcal, and the standard deviation for serum cholesterol is 54 mg/dl. From metabolic ward studies of Mattson et al. (1972), a 10 mg/1,000-kcal change in dietary cholesterol causes 1.2 mg/dl change in serum cholesterol; thus the expected SD of serum cholesterol variation due to dietary cholesterol variation is 8.2 mg/dl. The theoretically expected correlation between cholesterol intake and serum cholesterol is therefore

                   Overview of Nutritional Epidemiology

The standard deviation in this example for dietary cholesterol is, in reality, overstated because it included measurement error as well as true variation. Indeed, Shekelle et al. (1981) found that the relationship (using regression analysis) between dietary lipid intake based on carefully conducted interviews and serum cholesterol was similar to that obtained from metabolic studies. Because of other determinants of serum cholesterol, however, the correlation between dietary lipid intake and serum cholesterol was only 0.08; as the study population was large, this small correlation was highly statistically significant. Furthermore, some factors are associated with both reduced cholesterol intake and higher serum cholesterol, such as low levels of physical activity and knowledge of hypercholesterolemia. These tend to distort the true relationship between diet and serum levels toward an inverse association in cross-sectional studies (Shekelle et al., 1981). In studies that examined the relationships between change in diet and change in serum cholesterol using simple questionnaires, it has been possible to demonstrate rather strong correlations (see Chapter 6). The (p.6) methods of dietary intake measurement in many of the early studies were very inadequate; most commonly these investigations used 24-hour recall methods, which provide a poor assessment of usual intake (see Chapter 3). Nevertheless, it is clear that the use of serum cholesterol as a criterion for the validity of a dietary measurement method is inappropriate. From the standpoint of the development of the field of nutritional epidemiology, the early focus on serum cholesterol was unfortunate. If beta-carotene had been of major interest at that time, the attitude toward the possibility of measuring diet in epidemiologic studies might have been different, as it is easy to demonstrate an association between intake measured by simple questionnaires and blood levels (see Chapter 6).

The resurgent interest in dietary etiologies of disease, particularly cancer, has stimulated the development and evaluation of methods for dietary assessment in epidemiologic applications. Many, although not all, aspects of diet can now be measured readily and inexpensively with sufficient accuracy to provide useful information. These methods, consisting of food intake and biochemical and anthropometric measurements, are discussed in detail in later chapters. Equally important, for most nutrients that have been studied, biologically meaningful between-person variation has been documented to exist within populations; without such variation, observational studies of individuals would not be feasible.


The concepts, hypotheses, and techniques of nutritional epidemiology are derived from many sources. Biochemistry, for example, has provided findings that certain nutrients function as antioxidants that may protect critical cell components from damage, potentially reducing the incidence of cancer (Ames, 1983; Ames et al., 1995). Cell culture methods have been used to identify compounds, such as preformed vitamin A, that regulate growth and differentiation of cells and that may, therefore, influence the risk of cancer in humans. Experiments in laboratory animals have provided much information regarding the effects of diet on the occurrence of disease and mechanisms of action. Metabolic and biochemical studies among human subjects have yielded essential information on the physiologic effects of dietary factors. Findings from in vitro studies and animal experiments, however, cannot be extrapolated directly to humans (Ames et al., 1987), and physiologic and metabolic changes are several steps removed from the actual occurrence of disease in humans; thus, epidemiologic approaches are needed to address diet and disease relationships directly. Nevertheless, these basic science areas provide critical direction for epidemiologists, information that can aid in the interpretation of their epidemiologic findings, and new methods for measuring genetic and environmental exposures that can be applied in epidemiologic studies.

Correlation Studies

Until recently, epidemiologic investigations of diet and disease consisted largely of ecologic or correlation studies, that is, comparisons of disease rates in populations with the population per capita consumption of specific dietary factors. Usually the dietary information in such studies is based on disappearance data, meaning the national figures for food produced and imported minus the food that is exported, fed to animals, or otherwise not available for humans. Many of the correlations based on such information are remarkably strong; for example, the correlation between meat intake and incidence of colon cancer is 0.85 for men and 0.89 for women (Armstrong and Doll, 1975) (Fig. 1–1).

The use of international correlational studies to evaluate the relationships between diet and disease has several strengths. Most importantly, the contrasts in dietary intake are typically very large. For example, (p.7)

                   Overview of Nutritional Epidemiology

Figure 1–1. Correlation between per capita meat intake and incidence of colon cancer in women in 23 countries. (From Armstrong and Doll, 1975; reproduced with permission.)

within the United States, most individuals consume between 25% and 45% of their calories from fat, whereas the mean fat intake for populations in different countries varies from 11% to 42% of calories (Hebert and Wynder, 1987). Second, the average of diets for persons residing in a country are likely to be more stable over time than are the diets of individual persons within the country; for most countries the changes in per capita dietary intakes over a decade or two are relatively small. Finally, the cancer rates on which international studies are based are usually derived from relatively large populations and are, therefore, subject to only small random errors.

The primary problem of such correlational studies is that many potential determinants of disease other than the dietary factor under consideration may vary between areas with a high and low incidence of disease. Such confounding factors can include genetic predisposition; other dietary factors, including the availability of total energy intake; and other environmental or lifestyle practices. For example, with few exceptions, countries with a low incidence of colon cancer tend to be economically underdeveloped. Therefore, any variable related to industrialization will be similarly correlated with the incidence of colon cancer. Indeed, the correlation between gross national product and colon cancer mortality rate is 0.77 for men and 0.69 for women (Armstrong and Doll, 1975). More complex analyses can be conducted of such ecologic data that control for some of the potentially confounding factors. For example, McKeown-Eyssen and Bright-See (1985) have found that an inverse association of per capita dietary fiber intake and national colon cancer mortality rates persists after adjustment for fat intake. Rose (1982) and Kromhout (1989) have also emphasized the importance of the temporal relation in correlational studies; for at least some diseases, rates may be most appropriately related to dietary data many years earlier.

Most correlational studies are also limited by the use of food “disappearance” data that are only indirectly related to intake and are likely to be of variable quality. For example, the higher “disappearance” of calories per capita for the United States compared with most countries is probably (p.8) related in part to wasted food in addition to higher actual intake. Furthermore, aggregate data for a geographic unit as a whole may be only weakly related to the diets of those individuals at risk of disease. As an extreme example, the interpretation of correlational data regarding alcohol intake and breast cancer is complicated because, in some cultures, most of the alcohol is consumed by men, but it is the women who develop breast cancer. These issues of data quality can potentially be addressed by collecting information on actual dietary intake in a uniform manner from the population subgroups of interest. This is currently being done in a study conducted in 65 geographic areas within China that are characterized by an unusually large variation in the rates of many cancers (Chen et al., 1990).

Another serious limitation of the international correlational studies is that they cannot be independently reproduced, which is an important part of the scientific process. Although the dietary information can be improved and the analyses can be refined, the resulting data will really not be independent; the populations, their diets, and the confounding variables are the same. Thus, it is not likely that many new insights will be obtained from further ecologic studies among countries. For this reason, the methodologic aspects of correlational studies are not discussed further in this book.

The role of correlational studies in nutritional epidemiology is controversial. Clearly these analyses have stimulated much of the current research on diet and cancer, and in particular they have emphasized the major differences in cancer rates among countries. Traditionally, such studies have been considered the weakest form of evidence, primarily due to the potential for confounding by factors that are difficult to measure and control (Kinlen, 1983). More recently, some have thought that such studies provide the strongest form of evidence for evaluating hypotheses relating diet to cancer (Hebert and Wynder, 1987; Prentice et al., 1988). On balance, ecologic studies have unquestionably been useful, but are not sufficient to provide conclusions regarding the relationships between dietary factors and disease and may sometimes be completely misleading.

Special Exposure Groups

Groups within a population that consume unusual diets provide an additional opportunity to learn about the relation of dietary factors and disease (Zaridze et al., 1985). These groups are often defined by religious or ethnic characteristics and provide many of the same strengths as ecologic studies. In addition, the special populations often live in the same general environment as the comparison group, which may somewhat reduce the number of alternative explanations for any differences that might be observed. For example, the observation that colon cancer mortality in the largely vegetarian Seventh-day Adventists is only about half that expected (Phillips et al., 1980) has been used to support the hypothesis that meat consumption is a cause of colon cancer.

Findings based on special exposure groups are subject to many of the same limitations as ecologic studies. Many factors, both dietary and nondietary, are likely to distinguish these special groups from the comparison population. Thus, other possible explanations for the lower colon cancer incidence and mortality among the Seventh-day Adventist population are that differences in rates are attributable to a lower intake of alcohol or a higher vegetable consumption. Given the many possible alternative explanations, such studies may be particularly useful when a hypothesis is not supported. For example, the finding that the breast cancer mortality rate among the Seventh-day Adventists is not appreciably different from the rate among the general U.S. population provides fairly strong evidence that meat eating does not cause a major increase in the risk of breast cancer (see Chapter 16).

Migrant Studies and Secular Trends

Migrant studies have been particularly useful in addressing the possibility that the correlations observed in the ecologic studies (p.9) are due to genetic factors. For most cancers, populations migrating from an area with its own pattern of cancer incidence rates acquire rates characteristic of their new location (Staszewski and Haenszel, 1965; Adelstein et al., 1979; McMichael and Giles, 1988; Shimizu et al., 1991; Ziegler et al., 1993), although, for a few tumor sites, this change occurs only in later generations (Haenszel et al., 1972; Buell, 1973). Therefore, genetic factors cannot be primarily responsible for the large differences in cancer rates among these countries. Migrant studies are also useful to examine the latency or relevant time of exposure (see Chapter 16).

Major changes in the rates of a disease within a population over time provide evidence that nongenetic factors play an important role in the etiology of that disease. In the United States, for example, rates of coronary heart disease rose dramatically over the first half of this century, and then subsequently declined (Working Group on Arteriosclerosis, 1981). These secular changes clearly demonstrate that environmental factors, possibly including diet, are primary causes of this disease, even though genetic factors may still influence who becomes affected given an adverse environment.

Case–Control and Cohort Studies

Many of the weaknesses of correlational studies are potentially avoidable in case–control studies (in which information about previous diet is obtained from diseased patients and compared with that from subjects without the disease) or cohort investigations (in which information on diet is obtained from disease-free subjects who are then followed to determine disease rates according to levels of dietary factors). In such studies, the confounding effects of other factors can be controlled either in the design (by matching subjects to be compared on the basis of known risk factors or by restriction) or in the analysis (by any of a variety of multivariate methods) if information has been collected on the confounding variables. Furthermore, dietary information can be obtained for the individuals actually affected by disease, rather than using the average intake of the population as a whole.

Case–control studies generally provide information more efficiently and rapidly than cohort studies because the number of subjects is typically far smaller and no follow-up is necessary. However, consistently valid results may be difficult to obtain from case–control studies of dietary factors and disease because of the inherent potential for methodologic bias. This potential for bias is not unique for diet but is likely to be unusually serious for several reasons. Due to the limited range of variation in diet within most populations and some inevitable error in measuring intake, realistic relative risks in most studies of diet and disease are likely to be modest, say on the order of 0.5–2.0. These relative risks may seem small, but would be quite important because the prevalence of exposure is high. Given typical distributions of dietary intake, these relative risks are usually based on differences in means for cases and controls (or those who become cases and those who remain noncases in prospective studies) of only about 5% (see Chapters 3 and 12). Thus, a systematic error of even 3% or 4% can seriously distort such a relationship. In case–control studies it is easy to imagine that biases (due to selection or recall) of this magnitude could often occur, and it is extremely difficult to exclude the possibility that this degree of bias has not occurred in any specific study. Hence, it would not be surprising if case–control studies of dietary factors lead to inconsistent findings.

The selection of an appropriate control group for a study of diet and disease is also usually problematic. One common practice is to use patients with another disease as a control group, with the assumption that the exposure under study is unrelated to the condition of this control group. Because diet may well affect many diseases, it is often difficult to identify disease groups that are definitely unrelated to the aspect of diet under investigation. A common alternative is to use a sample of persons from (p.10) the general population as the control group. In many areas, particularly large cities, participation rates are low; it is common for only 60%–70% of eligible population controls to complete an interview (Hartge et al., 1984). Because diet is particularly associated with the level of general health consciousness, the diets of those who participate may differ substantially from those who do not; unfortunately, little information is available that directly bears on this issue.

The many potential opportunities for methodologic bias in case–control studies of diet raise a concern that incorrect associations may frequently occur. Even if many studies arrive at correct conclusions, distortion of true associations in a substantial percentage produces an inconsistent body of published data, making a coherent synthesis difficult or impossible for a specific diet and disease relationship. Methodologic sources of inconsistency may be particularly troublesome in nutritional epidemiology due to the inherent biologic complexity of nutrient-nutrient interactions. As the effect of one nutrient may depend on the level of another (which can differ between studies and may not have even been measured), such interactions may result in apparently inconsistent findings in the context of epidemiologic studies. Thus, compounding biologic complexity with methodologic inconsistency may result in an uninterpretable literature. Data accumulated in the last several years suggest that case–control studies, even when consistent, can be biased. For example, highly consistent positive associations between total energy intake and risk of colon cancer have been seen in case–control studies (Jain et al., 1980; Bristol et al., 1985; Potter and McMichael, 1986; Lyon et al., 1987; Graham et al., 1988; West et al., 1989; Peters et al., 1992), which seemed biologically implausible (see Chapter 11). However, in prospective studies, either no or inverse associations have been found (Garland and Garland, 1980; Phillips and Snowdon, 1983; Stemmermann et al., 1984; Hirayama, 1986; Willett et al., 1990; Bostick et al., 1994; Giovannucci et al., 1994; Goldbohm et al., 1994). As discussed in detail in Chapter 16, case–control and cohort studies have provided somewhat different perspectives of the relation between dietary fat and breast cancer. On the other hand, in both case–control and cohort studies of green and yellow vegetable intake in relation to lung cancer, remarkably consistent inverse associations have been found (see Chapter 15).

Prospective cohort studies avoid most of the potential sources of methodologic bias associated with case–control investigations. Because the dietary information is collected before the diagnosis of disease, illness cannot affect the recall of diet. Distributions of dietary factors in the study population may be affected by selective participation in the cohort; however, low participation rates at enrollment will not distort the relationships between dietary factors and disease. Although losses to follow-up that vary by level of dietary factors can result in distorted associations in a cohort study, follow-up rates tend to be rather high as participants have already provided evidence of willingness to participate and they may also be followed passively by means of disease registries and vital record listings (Stampfer et al., 1984). In addition to being less susceptible to bias, prospective cohort studies provide the opportunity to obtain repeated assessments of diet over time and to examine the effects of diet on a wide variety of diseases, including total mortality, simultaneously.

The primary constraints on prospective studies of diet are practical. Even for common diseases, such as myocardial infarction or cancers of the lung, breast, or colon, it is necessary to enroll tens of thousands of subjects. The use of structured, self-administered questionnaires has made studies of this size possible, although still expensive. For diseases of somewhat lower frequency, however, even very large cohorts will not accumulate a sufficient number of cases within a reasonable (p.11) amount of time. Case–control studies, therefore, continue to play an important role in nutritional epidemiology.

Controlled Trials

The most rigorous evaluation of a dietary hypothesis is the randomized trial, optimally conducted as a double-blind experiment. The principal strength of a randomized trial is that potentially distorting variables should be distributed at random between the treatment and control groups, thus minimizing the possibility of confounding by these extraneous factors. In addition, it is sometimes possible to create a larger contrast between the groups being compared by use of an active intervention. Such experiments among humans, however, are best justified after considerable nonexperimental data have been collected to ensure that benefit is reasonably probable and that an adverse outcome is unlikely. Experimental studies are particularly practical for evaluating hypotheses that minor components of the diet, such as trace elements or vitamins, can prevent disease because these nutrients can be formulated into pills or capsules (Stampfer et al., 1985).

Even if feasible, randomized trials of dietary factors and disease are likely to encounter several limitations. The time between change in the level of a dietary factor and any expected change in the incidence of disease is typically uncertain. Therefore, trials must be of long duration, and usually one cannot eliminate the possibility that any lack of difference between treatment groups may be due to insufficient duration. Compliance with the treatment diet is likely to decrease during an extended trial, particularly if treatment involves a real change in food intake, and the control group may well adopt the dietary behavior of the treatment group if the treatment diet is thought to be beneficial. Such trends, which were found in the Multiple Risk Factor Intervention Trial of coronary disease prevention (Multiple Risk Factor Intervention Trial Research Group, 1982), may obscure a real benefit of dietary change.

A related potential limitation of trials is that participants who enroll in such studies tend to be highly selected on the basis of health consciousness and motivation. It is possible, therefore, that the subjects at highest potential risk on the basis of their dietary intake, and thus susceptible to intervention, are seriously underrepresented. For example, if low folic acid intake is thought to be a risk factor for lung cancer and a trial of folic acid supplementation is conducted among a health conscious population that includes few individuals with low folic acid intake, one might see no effect simply because the study population was already receiving the maximal benefit of this nutrient through its usual diet. In such an instance, it would be useful to measure dietary intake of folic acid before starting the trial. Because the effect of supplementation is likely to be greatest among those with low dietary intakes, it would be possible either to exclude those with high intakes (the potentially nonsusceptibles) either before randomization or in subanalyses at the conclusion of the study. This requires, of course, a reasonable measurement of dietary intake.

It is sometimes said that trials provide a better quantitative measurement of the effect of an exposure or treatment because the difference in exposure between groups is better measured than in an observational study. Although this contrast may at times be better defined in a trial (it is usually clouded by some degree of noncompliance), trials still usually produce an imprecise measure of the effect of exposure due to marginally adequate sample sizes and ethical considerations that require stopping soon after a statistically significant effect is seen. For example, with a p value close to 0.05, as was found in the Lipid Research Clinics Coronary Primary Prevention Trial (Lipid Research Clinics Program, 1984), the 95% confidence interval extends from no effect to an implausibly strong effect. In an observational study, an ethical imperative to stop does not exist when statistical significance occurs; continued accumulation (p.12) of data can provide increasing precision regarding the relation between exposure and disease. A trial can provide unique information on the latent period between change in an exposure and change in disease; because spontaneous changes in diet are typically not clearly demarcated in time, the estimation of latent periods for dietary effects is usually difficult in observational studies.

Although all hypotheses would ideally be evaluated in randomized trials, this is sometimes impossible for practical or ethical reasons. For example, our knowledge of the effects of cigarette smoking on risk of lung cancer is based on observational studies, and it is similarly unlikely that randomized trials could be conducted to examine the effect of alcohol use on human breast cancer risk. It remains unclear whether trials of sufficient size, duration, and degree of compliance can be conducted to evaluate many hypotheses that involve major changes in eating patterns, such as a reduction in fat intake.


The interpretation of positive (or inverse) associations in epidemiologic studies has received considerable attention; however, the evaluation of null or statistically nonsignificant findings has received less. Because either finding is potentially important, both are considered here.

If an association is observed in an epidemiologic study, we are usually concerned whether it represents a true cause-and-effect relationship; that is, if we actively changed the exposure, would that influence the frequency of disease? Hill (1965) has discussed factors that have frequently been considered as criteria for causality. These have included the strength of association, the consistency of a finding in various studies and populations, the presence of a dose–response gradient, the appropriate temporal relationship, the biologic plausibility, and the coherence with existing data. As has been pointed out by Rothman (1986), these cannot be considered as criteria because exceptions are likely to be frequent; this is particularly true in nutritional epidemiology. In this field, true associations are not likely to be strong, although relative risks of 0.7 or 1.5 could potentially be important because the dietary exposures are common. The consistent finding of an association that cannot be explained by other factors in various populations markedly reduces the possibility that chance explains the findings and increases the likelihood of causality. Although the reproducibility of findings is extremely important, null findings should sometimes be expected in nutritional epidemiology as noted above, even when a causal relationship may exist; thus, absolute consistency is not a realistic expectation. Dose–response relationships are likely to be nonlinear and may be of almost any shape, depending on the starting point on a hypothetical spectrum of exposure (Fig. 1–2). Moreover, apparently clear dose–response relationships can easily be the result of bias or confounding. Although compatibility of a finding with an established mechanism of disease causation supports causality, post hoc biologic explanations should be viewed cautiously because they can usually be developed for most observations, including those that are later refuted. Moreover, the pathophysiology of most cancers and many other chronic diseases is poorly understood so that lack of a well-defined mechanism should not be construed as evidence against causality.

Knowledge that an association exists, even if deemed causal, is not sufficient to make public or personal decisions. Such actions require some knowledge of the shape and quantitative aspects of the dose–response relationship. For instance, knowledge that total fat intake is associated with risk of colon cancer would not provide a sufficient basis to recommend a universal reduction in fat intake. It would be much more useful to know, for example, the change in risk associated with a decrease in (p.13)

                   Overview of Nutritional Epidemiology

Figure 1–2. Hypothetical relationship between intake of an essential dietary factor and health. If two points on the ascending part of the curve are compared, it might be concluded that the nutrient was beneficial; if points on the horizontal portion were compared, it might be concluded that the nutrient had no effect; if points on the descending segment were contrasted, it might be reported that the nutrient was deleterious. The health effects of the nutrient can only be fully appreciated by an examination of the dose–response relationship over the full range of exposures, which may not be possible within any single study. (From Mertz, 1981; reproduced with permission.)

fat intake from 40% to 30% of total energy intake, which has been considered realistic for the U.S. population (Committee on Diet, Nutrition and Cancer, 1982), as well as the effect of a change from 30% to 20% of calories, which probably represents a limit of feasibility for the United States. It is entirely possible that a strong relation between fat intake and colon cancer risk exists below 20% of calories, but that above that level the relationship is nonlinear, flat, or too weak to be of importance (McMichael and Potter, 1985). In addition to this information, knowledge of the approximate latent period between alteration in diet and change in disease incidence would be important. If this were several decades, older individuals might rationally ignore the association in making decisions regarding their diet.

Interpretation of Null Associations

In a study of diet and disease, failure to observe a statistically significant association when such an association truly exists can occur in several circumstances, alluded to earlier. One possibility is that the variation in diet is insufficient; in the extreme, no associations can occur if everyone in the study population eats the same diet. Second, variation may exist for the study population, but only within a “flat” portion of the total dose–response relationship. A third possibility is that the method of measuring dietary intake is not sufficiently precise to measure differences that truly exist. Fourth, an association may be missed because of low statistical power due to an inadequate number of diseased and nondiseased subjects. Fifth, a relationship could be undetected because the temporal relationship between the measured exposure and the occurrence of disease did not encompass the true latent period; this could easily happen if the critical dietary exposure occurred during childhood and the disease was diagnosed during adulthood. Sixth, an association could be undetected because some unmeasured third variable was related to exposure and disease in opposite directions; in other words, negative confounding existed. In addition to these (p.14) six largely biologic reasons for failure to observe an association, methodologic sources of bias could obscure a relationship.

It is obviously not informative to describe a study as null or nonsignificant unless the possible explanations noted previously have been addressed. Clearly, no single study can fully encompass the total possible range of human diets, measure all aspects of diet with absolute precision, assess all potential latent periods, and control for all potentially confounding variables. What must be done, then, is to describe the conditions and limitations of the null findings. First, it is critical to demonstrate that true variation in diet exists within the study population and that the method of measuring diet provides useful discrimination among subjects (see Chapter 6). It is not adequate to demonstrate that dietary variation exists on the basis of measurements using the study instrument alone because this variation could merely represent error. On the other hand, demonstration that measurements made using the study instrument correlate with measurements made using another method with independent sources of error provides evidence both that diet does vary within the population and that the study instrument is capable of detecting this variation.

Although confidence intervals are important for reporting positive associations, they are even more critical for results that are near the null (e.g., relative risk = 1.0) or not statistically significant because they provide a sense of the range of values that are still consistent with the data. Although it has only recently been done in practice, confidence limits should ideally be adjusted for measurement error; measurement error tends to make the true confidence intervals wider than those usually calculated, assuming no such error (Rosner et al., 1989). It has become fashionable, and even required by some editors, to include a priori power calculations in reports of study results. Because confidence intervals are determined by the observed data as well as the influence of chance, a priori power estimates add little once the study is completed. (The use of power estimates to interpret nonsignificant findings can easily be misleading; it is quite possible for a study to have low a priori power to detect a positive association but have confidence limits that widely exclude that positive association if the association is in the opposite direction.) The range of latent periods encompassed by the study should also be described; in dietary studies usually this is possible only crudely. If the study is a prospective cohort, or data are available in retrospect for several points in time, analyses can be conducted to evaluate associations separately for different latent periods (Rothman, 1986). Finally, it will be important to describe the dietary and nondietary correlates of the primary exposure that have been evaluated as potential confounding variables.

Because it is rare that all aspects of a hypothesis can be addressed in one study, it is important to describe which aspects have or have not been evaluated. For example, it is of limited use to conclude simply that a given study of dietary vitamin C intake and colon cancer was negative. It would be much more informative to say, for example, “Vitamin C intake determined by a detailed quantitative method was 40 mg per day for the 10th percentile and 200 mg per day for the 90th percentile. During a 5-year follow-up period the observed relative risk was 1.0 with a 95% confidence interval of 0.8–1.3 after adjusting for exposure measurement error for a difference of 50 mg per day of vitamin C intake, which corresponds to a 50% increase for the average subject. Finally, adjustment for parental history of colon cancer and intakes of dietary fiber and calcium did not alter the findings.” It is thus clear from this description that the effects of very low and very high vitamin C intakes and the influence of childhood diet, are not being evaluated and that a 10%, but not a 30%, reduction in risk by a 50% increase in vitamin C intake later in life is still quite possible.

(p.15) Multivariate Relationships of Diet and Disease

Relationships between dietary factors and disease are likely to be extremely complex for both biologic and behavioral reasons. Types and amounts of food eaten may be related to important nondietary determinants of disease, such as age, smoking, exercise, and occupation, which may both distort or confound and modify relationships with diet. As discussed in Chapter 2, intakes of specific nutrients tend to be intercorrelated so that associations with one nutrient may be confounded by other aspects of the diet. Furthermore, the intake of one nutrient may modify the absorption, metabolism, or requirement for another nutrient, thus creating a biologic interaction. Due to these complexities, it is generally unsatisfactory to examine the relationship between a single dietary factor and disease in isolation. In practice, it is almost always necessary to employ multivariate techniques, including both stratified analyses and statistical models, to adjust for potentially confounding variables and examine interactions. These strategies are discussed further in Chapter 13.

The use of multivariate methods in any particular analysis requires a careful consideration of the precise question that is being posed and whether potential covariates are true confounders as opposed to effects of the primary exposure. Confusion resulting from the inappropriate application of multivariate methods is illustrated by the controversy surrounding the relation of body fat and risk of coronary heart disease (Manson et al., 1987). In a number of reports, blood pressure, glucose tolerance, and serum lipid levels were included in multivariate models along with a measure of body fat. Because these other risk factors are strongly influenced by obesity and are thus in the causal pathway relating relative weight with coronary heart disease, their inclusion substantially diminishes the apparent effect of relative weight. Conclusions based on such analyses that obesity has little relationship with coronary heart disease are misleading because obesity cannot be stripped of its metabolic consequences by sophisticated statistical methods. The application of multivariate methods in nutritional epidemiology necessitates maximal use of existing knowledge regarding the effects of dietary factors to avoid similar problems in the future.


Although growing rapidly, our knowledge is still largely incomplete regarding the relationships between dietary factors and the major illnesses of our culture. These illnesses include not only cancer and heart disease, which have received the most attention, but also congenital malformations, degenerative conditions of the eye, fractures, and many infectious diseases that are hypothesized to be influenced by the nutritional status of the host. Randomized trials may eventually provide definitive answers to some of these questions. However, in the near future, our knowledge of many of these relationships will depend largely on observational epidemiologic data, and for many relationships this may be indefinitely. For this reason, it is crucial to refine maximally our methods of data collection, analytic procedures, and interpretation of findings. The ensuing chapters are intended to further our progress in this direction.


Bibliography references:

Adelstein, A. M., J. Staszewski, and C. S. Muir (1979). Cancer mortality in 1970–1972 among Polish-born migrants to England and Wales. Br J Cancer 40, 464–475.

Ames, B. N. (1983). Dietary carcinogens and anticarcinogens. Oxygen radicals and degenerative diseases. Science 221, 1256–1264.

Ames, B. N., L. S. Gold, and W. C. Willett (1995). The causes and prevention of cancer. Proc Natl Acad Sci USA 92, 5258–5265.

Ames, B. N., R. Magaw, and L. S. Gold (1987). Ranking possible carcinogenic hazards. Science 236, 271–280.

(p.16) Armstrong, B., and R. Doll (1975). Environmental factors and cancer incidence and mortality in different countries, with special reference to dietary practices. Int J Cancer 15, 617–631.

Bostick, R. M., J. D. Potter, L. H. Kushi, T. A. Sellers, K. A. Steinmetz, D. R. McKenzie, S. M. Gapstur, and A. R. Folsom (1994). Sugar, meat, and fat intake, and non-dietary risk factors for colon cancer incidence in Iowa women (United States). Cancer Causes Control 5, 38–52.

Bristol, J. B., P. M. Emmett, K. W. Heaton, and R. C. Williamson (1985). Sugar, fat, and the risk of colorectal cancer. Br Med J Clin Res Ed 291, 1467–1470.

Buell, P. (1973). Changing incidence of breast cancer in Japanese-American women. JNCI 51, 1479–1483.

Chen, J., T. C. Campbell, L. Junyao, and R. Peto (1990). Diet, Life-style and Mortality in China: A Study of the Characteristics of 65 Chinese Counties. Oxford, England: Oxford University Press.

Committee on Diet, Nutrition and Cancer, Assembly of Life Sciences, National Research Council (1982). Diet, Nutrition, and Cancer. Washington, D.C.: National Academy Press.

Davidson, S., R. Passmore, and J. F. Brock (1973). Human Nutrition and Dietetics. Edinburgh: Churchill Livingstone.

Garland, C. F., and F. C. Garland (1980). Do sunlight and vitamin D reduce the likelihood of colon cancer? Int J Epidemiol 9, 227–231.

Giovannucci, E., E. B. Rimm, M. J. Stampfer, G. A. Colditz, A. Ascherio, and W. C. Willett (1994). Intake of fat, meat, and fiber in relation to risk of colon cancer in men. Cancer Res 54, 2390–2397.

Goldberger, J. E. (1964). Goldberger on Pellegra. Baton Rouge: Louisiana State University Press.

Goldbohm, R. A., P. A. van den Brandt, P. van’t Veer, H. A. M. Brants, E. Dorant, F. Sturmans, and R. J. J. Hermus (1994). A prospective cohort study on the relation between meat consumption and the risk of colon cancer. Cancer Res 54, 718–723.

Graham, S., J. Marshall, B. Haughey, A. Mittelman, M. Swanson, M. Zielezny, T. Byers, G. Wilkinson, and D. West (1988). Dietary epidemiology of cancer of the colon in western New York. Am J Epidemiol 128, 490–503.

Guangi-Qi, Y. (1987). Research on selenium-related problems in human health in China. In Combs, G. F., Spallholz, J. E., Levander, O. R., and Oldfield, J. E., (eds): Selenium in Biology and Medicine, Part A. New York: Van Nostrand Reinhold, pp 9–32.

Haenszel, W., M. Kurihara, M. Segi, and R. K. Lee (1972). Stomach cancer among Japanese in Hawaii. JNCI 49, 969–988.

Hartge, P., L. A. Brinton, J. F. A. Rosenthal, J. I. Cahill, R. N. Hoover, and J. Waksberg (1984). Random digit dialing in selecting a population-based control group. Am J Epidemiol 120, 825–833.

Hebert, J. R., and E. L. Wynder (1987). Dietary fat and the risk of breast cancer (letter). N Engl J Med 317, 165.

Hill, A. B. (1965). The environment and disease: Association or causation? Proc R Soc Med 58, 295–300.

Hirayama, T. (1986). A large-scale study on cancer risks by diet—with special reference to the risk reducing effects of green-yellow vegetable consumption. In Hayashi, Y., Magao, M., and Sugimura, T., et al (eds): Diet, Nutrition, and Cancer. Tokyo: Japan Scientific Societies Press, pp 41–53.

Jacobs, D. R., Jr., J. T. Anderson, and H. Blackburn (1979). Diet and serum cholesterol: Do zero correlations negate the relationship? Am J Epidemiol 110, 77–87.

Jain, M., G. M. Cook, F. G. Davis, M. G. Grace, G. R. Howe, and A. B. Miller (1980). A case–control study of diet and colorectal cancer. Int J Cancer 26, 757–768.

Kinlen, L. J. (1983). Fat and Cancer. Br Med J Clin Res Ed 286, 1081–1082.

Kromhout, D. (1989). Diet and Mortality: Strengthening Cross-Cultural Correlations With Time. Epidemiology, Nutrition, and Health. Proceedings of the 1st Berlin Meeting on Nutritional Epidemiology, 1988, Berlin, Germany, Smith-Gordon, London.

Lind, J. (1753). A Treatise on the Scurvy. Reprinted Edinburgh: Edinburgh University Press, 1953.

Lipid Research Clinics Program (1984). The lipid research clinics coronary primary prevention trial results. Reduction in incidence of coronary heart disease. JAMA 251, 351–364.

Lyon, J. L., A. W. Mahoney, D. W. West, J. W. Gardner, K. R. Smith, A. W. Sorenson, and W. Stanish (1987). Energy intake: Its relationship to colon cancer risk. JNCI 78, 853–861.

Manson, J. E., M. J. Stampfer, C. H. Hennekens, and W. C. Willett (1987). Body weight and longevity. A reassessment. JAMA 257, 353–358.

Mattson, F. H., B. A. Erickson, and A. M. Kligman (1972). Effect of dietary cholesterol on serum cholesterol in man. Am J Clin Nutr 25, 589–594.

(p.17) McKeown-Eyssen, G. E., and E. Bright-See (1985). Dietary factors in colon cancer: International relationships. An update. Nutr Cancer 7, 251–253.

McMichael, A. J., and G. G. Giles (1988). Cancer in migrants to Australia: Extending descriptive epidemiological data. Cancer Res 48, 751–756.

McMichael, A. J. and J. D. Potter (1985). Diet and colon cancer: integration of the descriptive, analytic, and metabolic epidemiology. Natl Cancer Inst Monogr 69, 223–228.

Mertz, W. (1981). The essential trace elements. Science 213, 1332–1338.

Multiple Risk Factor Intervention Trial Research Group (1982). Multiple Risk Factor Intervention Trial: Risk factor changes and mortality results. JAMA 248, 1465–1477.

Peters, R. K., M. C. Pike, D. Garabrandt, and T. M. Mack (1992). Diet and colon cancer in Los Angeles County, California. Cancer Causes Control 3, 457–473.

Phillips, R. L., L. Garfinkel, J. W. Kuzma, W. L. Beeson, T. Lotz, and B. Brin (1980). Mortality among California Seventh-day Adventists for selected cancer sites. JNCI 65, 1097–1107.

Phillips, R. L., and D. A. Snowdon (1983). Association of meat and coffee use with cancers of the large bowel, breast, and prostate among Seventh-day Adventists: Preliminary results. Cancer Res 43(suppl), 2403S–408S.

Porter, J. D., and A. J. McMichael (1986). Diet and cancer of the colon and rectum: A case–control study. JNCI 76, 557–569.

Prentice, R. L., F. Kakar, S. Hursting, L. Sheppard, R. Klein, and L. H. Kushi (1988). Aspects of the rationale for the Women’s Health Trial. JNCI 80, 802–814.

Rose, G. (1982). Incubation period of coronary heart disease. Br Med J Clin Res Ed 284, 1600–1601.

Rosner, B., W. C. Willett, and D. Spiegelman (1989). Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics Med 8, 1051–1069.

Rothman, K. J. (1986). Modern Epidemiology. Boston, MA: Little, Brown and Company.

Shekelle, R. B., A. M. Shryock, O. Paul, M. Lepper, J. Stamler, S. Liu, and W. J. Raynor Jr. (1981). Diet, serum cholesterol, and death from coronary heart disease: The Western Electric Study. N EngI J Med 304, 65–70.

Shimizu, H., R. K. Ross, L. Bernstein, R. Yatani, B. E. Henderson, and T. M. Mack (1991). Cancers of the prostate and breast among Japanese and white immigrants in Los Angeles County. Br J Cancer 63, 963–966.

Stampfer, M. J., J. E. Buring, W. Willett, B. Rosner, K. Eberlein, and C. H. Hennekens (1985). The 2×2 factorial design: Its application to a randomized trial of aspirin and carotene in US physicians. Statistics Med 4, 111–116.

Stampfer, M. J., W. C. Willett, F. E. Speizer, D. C. Dysert, R. Lipnick, B. Rosner, and C. H. Hennekens (1984). Test of the National Death Index. Am J Epidemiol 119, 837–839.

Staszewski, J., and W. Haenszel (1965). Cancer mortality among the Polish-born in the United States. JNCI 35, 291–297.

Stemmermann, G. N., A. M. Nomura, and L. K. Heilbrun (1984). Dietary fat and the risk of colorectal cancer. Cancer Res 44, 4633–4637.

West, D. W., M. L. Slattery, L. M. Robison, K. L. Schuman, M. H. Ford, A. W. Mahoney, J. L. Lyon, and A. W. Sorensen (1989). Dietary intake and colon cancer: Sex and anatomic site-specific associations. Am J Epidemiol 130, 883–894.

Willett, W. (1987). Nutritional epidemiology: Issues and challenges. Int J Epidemiol 16, 312–317.

Willett, W. C., M. J. Stampfer, G. A. Colditz, B. A. Rosner, and F. E. Speizer (1990). Relation of meat, fat, and fiber intake to the risk of colon cancer in a prospective study among women. N Engl J Med 323, 1664–1672.

Williams, R. R. (1961). Toward the Conquest of Beriberi. Cambridge, MA: Harvard University Press.

Working Group on Arteriosclerosis of the National Heart, Lung, and Blood Institute (1981). Decline in Coronary Heart Disease Mortality, 1963–78. Vol. 2. DHHS Publication No. (NIH) 82–2035. Bethesda, MD: National Institutes of Health, pp 157–258.

Wynder, E. L. (1976). Nutrition and cancer. Fed Proc 35, 1309–1315.

Zaridze, D. G., C. S. Muir, and A. J. McMichael (1985). Diet and cancer: Value of different types of epidemiological studies. Nutr Cancer 7, 155–166.

Ziegler, R. G., R. N. Hoover, M. C. Pike, A. Hildesheim, A. M. Nomura, D. W. West, A. H. Wu-Williams, L. N. Kolonel, P. L. Horn-Ross, and J. F. Rosenthal (1993). Migration patterns and breast cancer risk in Asian-American women. JNCI 85, 1819–1827.