## Peter L. T. Pirolli

Print publication date: 2007

Print ISBN-13: 9780195173321

Published to Oxford Scholarship Online: April 2010

DOI: 10.1093/acprof:oso/9780195173321.001.0001

# Elementary Foraging Models

Chapter:
(p. 30 ) 2 Elementary Foraging Models
Source:
Information Foraging Theory
Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780195173321.003.0002

# Abstract and Keywords

This chapter provides a general overview of conventional models of optimal foraging with simple illustrations of how they can apply to idealized human—information interaction tasks. Two conventional optimal foraging models are presented in detail: (a) the patch model, which addresses decisions related to searching and exploiting an environment that has a patchy distribution of resources, and (b) the diet model, which addresses what kinds of things to eat and what to ignore. These conventional models make unrealistic assumptions about the fine-grained details of cognition but they provide useful first approximations to information-foraging situations. These classic optimal-foraging models are applied to idealized examples of using search engines and dealing with e-mail that contains spam.

The detailed analyses and models presented in later chapters draw upon various parts of optimal foraging theory as well as other general approaches to rational analysis. This chapter provides a very general overview of the conventional models of optimal foraging (Stephens & Krebs, 1986 ). These conventional models make unrealistic assumptions about the fine-grained details of cognition—for instance, they assume perfect knowledge of the environment—but at the rational band of analysis (see chapter 1 ) they provide useful first approximations to information foraging situations. When the details of these models are elaborated by assumptions about the limitations of the cognitive architecture, they have resulted in highly predictive models such as SNIF-ACT (chapter 5 ) and ACT-IF (chapter 6 ).

The approach taken in this chapter is to present several basic foraging models and then illustrate the models with idealized examples from food foraging and from information foraging. Like illustrations used in physics that require the assumption of frictionless surfaces, the examples of food foraging and information foraging are purposely simplified in order to focus the discussion on the rational models. The complexity of the real world will be met head-on in later chapters.

# Optimal Foraging Theory

As implied by its name, Information Foraging Theory has drawn heavily upon models and techniques developed in optimal foraging theory (Stephens & Krebs, 1986 ). Optimal foraging theory seeks to explain adaptations of organism structure and behavior to the environmental problems and constraints of foraging for food. Optimal foraging theory originated in attempts to address puzzling findings that arose in ethological studies of food seeking and prey selection among animals (Stephens & Krebs, 1986 ). For instance, why would a predator eat a particular kind of prey in one environment but ignore the same prey in (p. 31 ) another environment? It has had an enormous impact in anthropology (Smith & Winterhalder, 1992 ), where it has been used to explain dietary choice (Kaplan & Hill, 1992 ), variations in land tenure and food sharing (Smith, 1987 ), group size (Smith, 1981 ), habitat choice (Cashdan, 1992 ), time allocation (Hames, 1992 ), and many other aspects of hunter-gatherer culture. Independent of the development of Information Foraging Theory, Sandstrom ( 1994 ) has suggested that optimal foraging theory may successfully address the complex empirical phenomena that arise in the scientific literatures.

Optimal foraging theory (Stephens & Krebs, 1986 ) seeks to explain adaptations of organism structure and behavior to the environmental problems and constraints of foraging for food. A key assumption is that animals (including humans) should have well-designed food-seeking strategies because higher rates of energy consumption should generally translate in higher reproductive success. 1 Consider a hypothetical predator, such as a bird of prey. Its fitness will depend on its reproductive success, which in turn will depend on how well it finds food that provides energy. The environment surrounding this bird will have a patchy structure, with different types of habitat (e.g., meadows, woodlots, and ponds) containing different amounts and kinds of prey. For the bird of prey, different types of habitat and prey will yield different amounts of net energy if included in the diet. Furthermore, the different prey types will have different distributions over the environment. For the bird of prey, this means that the different habitats or prey will have different access or navigation costs. Different species of birds of prey might be compared on their ability to extract energy from the environment. Birds are better adapted if they have evolved strategies that better solve the problem of maximizing the amount of energy returned per amount of effort. Conceptually, the optimal forager is one that has the best solution to the problem of maximizing the rate of net energy returned per effort expended, given the constraints of the environment in which it lives.

In their comprehensive survey of the field, Stephens and Krebs ( 1986 ) begin with discussion of two conventional models: (a) the patch model, which addresses decisions related to searching and exploiting an environment that has a patchy distribution of resources, and (b) the diet model, which addresses what kinds of things to eat and what to ignore. I follow their discussion using some simple hypothetical examples. As with many elegant theoretical models, these are certainly wrong in detail, but they provide understanding and insight. It should be noted that there are many other optimal foraging models in the literature that consider many other fascinating decision problems. Stephens and Krebs ( 1986 ) provide an excellent introduction to many of these models in behavioral ecology, Winterhalder and Smith ( 1992b ) collect many summary papers in the study of human behavior, Mangel and Clark ( 1988 ) present dynamic models of foraging, and Bell ( 1991 ) provides an excellent summary of observed food search strategies in the context of optimal foraging theory.

# Patch Model

Chapter 1 presents a summary version of Charnov’s Marginal Value Theorem (Charnov, 1976 ), which was developed in optimal foraging theory to deal with predictions of the amount of time an organism would forage in a patch before leaving to search for another. This is the conventional patch model in optimal foraging theory. Here, I provide a more detailed account of Charnov’s Marginal Value Theorem, and additional mathematical details are presented in the appendix.

## Characterizing Foraging in Patches by the Rate of Gain

As discussed in chapter 1 , the conventional patch model deals with situations in which organisms face an environment in which food is distributed in a patchy manner. By analogy, information patch models may deal with situations in which the information forager deals with information that is distributed in a patchy manner. For instance, chapters, books, bookshelves, and libraries impose a hierarchical structure on the arrangement of information. Our offices tend to have a patchy structure that evolves from use. For instance, my immediate desk work area may contain a variety of information items that are involved in some current task. Within arms’ reach there may be a variety of piles of documents that may contain topically related content (e.g., my pile of papers about foraging theory) or task-related content (e.g., my itinerary and receipts related to a travel expense report). Within the office there are also file cabinets (with a hierarchically organized file system), bookshelves, and books. As discussed in chapter 3 , the World Wide Web also exhibits a patchy structure.

figure 2.1 A hypothetical bird forages in an environment consisting of patches containing berry clusters. The foraging behavior can be characterized in terms of total rewards (G) and time spent between (TB) and within (TW) patches.

Figure 2.1 presents the idealized view of a forager assumed in the conventional patch model. It is assumed that a forager, such as a bird, searches through the environment and on occasion encounters a patch of food resources, such as a berry bush containing clusters of berries. The forager must expend some amount of between-patch time getting to the next food patch. Once in a patch, the forager engages in within-patch foraging and faces the decision of continuing to forage in the patch or leaving to seek a new one. Frequently, as an animal forages within a patch, the amount of food diminishes or depletes. For instance, a bird might deplete the berries on a bush as it eats them. In such cases, there will be a point at which the expected future gains from foraging within a current patch of food diminish to the point that they are less than the expected gains that could be made by leaving the patch and searching for a new one.

To generalize across the two domains of food foraging and information foraging, let us assume that the activity of foraging results in some total gain, G, in some measurable thing of value. In the case of food foraging, this may be the number of calories of energy gained from eating. In the case of information foraging, this might be some other utility that results from achieving a goal. Figure 2.2a is a hypothetical graph of the cumulative gains for the foraging behavior illustrated in figure 2.1 . The time expended on the forager’s search process proceeds left to right on the abscissa of the graph in figure 2.2a . As depicted in figure 2.1 , the hypothetical forager encounters one patch, consumes a couple of berry clusters, leaves the patch, searches for a new patch, encounters a second patch, and consumes a couple of more berry clusters. For simplicity, figure 2.2a assumes that the cumulative rewards gained from consuming the berry clusters in figure 2.1 come in discrete chunks. Each time a cluster of berries is consumed, the cumulative gains jump up in figure 2.2a . As time proceeds to the right in figure 2.2a , some gains accumulate in the first patch encountered, no further gains accumulate between patches, and some more gains are added after encountering the second patch.

figure 2.2 (a)The cumulative gain of rewards for the behavior of the hypothetical forager in figure 2.1 , and (b) the average rate of gain R expressed as a ratio of total rewards (G) to the total between-patch time (TB) and total with-patch time (TW).

The patch model assumes that the total foraging time of the hypothetical bird can be divided into two (p. 33 ) mutually exclusive activities: (a) the total amount of time spent between patches (searching for the next patch), TB, and (b) exploiting within patches TW (e.g., handling and consuming the berries). 2 Figure 2.2b is a rearrangement of the plot in figure 2.2a . In figure 2.2b , the right portion of the graph plots the cumulative gains shown in figure 2.2a purely as a function of the within-patch foraging time TW. Figure 2.2b also graphically illustrates the average rate of gain or rewards, R, which, as will become clear, is the key factor that characterizes the efficiency of the forager. The average rate of gain of value (calories; utility), R, is the ratio of the net value accumulated, G, divided by the total time spent between and with patches: (2.1)

(The appendix lists the definitions of variable used in models throughout this chapter.)

## Holling’s Disk Equation: Using Averages to Characterize the Rate of Gain

Figure 2.2 (and equation 2.1 ) characterizes the average rate of gain, R, in terms of total rewards gained and total time taken. This formulation is not particularly useful, but with some additional assumptions, it can be used to develop a way of characterizing the average rate of gain in terms of averages (rather than totals). The assumptions are as follows:

1. 1. The number of patches foraged is linearly related to the amount of time spent in between-patch foraging activities.

2. 2. The average time between patches, when searching, is t B.

3. 3. The average gain per patch is g.

4. 4. The average time to process each patch is t W.

On average, as the forager searches for patches, the patches will be encountered at an average rate of (2:2)

This rate can be used to define the expected total cumulative gain, G, as a linear function of between-patch foraging time, (2:3)

In equation 2.3 , λTB is the product of the total time spent searching for patches multiplied by the average rate of encountering patches, which produces the expected total number of patches that will be encountered. Since each patch produces an average reward, g, the product λTBg gives the expected total cumulative gain. Likewise, the expected total amount of within-patch time can be represented as (2:4)

Equation 2.4 multiplies the expected number of patches encountered λTB by the average amount of within-patch foraging time, t W.

Given the assumptions listed above, equation 2.1 can be rewritten to express the expected average rate of gain, (2.5)

This is what is known as Holling’s Disk Equation (Holling, 1959 ). 3 In contrast to equation 2.1 , which requires knowledge about total times and rewards, Holling’s Disk Equation is expressed in terms of averages that could be obtained by sample measurements from an environment. Holling’s Disk Equation serves as the basis for deriving several optimal foraging models. Stephens and Charnov ( 1982 ) have shown that broadly applicable stochastic assumptions lead asymptotically to equation 2.5 as foraging time grows large.

## Additional Characterizations of the Environment: Prevalence and Profitability

Two useful characterizations of the foraging environment can be made using equation 2.5 as context (see figure 2.3 ). In comparison to some baseline environment containing a patchy distribution of resources (figure 2.3a ), another environment may be “richer” because it has a higher prevalence of patches (figure 2.3b ). In comparison to figure 2.3a , the average time spent between patches is expected to be decrease in figure 2.3b because patches are more prevalent. Another way that the environment can become richer is because the patches themselves yield a higher rate of reward (figure 2.3c ). In other words, the patches are more profitable.

figure 2.3 In comparison to some baseline patchy environment (a), another foraging environment may be richer because patches are more prevalent (b) or because the patches themselves are more profitable (c).

In Holling’s Disk Equation (equation 2.5 ), the prevalence of patches is captured by λ (the rate of encountering patches). Increased prevalence would mean that the average time between patches, t B, would decrease, and the rate λ = 1/t B would increase. The profitability, π, of patches can be defined as the a ratio of the net rewards gained from a patch to the time cost of within-patch foraging,

In the context of equation 2.5 , increasing the profitability of within-patch activities increases the overall rate of gain, R. Decreasing the between-patch costs, t B (or equivalently, increasing prevalence λ), increases the overall rate of return, R, toward an asymptote equal to the profitability of patches, R = π.

## Within-Patch Gain Curves

The conventional patch model of optimal foraging theory (Stephens & Krebs, 1986 ) is an elaboration of equation 2.5 . It addresses the optimal allocation of total time to between-patch activities versus within patch activities, under certain strong assumptions. Rather than having a fixed average gain per patch and a fixed average within-patch cost, the patch model assumes (a) that there may be different kinds of patches and (b) that the expected gains from a patch can depend on the within-patch foraging time, which is under the control of the forager. The optimization problem is how much time to spend in each kind of patch before leaving to search for another.

The conventional patch model (Stephens & Krebs, 1986 ) assumes that the environment can be characterized as consisting of P different patch types that can be indexed using i = 1, 2, …, P. The conventional patch model assumes that the forager must expend some amount of time going from one patch to the next. Once in a patch, the forager faces the decision of continuing to forage in the patch or leaving to seek a new one. Each type of patch is characterized by

λi, the prevalence (or encounter rate) of patches of type i,

t Wi, the patch residence time, which is the amount of time the forager spends within patches of type i, and

gi(t Wi), the gain function for patches of type i that specifies the expected net gain as a function of foraging time spent within type i patches.

As discussed in the appendix, the conventional patch model can be expressed as a variant of Holling’s Disk Equation (equation 2.5 ): (2:6)

The numerator of equation 2.6 sums the expected gains from encountered patches of each type, and the denominator sums the time spent between and within patches.

Figure 2.4 presents a simple kind of gain function. In this example, there is a linear increase in cumulative within-patch gains up to the point at which the patch is depleted. In the information foraging domain, this might occur, for example, for an information forager who collects relevant citations from a finite list of citations returned by a search engine, where the relevant items occur randomly in the list.

figure 2.4 A gain function, g, characterizing a type of patch that yields rewards as a linear function of within-patch time, up to the point at which the patch is depleted.

As the forager processes the items, the cumulative gain function increases linearly, and when the end of the list is reached, the patch is depleted and the expected cumulative gain function plateaus.

Figure 2.5 illustrates graphically how the average rate of gain, R, will vary with different time allocation policies. Imagine that the forager’s environment is composed of just one kind of patch that has the simple linear within-patch gain function that eventually depletes, as illustrated in figure 2.4 . Assume that the average time (t B) spent between patches is 1/λi. Imagine that the forager can decide among three possible within-patch time allocation policies, t 1, t 2, and $t *$ , as illustrated in figure 2.5 . To see graphically the average rate of gain R that would be achieved by the different policies, one can plot lines, such as $R *$, from the origin and intersecting with the gain function, gi, at each particular within-patch time policy, such as t 1, t 2, or $t *$. The slope of these lines will be the average rate of gain because the slope will correspond to the expected amount of value gained from patches, g i (t Wi), divided by the average time spent in between-patch activities, t B, and the time spent within patches, t Wi. For cases such as figure 2.5 (linear but finite gains), a line, $R *$, tangent to g i and passing through the origin gives a slope equal to the optimal average rate of gain, and an optimal within-patch time allocation policy of $t *$. Policies of staying for shorter periods of time within patches (t 1) or longer (t 2) yield less than optimal average rates of gain. A forger should stay in such linear gain patches until the patches are exhausted (and no longer than that). When the patch is exhausted, the forager should move on the next patch.

figure 2.5 For the gain function in figure 2.4 , a within-patch time allocation policy of t* yields an optimal rate of gain (which is the slope of the line R*). Time allocation policies that are less than t* (e.g., t 1)or more than t* (e.g., t 2) will yield suboptimal overall rates.

## Charnov’s Marginal Value Theorem

Animals often forage in a patch that will have diminishing returns. The example of the hotel Web site in chapter 1 illustrates diminishing returns in the domain of information foraging. As mentioned in chapter 1 , Charnov’s ( 1976 ) Marginal Value Theorem was developed to deal with the analysis of time allocation for patches that yield diminishing returns curves, such as the ones depicted in figure 2.1 . The theorem, presented in detail in the appendix, deals with situations in which foraging within a patch has a decelerating expected net gain function, such as those in figure 2.6a . The theorem implies that a forager should remain in a patch as long as the slope of gi (i.e., the marginal value of gi) is greater than the average rate of gain R for the environment.

Figure 2.6 shows graphical representations of Charnov’s Marginal Value Theorem that appear in many discussions of optimal foraging theory. 4 Figure 2.6a captures the basic relations for the situation in which there is just one kind of patch-gain function. The prevalence of patches in the environment (assuming random distribution) can be captured by either the mean between-patch search time, t B, or the rate at which patches are encountered is λ = 1/t B. To determine the optimal rate of gain, $R *$, one draws (p. 36 ) a line tangent to the gain function gi(t W) and passing through t B to the left of the origin. The slope of the tangent will be the optimal rate of gain, R. The point of tangency also provides the optimal allocation to within-patch foraging time, $t *$. The point of tangency is the point at which the slope (marginal value) of gi is equal to the slope of tangent line, which is the average rate of gain R.

figure 2.6 (a) Charnov∗s Marginal Value Theorem states that the rate-maximizing time to spend in patch, t *, occurs when the slope of the within-patch gain function g is equal to the average rate of gain, which is the slope of the tangent line R*; (b) the average rate of gain increases with decreases in between-patch time costs; and (c) under certain conditions, improvements in the gain function also increase the average rate of gain.

To capture figure 2.6 mathematically, for the case in which there is just one kind of patch, let R(t W)be the overall rate of gain as a function of the time allocation policy, and let $g ′$ indicate the marginal value (the derivative or instantaneous slope) of the gain function g. For the case in which there is just one kind of patch, the patch model in equation 2.6 could be stated as (2.7)

Then, Charnov’s Marginal Value Theorem says that the optimal time to spend within each patch is that value $t *$ that satisfies the equation (2.8)

The left side of equation 2.8 is the marginal rate of the expected net within-patch gain function, and the right side is the overall rate of gain. As discussed in more detail in the appendix, for an environment in which there are P types of patches, the overall rate of gain depends on the time allocation policy $t ^ Wi$ for each type of patch i. Charnov’s Marginal Value Theorem says that the optimal set of $t ^ Wi$ value satisfies the condition that the marginal rate of gain for each type of patch is equal to the overall rate of gain, (2.9)

It should be noted that this more general form of Charnov’s Marginal Value Theorem, which deals with multiple kinds of patches, is not neatly captured by the simple one-patch model illustrated in figure 2.6 . It is also important to note that the theorem is applied to situations in which the gain function eventually becomes negatively accelerated.

## Effects of Between-Patch and Within-Patch Enrichment

The conventional patch models of optimal foraging theory deal with an unmoldable environment. The forager must optimize its selection of feasible strategies to fit the constraints of the environment. The information forager, however, can often mold the environment to fit the available strategies. This process is called enrichment.

(p. 37 ) One kind of environmental enrichment is to reduce the average cost of getting from one information patch to another. That is, the forager can modify the environment so as to minimize the between-patch foraging costs. Office workspaces tend to evolve layouts that seem to minimize the between-patch search cost for needed information. Such enrichment activities create a trade-off problem: Should one invest in reducing between-patch foraging costs, or should one turn to exploiting the patches?

A second kind of environmental enrichment involves making information patches that yield better returns of valuable information. That is, the forager can modify the environment so as to improve within-patch foraging results. For example, one may invest time in constructing and refining keyword queries for a search engine so that it returns lists with higher proportions of potentially relevant document citations. One may also enrich information patches by using filtering processes. For instance, people often filter their readings on a topic by first generating and filtering bibliographic citations and abstracts. Many computer systems for electronic mail, news, and discussion lists now include filters. Such enrichment activities create a trade-off problem: Should one continue to enrich patches to improve future within-patch foraging, or should one turn to exploiting them?

We may use the conventional patch model to reason qualitatively about these enrichment activities. Figure 2.6b illustrates the effects of enrichment activities that reduce between-patch time costs. As between-patch time costs are reduced from t B1 to t B2, the overall rate of gain increases from the slope of R1 to the slope of R2, and optimal within-patch time decreases from $t 1 ∗$ to $t 2 ∗$. Not only does reducing between-patch costs improve the overall average rate of gain, but also the optimal gain is achieved by spending less time within a patch (when the conditions satisfying Charnov’s Marginal Value Theorem hold; see the appendix).

Figure 2.6c illustrates the effects of enrichment activities that improve the returns from a patch. Figure 2.6c shows that as within-patch foraging gains are improved from g1 to g2, the optimal average rates of gain improve from the slope of R1 to R2 and the optimal within-patch time decreases from $t 1 ∗$ to $t 2 ∗$ . Again, within-patch enrichment not only improves the overall rate of gain but also reduces the optimal amount of time needed to spend within patches (when the conditions satisfying Charnov’s Marginal Value Theorem hold; see the appendix).

## Food Foraging Illustration: Birds and Mealworms

To illustrate concretely the predictions of the conventional patch model, I use data from one of the earliest tests of the model in Cowie ( 1977 ). Great tits (Parus major) were studied in a large artificial aviary containing artificial trees. The branches of the artificial trees contained sawdust-filled cups containing hidden mealworms. These cups constituted the patches sought out by the birds. Hiding the mealworms in sawdust in the cups produced a diminishing cumulative food-intake curve as in figure 2.6 . Travel time was increased experimentally to effect the between-patch enrichment in figure 2.6b . This was done by placing lids on the sawdust-filled cups containing mealworms. Without the lids, the average time to go from one cup to begin feeding in the next cup took about 5 sec, and with the top on the cup the travel time increased to 20 sec. Figure 2.7 shows that—as predicted—the birds had a policy of leaving patches earlier when the interpatch time was shorter (figure 2.7a ) than when it was higher (figure 2.7b ).

To effect an enrichment of the cumulative gain curves, as in figure 2.6c , Cowie ( 1977 ) manipulated the intercatch time of mealworms within the artificial food patches (the cups). As predicted by the model in figure 2.6c , improvements in feeding rates within patches produced shorter within-patch times. Although these empirical studies do find deviations from the conventional patch model, it has been generally successfully studied in a variety of species and environments (Stephens & Krebs, 1986 ).

## Information Foraging Illustration: Search Engines

Chapter 1 presents an illustration involving the search for the lowest two-star hotel price in Paris on a hotel Web site. Imagine an even more idealized case in which there is an information worker whose job is to take information-seeking tasks from a queue that arrives by some electronic means such as e-mail, perform searches on the Web for those tasks, and return as much relevant information as possible overall. Assume that for any given query the search engines return links to documents, and if one were to actually (p. 38 ) read each and every document there would be a diminishing returns curve because there is some amount of redundancy among documents and some finite pool of ideas that one is drawing upon. This characterization has been found for medical topics (see figure 1.6 ) and is likely to be generally true of many domains.

figure 2.7 Increased travel time from (a) t B = 4.76 sec to (b) t B = 21.03 sec increased the observed average patch-leaving time in Great Tits studied in Cowie ( 1977 ). The predicted patch-leaving times are indicated by the dashed lines, and are not significantly different from the observed rates.

Imagine that the search engines used by the information worker have very little variation in performance. Assume that the worker is very good at examining the search result links and estimating the expected amount of relevant (and previously unencountered) concepts or propositions in each document. Figure 2.8 presents a hypothetical gain curve for the number of relevant concepts or propositions per document as a function of the order in which the documents are returned by the typical search engine used by this hypothetical information worker. The cumulative gain curve, g1(t), was derived by fitting a function to the data in Bhavnani et al. ( 2003 ) and making the simplifying assumptions that (a) links can be scanned and processed at an average rate of one every 10 seconds (this includes scanning and possibly cutting and pasting the links into a report) and (b) other costs such as scrolling and paging through results can be ignored. The resulting gain function is for the cumulative amount of new information encountered in search results as a function of time, (2:10)

figure 2.8 (a) A reduction in between-patch travel time from 60 s to 30 s reduces the optimal within-patch time allocation from $t 1 *$ = 38 s to $t 2 *$ = 25 s, and (b) improving the within-patch gain function from g 1 to g 2 reduces the optimal within-patch time allocation from $t 1 *$ = 38 s to $t 3 *$ = 29 s.

## Effects of Changes in Travel Time

Figure 2.8 shows the predicted effects on optimal patch residence time of two hypothetical between-patch (p. 39 ) travel time costs, which would represent the time it takes to acquire a new task, navigate to search engine, formulate and enter a query, and wait for search engine results. In one case the travel time is assumed to be t B = 60 sec, and in the second case it is assumed to be t B = 30 sec. For the case in which travel time is t B = 60, the overall rate of gain R is (2.11)

In order to determine the optimal amount of time to spend within information patches, t W = $t *$, we need to determine when the slope of g1(t W) is equal to R. We can do this by finding the derivative (2.12)

and then solving the equality (2.13)

Solving for the case in which the travel time is t B (2.14)

A reduction in travel time of 30 seconds would cause an optimal forager to reduce time in each information patch by nearly 12 seconds. This is shown in figure 2.8 .

## Effects of Improved Processing of Links

Figure 2.8 shows the predicted effects of an improvement in the rate of processing links from 10 seconds per link to 5 seconds per link. The rate of gain in this case would be (2.15)

The derivative of the rate of gain remains the same as in equation 2.12 . Following the same steps as above, one finds that improving the time to process links would result in the forager reducing the time spent with each patch down to (2.16)

This reduction in optimal within-patch time allocation is illustrated in figure 2.8 .

## Summary

The purpose of this illustration is to show the calculations involved in the conventional patch model to make quantitative predictions, as well as to provide a more concrete understanding of the qualitative relationships that it captures. Many simplifying assumptions are made, but later chapters fill in some of the details. Chapters 35 present more detailed rational analyses and productions system models of Web use. It should be noted, however, that there is some evidence that changes in travel time on the Web have an effect on patch residence time (Baldi, Frasconi, & Smyth, 2003 ), as qualitatively predicted by the conventional patch model. In chapter 9 , I discuss Web usability advice that centers on this relationship.

# Diet Model

Imagine a hypothetical situation in which a bird of prey, such as a red-tailed hawk (Buteo jamaicensis), forages in a habitat that contains a variety of prey of various sizes, prevalences, and ease of capture, such as mice, ground squirrels, rabbits, and hares. Typically, such a hawk may soar for hours on end, or perch in a high tree, waiting to detect potential prey. The environment poses the following problem for the predator: What kinds of prey should the predator pursue, and what kinds should be ignored? One may think of this in terms of diet breadth: A broad (generalized) diet will include every type of prey encountered, but a narrow (specialized) diet will include only a few types. If a predator is too specialized, it will spend all of its time searching. If the predator is too generalized, then it will pursue too much unprofitable prey.

(p. 40 ) The conventional diet model (Stephens & Krebs, 1986 ) addresses these trade-offs. The diet model assumes (strongly) that

• prey are encountered at a constant rate as a function of search time;

• search and handling (which includes pursuit) are mutually exclusive processes;

• the forager has perfect knowledge about the prey and the environment with respect to prevalence, energetic value, and search and handling costs; and

• information about prey is assessed perfectly and used in a decision instantaneously when prey are encountered.

The details and derivation of the conventional diet model are presented in the appendix. The model assumes that prey can be classified by the forager into i = 1, 2, …, n types and that the forager knows information concerning the profitability and prevalence each kind of prey. The average time between finding prey of type i is t Bi. The rate of encountering prey of type i is assumed to be a random (Poisson) process. So prey will be encountered at a rate

Each kind of prey, i, is characterized by the average amount of energy, gi, that could be gained by pursuing, capturing, and consuming the prey. The average time cost, t Wi, of pursuit, capture, and eating is usually referred to as the handling cost associated with the prey type. The profitabilities of each type of prey, πi, are defined as the energetic value of the prey type divided by the handling time cost of pursuit, capture, and consumption of the prey, (2:17)

The diet of a forager can be characterized as the set of available prey types that the organism chooses to pursue when encountered. Let D be a set representing the diet of a forager; for example, D = {1, 2, 3} represents a diet consisting of prey types 1, 2, and 3. The average rate of gain, R, yielded by such a diet would be given by another variation on Holling’s Disk Equation (equation 2.5 ), (2.18)

## Optimal Diet Selection Algorithm

If we assume that the time costs needed to recognize prey are effectively zero, then an optimal diet can be constructed by choosing prey types in an all-or-none manner according to their profitabilities (this is known as the zero-one rule; see the appendix). In general (Stephens & Krebs, 1986 ), the following algorithm can be used to determine the rate-maximizing subset of the n prey types that should be selected:

• Rank the prey types by their profitability, πi = gi/t Wi. To simplify our presentation, we and let the index i be ordered such that π1 > π2 >…>πn.

• Add prey types to the diet in order of increasing rank (i.e., decreasing profitability) until the rate of gain for a diet of the top k prey types is greater than profitability of the k + 1st prey type, (2:19)

• The left side of the inequality in equation 2.19 concerns the rate of gain obtained by the diet of the k highest profitability prey types, computed according to equation 2.18 . The right side of the inequality concerns the profitability of the k + 1st prey type.

Conceptually, one may imagine an iterative process that considers successive diets of the prey types. Initially, the diet, D, contains just the most profitable type, D = {1}; the next diet considered contains the two most profitable types, D = {1, 2}; and so on. At each stage, the process tests the rate of gain R(k) for the current diet containing D = {1, 2, …, k} types against the profitability of the next type πk + 1. As long as the gain of the diet is less than the profitability of the next prey type, R(k) ≤ πk + 1, then the process should go on to consider the next diet D = {1, 2, …, k + 1}. Otherwise, the iterative process terminates, and one has obtained the optimal diet. Adding the next prey type would decrease the rate of gain for the diet.

To illustrate this graphically, consider figure 2.9 , which presents a set of hypothetical prey types having (p. 41 ) an exponential distribution of profitabilities indicated by πk. Assume that these prey types are all encountered at an equal rate of λk = 1. Figure 2.9 also presents R(k) calculated according to equation 2.18 , for diets including prey types up to and including each type k. One can see that R(k) increases at first as the diet is expanded up to an optimum diet containing the top four prey types and then decreases as additional items are included in the diet. The optimum, $R *$, occurs just prior to the point where R(k) crosses πk. Increasing the profitability of higher ranked items tends to change the threshold, yielding fewer types of items in the diet. A similar diet-narrowing effect is obtained by increasing the prevalence (λ) of higher ranked prey.

figure 2.9 A hypothetical example of the relationship between profitability (πk) and rate of gain [R(k)] for diets including prey types 1, 2, … k. In this illustration, it is assumed that when the prey types are ranked according to profitability, the profitabilities, πk, decrease exponentially. The rate of gain, R(k), increases to an optimum R* as the diet is expanded to include the four highest profitability prey types and decreases if lower ranked types are included.

## Principles of Diet Selection

The diet selection algorithm suggests the following:

Principle of Lost Opportunity. Intuitively, the information diet model states that a class of items is predicted to be ignored if the profitability, πi, for those items is less than the expected rate of gain, R, of continuing search for other types of items. This is because the gain obtained by processing items of that low-profitability prey type is less than the lost opportunity to get higher profitability types of items.

Independence of Inclusion from Encounter Rate. An implication of the diet selection algorithm (Stephens & Krebs, 1986 ) is that the decision to pursue a class of prey is independent of its prevalence. The decision to include lower ranked prey in a diet is solely dependent on their profitability and not on the rate at which they are encountered, λi. However, the inclusion of a class of prey is sensitive to changes in the prevalence of more profitable classes of prey. This can be seen by examination of equation 2.19 , where λi appears on the left side of the inequality but not the right side. Generally, increases in the prevalence of higher profitability prey (or equivalently increases in their encounter rates) make it optimal to be more selective.

Conventional models of optimal foraging theory—the patch model and the diet model—have generally proven to be productive and resilient in addressing food-foraging behaviors studied in the field and the lab (Stephens, 1990 ). However, these models do not take into account mechanisms that organisms actually use to achieve adaptive foraging strategies. The conventional models also make the strong assumption that the forager has perfect “global” information concerning the environment. Moreover, the models are static rather than dynamic (dependent on changing state or time).

### Food Foraging Illustrations

Returning to our hypothetical hawk, imagine that the hawk lives in an environment that hosts two kinds of rabbits (see table 2.1 ): (1) big rabbits that are scarce, rich in calories, and take a half hour to chase and consume and (2) small rabbits that are plentiful but low in calories, although quickly chased and consumed. 5 Should the hawk pursue just the big rabbits, or should the hawk include both kinds of rabbits in its diet?

From table 2.1 , we can calculate the rate of return for the narrow diet that includes only big rabbits: (2:20)

A broad diet that includes both kinds of rabbits turns out to have a lower rate of return: (2:21)

table 2.1 Hypothetical parameters for a hawk faced with a diet choice problem.

Parameters

Rabbit Type

λ

g

t w

π = g/t w

Big

1/3600 sec

10,000 kCal

1800 sec

5.56 kCal/sec

Small

100/3600 sec

100 kCal

120 sec

0.83 kCal/sec

The hawk should spend its time foraging just for big rabbits because they are so profitable (as indicated by g/t W). Pursuing the small rabbits would incur an opportunity cost that is greater than the gains provided by the small rabbits.

As an exercise, substitute various values for λSmall (the rate of encounter with small rabbits) in equation 2.21 , ranging from very low (e.g., λSmall = 1/3600 sec) to very high (e.g., λSmall = 1/sec). This will illustrate the principle that inclusion in the diet is independent of the encounter rate. Below I describe as an analogy the junk mail, received by virtually everyone, that has no value whatsoever (i.e., there is always something better to do than read junk mail). No matter how much the rate of delivery of junk increases, it would remain unprofitable to read a single piece of it.

Figure 2.10 presents data that are generally consistent with the predictions of the diet model. Figure 2.10 shows the diet of shore crabs when offered the choice of mussels of different sizes when each type was equally prevalent (Elner & Hughes, 1978 ). The crabs choose the most profitably sized mussels. Human hunter-gatherers have complex diets that appear to conform to the conventional diet model. For instance, the men of the Aché in Paraguay choose food types that are above the average rate of return for the environment (Kaplan & Hill, 1992 ).

## Information Foraging Examples

The diet model developed in optimal foraging theory is the basis for aspects of information foraging models developed in chapter 6 , where it will be used to predict how people select subcollections of documents based on the expected profitability of the subcollections in terms of the rate of extracting relevant documents per unit cost of interaction time. The general analogy is that one may think of an information forager as an information predator whose aim is to select information prey so as to maximize the rate of gain of information relevant to their task. These information prey might be relevant documents or document collections. Different sources will differ in their access costs or prevalences, and they will differ in profitability. The profitability of an information source may be defined as the value of information gained per unit cost of processing the source. For instance, physical and electronic mail may come from a variety of sources that have different arrival rates and profitabilities. Clearly, low-profitability junk mail should be ignored if it would cost the reader (p. 43 ) the opportunity of processing more profitable mail. We might also expect the diet of an information forager to broaden or narrow depending on the prevalences and profitabilities of information sources.

figure 2.10 Shore crabs tend to choose the mussel that has the highest profitability: (a) the profitability curve for mussels as a function of their size is mirrored by (b) the histogram of the size of mussel consumed by shore crabs when presented at equal prevalence in the environment (Elner & Hughes, 1978 ).

The general principles of opportunity cost and independence of inclusion from encounter rate can be illustrated by a hypothetical information foraging example that may resonate with many people who use e-mail. Suppose we observe a woman who runs a small business that she conducts using e-mail. Assume that each e-mail from a prospective customer is an order for the businesswoman’s product (let us assume that all other aspects of customer service are handled by others) and that she makes go = $10 profit on each order. The businesswoman also receives unsolicited e-mail (junk mail or spam) that occasionally offers some service or product savings of relevance to the woman. Suppose that, on average, 1/100 spam e-mails offers something that saves the woman$10 (gs = $10/100 =$0.10), and she receives one spam e-mail a minute (her encounter rate with spam is λs = 1/minute). Suppose that when she first started her business, the businesswoman received two orders during an 8-hour day (her encounter rate was λo = 1/240 orders per minute), but now it has improved to one order per hour (λo = 1/60 orders per minute). Assume that it takes one minute to read and process an e-mail (h 0 = h s = 1). The analysis in table 2.2 suggests that when the order rate is low (λo = 1/240), the woman should read both orders and spam, but when the rate of the more profitable order e-mails increases (to λo = 1/60), her information diet should narrow to processing just the orders. With the order rate high at λo = 1/60, one should ignore spam regardless of its prevalence (the value of λs). In general, as the prevalence of profitable information increases, one should expect a narrowing of the information diet. For the optimal forager who has decided to ignore spam, she should do so regardless of increases in its volume.

table 2.2 Hypothetical rates of return on the e-mail diets of a hypothetical information worker at two different rates of encounter of orders in the e-mail

Rate of Return (\$/min)

Orders Only

Orders þ Spam

Order Enciunter Rate

Low (λo = 1/240)

0.041

0.071

High (λo = 1/60)

0.164

0.132

When orders are low, the worker should process both orders and spam. When the orders are high, the worker should ignore spam.

# Discussion

Optimal foraging theory has been applied with considerable success in the field of behavioral ecology (Stephens, 1990 ; Stephens & Krebs, 1986 ) and cultural anthropology (Winterhalder & Smith, 1992a ). Historically, the first proposal of an optimal foraging model appeared in MacArthur and Pianka’s ( 1966 ) model of how the diet of species might change in reaction to invasion by competitor species, which made explicit predictions about how diets would change depending on prey availability. By the 1980s, optimal foraging theory had been used to bring orderly predictions to the study of behavior in hunter-gatherer societies (Smith, 1981 , 1987 ). As noted in chapter 1 , optimal foraging theory has arisen from the use of methodological adaptationism. This paradigm, including optimal foraging theory, came under a flurry of attacks precipitated by a paper by Gould and Lewontin ( 1979 ), which caused the field to become more rigorous in its methodology and more careful about its philosophy (Mayr, 1983 , 1988). 6 While the behavior of real animals and real people often departs from that of the optimal forager, the theory has been very productive in generating useful predictions. Departures from optimality often reveal hidden constraints or other important aspects of the decision problem and environment facing the forager. Once these are revealed, they can feed back into the rational analysis of the forager.

The development of the information foraging models that are presented in the chapters that follow often emerged from considering an elementary optimal foraging theory model and adding detail where necessary. There are certainly differences between food and information, the most notable being that information can be copied, and the same content viewed twice often is not informative the second time around. But it is the nature of metaphors and analogies that they are productive, but not completely equivalent.

(p. 44 ) APPENDIX: PATCH RESIDENCE TIME AND DIET MODELS

# Patch Residence Time Model

For the patch model, Holling’s Disk Equation (equation 2.5 ) is instantiated (Stephens & Krebs, 1986 ) as equation 2.6 . Assume that patches of type i are encountered with a rate λi as a linear function of the total between-patch foraging time, TB. Now imagine that the forager can decide to set a policy for how much time, tWi, to spend within each type of patch. The total gain could be represented as (2:A:1)

table 2.A.1 Notation used in conventional information foraging models.

Notation

Definition

R

Rate of gain of information value per unit time cost

G

Total information value gained

T B

Total time spent in between-patch foraging

T W

Total time spent in within-patch foraging

G

Average information value gained per item

g i

Average information value gained per item of type i

G (tW)

Cumulative value gained in information patches as a function of time t W

g i(t Wi)

Cumulative value gained in information patches of type i as a function of time t Wi

t B

Average time cost for between-patch foraging

t W

Average time cost for within-patch foraging

λ

Average rate of encountering information patches

t Bi

Time spent between patches of type i

t Wi

Time spent foraging within patches of type i

λi

Average rate of encountering information patches of type i

πt

Profitability of item type i

ρi

Probability of pursuing items of type i (diet decision model)

Likewise, the total amount of time spent within patches could be represented as (2:A:2)

The overall average rate of gain will be (2:A:3)

Equation 2.A.3 is presented as equation 2.6 in the text as the conventional patch model.

The task is to determine the optimal vector of collection residence times (t W1, t W2, …, t WP) for a set of patches, ρ = {1, 2, …, i, … P}, that maximizes the rate of gain R. To differentiate R in equation 2.A.3 with respect to an arbitrary t Wi, we first get (2.A.4)

where k i is the sum of all terms in the numerator of equation 2.A.3 not involving t Wi,

and c i is the sum of all terms in the denominator of equation 2.A.3 not involving t Wi,

So, for a given t Wi, we get (2.A.5)

R is maximized when ∂R/∂t Wi = 0 (Charnov, 1976 ), and so (2.A.6)

(p. 45 ) which becomes (2.A.7)

so the right-hand side of equation 2.A.4 (average rate of gain) is the same as the right-hand side of equation 2.A.7 (instantaneous rate of gain when the average rate of gain is maximized), (2.A.8)

If we replace R with a function R(t W1, t W2, …, t Wi, …, t P), the full vector of rate maximizing t Wi values, ($t ^ w 1$, $t ^ w 2$…, $t ^ w p$), must fulfill the condition specified by (2.9)

This is the formal condition (Charnov, 1976 ) of Charnov’s Marginal Value Theorem: Long-term rate of gain is maximized by choosing patch residence times so that the marginal value (instantaneous rate) of the gain at the time of leaving each patch equals the long-term average rate across all patches.

# Diet Model

Following Stephens and Krebs ( 1986 ), we assume that the items encountered can be classified into n types. The average rate of gain R can be represented as (2.A.10)

where, for each item type i, λi is the encounter rate while searching, t Wi is the expected processing time for each item type, gi is the expected net currency gain, and pi is the probability that items of type i should be pursued (the decision variable to be set by the optimization analysis). In the case of food foraging, equation 2.A.10 might be applied under the assumption that the modeled organism partitions the space of the observed feature combinations exhibited by its potential prey into discrete categories, i = 1, 2, … n. One may also think of equation 2.A.10 as being applicable when an organism can predict (recognize) the net gain, processing time, and encounter rate for an encountered prey. To maximize with respect to any given pi, we differentiate (2.A.11)

where k i is the sum of all terms not involving pi in the numerator of equation 2.A.10 and ci is the sum of all terms in the denominator not involving pi, and we assume that the gain, processing time, and encounter rate variables are not dependent on pi. Differentiating equation 2.A.11 obtains (2.A.12)

## Zero-One Rule

Inspection of equation 2.A.12 shows that R is maximized by either pi = 1 or pi = 0 (Stephens & Krebs, 1986 ). Note that this occurs under the constraint that the time it takes to recognize an item is assumed to be zero. This is known as the Zero-One Rule, which simply states that the optimal diet will be one in which items of a given profitability level are chosen in an all-or-none fashion, where profitability, πi, is defined as (2:A:13)

The decision to set pi = 1 or pi = 0 is reduced to the following rules, which determine the numerator of equation 2.A.12 :

Set pi = 0 if gi/t Wi < k i/c i (the profitability for i is less than that for everything else).

Set pi = 1 if gi/t Wi < k i/c i (the profitability for i is greater than that for everything else).

For the n item types, there are n such inequalities. This provides the basis for the diet optimization algorithm presented in the main text.

# Notes

References

Bibliography references:

Baldi, P., Frasconi, P., & Smyth, P. (2003). Modeling the Internet and the Web. Chichester, UK: Wiley and Sons.

Bell, W. J. (1991). Searching behavior: The behavioral ecology of finding resources1. London: Chapman & Hall.

Bhavnani, S. K., Jacob, R. T., Nardine, J., & Peck, F. A. (2003). Exploring the distribution of online healthcare information. Paper presented at the CHI 2003 Conference on Human Factors in Computing Systems, Fort Lauderdale, FL.

Cashdan, E. (1992). Spatial organization and habitat use. In E. A. Smith & B. Winterhalder (Eds.), Evolutionary ecology and human behavior (pp. 237–266). New York: de Gruyer.

Charnov, E. L. (1976). Optimal foraging: The marginal value theorem. Theoretical Population Biology, 9, 129–136.

Cowie, R. J. (1977). Optimal foraging in great tits (Parus major). Nature, 268, 137–139.

Elner, R. W., & Hughes, R. N. (1978). Energy maximization in the diet of the shore crab, Carcinus maenas (L.). Journal of Animal Ecology, 47, 103–116.

Gould, S. J., & Lewontin, R. C. (1979). The spandrels of san marcos and the panglossian paradigm: A critique of the adaptationist programme. Proceedings of the Royal Society of London (B), 205, 581–598.

Hames, R. (1992). Time allocation. In E. A. Smith & B. Winterhalder (Eds.), Evolutionary ecology and human behavior (pp. 203–235). New York: de Gruyter.

Holling, C. S. (1959). Some characteristics of simple types of predation and parasitism. Canadian Entomology, 91, 385–398.

Kaplan, H., & Hill, K. (1992). The evolutionary ecology of food acquisition. In E. A. Smith & B. Winterhalder (Eds.), Evolutionary ecology and human behavior (pp. 167–201). New York: de Gruyter.

MacArthur, R. H., & Pianka, E. R. (1966). On the optimal use of a patchy environment. American Naturalist, 100, 603–609.

Mangel, M., & Clark, C. W. (1988). Dynamic modeling in behavioral ecology. Princeton, NJ: Princeton University Press.

Mayr, E. (1983). How to carry out the adaptationist program? American Naturalist, 121, 324–334.

Mayr, E. (1988). Toward a new philosophy of biology. Cambridge, MA: Harvard University Press.

McNair, J. N. (1983). A class of patch-use strategies. American Zoologist, 23, 303–313.

Pirolli, P., & Card, S. K. (1998). Information foraging models of browsers for very large document spaces. In T. Catarci, M. F. Costabile, G. Santucci, & L. Tarantino (Eds.), Advanced Visual Interfaces Workshop, AVI ‘98 (pp. 83–93). Aquila, Italy: Association for Computing Machinery.

Sandstrom, P. E. (1994). An optimal foraging approach to information seeking and use. Library Quarterly, 64, 414–449.

Segerstrale, U. (2000). Defenders of the truth: The battle for science in the sociobiology debate and beyond. Oxford: Oxford University Press.

Smith, E. A. (1981). The application of optimal foraging theory to the analysis of hunter-gatherer group size. In B. Winterhalder & E. A. Smith (Eds.), Hunter-gatherer foraging strategies (pp. 36–65). Chicago: University of Chicago.

Smith, E. A. (1987). Optimization theory in anthropology: Applications and critiques. In J. Dupré (Ed.), (p. 47 ) The latest on the best (pp. 201–249). Cambridge, MA: MIT Press.

Smith, E. A., & Winterhalder, B. (Eds.). (1992). Evolutionary ecology and human behavior. New York: de Gruyter.

Stephens, D. W. (1990). Foraging theory: Up, down, and sideways. Studies in Avian Biology, 13,444–454.

Stephens, D. W., & Charnov, E. L. (1982). Optimal foraging: Some simple stochastic models. Behavioral Ecology and Sociobiology, 10, 251–263.

Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press.

Winterhalder, B., & Smith, E. A. (1992a). Evolutionary ecology and the social sciences. In E. A. Smith & B. Winterhalder (Eds.), Evolutionary ecology and human behavior (pp. 3–23). New York: de Gruyter.

Winterhalder, B., & Smith, E. A. (Eds.). (1992b). Evolutionary ecology and human behavior. New York: de Gruyter.

## Notes:

(1) . More strongly stated, the implicit assumption in optimal foraging models is usually that fitness is an increasing linear function of energy, whereas it is more likely that there is a saturating relationship (i.e., at some point, further increases in energy intake have little or no effect on fitness).

(2) . Note that this assumption does not always apply to real situations. For instance, web-weaving spiders can capture new patches of food (e.g., insects) in their webs while engaged in the activities of consuming a patch of food, and this requires some elaborations to the conventional patch model (McNair, 1983 ). Pirolli and Card ( 1998 ) apply such a model to an information browser that has multithreaded processing.

(3) . In his seminal work, Holling ( 1959 ) developed a model by studying a blindfolded research assistant who was given the task of picking up (foraging for) randomly scattered sandpaper disks—hence the name “disk equation.” Holling validated the model later by observing three species of small mammals preying upon sawfly cocoons in controlled experiments.

(4) . Figure 2.6 also uses the convention in optimal foraging theory in which the average between-patch time is plotted on the horizontal axis starting at the origin and moving to the left, and within-patch time is plotted on the horizontal axis moving to the right. This differs from preceding figures in this book.

(5) . Wild rabbit has about 500 kCal per pound of meat.

(6) . This debate is part of a broader one incited by the emergence of sociobiology in the 1970s, which continues to reverberate in the behavioral and social sciences. For a fascinating rendition of this “opera,” see Segerstrale ( 2000 ).