Jump to ContentJump to Main Navigation
Darwinian Populations and Natural Selection$

Peter Godfrey-Smith

Print publication date: 2009

Print ISBN-13: 9780199552047

Published to Oxford Scholarship Online: May 2015

DOI: 10.1093/acprof:osobl/9780199552047.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 26 February 2017

(p.165) Appendix

(p.165) Appendix


Darwinian Populations and Natural Selection
Oxford University Press

The heading of each section indicates (in brackets) the section in earlier chapters where the issues discussed arise. The exception is the final section, which is free-standing.

A.1. Equations for Change (2.1)

The idea of a Darwinian population is treated in this book as describing a “set-up,” a way in which things can be configured. But the importance of these configurations comes from the fact that they behave in distinctive ways. The knowledge we have of these behaviors largely takes the form of a patchwork of models. This section surveys some ways of representing change by natural selection in equations, emphasizing the pictures of evolution underlying the formalisms.

I will compare three kinds of representation. The first is a family of models which describe evolution as change in the frequencies of types. These can often be applied over multiple time-steps; their output can be treated as input for another round of change without the need to add further information (a “dynamically sufficient” model or a “recursion”). These include many genetic models, the “replicator dynamics,” and some models in evolutionary game theory.

The simplest model of this kind is a model of an asexually reproducing population with discrete generations, assuming no mutation, migration, or drift. Assume there are types A and B, with frequencies p and (1 − p) respectively. The symbol “W” will be used for fitness-related properties of various kinds—it will be defined slightly differently several times. In the first equation WA and WB represent the average number of offspring produced by individuals of the A type and B type respectively. Then p′, the new frequency of the A type after one generation of change, can be calculated as:


The denominator of (A1), mean fitness, can be symbolized W,¯ and then p=pWA/W¯ If the fitnesses are either constant or functions just of p, the output can be used as input for a new round of change, so the analysis can be extended over many time-steps. This simplest case can be extended in various directions. One variant is to give a model using continuous time, in which births and deaths (p.166) occur constantly. The fitness parameters now represent the per capita rates at which individuals of a given type contribute to the population by reproducing and dying. Then change in frequency of A can be represented as dpdt=p(WAW¯). This is the “replicator dynamics” (Taylor and Jonker 1978, Nowak 2006). That term is also sometimes used for the model in (A1), in which case discrete and continuous replicator dynamics can be distinguished.

A second way the model in (A1) can be extended is to introduce diploid individuals and sex. Now we track both the frequencies of alleles (A and a) at one locus, and the combinations formed by sexual reproduction—genotypes AA, Aa, and aa, with fitnesses WAA, WAa, and Waa. The frequencies of the A and a alleles are p and q respectively. The fitnesses can be interpreted as a combined measure of the chance an individual of that type has of surviving, and of the number of gametes it then produces that go into the next generation (Roughgarden 1979: 28). Assuming random mating (union of gametes), discrete generations, a large population, and no mutation or migration, the formula for change becomes:


The denominator is again a mean fitness (W¯) The model can also be extended to two genetic loci and beyond.

A quite different representation of change is the “Breeder’s equation”: r = h2s. Here r is the “response” to selection, defined as the difference in the mean of some quantitative character after selection, and the mean before selection; h2 is heritability and s is the strength of selection. In the simplest case (used in a derivation by Roughgarden 1979: ch. 9), this “strength” is the difference between the mean of the individuals in the parental generation who breed, and the overall mean in that generation.

The breeder’s equation is based on an underlying genetic model, assuming many genes with small effects. It is designed to be used where the genetic basis for a trait is complex and unknown. But it can also be understood even more abstractly, as heritability itself can be understood in a way that does not assume the presence of genes (Section A.2).

The breeder’s equation itself also does not require that the population can be categorized in terms of types, only that individuals have values of a quantitative character. The equation is valid only over a single time-step; even assuming that s is constant over generations, the heritability will usually change as the population evolves, and this change is not tracked in the equation itself. Even within this constraint, the equation in most cases applies approximately rather than exactly (Heywood 2005).

The equation embodies a very intuitive picture of evolution, however, one that affects many verbal discussions: fitness differences are not sufficient to generate (p.167) change if the characteristics of the fit are not “transmitted” to some extent. Heritability is a sort of “channel” between the generations, which may be clear, noisy, or altogether lost (when h2 = 0).

A third approach uses the “Price equation” (Price 1970, 1972, 1995, Frank 1995). This has in common with the breeder’s equation the fact that it applies only over a single time-step or (more exactly) interval, and it does not require the presence of persisting types in the population. It applies exactly, however, unlike the breeder’s equation. My discussion of Price here and in the next section draws on Okasha (2006).

Assume an ancestral and a descendent population whose individuals can each be described with respect to a quantitative characteristic, and assume a relation (interpreted as reproduction) linking individuals across the two times. Change is again represented as a consequence of a combination of fitness differences and heredity (in a general sense). One version of the equation is:


Here X is a quantitative character and X¯ is its mean at the start of the interval. ΔX¯ is defined as X¯oX¯, where X¯o is the mean at the end of the time interval. W is another slightly different measure of fitness: the number of descendants an individual in the parental generation has, divided by the average of those numbers. Each individual in the parental generation is characterized by its Xi and Wi, its phenotype and its fitness; also by Xi the average X value of its offspring; and by Δ‎Xi its value of XiXi (The subscripts are omitted from (A3).) Then Cov(W, X) is the covariance in the population between X and fitness. E(WΔ‎X) is the average of the products of the W and Δ‎X values.

In the breeder’s equation, heritability (h2) was used to measure the extent to which fitness differences in one generation have consequences for the next. In the Price equation (A3), heritability does not appear. Rather than heredity being treated as akin to a “channel,” the Price equation divides things up differently. The first term describes how change would occur if there was perfect transmission of character across the time interval, and the second term adds a correction for any “transmission bias.”

The Price equation can be used to describe change in frequency of a type (by suitable choice of X), but it can also be applied to a population of individuals treated as unique. Some see this focus on individuals as an important part of the mindset underlying the equation (Grafen 1985), and I think this is true in conceptual as well as technical respects. The Price equation is suited to the view I defended earlier as “evolutionary nominalism”: grouping individuals into “types” should be optional in evolutionary description. More precisely, evolutionary theory should allow that its key theoretical ideas should be applicable (p.168) regardless of the “grain” with which a system is described. One description of a population might group them into a small number of types; another might use a finer “mesh” in its classification scheme, and hence recognize a larger set of classes with fewer individuals inside them. A third might be so fine-grained that no two individuals fall into the same category at all. Initially this might seem to make attempts at description collapse, but that is not so. Unique individuals may be more and less similar to each other, more or less close with respect to some metric.

The underlying model here is also, in a sense, temporal rather than generational. The equation can be applied even if none of the parents reproduce, but some survive over a time interval and some do not (Rice 2004). This will be discussed in more detail in the final section below, where a generalization of the equation is presented.

The Price equation, unlike the genetic models discussed earlier in the section, is not idealized, in the sense of containing deliberate simplifications. Its use might involve idealizations on a particular occasion, but an analysis with the equation does not work by imagining things like random mating and constant fitness values. Instead it takes a particular case of change, either assumed or predicted, and represents the change by breaking it into parts. This is related to the fact that it does not work as a recursion, something whose output can always be plugged back in as new input to the same equation.

Here I have discussed simple equations representing short-term change, emphasizing the different idealizations they make and their underlying pictures. Models bring with them ways of categorizing things and default assumptions, which are often best made clear via contrasts (Winther 2006). I have presented the three formalisms as separate, but they can be connected in various ways—Price equations can be re-expressed so that heritability appears, for example (see Section A.2 below), and some authors discuss ways in which the Price equation can, with additional assumptions, function as a recursion (Frank 1998). There are also attempts to give single equations with a more ambitious role—representations of the overall tendencies in evolution, especially with respect to adaptation and the maximization of fitness (Fisher 1930, Grafen 2007).

A.2. Heritability and Heredity (2.2)

All treatments of evolution by natural selection include a requirement for the inheritance of traits. Replicator views require the reliable transmission of structure. I argued that this is not needed. What is relevant instead is a population-wide measure of parent–offspring similarity. Some summaries use a comparative criterion: parent and offspring must be more similar than other pairs of individuals (Lewontin 1985: 76; Gould 2002: 609). I will return to these below. (p.169) What often appears in formal models is a covariance. (The covariance between variables X and Y, for n paired measurements, is


where X¯ and Y¯ are the two means.) Covariance between parent and offspring is used in heritability measures, in particular. Covariances require a trait with a mean value in each generation. This might be an obviously quantitative trait like height, but might also be the probability of producing a given behavior, or a characteristic scored as one for the presence of a trait and zero for its absence.

I will look more closely at heritability. There is a family of heritability concepts (Jacquard 1983, Downes 2007). Some assume a causal model of inheritance that includes genes or something similar to them. But it is also possible to approach the idea in a more minimal way, aiming only to represent predictability relations between parents and offspring. Heritability can then be measured as the slope of the linear regression of offspring character on parental character (Roughgarden 1979: ch. 9). That slope is the covariance between parent and offspring values for the character divided by the variance of the parent values. When there are two parents, their average (the “midparent” value) may be used.

How good is heritability when used to express the inheritance requirement in summaries of what is needed for evolution by natural selection? A variety of problem cases do arise. They arise because heritability is so abstract a concept; it throws so much information away. As a result, we may have fitness differences and heritability, but where the details of the inheritance system and the fitness differences conspire in a way that results in no net change.

A simple example was given in Chapter 2. Another has been introduced by Brandon (unpublished, discussed in Godfrey-Smith 2007a). This is a simple case in which there are fitness differences, high heritability, but no change across generations because of a “bias” in the inheritance system that exactly counters the fitness differences. As Brandon says, although heritability is identified with the slope of a regression line, a regression analysis gives us two parameters, the slope and the intercept with the vertical axis. The “bias” shows up in the intercept. So if heritability is understood as a regression slope, then at least one extra parameter needs to be taken into account when using heritability and fitness to predict change.

A third case can be developed by making use of the fact that heritabilities are usually calculated in a way that takes the entire parental generation into account, regardless of fitness differences. Imagine an asexual population that contains variation in height. There is a positive covariance between height and fitness. There is also a positive covariance between parental height and offspring height. But there is no change across generations. This is because although taller individuals have more offspring on average, and taller individuals have taller (p.170) offspring on average, the taller individuals with the high fitness are not the same tall individuals as those that have taller offspring. The high-fitness tall individuals are not the tall-offspring tall individuals. Here is a simple numerical example. There are just six individuals in an asexual population. Three are short, with a height of one meter, and have one offspring each whose height is also one meter. Two are tall (two meters) and have one offspring each, where the offspring are two meters. One individual is two meters tall and has seven offspring—very high fitness—but his offspring are variable in height. Four of them are two meters and three are one meter tall. Then we get the same population statistics back with a larger overall population size. One response to this “heritability fails in the fit” case is to understand heritability in a fitness-weighted way, an approach defended in general terms by Heywood (2005). Others will be discussed below.

The example used in Section 2.3 was one of stabilizing selection. That case, and the two above, have in common the fact that the inheritance system would produce change alone if there were no fitness differences. So we might respond to those cases by saying that the aim of a summary of evolution by natural selection is not to say when selection will produce change from what we had before, but to say when selection will make a difference from what would have happened without it. That makes sense also when we think about the possible intrusion of factors such as migration.

Another case of stabilizing selection, a sexual case, would not be handled by that reply. This is a case of heterozygote superiority with respect to fitness but not with respect to phenotype. Assume the phenotype in question is height. An intermediate height is favored by selection and produced by a heterozygote (Aa) at one locus, resulting in a stable equilibrium of gene frequencies. There is a tendency for short individuals to produce short individuals and tall to produce tall, even when the population is in the equilibrium state. There are fitness differences between individuals in this equilibrium state. Yet there is no change. Suppose the population at the start of a generation has genotype frequencies of 0.25 AA: 0.5 Aa: 0.25 aa. Exactly half of the homozygotes of both kinds do not survive to breed, and all the heterozygotes survive to breed. So the pool of gametes contains a 50/50 mix of A and a alleles. If mating is random then the new generation will again have genotype frequencies of 0.25 AA: 0.5 Aa: 0.25 aa. This is not a case in which the fitness differences act to counter some change that the inheritance system was tending to produce “on its own.” If there had been no selection at all (and random mating) the new generation would have had the same genotype frequencies. Yet this is a case where parent phenotype predicts offspring phenotype to some extent, and parent phenotype predicts fitness as well.

This is a case where phenotype is heritable but fitness is not heritable; Lewontin noted these cases when formulating his 1970 summary, and this is not a (p.171) counterexample to that formulation. But requiring fitness to be heritable, rather than phenotype, brings other problems. This move has the consequence that if there are no fitness differences in some particular generation, then the transition from the previous generation to that new one cannot count as due to natural selection, even if there was extensive change in producing the new generation, and (for example) the pale ones in the old generation all died because they were eaten by birds and color was highly heritable.

Here is another interesting case, due to David Haig (personal communication) and developed in the context of modeling birth weight. Suppose that individual i’s phenotype is determined by the function Xi = Ti + Ei, where the variable T represents genotype and can be in three states (1, 2, or 3), and E represents environment, also with three states (1, 2, 3) that are equally probable. There is perfect asexual inheritance of T. Fitness, however, is determined by the function Wi = Ei, and hence is independent of T. Then X will be heritable and correlated with fitness, but all the fitness differences among individuals are due to the environment, and are uncorrelated with genotype. As a result, there will be no evolutionary change despite heritability and fitness differences.

The problem comes from the fact that environment here is a common cause of fitness and of phenotype. There are several ways of responding to this case. First, one might simply claim that it is a requirement for evolution by natural selection that fitness causally depend on phenotype, as opposed to merely being associated with it. This is only a partial answer. Second, in Haig’s case there is a role for something like the “biased” inheritance discussed above in the Brandon example. If we consider only the fit individuals within each genotypic class, we find their offspring are biased downwards with respect to phenotype (and fitness). Third, it is a case handled by Lewontin’s 1970 formulation, as fitness is not heritable even though phenotype is.

All of those problem cases can be understood in a different way by using the Price equation. Okasha (2006: sec. 1.5) argues that by recasting the Price equation so that a heritability term appears explicitly, we can see that traditional three-part recipes using heritability are generally accurate for predicting change except in cases where either, or both, of two additional effects are present. One effect is a role for the Y-intercept in the regression line used for calculating heritability, which was mentioned above. The other is a covariance between an individual’[s fitness and the “error” or deviation found when that individual’s offspring’s phenotype is predicted using that regression line. Okasha noted this second effect as an abstract possibility without giving an example; Haig’s case is a moderately realistic case where this feature is present. The “heritability fails in the fit” case above also has this feature; high-fitness individuals have their phenotype badly predicted by the regression line defining the heritability.

(p.172) The cases of stabilizing selection receive a different analysis from this point of view. The fitness term in Price requires a covariance between character and fitness, which is absent in stabilizing selection. So initially, it seems the equation does not recognize the fitness differences at all. But the stabilizing selection cases could be re-analyzed by treating deviation from the mean height as the character X being analyzed, rather than height itself. Then we have a negative covariance between X and fitness, and a transmission bias (second term) that counteracts it.

Some summaries express the inheritance requirement in a comparative way, as I noted above. Lewontin (1985) required that “individuals resemble their relations more than they resemble unrelated individuals and, in particular, offspring resemble their parents.” There are several ways of interpreting these criteria, but here is one: the average difference between parent–offspring pairs is smaller than the average difference between individuals of different generations. Without taking fitness differences into account, each parental individual is associated with its phenotype X, and also an X′ value, the average phenotypic value for its offspring if it has any (see the discussion of Price above). Deviations are squared. When both tests can be applied, this comparative criterion and a covariance criterion for heritability coincide: the difference between the average squared deviation across individuals of different generations and the average squared deviation across parent–offspring pairs is proportional to the parent–offspring covariance. The comparative criterion can be applied in some cases where the covariance criterion cannot, however. Suppose there are many qualitatively different types in both generations, with reliable transmission of type but some probability of mutation. The probability of a “match” in type is high across parent and offspring, lower for other pairs of individuals. There are no mean values defined within each generation, hence no departures from the mean and no covariance, but the comparative criterion can be applied (scoring each pair with 0 for a match and 1 for a failure to match). To use the covariance test we need to redescribe the population, so that all the individual X-values within each generation are scored on a numerical scale.

In the light of all these cases, how should we think of the heredity requirement in descriptions of evolution by natural selection? A Darwinian process requires that parents produce offspring who are similar to them. Whether a case of parent–offspring similarity is evolutionarily relevant depends on the statistical profile of the whole population. So “similarity” is vague, but the population-level models that describe the situation are not. Slight similarities are often enough for fitness differences to produce an evolutionary response. Covariance is a general-purpose measure of association, often used in equations that predict change in a given trait. But a family of statistical measures are relevant in different cases.

(p.173) A.3. Endler’s Summary (2.2)

In Chapter 2 I discussed some simple verbal summaries of evolution by natural selection. Problem cases were used to show their limitations. Here I look at a more careful and detailed summary, due to John Endler (1986: 4).

Natural selection can be defined as a process in which:

If a population has:

  1. a. variation among individuals in some attribute or trait: variation;

  2. b. a consistent relationship between that trait and mating ability, fertilizing ability, fertility, fecundity, and, or, survivorship: fitness differences;

  3. c. a consistent relationship, for that trait, between parents and their offspring, which is at least partially independent of common environmental effects: inheritance.


  1. 1. the trait frequency distribution will differ among age classes or life-history stages, beyond that expected from ontogeny;

  2. 2. if the population is not at equilibrium, then the trait distribution of all offspring in the population will be predictably different from that of all parents, beyond that expected from conditions a and c alone.

    Conditions a, b, and c are necessary and sufficient for the process of natural selection to occur, and these lead to deductions 1 and 2. As a result of this process, but not necessarily, the trait distribution may change in a predictable way over many generations.

Endler’s formulation takes into account many of the factors which caused problems for the simpler ones. It does not identify fitness with the number of offspring produced by an individual (or the average number produced by a type). A range of properties are associated with fitness in clause b, and he is clearly aiming to cover all the features that can affect change in age-structured populations.

However—and largely as a consequence—the formulation has problems. This is because it is expressed as a recipe for change. Some qualifications reduce its predictive content, but that is not what I have in mind. The point is that the ways in which fitness and heredity are handled do not make the formulation applicable as a description of conditions sufficient for change. In clause b Endler lists a number of properties related to fitness, but does not collapse these into a single measure. There is no “bottom line” to which survivorship, mating ability, and so on, are said to contribute. If there is no “bottom line,” Endler is leaving it open that the mating ability differences might balance out the survival differences, for example, to yield no evolutionary change.

(p.174) If we leave aside its purported role as a recipe, Endler’s formulation is a valuable one. Clause 2, for example, refers back to the possibility (discussed above) of the inheritance system producing change on its own, and “factors that out” from the change attributed to natural selection. Clause 1 similarly factors out the possible influence of ontogeny. I said in Chapter 2 that there are two ways to approach the abstract description of natural selection. One way is to make idealizations. Then it is possible to keep the summary simple, while also specifying conditions sufficient for change. The other approach is avoid idealization, and try to capture every case, but this “capturing” of the cases no longer involves giving conditions sufficient for change. Endler’s formulation, despite being set up like a recipe, does the second.

A.4. Altruism and Correlated Interaction (6.2)

In Chapter 6 two models were compared, represented in Figures 6.1 and 6.2. In the first, a population forms temporary groups at one stage in its lifecycle. Generations are discrete and reproduction is asexual. In the second, the population does not form groups but settles on a lattice.

Models of the evolution of altruism in which group structure is temporary have been extensively discussed (Matessi and Jayakar 1976, Uyenoyama and Feldman 1980, Wilson 1980). The intuitive idea behind the A type being an “altruist” is that all individuals benefit from being in a group containing more, rather than fewer, altruists, but in any given group context, the B type is fitter than A. It is as if the A type “donates” some fitness to everyone in its group. Here is a rule assigning fitnesses in such a case. Let WiA be the (absolute) fitness of an individual of the A type in a group with i members of the A type (including itself), and let WiB be the fitness of a B individual in a group with i As.


Here z is a baseline fitness, c is a cost paid only by A, and b is a benefit received by all individuals from each of the other A-type members of their group. (It is assumed that c and b are both positive.) The outcome of this situation depends not just on the fitnesses but on how groups are formed. If they are formed randomly, the A type is lost, regardless of the details. (Here, and below in this section, a large population is assumed.) But A can prevail (can invade B and remain stable) if groups are formed in a way that “clumps” the two types, so like tends to interact with like. Then the benefits of having As around tend to fall mainly on other As. One index of this clumping is Q:


(p.175) Here σ2‎ is the variance in the local frequency of A across groups, and σR2 is the variance that would result from random group formation. Then it can be shown that the A type has higher fitness if and only if condition (A6) holds (Wilson 1980, Kerr and Godfrey-Smith 2002b):


So high degrees of “clumping” help the altruist type. This has an obvious relation to Hamilton’s rule, discussed below. I now turn to the more unorthodox model, in which individuals settle into a lattice structure or a similar network without group boundaries, interacting with their neighbors. (I will use the term “network” for all structures of this kind in which discrete groups are absent.) Now WiA refers to the fitness of an A type with i neighbors of the A type, and likewise for WiB Each individual has n neighbors in total. The formula for B’s fitness is the same as in (A4); the formula for A’s fitness is slightly different given that i now refers only to neighbors: it is WiA=zc+ib. Two other parts of such a model are the neighborhood distributions and the network formation rule. The neighborhood distributions, NiA(t) and NiB(t) represent the frequencies with which each type encounters neighborhoods with i members of the A type at a given time, t. If we know each of these distributions at a time, these together with the fitnesses and the frequencies of the types suffice to predict change. Here p is the frequency of the A type at t, and p′ is its frequency in the next generation.


Suppose first that neighbors are distributed on the network randomly. Then it can be shown that with the fitnesses above, the A type will be lost (Godfrey-Smith 2008).

So we turn to non-random network formation rules. Complexity arises from the fact that evolutionary change is a consequence of the fitness structure and neighborhood distributions, but what the causal assumptions in the model give us is the network formation rule, and the relation between the two can be complicated. Things are simplified if we can use what can be called a “two-coins model.” As in the random case, we imagine predicting each of an individual’s neighbors with coin tosses, but now the coin is different according to whether the focal individual is of type A or B. An A individual’s neighbors are each predicted with a coin whose probability of choosing A is pA; for a B individual the coin’s probability of choosing A is pB This model cannot be applied exactly to the densely packed lattice in Figure 6.2, because each assignment of an individual to the lattice should be constrained by several others, not just one, but it can be (p.176) used approximately (for example, by filling each row independently and hence having correlation with respect to horizontal neighbors but not vertical ones).

The two-coins principle is used, in effect, by Hamilton (1975) and Nunney (1985), who borrow the parameter F (0 ≤ F ≤ 1) from treatments of inbreeding, for use as a measure of non-random association. F is used along with p to generate the “experienced” frequency of A neighbors for each of the two types.


When applicable, this leads to a simple result when we assume the fitness rules in (A4). The A type has a higher fitness if and only if:


Though they were arrived at by different roads, the Q parameter used for groups and the F parameter used with networks are doing a very similar job. It is also possible to treat the group-structured model as a special case of the neighbor-structured one; discrete groups are one kind of structure to which the model using neighbor interactions can be applied.

A.5. Hamilton’s Rule (6.2)

“Hamilton’s rule” in its original form states that an altruistic behavior will be favored if and only if r > c/b (Hamilton 1964). Here c is the cost to the actor, b is the benefit received by someone as a consequence of the action, and r is the coefficient of relatedness between the actor and recipient. The value of r for human full siblings is ½, for example, as is r between parent and offspring. The rule was initially taken to make good sense of altruistic behavior directed on biological relatives, but to help little with other kinds of altruism and cooperation. Hamilton himself, however, came to see that the principle could be applied more broadly. “[K]inship should be considered just one way of getting positive regression of genotype in the recipient [of altruistic behaviors], and … it is this positive regression that is vitally necessary for altruism” (1975: 337). In Chapter 6 I discussed Queller’s formulation of this idea, and here I outline a simplified version of his model and derivation.

Assume an asexual population whose members interact in pairs. Each individual has a value for phenotype, P, which is equal to one if the individual acts altruistically within its pair, and zero otherwise. Each individual also has a value of P*, which is the phenotype of the individual’s partner (again, one if the partner is an altruist, zero otherwise). Each individual also has a value of G, its genotype, and of G*, the genotype of its partner. (I am re-using the symbol “G” here, which stood for the germ line parameter in earlier chapters, but I will follow Queller’s (p.177) and standard symbolism here. The second use of“G” is confined to this section and there are no germ/soma uses of “G” in this section.) The values of G can initially be thought of as one and zero, for altruistic and selfish respectively, but this assumption does not matter, as we will see below. The cost of being an altruist is c, and the benefit received from having an altruistic partner is b. W0 is an initial or baseline fitness. Individual i’s total fitness can then be written as follows:


Assuming that G is faithfully transmitted from parent to offspring, a Price equation for change in the mean value of G can be written as ΔG¯=Cov(W,G)/W¯ Substituting the right-hand side of (A10) for W and rearranging, the model yields two equivalent criteria for when the mean of G will increase. One of the two is as follows:


The other formulation has Cov(G, P*) as the left-hand side numerator instead. Either way, “relatedness” is replaced here by an abstract measure of correlation between the phenotypes of those acting and the genotypes of those the actions affect. The recipient need not have the same phenotype as the actor, and the actor need not have the same genotype as the recipient. Further, talk of “genotype” is actually inessential here, as G in the model functions as a quantitative characteristic that is potentially correlated with P and that is passed on in reproduction—those are the only constraints on G. Transmission could be cultural, for example, and, more generally, the model does not require that the population can be sorted into discrete types, altruist versus selfish. P could take many values as well. If a population had individuals with many degrees of altruism (as with the case of height), the model would allow us either to group them coarsely into the altruist versus selfish (like the tall versus short), or to track all the fine differences. The model is thus compatible with evolutionary nominalism of the kind defended above. As discussed earlier, the model can also be extended to cover cases where cooperation is favored through reciprocity (Fletcher and Zwick 2006). If an individual’s behavior is sensitive to its circumstances rather than fixed, and cooperation is produced in a discriminate way (perhaps via a “Tit-for-Tat” rule), then Cov(G*, P) can be high even if pairs initially come together at random.

I will make one additional argument using the model, linking Chapters 2, 6, and 7. The genetic description of evolution is the firmest home of “types” in evolutionary thinking. But evolutionary nominalism applies here as well; genetic types, once we have a DNA sequence of appreciable length, are a coarse-graining just as phenotypic ones are. If we look at many “identical” copies of a gene, we (p.178) will eventually find a shading-off. Is this genetic sequence a token of the same type as that one if they differ by a silent substitution? Perhaps those do not count, but what about a mutation that affects only an unimportant part of the protein? Genetic sequences are related by distances in a space of substitutions, as well as by type-identity.

This affects discussions of genetic cooperation. The “cooperation” of two identical alleles in closely related cells or organisms is often taken to be readily explicable; they are not really two different things, in the evolutionarily important sense, but instances of a common type. It is the type—the “strategic gene” (Haig 1997) —that does well or badly, and its material tokens rightly behave indifferently between favoring their own copying and favoring their type-mate’s copying. But if one strand of DNA acquires a silent substitution, it is not suddenly outside the cooperative fold. Here the Queller formulation of Hamilton’s rule is useful. The model (partly via its Price-equation roots) explains donations of fitness between entities that are treated, in the explanation, as unique particulars that can be related by similarity and need not share their type. The case of two gene copies with identical sequence is treated as an extreme case of a more general phenomenon.

A.6. Connection, Modification, and Descent

Evolution in a Darwinian population is one kind of change in a system of objects over time, and a focus of this book has been the idea that Darwinian evolution shades into other kinds of change. A connected topic is the relation between different “levels of description.” Here I have in mind not just levels of selection, as in Chapter 6, but the fact that Darwinian populations are physical systems, and at the physical level different kinds of description apply to them. Each Darwinian individual is a collection of physical particles, moving through space and time, constantly losing and gaining matter. Darwinian processes become visible via a “zooming-out” from a mass of physical events. In this section I present a formal way of representing and investigating some of these issues. All the work in this section was done in collaboration with Ben Kerr, and the equation (A12) was proved by him.

Suppose a system consists of two collections of things existing at different times, with at least some causal connections linking the entities present at different times. The two times will be labeled ta and td, for the “ancestral” and “descendant” time points, respectively, and the collections of entities will be also be referred to as ancestral and descendant. Throughout, superscripts will indicate ancestral properties and subscripts will indicate those of descendants.

Assume that at least some of the descendant entities are connected to some of the ancestral entities. This “connection” can be thought of initially as some sort of causal responsibility, but that term is understood very broadly. If an object (p.179)


Figure A.1: Ancestral and descendant entities.

persists intact from ta to td, that is sufficient for connection. Familar kinds of reproduction also count. But any pattern of connectivity is allowed in the analysis. As Figure A.1 shows, the ancestors may differ in the number of connections they have to members of the descendant ensemble, and the descendants can also differ with respect to the number of connections they have to the ancestrals. There can be ancestors with no descendants, and descendants with no ancestors. The only constraint is that at least one connection exists.

Below I will describe a representation of change in systems of this kind. First, though, it is useful to take a step back from what is assumed so far. Imagine that we are at an earlier stage of analysis of the system in Figure A.1. We have not yet recognized distinct objects making up the ancestral and descendant ensembles. All we know is that the entire system at ta gives rise to entire system at td .To reach the stage of analysis pictured in Figure A.1 we have to first recognize separate objects, at both time points, and secondly limit the connections recognized between them. These preliminary stages are represented in Figure A.2, (a) and (b).

The second move, from the representation in Figure A.2(b) to Figure A.1, involves a kind of coarse-graining. We can assume everything present at ta has some effect on everything at td—there are minute gravitational effects, if nothing else. To reach a picture with limited connectivity we ignore many of these influences, and treat only some as significant. The earlier move, from Figure A.2(a) to A.2(b) is more philosophically controversial, but may also involve a similar kind of coarse-graining. To reach A.2(b) we treat some parts of the overall system as partially independent of the others, with an identity that is portable


Figure A.2:Stages of analysis preliminary to Figure A.1.

(p.180) across changes in other members of the collection. This is related to the distinction made in Chapter 8 between populations and highly integrated networks.

Suppose that we have reached the kind of picture seen in Figure A.1, with collections of distinct entities and limited connections between them. The next aim is to represent change over the time interval. Let there be na entities at ta and nd entities at td. Let Cji be an indicator variable for connection between ancestral entity i and descendant entity j. So:

Cji={1 if ancestral entity i connects to descendant entity j0 if ancestral entity i does not connect to descendant entity j

Thus, ancestral entity i connects to a total of C*i=j=1ndCji descendant entities, and descendant entity j connects to Cj*=i=1naCji ancestral entities. These are absolute measures of connectedness for ancestors and descendants. We can also define two relative measures of connectednessC˜*i andC˜j*, by dividing C*i and Cj* by C**/naand C**/nd respectively. That is, we divide the two absolute measures by the average connectedness of ancestors (in one case) and of descendants (in the other). Here C** is the total number of connections, or i=1naj=1ndCji.

X is some measurable characteristic of the entities. Let the value of X for ancestral entity i be Xi and that of descendant entity j be Xj. The mean character values in the ancestral and descendant ensembles are X¯a=1nai=1naXi and X¯d=1ndj=1ndXj Change can then be represented with an equation linking these two means; let ΔX¯ be the difference between the means, or X¯dXa¯. .It can be shown that:


Here ΔXji=Cji(XjXi), the change in character across a particular connection; E(ΔXji) is the average change across a connection.

Despite the complicated set-up, this equation is easy to interpret (see Kerr and Godfrey-Smith, forthcoming, for more detail). The first two terms in the right-hand side map to the terms found in a standard Price equation. The first term is a covariance between the character value of each ancestor and the number of descendants to which it is connected, relativized to the overall degree of connectedness seen in ancestors. C˜*i is thus a kind of fitness measure, treating the presence of a downwards arrow as a unit of influence for that ancestral entity. So Cov(C˜*i,Xi) measures the covariance of ancestral character with fitness. The second term measures the overall tendency of divergence to take place over a connection—it is like a “transmission bias” term. The third term, which is not part of a standard Price equation, is like a mirror-image of the first term, the fitness term. It measures the covariance between descendant character and the number of ancestors to which the descendant is connected, relative to the overall degree of connectedness seen in descendants.

(p.181) The Price equation is often seen as giving a complete decomposition of evolutionary change. But change is consistent with zero values for the two standard Pricean terms. The explanation of the “missing term” is as follows. It is usually assumed that the members of a parental generation may differ with respect to their number of offspring, but it is not usually assumed that the members of the offspring generation might differ with respect to the number of their parents. The present model, in contrast, makes no prior assumptions regarding the number of parents that an individual has; any pattern of connectivity is treated as possible, including one-to-many and many-to-one connections in each direction. The standard Price equation covers a special case that arises via an (often reasonable) simplifying assumption about the pattern of connectivity between the ancestral and descendant ensembles.

One simple example to illustrate the role of the third term is migration into the population from outside. A migrant is, in the context of this analysis, a descendant without an ancestor. When some individuals in the descendant ensemble are migrants and some are not, and the migrants differ in character from the locals, Cov(C˜j*,Xj) will be non-zero. Another example is a mixture of sexual and asexual reproduction (as seen in Figure A.1). Then, again, individuals will differ in their number of parents, and if those with more or fewer parents differ in character from the others, CoV(C˜j*,Xj)will be non-zero. It has often been noted that the structure of mainstream evolutionary theory is better designed for fruit flies and birds than it is for plants and for animals which show mixtures of sex and asexuality (see Chapter 4, along with Jackson et al. 1985, Tuomi and Vuorisalo 1989b). The generalization of the Price equation here equips it to deal with those cases, without reducing them (as may also be done) to a genetic level at which reproduction is more uniform. In contrast with the usual Price equation, the analysis here is also reversible. Equation (A12) treats change from the ancestral to descendant ensembles as a consequence of ancestral fitness differences, transmission bias, and differences in a descendant-focused mirror-image of fitness. But as any pattern of connectivity is allowed, an analysis using (A12) could describe change from an “descendant” ensemble to an “ancestral.”

The utility of the Price equation derives in part from the way it can be applied to hierarchically structured systems. Price’s second or “expectation” term, which corresponds to the second term in (A12), can be decomposed into a lower-level covariance term and a lower-level expectation. Equation (A12) also has this feature, but the expectation term breaks down into three lower-level terms, each corresponding to the terms described above (see Kerr and Godfrey-Smith, forthcoming).

When these points about hierarchy are made, it is usually assumed that the analyst knows in advance that there is a lower level of reproducing entities. The present framework can be used to represent how such conclusions may (p.182) be reached. To see this, take a step back to Figure A.2. The equation above can be applied to the cases in Figure A.2, but not in an informative way. The analysis will be trivial in Figure A.2(a), as there is only one connection and one member of each ensemble. So the change over that one connection is ΔX¯ . In the case of Figure A.2(b) the analysis will also be relatively uninformative, though not trivial. In Figure A.2(b), all the ancestors are jointly responsible for all the descendants, and only the second term can be non-zero. The breakdown given by the equation becomes informative when a significant role is being played by differing descendant number (differential fitness), differing ancestor number, or both. Change has a more Darwinian character when a significant role is played by the first term, the differential fitness term. It has a more “transformational” character (Lewontin 1983) when much of the weight is carried by some regular principle of change over a connection described in the second term. There is not an existing label that captures change due mainly to the third term, which is a matter of “differential convergence.”

Returning to questions about hierarchy: when someone wonders whether a hierarchical analysis will be informative, they begin, in effect, by treating the entities that make up our initial “focal” level (the circular shapes in Figure A.1) as if they were each like the shapes in Figure A.2(a)—undifferentiated wholes linked by single connections across the time interval. They may then ask whether these entities can be treated as collectives—whether they can be broken down into smaller units that enter into ancestor/descendant relations of their own. This question can be addressed using the same criteria described above for the initial or “focal” level. Is there a natural division of the focal entities into parts at all? If there is, does this division allow us to recognize a reasonably sparse, and hence informative, pattern of connection between sub-entities across the time interval? This is pictured in Figure A.3. Here I suppose that two higher-level entities are connected if and only if there is at least sub-entity connection between them. Alternatively, working from the focal level “up,” we can assess whether the focal entities can be collected together into larger units, revealing a Darwinian pattern at a higher level.

The analysis is obviously very general with respect to the entities making up the population. They might be organisms, time-slices of organisms, cells, genes, groups, or cultural variants. It is not assumed that reproduction is synchronized; the descendant entities could be at an earlier stage of individual development than the ancestors, and the two ensembles might differ internally in the same way. There may be intervening generations not represented. In addition, no distinction is made between persisting across the time interval and asexual reproduction accompanied by death of the parent. Either way, a single ancestor gives rise to a single descendant. In Chapter 5 much was made of the distinction between reproducing and persisting. The features of reproduction captured by (p.183)


Figure A.3:Connectivity at two levels.

parameters B and G of that chapter are not automatically given a role in this analysis. That is a way in which the framework here is incomplete.

In Section A.1 I distinguished two kinds of analyses of evolutionary change: those that apply over many time-steps (“dynamically sufficient” representations, or “recursions”), and “single-step” analyses that only contain information bearing on one interval or generation. Using this model we can try to describe the kinds of features that will make it possible to describe a system with a recursive expression. Two kinds of simplicity should rise to this situation. First, there may be a simple rule for change over a connection. Second, there may be a simple rule relating the character of an ancestor (Xi) to the number of connections it has to the descendant generation. The result may then be a compact dynamic rule, applicable over many time-steps, such as a discrete replicator dynamics with mutation (Nowak 2006). That dynamic requires that each descendant has only one ancestor, and change over the connection is described with a fixed high probability of faithful transmission and a small probability of change to a different state. It also requires that the fitness of an ancestor either has a fixed association with its character, or is a function of factors that can themselves be predicted evolutionarily, such as the frequency of a type. Then we have fitnesses systematically associated with repeatable types.

There will be other simple rules possible as well, which do not require that each descendant has only one ancestor. But evolutionary processes will only be orderly and tractable over long intervals when there is reasonable simplicity in the rules that determine, as a function of ancestral properties, which descent lines or connections are going to arise, and what sort of population will arise from those connections. This gives us a way of thinking about the contrast between the orderly processes of Mendelian inheritance and the less orderly processes of cultural change. (p.184)