Primate Vocal Communication
Primate Vocal Communication
Abstract and Keywords
This chapter reviews current understanding of vocal communication in primates. It argues that primate communication calls convey information about both the caller's affective state, and objects and events in the world. This mixed referential signaling mechanism appears to be fundamentally social in nature and thus crucial for the representation of goals, intentions, and knowledge.
In 1871, Charles Darwin drew attention to a dichotomy in the vocal communication of animals that had perplexed philosophers and naturalists for at least 2,000 years. In marked contrast to human language, he wrote, animal vocalizations appeared to be involuntary expressions of emotion and movement: “When the sensorium is strongly excited, the muscles of the body are generally thrown into violent action; and as a consequence, loud sounds are uttered, … although the sounds may be of no use” (1871/1981, p. 83). However, two pages later Darwin wrote: “That which distinguishes man from the lower animals is not the understanding of articulate sounds, for, as every one knows, dogs understand many words and sentences. … Nor is it the mere capacity of connecting definite sounds with definite ideas; for it is certain that some parrots, which have been taught to speak, connect unerringly words with things, and persons with events” (1871/1981, p. 85).
The vocal communication of monkeys and apes appears to be no different from that of other animals. Production is highly constrained. Nonhuman primates have a relatively small repertoire of calls, each of which is closely linked to particular social circumstances and shows little modification during development (see Hammerschmidt & Fischer, 2008, for a review). Perception, by contrast, is more flexible, open ended, and modifiable as a result of experience. The difference between production and perception is puzzling because producers are also perceivers: Why should an individual who can deduce an almost limitless number of meanings from the calls of others be able to produce only a limited number of calls of his or her own? The difference between production and perception is also puzzling because it constitutes a crucial distinction between human and nonhuman primates. Why should monkeys and apes—so similar to humans in so many other respects—be so different when it comes to vocal production?
The contrast between vocal production and perception constitutes the starting point for this chapter. We wish to make three points. First, it is important to be clear exactly what we mean when we say that primate vocal production is “sharply constrained.” Many scientists have taken this to mean that primate call production is fixed, uncontrollable, and involuntary. This conclusion is too extreme. In fact, both field and laboratory studies paint a more complex picture. Monkeys and apes can call or remain silent, modify the timing and duration of calling, and make subtle acoustic modifications to the calls they give in specific social contexts. However, although they can modify call production in many ways, they rarely create entirely new calls or call combinations, or sever the link between a particular call type and the circumstances in which it is normally given.
Second, the dichotomy between production and perception has important theoretical implications, because it draws our attention to the very different mechanisms that underlie the behavior of speakers and listeners, even when these individuals are involved in the same (p.85) communicative event. Nonhuman primates present us with a communicative system in which a small repertoire of relatively fixed, inflexible calls, each linked to a particular social context, nonetheless gives rise to an open-ended, highly modifiable, and cognitively rich set of meanings.
Third, the contrast between production and perception in primates and many other mammals cries out for an evolutionary explanation. What selective pressures caused our human ancestors—and they alone among the primates—to develop flexible vocal production? Unconstrained by any definitive data that might help resolve the issue, we offer some speculations.
As Darwin noted, primates—like other mammals—produce a small repertoire of acoustically fixed, species-specific calls that are closely tied to particular contexts and show little modification during development. By contrast, when it comes to perception and comprehension, primates and other animals display an almost open-ended ability to learn new sound-meaning pairs. They also appear to ascribe intentions and motives to signalers when assessing whether or not to respond to a given individual’s calls. Consider baboons, for example.
Baboons are Old World monkeys that shared a common ancestor with humans roughly 30 million years ago (Steiper et al., 2004). They live throughout the savannah woodlands of Africa in groups of 50 to 150 individuals. Although most males emigrate to other groups as young adults, females remain in their natal groups throughout their lives, maintaining close social bonds with their matrilineal kin (Silk et al., 1999, 2006a,b). Females can be ranked in a stable, linear dominance hierarchy that determines priority of access to resources, and daughters acquire ranks similar to those of their mothers. Baboon social structure can therefore be described as a hierarchy of matrilines, in which all members of one matriline (e.g., matriline B) outrank or are outranked by all members of another (e.g., matrilines C and A, respectively). Ranks within matrilines are as stable as those between matrilines (e.g., A1 > A2 > A3 > B1 > B2 > C1, etc.) (Cheney & Seyfarth, 2007).
Listeners Extract Rich “Narratives” from Simple Call Sequences
Baboon vocalizations, like those of many other primates, are individually distinctive (e.g., Owren et al., 1997; Rendall, 2003), and listeners recognize the voices of others (reviewed in Cheney & Seyfarth, 2007). Baboons' vocal repertoire contains a number of acoustically graded signals, each of which is relatively context specific. The alarm “wahoos” given by adult males to predators, for example, are acoustically similar to the wahoos that males give during aggressive contests (Fischer et al., 2002). Nonetheless, listeners respond to the two call types as if they convey qualitatively different information (Kitchen et al., 2003).
Grunts, the most common call given by baboons, are given in a variety of social interactions and also differ acoustically according to context. Move grunts are given in bouts of one or two calls while the group is on a move or when one or more individuals attempt to initiate a group move, and they often elicit answering grunts from nearby listeners. Slightly acoustically different infant grunts are given in a variety of affiliative contexts and function to facilitate social interactions (Cheney et al., 1995; Rendall et al., 1999). If a high-ranking female grunts as she approaches a lower-ranking female, the lower-ranking female is less likely to move away than if the approaching female remains silent. Grunts also function to reconcile opponents after a dispute, increasing the likelihood that former opponents will tolerate each other’s close proximity and reducing the probability of renewed aggression (Cheney & Seyfarth, 1997; Cheney et al., 1995).
Because calls are individually distinctive and each call type is predictably linked to a particular social context, baboon listeners can potentially acquire quite specific information from the calls that they hear. This applies not only to calls of a single type, like predator alarm calls (Fischer et al., 2000, 2001a,b), but also to the sequences of different call types that arise (p.86) when two or more individuals are interacting with each other.
Throughout the day, baboons hear other group members giving vocalizations to each other. Some interactions involve aggressive competition, for example, when a higher-ranking animal gives a series of threat-grunts to a lower-ranking animal and the latter screams. Threat-grunts are aggressive vocalizations given by higher-ranking to lower-ranking individuals, whereas screams are submissive signals, given primarily by lower- to higher-ranking individuals. A threat-grunt-scream sequence, therefore, provides information not only about the identities of the opponents involved but also about who is threatening whom. Baboons are very sensitive to both types of information. In playback experiments, listeners respond with apparent surprise to sequences of calls that appear to violate the existing dominance hierarchy. Whereas they show little response upon hearing the sequence “B2 threat-grunts and C3 screams,” they respond strongly—by looking toward the source of the call—when they hear “C3 threat-grunts and B2 screams.” Between-family rank reversals (C3 threat-grunts and B2 screams) elicit a stronger violation of expectation response than do within-family rank reversals (C3 threat-grunts and C1 screams) (Bergman et al., 2003).
A baboon who ignores the sequence “B2 threat-grunts and C3 screams” but responds strongly when she hears “C3 threat-grunts and B2 screams” reveals, by his or her responses, that he or she recognizes the identities of both participants, their relative ranks, and their family membership. He or she also acts as if he or she assumes that the threat-grunt and scream have occurred together not by chance, but because one vocalization caused the other to occur. Without this assumption of causality there would be no violation of expectation when B2’s scream and C3’s threat-grunt occurred together.
Baboons’ ability to deduce a social narrative from a sequence of sounds reveals a rich cognitive system in which listeners extract a large number of complex, nuanced messages from a relatively small, finite number of signals. A baboon who understands that “B2 threat-grunts and C3 screams” is different from “C3 threat-grunts and B2 screams” can make the same judgment for all possible pairs of group members as well as any new individuals who may join (Cheney & Seyfarth, 2007, Chapters 10 and 11).
Underlying the baboons’ sophisticated social cognition is an almost open-ended ability to learn new sound-meaning pairs. This open-ended learning is found in many nonhuman primates, as well as other animals. Baboons and other primates learn to recognize the voices of new individuals as they are born or join the group from elsewhere, just as they learn to distinguish their own species’ alarm calls (Fischer et al., 2000; Seyfarth & Cheney, 1986; Zuberbühler, 2000) and the different alarm calls of sympatric birds and mammals (Hauser, 1988; Hauser & Wrangham, 1990; Seyfarth & Cheney, 1990; Zuberbühler, 2001). Primates in laboratories readily learn to recognize the voices of their different caretakers and to associate different sounds, like the rattling of keys or the beep of a card swipe, with impending events that may be good (feeding) or bad (the visit of a veterinarian). In cross-fostering experiments, infant rhesus (Macaca mulatta) and Japanese macaques (M. fuscata) raised among the members of another species learned to recognize their foster mothers’ calls—and the foster mothers learned to recognize theirs—even in contexts in which the two species used acoustically different vocalizations (Seyfarth & Cheney, 1997). Taken together, these results suggest that nonhuman primates are both innately predisposed to ascribe meaning to different sounds and always ready to learn new information from novel auditory stimuli.
These generalizations apply with equal force to other mammals. Consider Rico, for example, a border collie who learned the names of more than 200 different toys (Kaminski et al., 2004). Rico was able to learn and remember the names of new toys by process of exclusion, or “fast mapping,” and—like small children—used gaze and attention to guide word learning. But of course Rico never learned to say any of the words he learned. In this respect, his vocal perception and production were similar to (p.87) those of language-trained apes (Savage-Rumbaugh, 1986; Terrace, 1979), sea lions (Schusterman et al., 2002), and dolphins (Herman et al., 1993).
Listeners Ascribe Intentions to Signalers
In addition to making judgments based on social causation, baboons appear to recognize other individuals’ intentions and motives. Baboon groups are noisy, tumultuous societies, and baboons would not be able to feed, rest, or engage in social interactions if they responded to every call as if it were directed at them. In fact, baboons appear to use a variety of behavioral cues, including gaze direction, learned contingencies, and the memory of recent interactions with specific individuals when making inferences about the target of a vocalization. For example, when a female hears a recent opponent’s threat-grunts soon after fighting with her, she responds as if she assumes that the threat-grunt is directed at her, and she avoids the signaler. However, when she hears the same female’s threat-grunts soon after grooming with her, she ignores the calls and acts as if the calls are directed at someone else (Engh et al., 2006). Conversely, when a female hears her opponent’s friendly infant grunt soon after a fight, she acts as if she assumes that the call is directed at her and is intended as a reconciliatory gesture. She approaches her recent opponent and tolerates her opponent’s approaches at a rate that is even higher than baseline rates (Cheney & Seyfarth, 1997). In contrast, hearing the grunt of an uninvolved dominant female unrelated to her opponent has no effect on the female’s behavior. In this latter case, she acts as if she is not the intended target of the call and treats the call as irrelevant.
In some cases, inferences about the intended target of a call seem to involve rather complex and indirect causal reasoning about, among other things, the kinship bonds that exist among others. Playback experiments, for example, have shown that baboons will accept the “reconciliatory” grunt by a close relative of a recent opponent as a proxy for direct reconciliation by the opponent herself (Wittig et al., 2007). To do so, the listeners must be able to recognize that a grunt from a particular female is causally related to a previous fight even though she has not interacted recently with the signaler, but with the signaler’s relative.
There are intriguing parallels between these results and recent neurophysiological research. In primates, faces and vocalizations are the primary means of transmitting social signals, and monkeys recognize the correspondence between facial and vocal expressions (Ghazanfar & Logothetis, 2003). When rhesus macaques hear another monkey’s calls, they exhibit neural activity not only in areas associated with auditory processing but also in higher-order visual areas (Gil da Costa et al., 2004). Ghazanfar and colleagues explored the neural basis of sensory integration using the coos and grunts of rhesus macaques and found that cells in the auditory cortex were more responsive to bimodal (visual and auditory) presentation of these calls than to unimodal presentation (Ghazanfar et al., 2005; see Romanski & Ghazanfar, this volume). Intriguingly, the effect of cross-modal presentation was greater with grunts than with coos. The authors speculate that this may have occurred because grunts are usually directed toward a specific individual, whereas coos are more often broadcast to the group at large. The greater cross-modal integration in the processing of grunts may arise because, in contrast to a coo, listeners who hear a grunt must immediately determine whether or not the call is directed at them.
Primate Vocal Production
Monkeys and apes have a relatively small repertoire of context-specific calls that show relatively little modification in their acoustic properties during development (Janik & Slater, 1997; McComb & Semple, 2005; Seyfarth & Cheney, 1997). Cross-fostering experiments with macaques suggest that the link between particular call types and the contexts in which they are given is difficult to break. For example, normally raised rhesus and Japanese macaques differ in their use of calls in several social contexts: Rhesus (p.88) macaques use a mixture of coos and grunts, whereas Japanese macaques use coos almost exclusively. In a 2-year experiment involving four individuals who were raised in groups of the other species, infant rhesus and Japanese macaques adhered to their species-typical pattern of calling even though, in every other respect, they were fully integrated into their adopted social groups (Owren et al., 1993).
There is also little evidence that nonhuman primates adapt calls to different contexts or create new calls to deal with novel situations. And although they routinely hear different call combinations—combinations, like those described previously, whose meaning is more than the sum of their constituent elements—these combinations are created when two baboons are vocalizing to each other. With a few possible exceptions (see later), signalers never combine different vocalizations to create new messages. Thus, primate vocal repertoires are far from open ended. Production is very different from perception.
Primates Are Typical of Most Mammals
In their highly constrained vocal production combined with flexible perception and cognition, nonhuman primates are typical of most mammals. Indeed, the ability to modify the acoustic features of calls depending on experience seems comparatively rare in the animal kingdom. As of 1997, when Janik and Slater published their review of the topic, vocal learning had been documented in only three orders of birds, cetaceans, harbor seals, and humans. True, we know much more about vocal communication in monkeys than in nonprimate mammals or even the great apes, and we may yet be surprised by novel evidence of vocal imitation (e.g., Poole et al., 2005) or creative call combinations (Arnold & Zuberbühler, 2006; Crockford & Boesch, 2003; Zuberbühler, 2002). For the moment, however, there is no reason to believe that mammals in general—including apes—differ from the baboons and other primates described previously. The question for primatologists then becomes: What selective forces gave rise to learned, flexible vocal production in our hominid ancestors? Below we offer a speculative answer to this question, but first we consider more closely the “fixed” nature of primate vocal production and explore the theoretical implications of communication between relatively constrained vocal producers and flexible, open-ended receivers.
Vocal Production, Though Constrained, Is Not Fixed and Involuntary
Compared to the large, learned repertoires of many songbirds and the imitative skills of cetaceans and pinnipeds, the vocal repertoires of nonhuman primates are small (McComb & Semple, 20051). Nonhuman primates also use their vocalizations in highly predictable social circumstances. These two observations, together with early neurophysiological studies showing that seemingly normal calls could be elicited by electrical stimulation of subcortical areas in the brain (e.g., Jurgens & Ploog, 1970; Ploog, 1981), have led many anthropologists (Washburn, 1982; Gardenfors, 2003), ethologists (Goodall, 1986), linguists (Bickerton, 1990), psychologists (Terrace, 1983), and neuroscientists (Arbib, 2005) to conclude that primate vocalizations are reflexive, involuntary signals—or, in Bickerton’s words, “quite automatic and impossible to suppress” (1990:142). This characterization is misleading.
In both the field and the laboratory, nonhuman primates appear to be able to control whether they produce a vocalization or remain silent. Baboons, as already noted, may follow an aggressive interaction with a reconciliatory grunt or they may not, and like other primates they vocalize more to some individuals than to others (e.g., Smith et al., 1982). Even in highly emotional circumstances like encounters with predators, some individuals call at high rates, others call less often, and still others remain silent (Cheney & Seyfarth, 1990).
The “decision” to call or remain silent can have significant behavioral consequences. In experiments conducted on wild capuchin monkeys (Cebus capucinus) in Costa Rica, (p.89) Gros-Louis (2004) found that individuals who discovered food were more likely to give “food” calls when other group members were nearby than when they were alone. Furthermore, they were more likely to call if a higher-ranking, as opposed to a lower-ranking, bystander was nearby. Individuals who called when approached by a high-ranking animal were less likely to receive aggression than those who remained silent. Gros-Louis (2004) concluded that capuchin food calls function to announce both ownership and the signaler’s willingness to defend his or her possession. As a result, unless they were strongly motivated to take the food, listeners refrained from harassing the signaler.
In more controlled laboratory settings, the timing, duration, and rate of calling by monkeys can be brought under operant control (Pierce, 1985). In a recent series of experiments, Egnor et al. (2007) exposed cotton-top tamarins (Saguinus oedipus) to intermittent bursts of white noise and found that subjects quickly learned to restrict their calling to the silent intervals. Clearly, then, primates can control whether they vocalize or not depending on variations in the ecological, social, and acoustic environments.
Within a given context, nonhuman primates can also make subtle modifications in the acoustic structure of their calls (reviewed by Hammerschmidt & Fischer, 2008). Wild chimpanzees (Pan troglodytes) in Uganda, for example, give long, elaborate pant-hoots either alone or in “choruses” with others. When two individuals have called together several times, the acoustic features of their pant-hoots begin to converge (Mitani & Brandt, 1994; Mitani & Gros-Louis, 1998). Apparently, they modify the acoustic structure of their calls depending on auditory experience. Crockford et al. (2004) tested this hypothesis on four communities of chimpanzees in the Tai Forest, Ivory Coast. They found that males in three contiguous communities had developed distinctive, community-specific pant-hoots, whereas males in a fourth community 70 km away showed only minor acoustic differences from males in the other three communities. By comparing the genetic relatedness between pairs of males with the acoustic similarity of their calls, Crockford and colleagues could rule out an explanation of call convergence based on shared genes. Instead, they propose that “chimpanzees may actively modify pant hoots to be different from their neighbors” (2004, p. 221). Such differences have functional consequences: Playback experiments conducted by Herbinger (2003) on individuals in the same West African community found that chimpanzees recognize the pant-hoots of other individuals, associate them with particular areas, and distinguish the calls of neighbors from strangers.
Like rhesus macaques (Gouzoules et al., 1984), wild chimpanzees who are receiving aggression produce acoustically different screams depending on the severity of the attack (Slocombe & Zuberbühler, 2005). Intriguingly, Slocombe and Zuberbühler (2007) also found that chimpanzee victims produced screams that appeared to exaggerate the severity of the attack, but they did so only when there was at least one individual nearby whose dominance rank was equal to or higher than that of the aggressor. These results suggest that chimpanzees both have a limited ability to modify the acoustic structure of their vocalizations and can recognize the dominance relations that exist among others.
Laboratory experiments confirm that primates can make subtle modifications to the acoustic features of their calls depending on experience. Elowson and Snowdon (1994) documented acoustic differences between the calls of pygmy marmosets (Cebuella pygmaea) housed in Washington, DC, and Madison, Wisconsin. When a group of marmosets was moved from Washington to Madison, the calls of the transplanted individuals changed to become more like their hosts’. In another experiment, Egnor et al. (2007) exposed cotton-top tamarins to bursts of white noise just as they produced their “contact loud call.” The tamarins responded by producing calls that were shorter, with fewer pulses. Calls given immediately before or after white noise were also louder and had longer interpulse intervals. Egnor and Hauser (2004) review several other cases in which nonhuman primates make subtle modifications in the acoustic structure of their calls.
(p.90) In sum, it is misleading and overly simplistic to describe primate vocal production as “fixed” and “involuntary.” A more accurate conclusion is that the basic structure of nonhuman primate vocal signals appears to be innately determined, whereas the fine spectrotemporal features can be modified based on auditory experience and social context (Egnor & Hauser, 2004; Hammerschmidt & Fischer, 2008). The distinction between relatively innate and more modifiable components of phonation is important, because it has significant implications for future research on the neurobiology of primate communication (see Egnor & Hauser, 2004; Hammerschmidt & Fischer, 2008). For example, what brain areas are responsible for the innate and the modifiable components of primate call production, and how are these two aspects integrated at the neural level? Second, what contextual factors are most important in modifying primate vocal production: Age? Caller identity? The history of interaction between participants? And what neural pathways are responsible for this modulation? One recent study found differences between the neural mechanisms involved in spontaneous vocalizations and those involved in the production of calls that were elicited by calls from another individual (Gemba et al., 1999). Given the flexibility of human phonation, those interested in the evolution of language will be curious to know which social situations and areas of the brain are responsible for the limited flexibility that occurs in the phonation of monkeys’ and apes’ calls.
“Affective” and “Symbolic” Signals: A False Dichotomy
Nonhuman primates present us with a communicative system in which a small repertoire of relatively fixed and inflexible calls, each linked to a particular social context, nonetheless gives rise to an open-ended, highly modifiable, and cognitively rich set of meanings. Listeners extract rich, semantic, and even propositional information from signalers who did not, in the human sense, intend to provide it (Cheney & Seyfarth, 1998).
The sharp distinction between signaler and recipient helps to clarify a theoretical issue that has deviled studies of primates’—and other animals’—vocalizations since Darwin first discussed them in The Expression of the Emotions in Animal and Man. Following Darwin, modern ethologists have typically assumed that vocal communication in animals differs from human language largely because the former is an “affective” system based on emotion, whereas the latter is a “referential” system based on the relation between words and the objects or events they represent (see, for example, Hauser, 1996; Marler et al., 1992; Owings & Morton, 1998; Owren & Rendall, 1997; Seyfarth et al., 1980). But this dichotomy is logically false.
A call’s potential to serve as a referential signal depends on how tightly linked the call is to a particular social or ecological context. The mechanisms that underlie this specificity are irrelevant. A tone that informs a rat about the imminence of a shock, an alarm call that informs a vervet monkey about the presence of a leopard, or a sequence of threat-grunts and screams that informs a baboon that B3 and D2 are involved in a fight all have the potential to provide a listener with precise information because of their predictable association with a narrow range of events. The widely different mechanisms that lead to this association have no effect on the signal’s potential to inform (Seyfarth & Cheney, 2003). Put slightly differently, there is no obligatory relation between “referential” and “affective” signaling. Knowing that a call is referential (i.e., has the potential to convey highly specific information) tells us nothing about whether its underlying cause is affective or not. Conversely, knowing that a call’s production is due entirely to the caller’s affect tells us nothing about the call’s potential to serve as a referential signal.
It is therefore wrong, on theoretical grounds, to treat animal signals as either referential or affective, because the two properties of a communicative event are distinct and independent dimensions. Highly referential signals could, in principle, be caused entirely by a signaler’s emotions, or their production could be relatively independent of measures of arousal. Highly (p.91) affective signals could be elicited by very specific stimuli and thus function as referential calls, or they could be elicited by so many different stimuli that they provide listeners with only general information. In principle, any combination of results is possible. The affective and referential properties of signals are also logically distinct, at least in animal communication, because the former depends on mechanisms of call production in the signaler, whereas the latter depends on the listener’s ability to extract information from events in its environment. Signalers and recipients, though linked in a communicative event, are nonetheless separate and distinct, because the mechanisms that cause a signaler to vocalize do not in any way constrain a listener’s ability to extract information from the call.
Baboon grunts offer a good example. Rendall (2003) used behavioral data to code a social interaction involving move or infant grunts as having high or low arousal. He then examined calls given in these two circumstances and found that in each context certain acoustic features or modes of delivery were correlated with apparent arousal. Bouts of grunting given when arousal was apparently high had more calls, a higher rate of calling, and calls with a higher fundamental frequency than bouts given when arousal was apparently low. Further analysis revealed significant variation between contexts in the same three acoustic features that varied within context. By all three measures (call number, call rate, and fundamental frequency), infant grunts were correlated with higher arousal than were move grunts. Infant grunts also exhibited greater pitch modulation and more vocal “jitter,” a measure of vocal instability (Rendall, 2003). In human speech, variations in pitch, tempo, vocal modulation, and jitter are known to provide listeners with cues about the speaker’s affect or arousal (e.g., Bachorowski & Owren, 1995; Scherer, 1989).
It is, of course, difficult to obtain independent measures of a caller’s arousal in the field. However, similarities between human and nonhuman primates in the mechanisms of phonation (Fitch & Hauser, 1995; Fitch et al., 2002; Schön Ybarra, 1995) support Rendall’s (2003) conclusion that different levels of arousal play an important role in causing baboons to give acoustically different grunts in the infant and move contexts. This conclusion, however, tells us nothing about the grunts’ potential to act as referential signals that inform nearby listeners about social or ecological events taking place at the time. As already noted, move grunts are given in a restricted set of circumstances, when the group is about to initiate, or has already initiated, a move. As a result, they have the potential to convey quite specific information to listeners. When one baboon hears another give a move grunt, he or she learns with some accuracy what is happening at that moment. By comparison, infant grunts are not as tightly linked to a particular type of social interaction. They may be given as the caller approaches a mother with infant, in answer to another animal’s grunt, or as a reconciliatory signal following aggression. As a result, their meaning is less precise. When one baboon hears another’s infant grunt, he or she learns only that the caller is involved in some sort of friendly interaction, but the precise nature of the interaction is unknown.
In sum, far from being a communicative system that is either affective or symbolic, vocal communication in nonhuman primates (and many other animals) contains elements of both. In their production, monkeys and apes use a small repertoire of relatively stereotyped calls, each closely linked to a particular context. This predictable association between call and context creates, for listeners, a world in which there are statistical regularities—regularities that allow them to ascribe meaning to vocalizations and to organize their knowledge into a rich conceptual structure (Cheney & Seyfarth, 2007, Chapters 10 and 11).
The Evolution of Flexible Vocal Production
At some point in our evolutionary history—probably after the divergence of the evolutionary lines leading to chimpanzees and bonobos on the one hand and humans on the other (Enard et al., 2002)—our ancestors developed much (p.92) greater control over the physiology of vocal production. As a result, vocal output became both more flexible and considerably more dependent on auditory experience and imitation (Fitch, 2007; Lieberman, 1991). What selective pressures might have given rise to these physiological changes?
Vocal communication in nonhuman primates lacks three features that are abundantly present in human language: the ability to generate new words, lexical syntax, and a theory of mind. By the latter we mean the ability of both speakers and listeners to make attributions about each other’s beliefs, knowledge, and other mental states (Grice, 1957). These are the simplest, most basic features that distinguish human and nonhuman primate vocal production, and it is with these traits that speculations about the evolution of language must start. At the earliest stages of language evolution we need not worry about the more complex properties of language that probably came later—properties like case, tense, subject-verb agreement, open- and closed-class items, recursion, long-distance dependency, subordinate clauses, and so on.
How might the ability to generate new words, lexical syntax, and a theory of mind have evolved: simultaneously, in response to the same selective pressures, or more serially, in some particular order? We propose that the evolution of a theory of mind preceded language, creating the selective pressures that gave rise to the ability to generate new words and lexical syntax, and to the flexibility in vocal production that these two traits would have required (Cheney & Seyfarth, 2005, 2007). We make this argument on both empirical and theoretical grounds.
Empirically, there is no evidence in nonhuman primates for anything close to the large vocal repertoire we find even in very young children. Similarly, nonhuman primates provide few examples of lexical syntax. Recent work by Zuberbühler and colleagues on the alarm calls of forest monkeys provides intriguing evidence that the presence of one call type can “modify” the meaning of another (Arnold & Zuberbühler, 2006; Zuberbühler, 2002), and a study by Crockford and Boesch (2003) suggests that a call combination in chimpanzees may carry new meaning that goes beyond the meaning of the individual calls themselves, but these rare exceptions meet few of the definitions of human syntax. By contrast, there is growing evidence that both Old World monkeys (Cheney & Seyfarth, 2007; Engh et al., 2006; Flombaum & Santos, 2005) and apes (Buttelmann et al., 2007; Hare et al., 2001; Tomasello et al., 2005) may possess rudimentary abilities to attribute motives or knowledge to others, and engage in simple forms of shared attention and social referencing.
More theoretically, we suggest that the evolution of a theory of mind acted as a prime mover in the evolution of language because, while it is easy to imagine a scenario in which a rudimentary theory of mind came first and provided the impetus for the evolution of large vocabularies and syntax, any alternative sequence of events seems less likely.
Consider, for example, the course of word learning in children. Beginning as early as 9 to 12 months, children exhibit a nascent understanding of other individuals’ motives, beliefs, and desires, and this skill forms the basis of a shared attention system that is integral to early word learning (Bloom & Markson, 1998; Tomasello, 2003). One-year-old children seem to understand that words can be mapped onto objects and actions. Crucially, this understanding is accompanied by a kind of “social referencing” in which the child uses other people’s direction of gaze, gestures, and emotions to assign labels to objects (Baldwin, 1991; reviewed in Fisher & Gleitman, 2002; Pinker, 1994). Gaze and attention also facilitate word learning in dogs and other animals. Children, however, rapidly surpass the simpler forms of shared attention and word learning demonstrated by animals. Long before they begin to speak in sentences, young children develop implicit notions of objects and events, actors, actions, and those that are acted upon. As Fisher and Gleitman (2002:462) argue, these “conceptual primitives” provide children with a kind “conceptual vocabulary onto which the basic linguistic elements (words and structures) are mapped.” Moreover, in contrast to monkeys, apes, and other animals, 1-year-old children (p.93) attempt to share what they know with others (Tomasello & Carpenter, 2007). While animals are concerned with their own goals and knowledge, young children are motivated to make their thoughts and knowledge publically available.
The acquisition of a theory of mind thus creates a cognitive environment that drives the acquisition of new words and new grammatical skills. Indeed, the data on children’s acquisition of language suggest that they could not increase their vocabularies or learn grammar as rapidly as they do if they did not have some prior notion of other individuals’ mental states (Fisher & Gleitman, 2002; Pinker, 1994; Tomasello, 2003).
By contrast, it is much more difficult to imagine how our ancestors could have learned new words or grammatical rules if they were unable to attribute mental states to others. The lack of syntax in nonhuman primate vocalizations cannot be traced to an inability to recognize argument structure—to understand that an event can be described as a sequence in which an agent performs some action on an object. Baboons, for example, clearly distinguish between a sequence of calls indicating that Sylvia is threatening Hannah, as opposed to Hannah is threatening Sylvia. Nor does the lack of syntax arise because of an inability to mentally represent descriptive verbs, modifiers, or prepositions. In captivity, a variety of animals, including dolphins (Herman et al., 1993), sea lions (Schusterman & Krieger, 1986), and African gray parrots (Pepperberg, 1992), can be taught to understand and in some cases even to produce verbs, modifiers, and prepositions. Even in their natural behavior, nonhuman primates and other animals certainly seem capable of thinking in simple sentences, but the ability to think in sentences does not motivate them to speak in sentences. Their knowledge remains largely private.
This may occur in large part because primates and other animals cannot distinguish between what they know and others know and cannot recognize, for example, that an ignorant individual might need to have an event explained to them. As a result, although they may mentally tag events as argument structures, they fail to map these tags into a communicative system in any stable or predictable way. Because they cannot attribute mental states like ignorance to others, and are unaware of the causal relation between behavior and beliefs, monkeys and apes do not actively seek to explain or elaborate upon their thoughts. As a result, they are largely incapable of inventing new words or of recognizing when thoughts should be made explicit.
We suggest, then, that long before our ancestors spoke in sentences, they had a language of thought in which they represented the world—and the meaning of call sequences—in terms of actors, actions, and those who are acted upon. The linguistic revolution occurred when our ancestors began to express this tacit knowledge and to use their cognitive skills in speaking as well as listening. The prime mover behind this revolution was a theory of mind that had evolved to the point where its possessors did not just recognize other individuals’ goals, intentions, and even knowledge—as monkeys and apes already do—but were also motivated to share their own goals, intentions, and knowledge with others. Whatever the selective pressures that prompted this change, it led to a mind that was motivated to make public thoughts and knowledge that had previously remained private. The evolution of a theory of mind spurred the evolution of words and grammar. It also provided the selective pressure for the evolution of the physiology adaptations that enabled vocal modifiability.
Arbib, M. (2005). From monkey-like action-recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Science, 28, 105–167.
(p.94) Arnold, K., & Zuberbühler, K. (2006). Language evolution: Compositional semantics in primate calls. Nature, 441, 303.
Bachorowski, J. A., & Owren, M. J. (1995). Vocal expression of emotion: Acoustic properties of speech are associated with emotional intensity and context. Psychological Science, 6, 219–224.
Baldwin, D. (1991). Infants’ contribution to the achievement of joint reference. Child Development, 92, 875–890.
Bickerton, D. (1990). Language and species. Chicago: University of Chicago Press.
Bloom, P., & Markson, L. (1998). Capacities underlying word learning. Trends in Cognitive Science, 2, 67–73.
Buttelmann, D., Carpenter, M., Call, J., & Tomasello, M. (2007). Enculturated apes imitate rationally. Developmental Science, 10, 31–38.
Cheney, D. L., & Seyfarth, R. M. (1990). How monkeys see the world. Chicago: University of Chicago Press.
Cheney, D.L., & Seyfarth, R. M. (1997). Reconciliatory grunts by dominant female baboons influence victims’ behaviour. Animal Behaviour 54, 409–418.
Cheney, D. L., & Seyfarth, R. M. (1998). Why monkeys don’t have language. In: G. Petersen (Ed.), The Tanner lectures on human values (vol 19, pp. 175–219). Salt Lake City: University of Utah Press.
Cheney, D. L., & Seyfarth, R. M. (2005). Constraints and preadaptations in the earliest stages of language evolution. Linguistic Review, 22, 135–159.
Cheney, D. L., & Seyfarth, R. M. (2007). Baboon metaphysics: The evolution of a social mind. Chicago: University of Chicago Press.
Cheney, D. L., Seyfarth, R. M., & Silk, J. B. (1995). The responses of female baboons to anomalous social interactions: Evidence for causal reasoning? Journal of Comparative Psychology, 109, 134–141.
Crockford, C., & Boesch, C. (2003). Context-specific calls in wild chimpanzees (Pan troglodytes verus): Analysis of barks. Animal Behaviour, 66, 115–125.
Crockford, C., Herbinger, L., Vigilant, L., & Boesch, C. (2004). Wild chimpanzees have group-specific calls: A case for vocal learning? Ethology, 110, 221–243.
Darwin, C. (1871/1981) The descent of man, and selection in relation to sex. Princeton: Princeton University Press.
Egnor, S. E. R., & Hauser, M. D. (2004). A paradox in the evolution of primate vocal learning. Trends in Neuroscience, 27, 649–654.
Egnor, S. E. R., Wickelgren, J., & Hauser, M. D. (2007). Tracking silence: Adjusting vocal production to avoid acoustic interference. Journal of Comparative Physiology Series A, 193, 477–483.
Elowson, M., & Snowdon, C. T. (1994). Pygmy marmosets, Cebuella pygmaea, modify vocal structure in response to changed social environment. Animal Behaviour, 47, 1267–1277.
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L., Wiebe, V., Kitano, T., et al. (2002). Molecular evolution of FOXP2: A gene involved in speech and language. Nature, 418, 869–872.
Engh, A. E., Hoffmeier, R. R., Cheney, D. L., & Seyfarth, R. M. (2006). Who, me? Can baboons infer the target of vocalisations? Animal Behaviour, 71, 381–387.
Fischer, J., Cheney, D. L., & Seyfarth, R. M. (2000). Development of infant baboon responses to female graded variants of barks. Proceedings of the Royal Society of London Series B, 267, 2317–2321.
Fisher, C., & Gleitman, L. R. (2002). Language acquisition. In: H. F. Pashler & C. R. Gallistel (Eds.), Handbook of experimental Psychology, vol 3: Learning and motivation (pp. 445–496). New York: Stevens Wiley.
Fischer, J., Hammerschmidt, K., Cheney, D. L., & Seyfarth, R. M. (2002). Acoustic features of male baboon loud calls: Influences of context, age, and individuality. Journal of the Acoustical Society of America, 111, 1465–1474.
Fischer, J., Hammerschmidt, K., Seyfarth, R. M., & Cheney, D. L. (2001a). Acoustic features of female chacma baboon barks. Ethology, 107, 33–54.
Fischer, J., Metz, M., Cheney, D. L., & Seyfarth, R. M. (2001b). Baboon responses to graded bark variants. Animal Behaviour, 61, 925–931.
Fitch, W. T. (2007). The evolution of language: A comparative perspective. In: G. Gaskell (Ed.), Oxford handbook of psycholinguistics. Oxford: Oxford University Press.
Fitch, W. T., & Hauser, M. D. (1995). Vocal production in nonhuman primates: Acoustics, physiology, and functional constraints on “honest” advertisement. American Journal of Primatology, 37, 191–220.
(p.95) Fitch, W. T., Neubauer, J., & Herzel, H. (2002). Calls out of chaos: The adaptive significance of nonlinear phenomena in mammalian vocal production. Animal Behaviour, 63, 407–418.
Flombaum, J. L., & Santos, L. R. (2005). Rhesus monkeys attribute perceptions to others. Current Biology, 15, 447–452.
Gardenfors, P. (2003). How homo became sapiens: On the evolution of thinking. Oxford: Oxford University Press.
Gemba, H., Kyuhou, S., Matsuzaki, R., & Amino, Y. (1999). Cortical field potentials associated with audio-initiated vocalization in monkeys. Neuroscience Letters, 272, 49–52.
Ghazanfar, A. A., & Logothetis, N. K. (2003). Facial expressions linked to monkey calls. Nature, 423, 937–938.
Ghazanfar, A. A., Maier, J. X., Hoffman, K. L., & Logothetis, N. K. (2005). Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience, 25, 5004–5012.
Gil da Costa, R., Braun, A., Lopes, M., Hauser, M. D., Carson, R. E., Herscovitch, P., et al. (2004). Toward an evolutionary perspective on conceptual representation: Species-specific calls activate visual and affective processing systems in the macaque. Proceedings of the National Academy of Sciences, 101, 17516–17521.
Goodall, J. (1986). The chimpanzees of Gombe: Patterns of behavior. Cambridge, MA: Harvard University Press.
Gouzoules, S., Gouzoules, H., & Marler, P. (1984). Rhesus monkey (Macaca mulatta) screams: Representational signaling in the recruitment of agonistic aid. Animal Behaviour, 32, 182–193.
Grice, H. P. (1957). Meaning. Philosophical Review, 66, 377–388.
Gros-Louis, J. (2004). The function of food-associated calls in white-faced capuchin monkeys, Cebus capucinus, from the perspective of the signaler. Animal Behaviour, 67, 431–440.
Hammerschmidt, K., & Fischer, J. (2008). Constraints in primate vocal production. In: U. Griebel & K. Oller (Eds.), The evolution of communicative creativity: From fixed signals to contextual flexibility (pp. 93–119). Cambridge, MA: MIT Press.
Hare, B., Call, J., & Tomasello, M. (2001). Do chimpanzees know what conspecifics know? Animal Behaviour, 61, 139–151.
Hauser, M. D. (1988). How infant vervet monkeys learn to recognize starling alarm calls: The role of experience. Behaviour, 105, 187–201.
Hauser, M. D. (1996). The evolution of communication. Cambridge, MA: MIT Press.
Hauser, M. D., & Wrangham, R. W. (1990). Recognition of predator and competitor calls in nonhuman primates and birds: A preliminary report. Ethology, 86, 116–130.
Herbinger, I. (2003). Inter-group aggression in wild West African chimpanzees (Pan troglodytes verus): Mechanisms and function. Ph.D. dissertation, University of Leipzig.
Herman, L. M., Pack, A. A., & Morrel-Samuels, P. (1993). Representational and conceptual skills of dolphins. In: H. L. Roitblat, L. M. Herman, & P. E. Nachtigall (Eds.), Comparative cognition and neuroscience (pp. 403–442). Hillsdale, NJ: Lawrence Erlbaum Associates.
Janik, V. W., & Slater, P. J. B. (1997). Vocal learning in mammals. Advances in the Study of Behavior, 26, 59–99.
Jurgens, U., & Ploog, D. (1970). Cerebral representation of vocalization in the squirrel monkey. Experimental Brain Research, 10, 532–554.
Kaminski, J., Call, J., & Fischer, J. (2004). Word learning in a domestic dog: Evidence for “fast mapping.” Science, 304, 1682–1683.
Kitchen, D. M., Cheney, D. L., & Seyfarth, R. M. (2003). Female baboons’ responses to male loud calls. Ethology, 109, 401–412.
Lieberman, P. (1991). Uniquely human. Cambridge, MA: Harvard University Press.
Marler, P., Evans, C. S., & Hauser, M. D. (1992). Animal signals: Motivational, referential, or both? In: H. Papousek, U. Jurgens, & M. Papousek (Eds.), Nonverbal vocal communi- cation: Comparative and developmental approaches (pp. 66–86). Cambridge: Cambridge University Press.
McComb, K., & Semple, S. (2005). Coevolution of vocal communication and sociality in primates. Biology Letters, 1, 381–385.
Mitani, J. C., & Brandt, K. L. (1994). Social factors influence the acoustic variability in the long-distance calls of male chimpanzees. Ethology, 96, 233–252.
Mitani, J., & Gros-Louis, J. (1998). Chorusing and call convergence in chimpanzees: Tests of three hypotheses. Behaviour, 135, 1041–1064.
(p.96) Owings, D. H., & Morton, E. S. (1998). Animal vocal communication: A new approach. Cambridge: Cambridge University Press.
Owren, M. J., Dieter, J. A., Seyfarth, R. M., & Cheney, D. L. (1993). Vocalizations of rhesus and Japanese macaques cross-fostered between species show evidence of only limited modification. Developmental Psychobiology, 26, 389–406.
Owren, M. J., & Rendall, D. (1997). An affect-conditioning model of nonhuman primate vocal signaling. In: M. D. Beecher, D. H. Owings, & N. S. Thompson (Eds.), Perspectives in ethology (vol 12, pp. 299–346). New York: Plenum Press.
Owren, M. J., Seyfarth, R. M., & Cheney, D. L. (1997). The acoustic features of vowel-like grunt calls in chacma baboons (Papio cynocephalus ursinus): Implications for production processes and functions. Journal of the Acoustical Society of America, 101, 2951–2963.
Pepperberg, I. M. (1992). Proficient performance of a conjunctive, recursive task by an African gray parrot (Psittacus erithacus). Journal of Comparative Psychology, 106, 295–305.
Pierce, J. (1985). A review of attempts to condition operantly alloprimate vocalizations. Primates, 26, 202–213.
Pinker, S. (1994). The language instinct. New York: William Morrow and Sons.
Ploog, D. (1981). Neurobiology of primate audio-visual behavior. Brain Research Review, 3, 35–61.
Poole, J., Tyack, P., Stoeger-Horwarth, A., & Watwood, S. (2005). Elephants are capable of vocal learning. Nature, 434, 455–456.
Rendall, D. (2003). Acoustic correlates of caller identity and affect intensity in the vowel-like grunt vocalizations of baboons. Journal of the Acoustical Society of America, 113, 3390–3402.
Rendall, D., Seyfarth, R. M., Cheney, D. L., & Owren, M. J. (1999). The meaning and function of grunt variants in baboons. Animal Behaviour, 57, 583–592.
Savage-Rumbaugh, E. S. (1986). Ape language: From conditioned response to symbol. New York: Columbia University Press.
Scherer, K. R. (1989). Vocal correlates of emotion. In: H. Wagner & A. Manstead (Eds.), Handbook of psychophysiology: Emotion and social behavior (pp. 167–195). New York: John Wiley and Sons.
Schön Ybarra, M. (1995). A comparative approach to the nonhuman primate vocal tract: Implications for sound production. In: E. Zimmerman, J. D. Newman, & U. Jurgens (Eds.), Current topics in primate vocal communication (pp. 185–198). New York: Plenum Press.
Schusterman, R. J., & Krieger, K. (1986). Artificial language comprehension and size transposition by a California sea lion (Zalophus californianus). Journal of Comparative Psychology, 100, 348–355.
Schusterman, R. J., Reichmuth Kastak, C., & Kastak, D. (2002). The cognitive sea lions: Meaning and memory in the lab and in nature. In: M. Bekoff, C. Allen, & G. Burghardt (Eds.), The cognitive animal: Empirical and theoretical perspectives on animal cognition (pp. 217–228). Cambridge, MA: MIT Press.
Seyfarth, R. M., & Cheney, D. L. (1986). Vocal development in vervet monkeys. Animal Behaviour, 34, 1640–1658.
Seyfarth, R. M., & Cheney, D. L. (1990). The assessment by vervet monkeys of their own and another species’ alarm calls. Animal Behaviour, 40, 754–764.
Seyfarth, R. M., & Cheney, D. L. (1997). Some general features of vocal development in non-human primates. In: M. Husberger & C. T. Snowdon (Eds.), Social influences on vocal development (pp. 249–273). Cambridge: Cambridge University Press.
Seyfarth, R. M., & Cheney, D. L. (2003). Signalers and receivers in animal communication. Annual Review of Psychology, 54, 145–173.
Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Vervet monkey alarm calls: Semantic communication in a free-ranging primate. Animal Behaviour, 28, 1070–1094.
Silk, J. B., Altmann, J., & Alberts, S. C. (2006a). Social relationships among adult female baboons (Papio cynocephalus). I. Variation in the strength of social bonds. Behavioral Ecology and Sociobiology, 61, 183–195.
Silk, J. B., Altmann, J., & Alberts, S. C. (2006b). Social relationships among adult female baboons (Papio cynocephalus). II: Variation in the quality and stability of social bonds. Behavioral Ecology and Sociobiology, 61, 197–204.
Silk, J. B., Seyfarth, R. M., & Cheney, D. L. (1999). The structure of social relationships among (p.97) female baboons in the Moremi Reserve, Botswana. Behaviour, 136, 679–703.
Slocombe, K., & Zuberbühler, K. (2005). Agonistic screams in wild chimpanzees (Pan troglodytes schweinfurthii) vary as a function of social role. Journal of Comparative Psychology, 119, 67–77.
Slocombe, K., & Zuberbühler, K. (2007). Chimpanzees modify recruitment screams as a function of audience composition. Proceedings of the National Academy of Sciences, 104, 228–233.
Smith, H. J., Newman, J. D., & Symmes, D. (1982). Vocal concomitants of affiliative behavior in squirrel monkeys. In: C. T. Snowdon, C. H. Brown, & M. Petersen (Eds.), Primate communication (pp. 30–49). Cambridge: Cambridge University Press.
Steiper, M. E., Young, N. M., & Sukarna, T. Y. (2004). Genomic data support the hominoid slowdown and an Early Oligocene estimate for the hominoid–cercopithecoid divergence. Proceedings of the National Academy of Sciences, 101, 17021–17026.
Terrace, H. S. (1979). Nim. New York: Knopf.
Terrace, H. S. (1983). Nonhuman intentional systems. Behavioral and Brain Science, 6, 378–379.
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press.
Tomasello, M., & Carpenter, M. (2007). Shared intentionality. Developmental Science, 10, 121–125.
Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral Brain Science, 28, 675–691.
Washburn, S. L. (1982). Language and the fossil record. Anthropology UCLA, 7, 231–238.
Wittig, R. M., Crockford, C., Seyfarth, R. M., & Cheney, D. L. (2007). Vocal alliances in chacma baboons, Papio hamadryas ursinus. Behavioral Ecology and Sociobiology, 61, 899–909.
Zuberbühler, K. (2000). Referential labeling in Diana monkeys. Animal Behaviour, 59, 917–927.
Zuberbühler, K. (2001). Predator-specific alarm calls in Campbell’s guenons. Behavioral Ecology and Sociobiology, 50, 414–422.
Zuberbühler, K. (2002). A syntactic rule in forest monkey communication. Animal Behaviour, 63, 293–299.
(1) . One should, however, treat estimates of the size of a species’ vocal repertoire with caution. Often the best predictors of repertoire size are the length, creativity, and ingenuity with which a species has been studied.