Jump to ContentJump to Main Navigation
Sociobiology of Communication$

Patrizia d'Ettorre and David P. Hughes

Print publication date: 2008

Print ISBN-13: 9780199216840

Published to Oxford Scholarship Online: September 2008

DOI: 10.1093/acprof:oso/9780199216840.001.0001

The evolution of human communication and language

(p.249) CHAPTER 14 The evolution of human communication and language
Sociobiology of Communication

James R. Hurford

Oxford University Press

Abstract and Keywords

Human languages are far more complex than any animal communication system. Furthermore, they are learned, rather than innate, a fact which partially accounts for their great diversity. Human languages are semantically compositional, generating new meaningful combinations as functions of the meanings of their elementary parts (words). This is unlike any known animal communication system (except the limited waggle dance of honeybees). Humans can use language to describe and refer to objects and events in the far distant past and the far distant future, another feature which distinguishes language from animal communication systems. The complexity of languages arises partly from self-organization through cultural transmission over many generations of users. The human willingness altruistically to impart information is also unique.

Keywords:   language complexity, language diversity, compositionality of meaning, double articulation, self-organization, stimulus-freedom, cultural transmission

14.1 Introduction

Human language stands out in a number of ways from the topics of almost all of the other chapters in this book. Although every communication system can claim in some way to be unique, human language is spectacularly unique in its complexity and expressive power.

Complexity is hard to measure, but a clue is given by the fact that The Cambridge Grammar of the English Language (Huddleston & Pullum, 2002), which is just a description of Modern Standard English, weighs in at over 1700 pages. The headings of the first half-dozen descriptive chapters, out of eighteen, are: The verb, The clause: complements, Nouns and noun phrases, Adjectives and adverbs, Prepositions and preposition phrases, and The clause: adjuncts. No non-human communication system demands anything like this degree of detail to describe it. And English is just one of over 6000 human languages, all of comparable complexity.

As for expressive power, this is also hard to measure. We can't see far into the minds of non-human animals to know what exactly they can communicate with each other, but it seems a fair bet that any factual information, and any affective content1 that can be conveyed by an animal communication system can also be conveyed, or at least satisfactorily paraphrased, in any human language. We can, we believe, concisely summarize the information given by honeybee waggle dances, by chaffinch territorial songs, and by vervet monkey alarm calls. We are not yet sure exactly what is conveyed by whale songs, but a reasonable default hypothesis would seem to be that they convey messages of the same expressive power as complex birdsongs. We may be wrong about this, but the current belief is that any human language is capable of communicating the sum total of all that any animal species can communicate, and more. More, because we alone, as far as we know, can tell each other about fictional or abstract objects, and about events far distant in time and space.

In the bulk of this chapter, I will list and discuss some of the most important differences and similarities between human languages and non-human communication systems, with an evolutionary perspective, in particular drawing on results from comparative psychology pertaining to our closest relatives, the non-human primates (see related discussion on language in Chapter 13).

14.2 Diversity

We must first make the vital distinction between Language, the biologically given universal human capacity, and languages, such as English, Swahili, Cantonese, Dyirbal, and Navajo, which are culturally developed systems enabled by the biological capacity. No one speaks Language; Language (with a capital L) is not a language. This contrasts with animal communication systems. True, different chaffinch, and other songbird, dialects exist, but their range is far less than that among human languages.

(p.250) By the usual count, there are over 6000 different human languages. Defining the difference between a language and a dialect is ultimately not possible, but a rough criterion is that different languages are mutually unintelligible, whereas there is some degree of mutual intelligibility among different dialects of the same language. By this criterion, there are in fact several different Chinese languages, of which Mandarin and Cantonese are two, but Norwegian and Swedish actually count as the same language. In the past, many other languages existed, but are now extinct. It seems likely that the peak number of languages spoken by humans occurred some time in the last few millennia, when the earth was as yet sparsely populated by small groups of humans living in relative isolation. Now, languages are being lost, and we are in an age of mass linguistic extinction, with predictions that about half the world's languages will die out in the next century. The great diversity of human languages is made possible by the fact that they are learned, rather than biologically transmitted from generation to generation via the DNA. The fact that languages are learned is not, however, sufficient to account for their great diversity.

The diversity of biological species arises through accumulated genetic copying errors, geographical isolation, and selective adaptation to new niches. Copying errors in learning and geographical isolation are also responsible for the great diversity of human languages. As early humans spread out over the planet, their group languages accumulated changes which were not constrained by any need to communicate with the groups they had left behind, and these languages struck out on their own. But adaptation to new niches is not a factor affecting the diversity of languages, aside from the relatively simple matter of vocabulary—languages of African pygmies have no unborrowed word for snow. In matters of grammatical structure and structure of their sound systems, there is no correlation between languages and the physical environments of their speakers.

A factor permitting the diversification of languages is the ‘arbitrariness of the sign'1. A rose by any other name would smell as sweet. So long as people tacitly agree to use the same words for the same things, a language works. In grammar, as long as languages put the words in the same order to convey who did what to whom, and use the same conventional inflections to convey such details as the timing of the event reported and the speaker's attitude to it, the language works. So languages are fairly free to evolve different grammars and sound systems within the limits imposed by the communicative needs of the group.

A contentious issue within linguistics is the degree to which language learning, and therefore linguistic diversity, is further constrained by biology. One can certainly imagine crazy languages that would be impossible to learn. A frequently given example is a language which expressed questions by completely inverting the word order of the corresponding statements. In such a language, you would ask the way to the station by saying ‘Station the to way the me tell can you?’. The strain on short-term memory in computing how to express questions in such a language rules it out as a possible natural language. The contention within linguistics arises because there may be some such constraints on language learnability which are not attributable to non-linguistic factors such as short-term memory, but operate only in the specific domain of language. Here is a candidate for such a Language-specific constraint. Beware—like all such examples, it involves ‘thinking the impossible', something that linguists are skilled at. Consider the following pair of sentences:

The man built the house. The house fell down.

We can make a single sentence, expressing the same information, thus:

The house that the man built fell down.

So far, so good. The compressed sentence was formed by making a relative clause (underlined) out of the first sentence, and attaching it to the shared noun phrase, the house. The original right-hand sentence The house fell down is wrapped around the outside of this underlined relative clause. Now let's try it again, with this last complex sentence as one of the inputs to the process:

The house that the man built fell down. The man escaped.

Here again, there is a shared noun phrase, the man. So in principle, it ought to be possible to use the (p.251) same relative clause-forming process. If we do, we get:

*The man that the house that built fell down escaped.

This is an impossible sentence, as indicated by the linguist's conventional asterisk. And generally, across languages, we find that sentences like this, and their analogues, adjusting for the differences between languages, such as word order, are also not well-formed. The interesting question is ‘Why?? Is it due to a constraint specific to Language, a putative ‘Law of Language', that sentences such as this do not occur in languages? Or is this fact due to a more general constraint on processing any kind of serial information, linguistic or otherwise? Both opinions are held in the field, probably with a swing under way to the general non-domain-specific explanation. The original discoverer of this family of constraints, known as ‘Syntactic Island Constraints' was J. R. Ross (1967, 1986), in the early heyday of the generative linguistics1 movement, whose goal was, in part, to discover facts peculiar to the human Language faculty. The alternative view that such constraints arise from general constraints on learning any sequential behaviour has been argued by Morten Christiansen, among others (Christiansen et al. 2002; Christiansen & Ellefson 2002). Note that there are dozens of similar examples, a fact which underlines the great complexity of human languages, as compared with animal communication systems, where considerations of such abstractness and complexity do not arise.

Linguistic inheritance is both vertical, as when children more or less faithfully acquire the languages of their parents, and horizontal, as when languages mix and borrow each other's words and constructions. Branching family tree diagrams are still popular in historical linguistics. But such genealogies are misleading. According to a common classification, English is a Germanic language (along with German, Dutch, Icelandic, Danish, and others), while French is a Romance language (along with Romanian, Portuguese, Italian, Spanish, and others). But such always-diverging, never-converging tree diagrams distort the extent to which languages have influenced each other across language family boundaries. English and French share a lot of similar vocabulary and grammar, the result of the Norman Conquest of England in 1066 and centuries of contact ever since. Many languages are of such hybrid types, owing their structure to multiple sources. The possibility of such hybridization adds to the overall diversity of languages.

Summarizing the factors contributing to linguistic diversity: (1) the fact that languages are learned, rather than coded into the genes; (2) the arbitrariness of the sign; and (3) the prevalence of horizontal transmission allow for great diversity, but this is significantly constrained by (4) biological factors such as memory and processing limitations, which may or may not be specific to the Language domain.

14.3 Learning

Human languages are learned. Many animals are also adept at learning. But they can't learn human languages. Kanzi has learned about 500 words, but this seems to be close to his limit. And Kanzi is a highly human-enculturated bonobo. If animals can learn, why can't they learn languages? What is different about languages? A major factor is the arbitrariness of the Sign. A chimpanzee can learn to use a tool to reach a banana. In this case the function of the tool is transparently mechanical. Physical laws govern the interaction of the tool and the banana. A word, it can be argued, is also a kind of tool. If I want a banana, just saying ‘banana' may be enough, if my hearer is cooperative, to get me the banana. But the word banana has no physically causal relation to the outcome. In this sense, words are magic; just using words, if the hearers are cooperative, makes physical things happen. Apes have a good understanding of physical cause and effect with everyday objects, and can learn practical tasks. But the arbitrary nature of human symbols is a far greater challenge to learning, because it's not obvious how they work.

To begin to understand, as human infants do, that the noises made by conspecifics carry some informative message, there needs to be, in the child, a presumption of their relevance to its life (Sperber & Wilson,1986). This is the idea that uttering a sentence such as It's late not only conveys the information that something is late, but also (p.252) that the speaker intends the hearer to know that something is late. So an infant on hearing an utterance in a human language knows that the speaker intends the hearer to know something. Count the instances of intentional verbs1 in this last sentence (knows, intends, know), and we see three embedded levels of intentionality 1 (1975) provocative title for his book was Learning how to mean. Taken literally, this might suggest a tabula rasa1 in the child, in which even an understanding that utterances mean something has to be learnt. What we see in humans, as opposed to non-humans, is a developmental process whereby this understanding emerges well within the first year of life. It is developmentally programmed into normal early human maturation, in normal circumstances, as opposed to being strictly learned. The human child is predisposed to understand that utterances mean something. Play, babbling, and imitation are aids to achieving this understanding.

A useful distinction has been made between learning by emulation2 and learning by imitation. Emulation involves achieving the same goal as was observed, but not necessarily by the same means. For example, if a chimpanzee sees me push a door open with my foot, it may learn to push the door open with its hand; this is emulation. But if the chimp slavishly follows my actual method of opening the door, using its foot, that would be learning by imitation. Whiten et al. (2004), in a survey of ape learning, conclude that there is more emulation than pure imitation in apes, and both kinds of copying occur much more readily when the demonstrator is a human trainer than spontaneously among apes themselves. None of the work surveyed, however, involves the copying of clearly communicative behaviours. In some sense, emulation is more intelligent; it gets the job done. Human children are natural imitators. They imitate for no apparent reason, as shown by Meltzoff's (1988) well-known experiment, in which very young babies imitated the facial gestures of adults. Non-human apes are very poor at vocal imitation, but human children are expert at it. The babbling stage in babies is like play among animals, in that it seems to have no immediate purpose. Play-fighting in young animals is plausibly accounted for in evolutionary terms in that it refines motor skills which will be useful later in life. Likewise, both babbling and vocal imitation are practice for use of language later in life. Human babies, unlike other apes, have a natural disposition to engage in these activities, whose payoff only materializes long afterwards.

Children's natural dispositions to imitate and to participate in group activities are signs of a kind of sociability special to humans, a topic to which we will return later in this essay.

14.4 Complexity

We have already touched on the great complexity of languages, compared to any non-human communication system. How does this complexity relate to biology? In linguistic theorizing, two opposing tendencies are felt. On the one hand, people are struck by the universality of such complex facts as the Island Constraints mentioned above, and a host of other universal tendencies involving intricate facts about the reference of pronouns, the scope of quantificational words such as all and each, the varying semantic effects of verbs such as promise and persuade and adjectives such as easy and eager3 and so forth. Children learn such abstract facts with no overt tuition and from examples which by no means logically determine the conclusions the children come to. The children's more or less faithful learning is evidenced by their later usage in general conformity with other members of the community. This led linguists in the 1970s and 1980s to hypothesize a rich innate structure guiding the acquisition process, consisting of a set of distinct innate modules such as the X-Bar module, the Binding module, the Theta module, and the Case module (Chomsky, 1986). It is not necessary here to explain the purported content of these modules; note that each is a set of propositions determining some independent, but interlocking, aspect of the linguistic knowledge that the child (p.253) will ultimately acquire. Thus syntactic theory, at this stage, responded to the complexity of languages by postulating a complex biological endowment specifically devoted to Language.

On the other hand, of course, there was always Occam's Razor, the normal scientific pressure to adopt theories which are as simple as possible. The postulated richness of the innate language acquisition mechanism was a biological embarrassment, as each of these modules presumably had to be coded somehow into the genome. Further, their interacting4 nature in modern languages made it necessary, but difficult, to imagine stages in the evolution of the modern Language faculty when some of these modules were present and others had not yet emerged. Certainly it is possible to imagine such undeveloped versions of the modern Language faculty, but it adds to the strain on credibility of the whole story, in an already speculative field. This kind of gradual evolution of the Language faculty was proposed by Pinker & Bloom (1990), in a landmark article arguing the proposition that the most obvious explanation for the complexity of natural language is that it evolved by Darwinian natural selection. To many, this had seemed obvious, but it is a sign of the intellectual climate within the dominant paradigm in linguistics in the late 20th century that it needed arguing at all.

The simpler a theory of the innate Language faculty could be made to look, the more it appealed to biologists. Quite recently, a theoretical move has been made toward an extremely simple account of the human language faculty; this is known as the Minimalist Program1 (Chomsky, 1995), proposing that the language faculty is nothing more than a facility to recursively merge lexical structures (precisely specified dictionary entries) to form larger structures such as phrases and sentences. It is stressed that this is a ‘program' rather than a ‘theory', and its empirical and predictive delivery remains also minimal; one is reminded of String Theory in physics.

Linguists have become persuaded in the last decade or so that while human language is clearly spectacularly unique among animal communications systems its biological foundations rest in a combination of many factors, many of which are not specific to Language. Enhanced memory for large numbers of arbitrary associations is one such factor. Humans typically have vocabularies of about 50,000 items. Kanzi, the best-performing non-human primate in this regard, has mastered about one-hundredth of that number5, and is not expected to learn significantly more. It was always recognized that the lexical component of a language necessitated rote-learning, and this made it the least theoretically interesting component of a language. A recent movement in the theory of grammar, known as Construction Grammar1,6, suggests that there is in fact no principled distinction between lexical items and grammatical constructions. Grammatical constructions are like lexical items with variables in them, permitting the insertion of a more or less wide range of permitted other constituent items. An example of such a construction in English is The + COMPARATIVE + CLAUSE + the + COMPARATIVE + CLAUSE, as for instance in ‘The more you eat, the fatter you get or The bigger they are, the harder they fall’. This theory places less emphasis on economy of statement in the grammar, and recognizes that the representation of language in the brain may be somewhat redundant, and uneconomical, taking advantage of humans' undoubtedly great powers of memorization.

The human Language faculty, rather than being a richly structured independent module of the mind, is a mosaic of many factors which have come together in a unique combination in humans. Many of these factors can be observed, often in a less powerful form, in other animals. The recognition that this is so is seen in a distinction made by Hauser et al. (2002) between the ‘faculty of language in the broad sense' (FLB1) and the ‘faculty of language in the narrow sense' (FLN1). FLN is whatever is unique to human language; Hauser et al. (2002) suggest that this may, at most, be limited to the human capacity for recursion, the execution of a computation of a certain type during the execution (p.254) of a similar computation at a higher level. For example, to grasp what John's father's brother's neighbour's cat refers to, you have to identify the referent of the subpart John's father's brother's neighbour, and to understand that, you have to identify the referent of John's father's brother, and so on. Hauser et al. leave it open whether such a capacity for recursion can be found in any non-human animals. If it can be, then the human faculty for language in the narrow sense is, in their view, actually empty, leaving us with a picture of FLB as a mosaic of factors all of which can be found in some form or other outside of the domain of human Language. One candidate for recursion in animals is navigation; figuring out how to get from A to B might involve recursively embedded processes. The technical definition of recursion, and how to recognize whether it is in play in a specific animal activity, is not, however, satisfactorily pinned down, and there is room for argument about the use of recursion in animal activities. The radical view that human Language may have no unique individual properties is controversial, and I will review below other candidates for a categorical difference between human Language and animal communication systems.

14.5 Compositionality

The example of how we parse a recursive structure such as John's father's brother's neighbour's cat highlights another feature of human language that is not found in any animal communication system that we know of (with one odd exception). The Principle of Compositionality1 states that ‘The meaning of the whole is a function of the meanings of the parts, and the way they are structured together'. You understand the meaning of a whole sentence because you know the meanings of the individual words, and you know the contribution the grammar makes to this understanding. This is how you know that Mary kissed Bill means something different from Bill kissed Hillary. While many complex animal calls are combinatorial, that is, they are made up of several reusable subunits strung together, none is compositional in this sense. The songs of gibbons are sequences of units which occur in other contexts, and can therefore be identified as independent subunits, but there is no sense in which the meaning of the whole gibbon song is understood as any combination of the meanings of these subunits. In this sense, the term ‘song' is appropriate, as such complex animal calls are more like music than human language, which expresses semantic content through the application of compositionality.

The ‘odd exception' mentioned above is the waggle dance of honeybees, which has two meaningful components, the speed of the dance and its orientation on the honeycomb. The speed conveys the approximate distance of food from the hive, and the orientation conveys the angle, relative to the sun's position, at which the food is to be found. Both distance and angle are necessary to specify the food's location, and the meanings of the two aspects of the dance combine to compose this information. The dance behaviour of honeybees is, however, completely specified in their genes. The example shows how utterly different mechanisms may achieve a communicative effect. The human mechanism of learning languages is clearly more versatile in allowing the possibility of conveying a wide range of different messages, in adaptation to a complex and changing world.

The nearest to an example of compositionality in the communication of an animal closely related to humans is described by Klaus Zuberbühler (2002, 2005). As any kind of precursor to the compositionality in human language, the example is problematic, as it involves the responses of one species, Diana monkeys, to the alarm calls of another species, Campbell's monkeys. Campbell's monkeys have different unitary alarm calls for leopards and eagles. Diana monkeys interpret these appropriately. Occasionally a Campbell's monkey utters a ‘boom' about thirty seconds before such an alarm call, and the Diana monkeys then react with less panic than to ‘boom-less' alarm calls: ‘…adding “booms” before the alarm call series of a Campbell's monkey created a structurally more complex utterance with a different meaning than that of alarm calls alone’ (Zuberbühler 2005:279). Zuberbühler himself is frank about the limitations of non-human primates: ‘there is no evidence that they are able to invent and incorporate new call types into their repertoires or to combine calls creatively to produce novel meanings’ (Zuberbühler 2005:281)

(p.255) 14.6 Double articulation

The contrast between songs consisting of identifiable subunits and truly compositional signals such as human sentences brings out another unique characteristic of human languages, their so-called ‘double articulation'1,7 at the phonological level, the expressions of human languages are combinatorial but not compositional. That is to say that the signals consist of systematically reusable subunits which themselves carry no meaning. The separate phonemes 1 making up a word have no meaning. The word cat consists of three phonemes {/k/ + /a/ + /t/}, but the meaning of the word is not derived from the meanings of the phonemes, because they have no meanings. At the morphosyntactic level, the word cat does have a meaning, which contributes, for example, to the overall meaning of a sentence such as The cat sat on the mat. So languages are structured in two layers, a semantically compositional morphosyntactic layer, and a phonological layer which is merely combinatorial. All human languages have this property.

Double articulation clearly contributes to the massive expressive power of human languages. New meaningful words can be invented by simply combining phonemes from a given set. The phoneme inventories of languages vary in size from a mere dozen to over a hundred. Obviously, languages with fewer phonemes at their disposal tend to have longer words. The combinatorial power afforded by a phonological layer of structure provides languages with their vocabularies of tens of thousands of meaningful words.

The evolution of communication systems with this feature of double articulation is thought-provoking. It clearly has a biological aspect. Humans must have the facility for combining elementary sounds from a small fixed set in highly flexible and productive ways. But the fixed sets of phonemes vary widely from language to language, so these are not biologically fixed, although the articulatory apparatus within which they are defined is a matter of biological endowment. Recent work by Zuidema & de Boer (in press), using computer modelling, shows a process of self-organization at work. They start with utterances which are not recognizably made up of discrete elements, but are merely random walks (‘trajectories') through articulatory space. To imagine such a trajectory, try to make some ‘inarticulate' vocal sound, moving your jaw, lips, and tongue around while sporadically vibrating your vocal cords, and altering the airflow through nose and mouth; avoid visiting known phonemes of your language. By a process of imitative transmission through a population of agents, with copying errors, and a requirement that the separate trajectories should not collapse with each other, the set of such trajectories through articulatory space gradually settled down to a more systematically organized set. In this evolved system, the same starting and end points were used by many different trajectories.

This can be visualized as follows. Imagine a square with random scribbles on it. The only constraint is that each scribble is a single continuous line. Agents are required to copy these lines and pass their copies on to a successive generation, keeping the same overall number of lines. What happens, over time, is that a more systematically organized set of lines emerges, which start and end at the various corners of the square. At the beginning of the simulation, the corners of the square had no special status in the formation of the lines/scribbles. At the end of the simulation, there is convergence on a system of lines which reuse a small set of starting and end points, and move economically between them See Figure. 14.1. This suggests that even though humans are biologically capable of making ‘inarticulate' vocal gestures, and of attaching some meaning to them, what happens over time in the continuous trade of such gestures is a self-organizing process by which ‘articulate', jointed speech, reusing a small set of focal points, emerges.

                   The evolution of human communication and language

Figure 14.1 Self-organization in articulatory space. The left-hand box contains five randomly scribbled lines, schematically representing random gestural trajectories in articulatory space. The right-hand box shows five trajectories approximately optimized for simplicity and distinctiveness from each other. After Zuidema and de Boer (in press).

14.7 Self-organization

The example of the emergence of combinatorial phonology introduces what may be a potent and pervasive force in the evolution of languages, in their grammars as well as in their sound systems. The investigation of such self-organizing processes (p.256) in the context of language evolution is relatively new. It is a distinct process from natural selection, but entirely compatible with it. In an early pioneering work on self-organization, Thompson (1961) tended to depict self-organization (‘laws of growth and form') and natural selection as mutually exclusive alternatives. More recently, Oudeyer (2006) gives a clear discussion of the relationship between natural selection and self-organization. Self-organization can affect both organic phenomena (e.g. snail shells) and non-organic phenomena (e.g. snow crystals). Self-organization narrows the search space of possibilities from which natural selection selects.

In the evolution of language, the most promising cases of self-organization arise through the interaction of users of a language over historical time. The self-organized object which emerges is not a physical object like a snail shell or a snow crystal, but the language itself, an abstraction over the common behaviours of the speakers of the language. However, a physical, non-linguistic example may help. Consider an informal well-worn footpath diagonally crossing a field. The path was not deliberately designed by any one person, but is simply the end product of hundreds of people taking the shortest route across the field. In the case of language, repeated usage over generations, with idealized copying of the observed patterns by new learners, results in features of language which were not the invention of any one person, and further, were not closely dictated by the genes. In other words, each individual involved in the process could have behaved in a variety of ways, as far as any direct pressure from the genes is concerned. But the accumulation of hundreds of tiny unconscious facultative acts led to the language concerned being the way it is.

Here is an example. A salient feature of very many languages is a correlation between frequency and irregularity. For example, in English, the most common verbs (be, have, do, make, go, etc.) are all irregular. This correlation between frequency and irregularity in languages comes about through the repeated action of several processes. One process is the tendency to slur or phonetically erode high-frequency words. This erosion creates irregularities. It is well known that children are somewhat resistant to irregularities, tending naturally to regularize even irregular verbs. Thus children learning English go through a stage where they use *goed instead of went and *comed instead of came. In the case of the most frequent irregular verbs, however, the irregular usage in the environment overwhelms the child's natural disposition to regularize, with the result that irregular forms persist in the language, just in the more frequent forms. For less frequent forms, the child is not presented with enough evidence to overrule its natural tendency to regularize, and less frequent forms are mostly regular. This process has been modelled computationally by Kirby (2001).

Such self-organizational processes have been dubbed ‘phenomena of the third kind' by Keller (1994). In his taxonomy, phenomena of the first kind are natural phenomena, like oceans and volcanoes; phenomena of the second kind are human artefacts, deliberately made, like telephones and the written constitutions of nations; phenomena of the third kind are the outcome of human action, but not deliberately made by any single, or even collective, conscious decision. Keller also invokes Adam Smith's ‘Invisible Hand' (Smith, 1786), paraphrased in modern terminology as ‘market forces'. Keller argues that much of the evolution of language should be seen as an Invisible Hand, i.e. self-organizing, process. The self-organization of a communication system along the lines illustrated here can only happen in a relatively complex learned system, such as humans have. With such (p.257) limited systems as the mostly innate 3-way alarm calls of vervet monkeys, there is far less scope, if any, for the accumulation of tiny facultative actions determining the historical course of the system.

14.7 Stimulus-freedom

There is a considerable difference of degree between humans and non-humans in the extent to which their mental processes are immediate reactions to their environment. Humans can recall, and muse about, specific events from long in the past, and can plan complex series of actions far into the future. One can find the tiny seeds of stimulus-freedom in animals closely related to humans. In object-displacement experiments, for example, a desirable object is hidden from an animal's view, but the animal still seems to know it is there, and searches for it. Thus the animal has a mental representation of an object not currently perceived. Dogs are good at this. Panzee, a chimpanzee, could remember after a night's sleep where food had been hidden the day before (Menzel 2005). It is often claimed (Tulving 1972, 2005; Suddendorf & Corballis 1997, 2007) that only humans have episodic memory, a recall of specific events, as opposed to non-time-indexed knowledge of some state of affairs (which may result from having observed some event in the past). Experimentally sorting out the difference between recall of events and knowledge of resulting facts is problematic. In recognition of this difficulty, experimenters have attributed ‘episodic-like' memory to animals such as scrub jays, which show evidence of remembering what food they hid, and where, and how long ago (Clayton & Dickinson 1998; Clayton et al. 2001).

The evidence from animals who show some slight signs of episodic memory means that such memory is not absolutely dependent on the prior evolution of language. Certainly in humans, episodic memories are aided by public language. There was presumably some co-evolution of the faculty of Language and a capacity for episodic memory. In humans, the earliest memories of specific lifetime events are typically from roughly around 2 years of age, when syntactic language begins to develop. This suggests some interdependence between episodic memory and language.

Animals can plan future actions, to some degree. Mulcahy & Call (2006) report on experiments in which bonobos and orang-utans collected and hoarded appropriate tools for tasks as far ahead as 14 hours before the task was carried out. They comment that ‘These findings suggest that the precursor skills for planning for the future evolved in great apes before 14 million years ago, when all extant great ape species shared a common ancestor’ (p.1038).

There is a symmetrical relationship between planning and memory, to the extent that planning has sometimes been classified as ‘prospective memory’ (Meacham & Singer, 1977). In one experiment (Cook et al. 1983), rats searching a 12-arm radial maze for food were taken out while still searching and replaced in the maze later. They showed similar accuracy of recall in relation to (1) number of arms already previously searched and (2) number of arms not yet searched and therefore remaining to searc. This indicates an overlap of the mechanisms of retrospective and prospective memory. Such memories are stored in the animal's brain and are not dependent on its current perceptions. Humans, however, have much longer retrospective and prospective memories than non-humans. Suddendorf & Corballis (1997) write of the ‘unconstrained mental time travel’ of humans.

Humans can think about absent things. This stimulus-freedom of human mental processes is reflected, naturally, in our communication systems. We can talk about absent things, and in fact this is the norm for human communication. We constantly bring to mind distant events or possible future events, and talk about them. The structure of modern languages makes this possible, but this is probably a case where language structure has evolved to meet the need to express such ‘time-travelling' thought, rather than the structure of language actually enabling such time-travelling thoughts in the first place. A simple story, probably partly correct, is: first the private thought capacity, then a communication system adapted to make the private thoughts public.

The relation between language and thought is a hot philosophical issue. Most comparative psychologists, and a growing number of philosophers, are willing to concede some thoughts and concepts to (p.258) non-humans. But clearly there are thoughts that can only be attained with the help of language. Examples are the concepts expressed by words and phrases such as Tuesday, unicorn, ninety-three, zero, generosity and legal. Examples such as these rely on verbal definitions made possible by the productive generative capacity of languages. Given compositionality (as discussed above), it is possible to arrive deductively at meanings not previously entertained by the mind. For example, given the concepts expressed by white, horse, single, horn and forehead, compositionality allows one to deduce what the expression white horse with a single horn in its forehead should mean, even though we are never likely to experience such a beast. Presumably such thoughts are permanently denied to non-humans.

Once fictions can be expressed, and thus shared between people, they can become potent cultural forces, defining group identity. A commitment to the proposition that Jesus Christ is the son of God is what centrally divides devout Christians from devout Muslims. Thus, beside the obvious practicality of generative language, for transmitting real-world information, enabling us to build space-ships that reach the Moon, generative language provides for the construction and sharing of rich structures not corresponding to any perceptible reality, defining complex cultures.

14.8 Interpersonal function

The vast potential of languages for describing the real world, and fictitious worlds, in detail, should not lead us to ignore the fact that making descriptive statements about a world must have a social purpose. Austin (1962) wrote of the ‘descriptive fallacy', the idea that the point of language is to describe a world. He famously stressed that when we use any language at all, we are thereby doing something, carrying out some social act. Much animal communication carries this purely social force, and it is inappropriate to paraphrase such signals in declarative terms. For example, a threat signal is just doing the threat, or it just is the threat. Translating an animal threat signal into a human declarative sentence, such as If you don't back off, I will attack you may be useful for our purposes, but there is no reason to suppose that any such complex thought goes through the mind of the threatener or the threatened animal. A tiny number of human utterances have only this bare ‘illocutionary' force. For example, Hello just does greeting; it has no declarative content, and doesn't describe any state of affairs. The vast majority of human utterances have some social purpose, an intended impact on a hearer8, in addition to whatever descriptive content they may have. For example, It's raining, in addition to describing the current weather, will always in normal circumstances be said with some intended effect on a hearer, such as to warn them to put on a raincoat, or to prove that your prediction was right, or to jokingly complain about the local climate. As this dyadic ‘doing things to each other' function is basic to both animal and human communication, it seems likely that this is a remote evolutionary foundation of human language, and that the vast referential, descriptive, triadic power of language came later. The set of social acts which can be carried out using language exceeds the range of things that non-human animals can do to each other with symbolic signals, such as threat or submission gestures. All of the acts that can be carried out by non-humans can also be carried out using words (see related discussion in Chapter 1). Thus a threat can be made by purely non-verbal means, e.g. by shaking ones fist in a person's face; and it can also be carried out in the calmest of ways with words, with little emotion, by saying If you move, ‘ll shoot’. Social acts between animals are mostly dyadic, only involving the sender and the receiver of the signal (see detailed discussion in Chapter 3). Humans can overlay their social acts with descriptive content, as in the previous example, which refers to actions such as moving and shooting. This vastly increases the subtlety and fine-grained detail of things that humans can do to each other, using language.

Some things that humans can do to each other, using language, can only be done with language, or (p.259) at least in a language-defined context. Thus, promising, for example, requires some understanding of what is promised, which can only be expressed with words. True, I can effectively promise something merely by nodding, but in such a case what I am thereby committed to has previously been spelled out in language. Another class of uniquely human communicative acts is those where a social fact or convention is made to exist solely by using an appropriate verbal formula, as in examples like I name this ship the Mary Rose or I hereby declare you man and wife, or Ego te absolvo. Such acts are, of course, impossible in the non-human world.

14.9 Mind-reading, manipulation, and cooperation

Encounters between animals can be either adversarial or mutually beneficial. In both cases, it is advantageous to an animal to be able to predict and influence the actions of the other. Predicting events can involve various degrees of intentionality. Predicting that a falling rock will land near you involves no understanding of the mental processes of another organism. Predicting that a lion skulking nearby will chase you may, or may not, involve attributing some attitude to the lion. A zebra may simply have a built-in avoidance response to nearby skulking lions, just as some people may possibly have built-in, or epigenetically easily triggered, arachnophobia. But some ability to ‘mind-read' accurately the intentions of competitors, predators and prey would clearly be advantageous to any animal (Krebs & Dawkins 1984).

Experiments with chimpanzees show that they can tell whether a human experimenter is teasing them or merely being clumsy (Call et al. 2004), thus demonstrating a degree of mind-reading. There are also many reports of tactical deception among primates, and Byrne & Corp (2004) also found a correlation between neocortex size and rate of tactical deception. Thus one thing bigger brains is good for is deception, which involves both prediction of the likely actions of another organism and deliberate manipulation to influence them. Hare et al. (2000) showed experimentally that Chimpanzees know what conspecifics do and do not see (their title). In this experiment, a subordinate chimpanzee could see two food items, and was also in a position to see that a dominant chimpanzee could only see one of these food items. When both animals were released from their positions, the subordinate chimpanzee reliably went for the food item that had been invisible to the dominant.

All work of this kind is centered around the complex question of the extent to which non-humans have a ‘Theory of Mind’, the ability to know that another organism is just like, and therefore thinks like, oneself. Note that there are two main components here: (1) the obvious one, just stated, and (2) what ‘oneself' is like. Informal character attributions among people show a tendency to project ones own vices and virtues onto other people. Thus a generous person will tend to assume that other people are generous; and a miserly person will tend to assume that other people are also miserly. Crucially, a naturally uncooperative animal will not be able to read cooperative intentions in another animal, although it may well be able to read competitive intentions in another.

There is experimental evidence that chimpanzees can read competitive intentions in human experimenters but not generous cooperative intentions, when the stimuli presented to the animal are very similar. In one condition, a human made a reaching gesture, with hand spread for grasping, toward a container; in this condition, the observing chimpanzee anticipated the human's reach and got to the container first. In another condition, the gesture was very similar, but with fingers together in a whole-hand pointing gesture, indicating the container. The chimpanzee subject did not ‘get the point' of this cooperative pointing gesture. A natural interpretation is that chimpanzees can read the intentions of others, but they do not expect those intentions to be cooperative. Thus, a certain category of others' intentions (the cooperative intentions) remains obscure to them.

14.10 Reference

Communicative acts in the animal world are mostly dyadic, not involving any third entity besides the sender and receiver of the signal. A widespread exception is alarm calls. The alarm calls of vervet monkeys (Cheney & Seyfarth 1990) are especially (p.260) well known, but many other species of birds and mammals also have ritualized alarm calls for specific classes of predators, typically with separate signals for aerial and terrestrial predators. Alarm calls are triadic because they involve the sender, the receiver, and the referent of the call. Triadic communication is about something, whereas dyadic communication is not. Animal alarm calls are largely genetically determined, in both production and reception, with very little room for voluntary control. In young vervets there is some learning of the specific class of aerial objects for whom it is appropriate to make the eagle alarm call. And there is also an audience effect, with mothers being more likely to make an alarm call when their own offspring are nearby. Since both the stimulus-to-call behaviour and the call-to-response evasive behaviour (e.g. climbing a tree when hearing the leopard call) are strongly specified in the animal's genes, the question arises whether the animals are ‘referring', in anything like a human sense, to the predator. It could be the case, for instance, that natural selection has acted in parallel to favour two independent but mutually adaptive behaviours: (1) Bark when seeing a leopard and (2) Climb a tree when hearing a bark. If this were the case, there would be no human-like sense in which the animal's alarm call means, or brings to mind, the appropriate class of predators.

Klaus Zuberbühler has described experiments which can be naturally interpreted to suggest that animal alarm calls do in fact bring the concept of the appropriate predator to mind, at least for a short period. Zuberbühler et al. (1999) worked with Diana monkeys of the African forest who have distinct calls for leopards and eagles. Female monkeys both give spontaneous alarm calls on sensing a predator and respond to alarm calls from males by repeating the call. Beside recording the alarm calls, the researchers also recorded characteristic noises associated with the two predators, such as the growl of a leopard and an eagle's shriek. Next, they played back three different kinds of pairs of stimuli, where the stimuli in each pair were separated by an interval of five minutes' silence. On hearing first an eagle alarm call, then (after five minutes) the shriek of an eagle, female monkeys showed less sign of alarm (giving fewer repeat calls) than after hearing, for example, an eagle alarm call followed by the growl of a leopard. The logic is this. If, on hearing an eagle alarm call you are prepared to be wary of an eagle in the area, you are less disturbed by hearing the actual eagle shriek. The eagle shriek merely confirms what the earlier alarm call told you. But if you hear an eagle alarm and then hear a leopard growl, the growl is new information, telling you about a kind of predator that you hadn't been made aware of by the previous call. The researchers did, of course, try out all the necessary control conditions to consolidate this conclusion. The conclusion is that the alarm calls do not merely trigger the relevant evasive action, with no representation of the specific source of danger being kept in the head; the Diana monkeys, on hearing a leopard alarm call, keep the idea of a leopard in their minds for at least five minutes; and likewise with the eagle alarm call. This behaviour meets the criteria set by Marler et al. (1992) for calls being ‘functionally referential'. It seems likely that similar results would be obtained with all species with small inventories of predator-specific alarm calls.

The term reference is used by animal researchers with less care than by most linguists. There are two senses that need to be distinguished. In the discussion above of alarm calls, the question is whether some class of calls, such as a vervet's bark, has a referential meaning in roughly the same way that the English word leopard has. Of course translation even from one human language to another is seldom, if ever, perfect, so we should not expect to have a perfect English translation of what the vervet's bark means. But the idea is that the vervets have a (very limited) code, shared by the whole community, according to which ‘bark’ means what we humans roughly translate as leopard. When a vervet hears the bark, it brings a certain concept to mind. Certainly, the vervets' concepts are only protoversions of ours, because they cannot expound on the nature of leopards, and presumably do not ever mutely reflect dispassionately on the nature of leopards. But nevertheless, we may see in such alarm calls a skeletal version of our own shared codes (vocabularies), by which reference to classes of objects and actions is conventionalized to arbitrarily related signals. Putting it anthropomorphically, a vervet's (p.261) bark denotes something like the class of leopards. Such denotation 1 is one sense of the term reference, and in this sense it is the signals themselves, the proto-words, that refer.

The other sense in which reference is used does not involve a shared code, conventionally mapping a set of signals onto corresponding classes of objects and actions. In this sense, it is the individual users themselves who do the referring. Here is an example. Third person personal pronouns, such as he, she, it, this, and that can be used with variable reference; what they point out on any occasion depends on the circumstances at the time. If I write here That's good, you, as a reader remote in time and space, don't know what I am referring to, and nothing in the word that gives you any clue about what class of objects the referent of that might belong to. Words such as these are known to linguists as deictic, or pointing, words. The word is used in a particular context to draw attention to some particular thing, and on other occasions to draw attention to totally different things. All that is conventionalized about such words is that they are used for pointing to things in the context of the current discourse. What exactly they point to is left to the pragmatic inference of the observer. Pointing with the index finger (or in some cultures the lower lip) is a non-linguistic analogue of the use of deictic words. What precursors can be found in non-human behaviour for this kind of pointing behaviour, whereby an animal draws the attention of another animal to some specific object in their shared context? In the wild, none; in captivity, some, but only with their human keepers. In thirty years of observing chimpanzees in the wild, Jane Goodall (1986) never observed a chimpanzee point to an object with a view to drawing the attention of some other chimpanzee to it. Primates in the wild just don't point to things. In this sense, they don't refer to specific things. This kind of reference is totally absent from non-human communication in the wild. In captivity, chimpanzees and other primates learn to communicate their needs to human keepers by pointing, but the circumstances are limited to the fulfilment of current desires, as when a chimpanzee points to some food item that he wants to be given. Even in captivity, apes do not point to things just to share information about some interesting property they may have.

The absence of pointing in primate communication in the wild highlights the absence of a human level of cooperation.

14.11 Conclusion

There is indeed a wide gap between human language and non-human communication, in the various ways I have surveyed here. The difference cannot be attributed to any single factor. Apes are different from us in many qualitative ways. It seems most likely that at some time a critical combination of factors arose in our ancestors which gave rise to the rapid expansion of the Language faculty, in its many facets, and a concomitant diversification and enrichment of individual languages and cultures. Exactly what the components of that critical combination were is still to be discerned, and it is not clear what further evidence we may be able to call upon.


Human languages are far more complex than any animal communication system. Furthermore, they are learned, rather than innate, a fact which partially accounts for their great diversity. Human languages are semantically compositional, generating new meaningful combinations as functions of the meanings of their elementary parts (words). This is unlike any known animal communication system (except the limited waggle dance of honeybees). Humans can use language to describe and refer to objects and events in the far distant past and the far distant future, another feature which distinguishes language from animal communication systems. The complexity of languages arises partly from self-organization through cultural transmission over many generations of users. The human willingness altruistically to impart information is also unique.


Bibliography references:

Austin, J.L. (1962). How to Do Things with Words. Harvard University Press, Cambridge, MA.

Byrne, R.W. and Corp, N. (2004). Neocortex size predicts deception rate in primates. Proceedings of the Royal Society B published online.

(p.262) Call, J., Hare, M., Carpenter, and Tomasello, M. (2004). ‘Unwilling’ versus ‘unable’: chimpanzees' understanding of human intentional action. Developmental Science, 7, 488–498.

Cheney, D. and Seyfarth, R. (1990). How Monkeys See the World: Inside the Mind of Another Species. University of Chicago Press, Chicago.

Chomsky, N. (1980). Rules and Representations. Basil Blackwell, London

Chomsky, N. (1986). Knowledge of Language. Praeger, New York.

Chomsky, N. (1995). The Minimalist Program, Current Studies in Linguistics 28. MIT Press, Cambridge, MA.

Christiansen, M.H. and Ellefson, M.R. (2002). Linguistic adaptation without linguistic constraints: the role of sequential learning in language evolution. In: A. Wray (ed.), The Transition to Language, pp. 335–358. Oxford University Press, Oxford.

Christiansen, M.H., Dale, R., Ellefson, M.R., and Conway, C. (2002). The Role of Sequential Learning in Language Evolution: Computational and Experimental Studies, In A Cangelosi and D Parisi, eds. Simulating the evolution of language pp. 165–187. Springer Verlag, London.

Clayton, N.S. and Dickinson A. (1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395, 272–274.

Clayton, N.S., Griffiths, D., Emery, N., and Dickinson A. (2001). Elements of episodic-like memory in animals. Philosophical Transactions of the Royal Society of London, 356, 1483–1491.

Cook, R., Brown, M., and Riley, D. (1983). Flexible memory processing by rats: use of prospective and retrospective information. Journal of Experimental Psychology. Animal Behavior Processes, 11, 453–469.

Croft, W. (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford University Press, Oxford.

Culicover, P. and Jackendoff (2005). Simpler Syntax. Oxford University Press, Oxford.

Fillmore, C., Kay, P., Michaelis, L.A., and Sag, I. Construction Grammar. University of Chicago Press, Chicago.

Goldberg, A. (1995). Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago.

Goodall, J. (1986). The Chimpanzees of Gombe: Patterns of Behavior. Harvard University Press, Cambridge, MA.

Halliday, M.A.K. (1975). Learning How to Mean: Explorations in the Development of Language. Edward Arnold, London.

Hare, B., Call, J., Agnetta, B., and Tomasello, M. (2000). Chimpanzees know what conspecifics do and do not see. Animal Behaviour, 59, 771–785.

Hauser, M.D., Chomsky, N., and Fitch, W.T. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science, 298, 1569–1579.

Huddleston, R. and G.K. Pullum (2002). The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press.

Keller, R. (1994). On Language Change: the Invisible Hand in Language. London: Routledge. [Translation and expansion of Sprachwandel: Von der unsichtbaren Hand in der Sprache. Tübingen.]: Franke.).

Kirby, S. (2001). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Journal of Evolutionary Computation, 5, 102–110.

Krebs, J.R. and R. Dawkins (1984). Animal signals: mind-reading and manipulation. In: JR. Krebs and N.B. Davies, eds Behavioural Ecology: an Evolutionary Approach, 2nd edn, pp. 380–402. Blackwell Scientific Publications, Oxford.

Marler, P., Evans, C.S., and Hauser, M.D. (1992). Animal signals? reference, motivation or both? In: H. Papoucek, U. Jürgens, and M. Papoucek (eds), Nonverbal Vocal Communication: Comparative and Developmental Approaches, pp. 66–86. Cambridge University Press, Cambridge.

Meacham, J.A. and Singer, J. (1977). Incentive effects in prospective remembering. Journal of Psychology, 97, 191–197.

Meltzoff, A. (1988). Infant imitation after a one-week delay: long-term memory for novel acts and multiple stimuli. Developmental Psychology, 24, 470–476.

Menzel, C. (2005). Progress in the study of chimpanzee recall and episodic memory. In: HS. Terrace and J. Metcalfe (eds), The Missing Link in Cognition: Origins of Self-Reflective Consciousness, pp. 188–224. Oxford University Press, Oxford.

Mulcahy, N.J. and Call, J. (2006). Apes save tools for future use. Science, 312, 1038–1040.

Oudeyer, P.-Y. (2006). Self-Organization in the Evolution of Speech. Oxford University Press, Oxford. (Translated from French L'auto-organisation de la Parole by James R. Hurford).

Pinker, S. and Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707–784.

Ross, J.R. (1967). Constraints on Variables in Syntax. PhD thesis. MIT. (Published as Ross 1986.)

Ross, J.R. (1986). Infinite Syntax! Ablex Publishing Co., Norwood, New Jersey.

Smith, A. (1786). An Inquiry into the Nature and Causes of the Wealth of Nations: in three volumes. Fifth edition. A. Strahan and T. Cadell, London.

(p.263) Sperber, D. and D. Wilson (1986). Relevance : Communication and Cognition. Blackwell, Oxford.

Suddendorf, T. and Corballis M.C. (1997). Mental time travel and the evolution of the human mind. Genetic, Social, and General Psychology Monograph, 123, 133–167.

Suddendorf, T. and Corballis, M.C. The evolution of foresight: What is mental time travel and is it unique to humans? Behavioral and Brain Sciences, in press.

Thompson, D. (1961). On Growth and Form, JT Bonner ed. Cambridge University Press, Cambridge. Abridge edition.

Tulving, E. (1972). Episodic and semantic memory. In: E. Tulving and W. Donaldson eds, Organization of Memory, pp. 381–403. Academic Press, New York.

Tulving, E. (2005). Episodic memory and autonoesis: uniquely human? In: H.S. Terrace and J. Metcalfe (Eds), The Missing Link in Cognition: Origins of Self-Reflective Consciousness, pp. 3–56. Oxford University Press, Oxford.

Whiten, A., Horner, V, Litchfield, C., and Marshall-Pescini, S. (2004). How do apes ape? Learning and Behaviour, 32, 36–52.

Wood, D. (1988). How Children Think and Learn. Basil Blackwell, London.

Zuberbühler, K. (2002). A syntactic rule in forest monkey communication. Animal Behaviour, 63, 293–299.

Zuberbühler, K. (2005). Linguistic prerequisites in the primate lineage. In: M. Tallerman ed. Language Origins: Perspectives on Evolution, pp. 263–282. Oxford University Press, Oxford.

Zuberbühler, K., Cheney, D.L., and Seyfarth R.M. (1999). Conceptual semantics in a non-human primate. Journal of Comparative Psychology, 113, 33–42.

Zuidema, W. and de Boer, B. The evolution of combinatorial phonology. Journal of Phonetics, in press. (p.264)


(1) Items in this chapter superscripted by ‘I’ are terms routinely used by linguists about language, and are explained in a glossary at the end of this chapter.

(2) The term ‘emulation’ is due to Wood (1988).

(3) Compare I promised John to go with I persuaded John to go. Who, in each case, is to do the going? Also compare John is easy to please with John is eager to please. Who, in each case, is understood as doing the pleasing?

(4) These hypothesized modules of the Language faculty are ‘interacting’ in roughly the same sense as subsystems of physical organisms, such as the respiratory system and the circulation system, work together.

(5) Kanzi uses a lexigram board, with abstract symbols that he points to, as a substitute for uttering spoken words.

(6) Goldberg (1995), Croft (2001), Fillmore (2003), Culicover (2005).

(7) Also sometimes called ‘duality of patterning’—the terms are equivalent.

(8) An exception may be private soliloquizing, praying, or talking to oneself. It seems likely that these are uniquely human activities which evolved on top of a prior purely social communicative form of language. Chomsky (1980) is in a minority in holding that such talking to oneself may be the main function of human language.