Long-term memory and the episodic buffer - Oxford Scholarship Jump to ContentJump to Main Navigation
Working Memory, Thought, and Action$

Alan Baddeley

Print publication date: 2007

Print ISBN-13: 9780198528012

Published to Oxford Scholarship Online: March 2012

DOI: 10.1093/acprof:oso/9780198528012.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 26 July 2016

Long-term memory and the episodic buffer

Long-term memory and the episodic buffer

Chapter:
(p.139) Chapter 8 Long-term memory and the episodic buffer
Source:
Working Memory, Thought, and Action
Author(s):

Alan Baddeley

Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780198528012.003.0008

Abstract and Keywords

This chapter attempts to determine how working memory and long-term memory interact. It begins by considering a number of alternative views that would largely dispense with the question. Working memory as language processing is first discussed followed by the view of working memory as activated long-term memory. The author's decision to postulate a central executive devoid of memory capacity led to a number of problems. These problems are categorized into three and are assessed in this chapter. The latter part of this chapter introduces the concept of the episodic buffer. This buffer is assumed to be a temporary storage system that is able to combine information from the loop, the sketchpad, long-term memory, or indeed from perceptual input, into a coherent episode.

Keywords:   working memory, long-term memory, language processing, episodic buffer, loop, sketchpad, perceptual input

Of the four functions that I suggested might be desirable in a central executive, three were characteristics of attentional control, namely the capacity to focus, to divide and to switch attention, while the fourth capacity was qualitatively different, namely that of interfacing working memory with long-term memory (Baddeley, 1996). Implicit in the first three is the idea of the central executive as an attentional control system, something that was made explicit by Baddeley and Logie (1999). Such a view differs from the initial concept, which regarded the central executive as comprising a limited capacity pool of general processing capacity that could be used for a range of functions including both attentional control and temporary storage. The modification to our original view stemmed from the fear that a general processing concept was simply too powerful, with too few constraints to generate tractable and useful questions. By treating the executive as a purely attentional system, it became easier to frame potentially fruitful questions, although as we have just seen, not necessarily to answer them at this stage with any degree of completeness. However, having banished storage from the executive, it became increasingly clear that we were left with a number of problems in tackling the fourth question raised, namely that of how working memory and long-term memory interact.

8.1 Some reductionist views

Before beginning the search for a link between working memory and long-term memory, we should consider a number of alternative views that would largely dispense with the question. One of these is the suggestion that WM is simply part of the system for processing language. This view tends to be taken by investigators whose primary interest is in language, and who regard temporary storage simply as a secondary feature of the systems involved. This is therefore partly a question of focus rather than content. However, it may lead to a neglect of those features of WM that are not language-based, as in the case of Allport's (1984) proposal that STM deficits in patients are caused by a subtle deficit in speech perception.

(p.140) 8.1.1 Working memory as language processing

It seems very probable that the phonological loop has evolved from systems that were specialized for speech perception (the phonological store) and production (the articulatory rehearsal system). The evidence suggests however that the loop provides a separate offline system for storing and manipulating language-related material that goes beyond a basic capacity to perceive and produce speech (Vallar and Papagno 2002). This conclusion is based on the observation that although language processing and phonological STM are often both impaired in patients with left hemisphere damage (Vallar et al. 1992), speech processing and phonological STM can show a clear dissociation. For example, patients such as P.V. have grossly impaired phonological STM but normal perception and production of language (Basso et al. 1982; Martin and Breedin 1992). Conversely, patients may show relatively well-preserved verbal STM, despite a major auditory perceptual deficit (Baddeley and Wilson 1993a). Furthermore, even if such peripheralist views of working memory might in principle provide an account of phonological STM, they say nothing concerning the crucial capacity of the system to serve as a working memory, to manipulate information in order to solve problems, or indeed about the capacity to interface with LTM. This is not, of course, to argue against a range of more complex accounts of working memory that emphasize language processing which I would regard variants on a multicomponent model (e.g. N. Martin). They differ from my own approach in focusing on patients who may have relatively complex language deficits, rather than searching for cases who show isolated deficits in a specific component: in this case, the phonological loop. Both styles of theorising tend to accept the need to assume some form of temporary storage within a multicomponent working memory.

8.1.2 Working memory as activated long-term memory

A much more widely held view of WM is that it simply represents the currently active components of LTM. This was of course the dominant view in North America up to the late 1960s (e.g. Melton 1963), and was proposed more recently by Nairne (2002) and by Ruchkin et al. (2003). My objection to this view is not that it is incorrect, but rather that it appears to give a clear answer without actually doing so (Baddeley 2003a). WM is certainly dependent on LTM, but in so many different ways as to make a simple identification of WM with activated LTM quite unhelpful. Such a view might reasonably, for example, be taken to imply that if one understands LTM, then an understanding of WM will naturally follow. There is little evidence for this. Although we certainly know a great deal more about WM than we did 30 years ago, I would suggest that almost all of this has resulted from treating the system as separate from LTM.

(p.141) Advocates of a single memory system have often relied on demonstrating analogies between WM and LTM, an approach that began with Melton's (1963) classic attack on the concept of STM, was continued by Postman (1975) and more recently by Nairne (2002) and Ruchkin et al. (2003). Such analogies are almost always based on experimental paradigms that involve both long- and short-term components. It is therefore not surprising that such hybrid paradigms show similarities to explicitly long-term tasks. Examples of such tasks include the Peterson short-term forgetting test, free recall and running memory span. However, similarities do not demonstrate identity; the fact that both lizards and elephants have four legs, an absence of fur, two eyes and a mouth does not make them the same species. This issue has already been covered in relation to Nairne's (2002) advocacy of a unitary memory system, and hence will not be discussed further.

However, while resisting the view that WM is simply part of LTM, I would certainly agree that there are several quite different ways in which WM and LTM interact. If we consider the phonological loop, for example, it is clear that pseudo words are easier to recall if they are word-like, phonotactically similar in structure to the subjects' native language (Baddeley 1971; Gathercole et al. 2001). Hence a nonword such as monage is likely to be better recalled than luzok, despite the fact that both are unfamiliar and meaningless. More explicit influences of long-term knowledge can, of course, also influence performance, which is presumably why subjects in digit span experiments virtually never respond with anything other than digits. Semantic knowledge at the level of individual words, concepts and general world knowledge also influences immediate verbal recall, again probably relying on both implicit and explicit processes.

I agree, therefore, that working memory does involve activated LTM in a range of different ways, but then, so do most aspects of human cognition. For example, much perception also involves activated LTM; we tend to see the world in terms of tables, chairs and sunsets, not as purely sensory features. Such links with prior experience are, of course, important, but would not lead us to suggest that perception, or even something more heavily dependent on learning, such as language, is simply activated LTM. So having agreed that the undoubted link between WM and LTM presents a problem, or probably a series of problems, rather than a solution, what else can one say about it?

8.2 Some skeletons in the working memory cupboard

Our decision to postulate a central executive devoid of memory capacity led to a number of problems. Initially I chose to set such difficulties aside, to be reconsidered in due course. Although I regarded such issues as simply ‘on the (p.142) back burner’, given their potential threat to the adequacy of our model of WM, a more accurate metaphor might be skeletons in the cupboard. Eventually, we attempted to put one skeleton too many into the cupboard: they all fell out, emphasising the need for a rather fundamental rethink of the structure of WM. The problems broadly fell into three categories: (1) Evidence for the short-term storage of information that could not readily be explained by the phonological loop or the sketchpad; (2) The problem of how the visuospatial and phonological systems might interact; and (3) The unresolved problem of the interface between WM and LTM. They will be discussed in turn.

8.2.1 A back-up store for STM?

We have tended to treat the phonological loop as though it were the sole source of digit span performance. If that were the case, then with visual presentation and articulatory suppression, span should drop virtually to zero. In fact, span typically drops from six or seven to four or five digits (e.g. Larsen and Baddeley 2003b), suggesting the need to assume some kind of additional ‘back-up’ store (Page and Norris 1998). Could this simply reflect the contribution of LTM to span? If so, patients with a pure phonological STM deficit showing normal LTM should have spans of four or five digits, rather than the spans of one or two items typically reported (Vallar and Shallice 1990b).

8.2.2 Preserved recall in STM patients

When STM patients recall auditorily presented digits, their span is about one item. With visual presentation it rises to around four digits (Shallice and Warrington 1970; Basso et al. 1982). An obvious way out of this dilemma is to attribute the increase to the visuospatial sketchpad. However, the sketchpad had typically been assumed to hold information in parallel, and to be inappropriate for serial recall. A series of studies by Phillips using matrix patterns suggests that only the final item is held in STM (Phillips and Christie 1977). A possible solution is to postulate a further system, possibly lexically based, that is specialized for serial recall. However, while there is good evidence for the retention of visual features of verbally coded sequences (see Logie et al. [2000] for a review), it is less clear if, and how the sequential order of items that are not verbally codable is maintained (though see Chapter 4 for a further discussion of this issue).

8.2.3 Semantic coding in STM

Although my initial study showed that immediate serial recall of five-word sequences principally reflected phonological coding, a small but significant effect of semantic coding was also found (Baddeley 1966a). Other studies have (p.143) shown much more powerful semantic effects, (e.g. Brener, 1940; Hulme, Roodenrys et al. 1997). Semantic factors tend to be more influential when the difficulty of retaining item rather than order information increases, for example as a result of using longer sequences or larger sets of potential items. The fact that semantic factors contribute to STM paradigms was not a problem for the initial Baddeley and Hitch working memory model. It did however become a problem once the executive was stripped of its storage capacity, leading to the question of how these phonological and semantic codes were stored and how they were combined to enhance recall.

8.2.4 Sentence span

Immediate memory for sentential material is typically substantially greater than span for unrelated words (Brener 1940). Baddeley et al. (1987) found spans of around five for unrelated words and 15 for sentences. Should one suggest, therefore, that LTM contributes 10 words to span, and WM five? Suppose then that we test a patient with a phonological STM deficit resulting in an unrelated word span of one item, but with normal LTM; what might we expect? Sentence span should presumably be around 11 words, comprising ten from LTM and one from STM. The span observed was five (Baddeley et al. 1987). This suggests an interactive process whereby a basic phonological core is amplified by contributions from LTM. If the basic span is grossly restricted, then overall performance will also be severely limited.

Such a view is broadly consistent with the recent work by Gathercole (unpublished) who suggests that memory span in general relies upon the phonological storage of information at the syllabic or sub-syllabic level, which is then interpreted using redintegrative processes. As observed earlier, however, the current WM model remains silent on exactly how these processes might operate, or indeed where any intermediate products might be stored during redintegration.

8.2.5 Prose recall

The issues raised by sentence span are even more acute when considering recall of prose passages comprising a paragraph or more. The issue was presented particularly clearly by K.J., a highly intelligent patient with a dense, but very pure amnesia (Wilson and Baddeley 1988). When tested on retention of the paragraph comprising the prose recall subtest of the Wechsler Memory Scale, he performed normally on immediate test, while totally failing to recall anything 20 minutes later. His lack of delayed recall was expected, but how did he manage to do so well on immediate test? The passage in question, a brief newspaper-like story about a lady losing her purse and being helped by the (p.144) police, comprised around 20 ‘idea units’, each several words long. This is substantially beyond the capacity of the phonological loop; nor was it plausible to assume that this amount of detail could be stored in the sketchpad. So how was such good performance achieved?

Fortunately, the prose recall test was included as part of a routine clinical battery given by my colleague, Barbara Wilson, who had, over the years, tested many patients with memory deficits. We therefore decided to see how typical K.J.'s performance might be, and if, as we were reasonably certain, it proved to be atypical, to try to assess what allowed him to do so well (Baddeley and Wilson 2002). All our amnesic patients performed badly on delayed recall, as, of course, we expected. We found that most amnesic patients also performed relatively badly on immediate recall. There was, however, a small number of patients like K.J., whose immediate level of performance ranged from moderate to excellent. When we examined their clinical profile, two features stood out; they tended to have a high level of intelligence as measured by the Wechsler Adult Intelligence Scale, and in most cases to have well-preserved executive capacities, characteristics that, unsurprisingly, tended to correlate.

We interpreted our results as follows. We assumed that comprehending a coherent prose passage involves activating and combining representations within LTM. These range from the meanings of individual words, through concepts and up to higher order structures such as story grammars and scripts, reflecting shared social knowledge (Kintsch and van Dyck 1977; Schank and Abelson 1977). In subjects with normal episodic memory, we assume that such structures will be consolidated within LTM and hence can form a basis for retrieval after a delay. Patients with defective episodic LTM, however, regardless of how well the structure representing the passage is built and maintained in WM, will fail to consolidate the trace, and hence will not remember it after a brief filled delay.

Our second assumption is that while the task of constructing an overall representation of the paragraph in WM may be relatively straightforward given normal episodic memory, in the absence of this it becomes a highly attentionally demanding task, requiring the constant active maintenance and updating of the memory structure if it is not to collapse. Amnesic patients who are able and willing to do this can demonstrate good immediate recall, but nevertheless remain incapable of longer term storage and retrieval of the story. A particularly striking example of the capacity to maintain information in WM over time despite dense amnesia was described to me by Endel Tulving (personal communication, 1999). The patient in question claimed to still be able to play bridge, despite extremely dense amnesia. Tulving decided to test this and set up a game. Not only was the patient able to keep track of the bidding (p.145) and the resultant trump suit, but was able to remember the fall of cards well enough to allow him and his partner to win the rubber.

Our interpretation of preserved prose recall despite amnesia has considerable similarity to the concept of long-term working memory put forward by Ericsson and Kintsch (1995). It differs, however, in stressing the temporary nature of the retrieval structures built by our amnesic patients, in contrast to the emphasis by Ericsson and Kintsch on the prior development within LTM of the necessary retrieval structures, as for example, in the case of mnemonists who are capable of demonstrating remarkable immediate recall only because of many hours of practice building up the necessary structures in LTM. As we shall see later, I also assume a more active and flexible role of working memory, in addition to the utilization of the activated structures and representations within LTM assumed by Ericsson and Kintsch.

8.2.6 Chunking

Perhaps the most powerful single observation about the functioning of STM was the demonstration by Miller (1956) of the importance of chunking, enhancing span by combining several items into a single integrated chunk. The recall advantage of prose over unrelated words presumably stems from the capacity of subjects to bind together individual words within the prose material into meaningful chunks. Miller's suggestion that our capacity for processing information might be determined by number of chunks rather than items has continued to be influential, although current opinion tends to favour a capacity nearer four than Miller's magical number seven (Cowan 2001; 2005).

In the initial WM model, chunking was implicitly assigned to the processing and storage capacities of the central executive. However, given its new truncated form, with its lack of storage capacity, the executive seems ill-equipped to perform this important task. Could chunking perhaps be assigned to LTM? Since our densely amnesic patient KJ. appears to be able to chunk perfectly adequately, as shown by his capacity for immediate prose recall, it would seem to suggest that chunking does not depend on intact episodic LTM. Furthermore, he appears to be able to utilize some system or process that maintains the chunked representations over short delays. Finally, the very flexibility of our capacity to combine information into chunks seems to suggest that the process goes beyond the simple coactivation of existing structures in LTM.

8.2.7 Working memory span

As we shall see in Chapter 10, one of the most extensively investigated features of working memory concerns individual differences in its overall capacity. In a (p.146) classic study, Daneman and Carpenter (1980) required their subjects to process a series of sentences, and afterwards recall the last word in each. They measured the maximum number of sentences that could accurately be processed and recalled in this way, terming it working memory span. They demonstrated a substantial correlation between span and reading comprehension in college students, while subsequent studies have shown it to predict performance on a wide range of cognitive tasks, including measures of general intelligence, and practical cognitive skills such as learning to programme or understand electronics (Kyllonen and Christal 1990; Daneman and Merikle 1996; Engle et al. 1999a;). Once again, this task clearly demands storage processes that exceed the capacity of the verbal and visuospatial subsystems. Working memory span also predicts cognitive functioning much more effectively than measures of either simple word span or episodic LTM (see Chapter 11). The question arises once more as to how the Baddeley and Logie version of WM can account for these important results.

8.2.8 Conscious awareness

As discussed in the section on imagery, Baddeley and Andrade (2000) attempted to give an account of the processes underpinning the conscious awareness of visuospatial and auditory images, purely in terms of the visuospatial and phonological subsystems. While our results indicated that these subsystems formed a plausible part of the story, particularly for recently encoded novel stimuli, we had difficulty in using them to give an account of images based on LTM. Although WM was clearly involved, the apparent contribution from the loop and sketchpad to images deriving from LTM was rather modest. Our revised interpretation of these results involved combining information from LTM and from the relevant WM subsystems, but left completely unspecified how such information was integrated (Baddeley and Andrade 2000). Attentional capacity appeared to be necessary, since a demanding secondary task tended to reduce ratings of vividness for all image types, but we had no suggestions as to what this system might be.

The problem of integrating information from more than one source also arises in other more standard procedures. For example, Logie et al. (2000) were able to demonstrate visual similarity effects in immediate serial consonant recall. The visual effects were small compared to those based on phonology, but were reliable, operating across all serial positions and occurring both with and without articulatory suppression. These results imply some common point at which visual and verbal information can interact.

To summarize, although the simple Baddeley and Hitch (1974) model is capable of accounting for a good deal of data, the attempt to limit its storage (p.147) capacity to the visuospatial and verbal subsystems has created a number of significant problems. These appear to suggest that the capacity of WM to store information exceeds that of the existing subsystems. We also need a mechanism for allowing verbal and visuospatial subsystems to interact with each other, and with LTM. Finally, the system appears to be related to conscious awareness and to be attentionally limited.

In response to these problems, I proposed the concept of an episodic buffer, identifying it as a fourth component of working memory (Baddeley 2000). In retrospect, it could equally well be regarded as a fractionation of the initial central executive into an attentional control component as proposed by Baddeley and Logie (1999), and an additional storage component. Such a conceptualization has the attraction of neatness, since the loop and sketchpad can also be separated into processing and storage components. This may, however, have the drawback of suggesting closer similarities than proves to be the case. For present purposes, therefore, we will treat the episodic buffer as a separate subsystem, as shown in Fig. 8.1.

                   Long-term memory and the episodic buffer

Fig. 8.1 The revised model of working memory proposed by Baddeley (2000). It includes a representation of links to long-term memory, and includes a fourth component, the episodic buffer. In this initial version, links between the subsystems and the buffer operated via the central executive. It now seems likely that there are also direct links (shown here as dotted lines). From Baddeley, A.D. (2000). The episodic buffer: a new component of working memory? In: Trends in Cognitive Sciences 4(11), 417–423. Reproduced with permission from Elsevier.

(p.148) 8.3 The episodic buffer

I will start by attempting to justify the name, since it was the name that crystallized my somewhat inchoate thoughts on the need for a further component of WM. The system is episodic, in the sense that it integrates information into coherent episodes; it is a buffer in that it comprises a limited capacity storage system that enables information coded using different dimensions to interact. Its capacity is set in terms of chunks, a chunk being a package of information bound by strong associative links within a chunk, and relatively weak links between chunks. A central feature of the buffer therefore is its role in binding information from diverse sources into unified chunks.

Hummel (1999) distinguishes between static and dynamic binding. Static binding occurs when two features co-occur, a perceptual example, would be yellowness and bananas. Repeated observation of yellowness and bananas may result in binding of these in semantic memory. This binding can be based either on the basic structure of the underlying perceptual system, or on learning. Examples of structural factors that facilitate perceptual binding are offered by the gestalt principles such as continuity and closure which help to parse the visual array into objects and into scenes. Binding based on learning occurs when long-term knowledge helps us chunk familiar objects in complex scenes such as a car parked in front of a house. In both cases binding would appear to occur at comparatively little attentional cost. Dynamic binding involves the novel combination of items which may be combined in many different ways, potentially involving the integration of a number of apparently arbitrary objects and features, for example, a red banana floating in a lake of blue porridge. Hummel suggests that such dynamic binding places considerably higher computational demands on the system. This issue will be addressed in the next chapter.

To summarize. The episodic buffer is assumed to be a temporary storage system that is able to combine information from the loop, the sketchpad, long-term memory, or indeed from perceptual input, into a coherent episode. This process may be attentionally demanding, whereas direct retrieval from LTM is assumed to be relatively undemanding (Baddeley et al. 1984b; Craik et al. 1996). As Fig. 8.1 (see p. 147) shows, the buffer provides a link between the executive and LTM. In the initial (Baddeley 2000) version shown, the flow of information from the phonological loop and the visuospatial sketchpad occurs indirectly through the executive. I did not assume a direct link with these subsystems, preferring to leave this question open to empirical investigation. As the next chapter indicates, the evidence tends to favour such a link.

The episodic buffer is assumed to be the basis of conscious awareness. A number of theorists have suggested that a principal function of consciousness (p.149) is to bind together information gleaned from separate perceptual channels such as colour, shape and location, into coherent objects (Baars 1997; Dehaene and Naccache 2001). The reflexive character of conscious awareness, the fact that we can be aware of our experience and reflect upon it, implies a system that has both temporary storage and manipulative processes. Both are, of course, at the centre of the concept of WM. These issues are discussed further in Chapter 16.

A possible criticism of the concept of an episodic buffer is that I am simply replacing a cupboard full of skeletons with a single insubstantial ghost. I propose three methods of fleshing out my potential apparition. The first is to go through the list of problems outlined, and say how I think the concept of an episodic buffer offers a potential solution. Secondly, I shall apply the concept to a series of questions proposed by Miyake and Shah (1999b) as central to any model of working memory. The third is to attempt to use the concept of an episodic buffer as a framework for empirical studies. If it proves capable of generating experiments that are fruitful in helping understand more about the processes and systems of working memory, then it will have succeeded, even if the findings result in substantial modification, or even the abandonment of the initial concept. In short, the episodic buffer is a conceptual tool, not a specified model. None of these stratagems should, of course, be regarded as intrinsically capable of demonstrating the validity of the episodic buffer. What they should do, I hope, is to illustrate its potential heuristic value as a means of tackling the comparatively neglected question of how the subsystems of working memory are integrated and interfaced with LTM.

8.3.1 Burying the skeletons

How might the concept of an episodic buffer help in tackling our nine problems? Possible approaches to each of these may be summarized as follows:

  1. 1 The back-up store. The episodic buffer offers one way of supplementing the limited capacity of the phonological loop by utilising a multimodal code to provide additional storage capacity.

  2. 2 STM patients. Show grossly impaired performance only when auditory presentation results in information entering the episodic buffer through the impaired phonological store. With visual presentation, a more adequate route into the buffer is available.

  3. 3 Semantic coding in STM. Because of its capacity for linking with LTM and providing a multidimensional code, it allows the phonological loop to capitalize on semantic information. This may be unnecessary for short sequences but becomes increasingly valuable for long sequences and for those for which item information is critical.

  4. (p.150) 4 The multidimensional episodic buffer store. Might plausibly be assumed to be capable of storing serial order in a way that is not open to the sketchpad.

  5. 5 Recall of sentential and prose material. Enhanced span results from the capacity of the buffer to use and integrate information from both the WM subsystems and LTM, allowing span to be increased by chunking.

  6. 6 Chunking. It is assumed that the active chunking of previously unrelated items may occur within the episodic buffer, utilising the attentional capacity of the executive to capitalize on prior learning, and to combine information from separate sources in novel ways.

  7. 7 Redintegration. Is assumed to be a process whereby the executive takes advantage of information available in LTM in order to optimize interpretation of the contents of the episodic buffer. There may however be other more automatic processes whereby long-term knowledge facilitates both chunking and retrieval from STM.

  8. 8 Working memory span. Is assumed to reflect the storage capacity of the buffer, together with the efficiency with which the executive can use this capacity.

  9. 9 Conscious awareness. Provides one mode of retrieval of information from the buffer. It is particularly effective in allowing multiple sources of information to be processed in parallel (see Chapter 16).

8.3.2 How might the buffer work?

At this stage, it is clearly inappropriate to expect a precise and detailed model. What follows, therefore, should be regarded as a basis for generating testable hypotheses about the way in which information from LTM may be held and manipulated in WM. They are essentially guesses which aim to facilitate investigation of an important but neglected area, not firm predictions which if unsuccessful would necessarily imply that the whole model be discarded. I shall again use the Miyake and Shah (1999b) questions as a convenient framework. The questions and my tentative answers are as follows:

8.3.3 Basic mechanisms and representations

This breaks down into a series of subquestions.

  1. 1 How is information encoded and maintained? I initially assumed as a working hypothesis that encoding would depend heavily on the operation of the central executive, that it may operate through the phonological loop, sketchpad, or from LTM, and be maintained by rehearsal. While the phonological component may be maintained through subvocalization, I suspect that this is atypical of most types of rehearsal. Whereas the phonological (p.151) code can literally be regenerated, this is not the case for a visual, or I suspect a semantic code. It seems likely, therefore, that rehearsal within the buffer is more analogous to continued attention to a particular representation.

  2. 2 What is the retrieval mechanism? I would regard conscious awareness as the principal retrieval mode. Whether it is the only mode I am less certain. Consider an instruction such as ‘press the right-hand button when the red light appears’. It clearly seems to be the case that we can set up a temporary ‘program’ that allows us to perform such an operation. Is this maintained in the episodic buffer, on in some parallel system? If in the buffer, I suspect that it is not necessary to become aware of the instruction in order to obey it. So far, however, we know remarkably little about this important capacity (Monsell 1996), so suggesting that such temporary programs are stored in the buffer at least offers a potentially useful starting hypothesis.

  3. 3 How is information represented: is the format different for different types of information? I assume a single multidimensional code which provides an interface for information from many different sources. These are likely to include LTM and the subsystems of WM, together with information from sensory systems, including those such as smell and taste, that do not themselves have an active means of control and manipulation. Again, this is a highly speculative assumption that at least has the advantage of encouraging consideration of an intriguing but relatively unresearched area.

8.3.4 Control and regulation

  1. 1 How is the information controlled and regulated? I assume that control depends both on the systems feeding into the buffer, and on the central executive. Hence, phonological information would be controlled in part by the process of subvocalization, while information from LTM would be substantially more influenced by habits and experience. The flow of information would, however, be determined by the supervisory component of the central executive, which in turn would be influenced by higher-level goals.

  2. 2 What determines which information is stored and which ignored? Both existing habits and higher-level goals determine the flow of information, as discussed in the chapters that follow.

  3. 3 Is control handled by a central structure? Yes, the central executive, although the extent to which this represents a single hierarchical structure with one basic controller or a more heterarchical alliance of multiple executive processes remains to be decided. The way in which this system might operate is discussed in subsequent chapters.

(p.152) 8.3.5 Is the episodic buffer unitary or non-unitary?

  1. 1 Does it consist of multiple separable subsystems? At this stage of theorising, the buffer is regarded as a unitary subsystem within multicomponent working memory. It seems probable that as a result of empirical exploration, the buffer may be fractionated, as has proved to be the case with the loop and sketchpad.

8.3.6 The nature of the buffer's limitations

  1. 1 What mechanisms constrain capacity? Capacity is limited in a number of different ways. First of all, the buffer will be constrained by the fact that its sources of information, namely LTM and the various subsystems, themselves have limits. The buffer itself is limited in the number of chunks that can be maintained (Cowan 2005), and by the efficiency with which the central executive can operate the system. This in turn will depend upon the overall attentional capacity of the executive.

At a more neurobiological level, such limits will themselves tend to reflect a number of parameters including measures of excitation and inhibition, rates of decay within the relevant stores, and interference effects depending upon the precise character of the material being maintained. While there will be similarities in the mechanisms operating within the various subsystems that feed into the episodic buffer, these mechanisms are unlikely to be identical, given the differential constraints imposed by the need to process sound, vision and meaning.

8.3.7 The role of the buffer in complex cognitive activities

  1. 1 A role in language comprehension? I assume that the buffer plays a role in dealing with the comprehension of complex episodes, although the extent to which the buffer is heavily involved in routine comprehension is uncertain (see below). It is possible that most access to long-term memory representations may be relatively automatic, given the evidence that retrieval from LTM appears not to be heavily dependent on working memory capacity (Baddeley et al. 1984; Craik et al. 1996).

  2. 2 Spatial thinking? I see the episodic buffer as offering a cognitive workspace, and hence as playing a central role in spatial thinking, backed up by the visuospatial sketchpad. I assume the same would apply to a number of other problem-solving activities, except that emphasis would be placed on other subsystems than the sketchpad. In particular, flexible access to LTM, and the capacity to pick up and utilize analogies across different contexts (p.153) and modalities is likely to be important. Performance will be influenced by the number of chunks that can be maintained, and by the attentional capacity of the central executive. I return to this issue in the next chapter.

8.3.8 Relationship of the buffer to LTM and knowledge

  1. 1 Relation to episodic LTM. The episodic buffer is the principal link between working memory and LTM. It resembles episodic LTM in that it is concerned with integrating and maintaining specific individual episodes. It differs, however, in that such maintenance is temporary, and attentionally limited. Entry of new material into episodic long-term memory is assumed to be dependent on the buffer, which is also assumed to play an important role in episodic retrieval. The buffer may, however, be well-preserved in densely amnesic patients, and conversely, possibly disrupted in patients whose LTM is otherwise unimpaired. Such patients would, of course, experience secondary problems with new learning of a type that is often encountered in dysexecutive patients, who typically show poor attentional and strategic control of both learning and retrieval (Stuss and Knight 2002).

  2. 2 Relationship to semantic memory. The buffer will frequently utilize semantic information in representing episodes, and such episodes will in due course contribute to semantic memory, as part of the normal process of accrual of knowledge.

  3. 3 Relation to procedural skills. Such skills would not be assumed to play a particularly important role in the episodic buffer, although as discussed earlier, it is conceivable that action plans such as instructions to perform a specified task in a particular way might be stored temporarily within the buffer.

8.3.9 What is the relationship to attention and consciousness?

The episodic buffer is assumed to be controlled by the central executive, a system whose capacity is attentionally limited. Retrieval from the buffer is assumed to operate principally through the process of conscious awareness, allowing information from multiple sources to be combined into a coherent overall representation.

8.3.10 How is the episodic buffer biologically implemented?

It seems unlikely that the episodic buffer reflects the operation of a single area of the brain. Given that its function is to pull together information from many different subsystems, it will potentially be influenced by all of these, to a (p.154) greater or lesser extent. Nevertheless, it seems highly probable that the frontal lobes will be heavily involved, given their extensive links throughout the brain, and their involvement in ‘higher-level’ functions that coordinate processes such as perception and memory (Stuss and Knight 2002). It is also possible that the hippocampus may play a role in binding new information within the buffer with existing information in LTM.

The processes involved in such integration are essentially those that are required for any theory of binding. Hypotheses currently fall into two broad categories. At the individual cell level, there is evidence to suggest that particular neurons may be specialized for detecting specific features, while others detect conjunctions of features (Fuster 2002). It is possible in principle that a hierarchy of such neurons could provide a binding mechanism, broadly along the lines advocated by Goldman-Rakic (1998).

A second approach is to suggest that components of a single scene are integrated through the synchronous firing of the relevant units (Gray and Singer 1989; Hummel 1999; Singer 1999). Vogel et al. (2001) propose such an interpretation for their observation that the capacity of visual working memory is limited to about four objects, regardless of how many features each object comprises. They suggest that the limit is set by the interference due to overlap of firing as the number of objects increases.

Neuroimaging research is beginning to tackle the question of feature integration, with a study by Prabakharan et al. (2000) being particularly relevant. This study was concerned with the short-term retention of consonants, and of locations. Four conditions were used. In one, subjects were shown four consonants, with recognition tested after a brief delay by presentation of a probe consonant. A second condition involved four spatial locations which were likewise probed. The third condition combined these: as in the single task, the consonants were presented in the centre of the screen, and the locations arrayed across the screen. The final condition integrated the two tasks by presenting each consonant at one of the relevant locations. Degree and location of functional magnetic resonance imaging (fMRI) activation was studied, with the consonants activating the areas in the left hemisphere typically associated with the phonological loop, while the locations typically activated the visual equivalents, principally in the right hemisphere (see Chapter 12). Presenting both the line of consonants and the array of locations activated both sets of areas. However, when the consonants were placed in the specific locations, the overall level of activation was reduced, and its principal focus moved to the right frontal cortex, leading them to conclude that ‘the present fMRI results provide evidence for another buffer, namely one that allows for temporary retention of integrated information’ (Prabakharan et al. 2000, p. 89).

(p.155) A similar conclusion was reached by Bor et al. (2003) in a study concerned with spatial working memory. They selected matrix patterns which varied in the ease with which they could be chunked and found that the greater the degree of chunking, the greater the extent of frontal fMRI activation.

8.3.11 Relationship to other models

One of the advantages of the episodic buffer concept is that it provides a bridge between the Baddeley and Hitch working memory model, with its emphasis on separating and identifying the phonological and visuospatial subsystems, and other approaches focusing on either executive processes or links to LTM. It may be helpful to compare the revised multicomponent working memory model with some of these.

My approach has much in common with that of Cowan (2001; 2005), although at first sight our models may seem to be very different. He tends to focus on attentional limitations, and to be less specific about the visuospatial and verbal subsystems. This appears to me to be a matter of emphasis. Indeed, the one major difference that I would make between our approaches is his emphasis on working memory as activated LTM, which seems to me to risk under-emphasising the capacity of WM to combine and manipulate information in novel and creative ways.

The concept of working memory as activated LTM features even more prominently in the long-term working memory model of Ericsson and Kintsch (1995). The episodic buffer differs from their approach in assuming a temporary and flexible workspace that draws upon working memory; it thus proposes a much more dynamic system than that of Ericsson and Kintsch, which appears to depend on the simple activation of representations in LTM.

The assumption of an active, creative system is supported by the fact that one can readily set up a totally novel combination of concepts, such as an ice hockey-playing female elephant, and then go on to solve problems such as to how this new player might best be used, as a defender, perhaps, capable of delivering a mean body check? Or perhaps even better as a goalkeeper? Such manipulation of newly created complex representations would appear to go beyond simple activation, and to require something like a temporary workspace. Cowan (2001) comes close to this in proposing that the ‘addresses’ of the items that have been activated in long-term memory are maintained in a temporary store. Such an assumption does not appear to offer any advantage over the buffer concept, since it still requires storage, and specification as to how these addresses might be combined.

More generally, by allowing multidimensional coding, the episodic buffer concept becomes more readily comparable to a range of models of short-term (p.156) and working memory that assume the system to be based on a wider range of memory coding than did the original model. While differences between my own model and the above remain, the greater breadth resulting from the episodic buffer concept makes it easier to address common problems, notably that of how WM relates to LTM.

The concept of an episodic buffer does therefore appear to offer a potential interpretation of the wide range of data that proved problematic for a working memory model that assumed a purely attentional central executive. A critic might complain that I have simply reverted to the old concept of an all-powerful executive. I would suggest that separating attentional and storage capacities is an important development for the model. Whether I am justified in this claim will depend on how successful the concept of an episodic buffer is in going beyond an account of existing data to generate new studies that succeed in expanding our knowledge in new and fruitful ways. The next chapter describes our first steps in trying to meet this challenge.