Jump to ContentJump to Main Navigation
Computational Neuroscience of Vision$

Edmund Rolls and Gustavo Deco

Print publication date: 2001

Print ISBN-13: 9780198524885

Published to Oxford Scholarship Online: March 2012

DOI: 10.1093/acprof:oso/9780198524885.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 18 October 2018

Principles and Conclusions

Principles and Conclusions

Chapter:
(p.456) 13 Principles and Conclusions
Source:
Computational Neuroscience of Vision
Author(s):

Edmund T. Rolls

Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780198524885.003.0013

Abstract and Keywords

This concluding chapter sums up the key findings of this study on the computational neuroscience of vision. The results show that the responses of many inferior temporal visual cortex neurons have transform invariant responses to objects and faces, but not all neurons have view invariance. The findings also indicate that much of the information available from the responses of the neurons about shapes and objects is available in short time periods and that invariant representations can be self-organized using a trace learning rule incorporated in a feature hierarchy network such as VisNet.

Keywords:   vision, computational neuroscience, visual cortex, neurons, view invariance, invariant representations, VisNet

13.1 The inferior temporal visual cortex provides a transform invariant representation of objects and faces

The neurophysiological evidence described in Chapter 5 shows that the responses of many inferior temporal visual cortex neurons have transform invariant responses to objects and faces. The types of invariance include translation, size, spatial frequency, and even view for some neurons. Not all neurons have view invariance, and this is in accordance with the theory that view invariant representations are formed in the inferior temporal cortex by associating together different views of the same object. The transform invariance property reflects the point that information about objects and faces is made explicit in the neuronal output of the ventral visual processing stream.

13.2 The representation in the inferior temporal visual cortex (IT) is in a distributed form that provides high capacity and that can be read by neurons performing dot-product decoding

The neurophysiological evidence for this is described in Chapter 5. Part of the evidence is that IT neurons typically have an exponential distribution of firing rates to a set of stimuli.

Another part of the evidence is that the information provided by different IT neurons is almost independent, so that the number of stimuli rises exponentially with the number of neurons in the sample. This means that a receiving neuron needs to receive just a set of inputs from randomly selected IT neurons in order to obtain a great deal of information about which object is being viewed, and is a major factor in simplifying brain connectivity.

Another part of the evidence is that each neuron provides up to typically 0.3–0.5 bits of information about the stimulus set, so that to obtain information about which object in the high dimensional space of objects is being viewed, the responses of many neurons must be considered. A single neuron provides insufficient information to diagnose which of many objects is being seen.

Another part of the evidence is that much of the information available in the firing rates can be read by dot product decoding, which is very biologically plausible in that it is the simplest type of operation that neurons could perform.

Another part of the evidence is that the information can be read out from the firing rates of the neurons (in practice, the number of spikes they emit in a short time) without taking into account their relative time of firing.

(p.457) This poses a major challenge to theories that synchronization between the spikes of different neurons is a part of the code in the ventral visual system, because quantitatively a very great deal of information is available in the rates, and it remains to be shown that synchronization adds quantitatively much or anything to the code, at least in the inferior temporal visual cortex, the end of the object processing stream.

13.3 Much of the information available from the responses of the neurons about shapes and objects is available in short time periods, of for example 20 ms.

This property enables processing to travel rapidly from stage to stage of the ventral visual system, as the next stage can read the information in short time periods from the previous stage. This property also shows that each cortical area can perform the computationnecessary to support object recognition in a short period in the order of 20 ms. The evidence for this from information theoretic analyses of the responses of neurons, and from backward masking experiments when measuring neuronal responses and human psychophysical responses, is described in Chapter 5. The computation can be performed this rapidly because neurons have continuous dynamics, as shown in Chapter 7 (Section 7.6).

13.4 Implementing computations with neurons with continuous dynamics allows very rapid feedback processing within a cortical area, so that constraint satisfaction using recurrent collateral connections between neurons in a cortical area could be performed within approximately 15 ms per cortical area

The evidence for this is described in Section 7.6. The same type of rapid processing enables not only information to travel rapidly through a multiple layer hierarchy such as the visual cortical areas in the ventral stream with attractor-based feedback processing in each area, but also for top-down processing including attentional biasing effects to operate quite rapidly, as described in Chapters 9 and 10.

13.5 Of the approaches to the perception of objects described in Section 8.2, what is found in the ventral visual system appears to be closest to a hierarchical feature analysis system

Different abstract computational approaches to object recognition are described in Section 8.2. Of these, feature lists, spaces or histograms (Section 8.2.1) are inadequate to account (p.458) for primate object perception and discrimination because they include no shape information, and cannot discriminate between objects with the same features in a different spatial arrangement. Nevertheless, the types of features, such as texture and colour, that are present in objects do provide valuable evidence in identification (and are incorporated very naturally into hierarchical feature analysis systems).

Template matching and the alignment approach (Section 8.2.3) have the major problem as potential solutions to primate vision that it is very difficult to conceive how the arbitrary shifts, rotations, and scale changes would be performed by a biological system that cannot perform very accurate matrix multiplication. It is also difficult to conceive how a biological system would, even if it had the capability to do such matrix multiplications, find the correct match in its database of the stored possible templates. Some form of canonical representation would be needed. Another difficulty is that reasonable segmentation of an object would be needed at an early stage of processing so that the transforms could be performed on the image of an object for later matching into the database, and segmentation early on in visual processing in complex natural scenes including occlusions is very difficult. Another difficulty with this proposal is that it is sufficiently powerful, if it could be implemented in the brain, to solve the problem of invariant object recognition in as little as one processing stage, and this does not match the gradual computation of invariant representations which seems to be being performed over many stages of the cortical visual hierarchy leading to IT as described in Chapters 5 and 8.

Syntactic 3D structural descriptions of objects (Section 8.2.2), which were chosen by Marr (1982) and by Biederman (1987), suffer from the problems that it is very difficult to conceive how the arbitrary ‘on-the-fly’ (dynamic, real-time) syntactic or relational linking between the limited set of parts could be performed by a biological system, as described in Section 8.2. The binding would need not only a link between part A and part B, but the link would have to provide some evidence about the spatial relation, e.g. ‘is to the top left of’, and this would mean that at least three items would be involved in every relational description (at least two parts and one relational descriptor). Synchronization does not seem to be a possible solution to this binding problem, because, to describe an object, not just knowing which features need to be bound, but the type of relational binding between every pair of features, would be needed (see further below). Another problem is that such a system only works well if the parts or features can be identified correctly (which includes early segmentation) before the structural description can be parsed, and this early segmentation is difficult in natural scenes.

Hierarchical feature analysis systems do have the capability of implementing the spatial relations between features into the analysis they perform, by incorporating fixed (non-dynamic) feature combination neurons, as described in Chapter 8. Such systems perform the computation gradually over several processing stages (matching what is found in the primate visual system) in order to limit the number of possible combinations that would potentially need to be tested if the analysis were to be globally over the whole image space at any one stage.

One potential problem with such systems is the difficulty of performing the analysis over the whole image because many different objects may be present in the scene, and the spatial relations between all the objects need to be encoded. It is proposed that the solution to this problem used by the brain is to limit the analysis so that it is not performed simultaneously over the whole visual field. This is achieved by using foveate vision (i.e. giving preference to a high resolution central region), by reducing receptive field size in complex natural scenes using shunting / competitive interactions (as described in Sections 5.6.3 and 8.4.9), and by (p.459) using spatial or object attentional bias to limit what needs to be solved at any one time, as described in Chapters 9 and 10.

Another potential problem is the potential combinatorial explosion in the number of neurons required, but it is suggested in Chapter 8 that this is solved partly by taking low order combinations of the inputs from the preceding stage, partly by taking the inputs from only a small region of the previous stage, and partly by using the redundancy present in natural images, and matching by self-organization the feature analyzers to the features and feature combinations that are present in scenes.

Another potential problem is how the system might be trained to put together the different types of retinal feature that are transforms of each other in order to build transform invariant representations. The solution that is proposed to this in Chapter 8 is a short term memory trace-based local associative learning rule.

Another possible problem of such systems is that they most easily associate together different 2D views of objects, and may not include a full 3D description of the object. It is suggested that the work-around to this problem adopted by the visual system is to use a separate visual system, the dorsal visual system, to perform the 3D operations needed in space to perform actions on objects identified by the ventral visual system.

In summary, although there are problems and limitations of such systems, the processing being performed by the ventral visual system does appear to match a feature hierarchy analysis system, and solutions that the brain uses to many of the problems that arise are proposed and described in Chapter 8.

13.6 Invariant representations can be self-organized using a trace learning rule incorporated in a feature hierarchy network such as VisNet

Although the demonstration discussed in Section 9.6.2 shows that attention can contribute to help with providing useful translation invariant representations by limiting the area of the visual field being processed, and thus the number of objects that might otherwise be simultaneously represented in the inferior temporal visual cortex, attention-based models are not sufficient to solve the problem of forming invariant representations, as described in Section 13.7.

A sufficient and also biologically plausible way of achieving translation invariance using feedforward processing and a trace learning rule is described in Chapter 8. A feedforward system is more plausible in that with the masking experiments described in Section 5.5.6 and Appendix B.3.2, it is found that inferior temporal cortex neurons can still provide object-selective information even when each cortical area can fire for only 30 ms. This time is too short to enable activity once it has reached the inferior temporal cortex to be back projected to V1, and forward projected again with dynamic settling. One property that may help the real visual cortical system with the feedforward aspect of its processing, especially when operating in cluttered natural scenes, may be the weighting given to whatever is at the fovea, given the larger magnification factor at the fovea (see Section 8.4.6).

Of course, in natural cluttered scenes when there is sufficient time for attentional processes to operate, good performance may be contributed both by the weighting given to whatever is at the fovea due to its greater magnification (which describes overt attention, where eye position (p.460) is used to define what object in a scene to process), and to the types of covert attentional process described in this book, where attention can be paid to an object that is not at the fovea.

Another reason why the trace learning rule-based self-organizing processes described in Chapter 8 are more appropriate for forming invariant representations in general is that they provide an account of view-invariant object recognition, which a spatial attention-based approach to invariant object recognition cannot. Further, a spatial attention-based approach to invariant object recognition cannot, without special added mechanisms, account for the binding of features in which the relative spatial position of the features must be encoded, as discussed in Section 13.7.

13.7 Spatially selective shape feature binding, and invariant form-based object recognition, require neurons that respond to combinations of features in the correct spatial configuration, but not attention or synchronization

To obtain globally invariant representations of objects (i.e. representations that are invariant when the whole object is moved in the visual field), yet in which the local spatial arrangement of the features is critical, it is shown in Section 8.4.5 that a solution is to use neurons that become by a self-organizing process connected to respond to a low-order combination of features in the correct spatial arrangement from the previous layer. Such neurons might respond differently to an ‘L’ and a ‘T’, even though both are formed from a vertical and a horizontal line. Low-order combinations (combinations of a few inputs) are required in order to limit the combinatorial explosion, the solution to which is also helped by the layered architecture of the visual system (V1–V2–V4–Posterior IT–Anterior IT), by the redundancy in the statistics of the visual world, and by the large numbers of neurons in the visual system. It is also noted in Section 8.4.5 that the spatial binding of features must be implemented early on in the processing, so that the relative positions of the features become part of the object description, even though the object itself can be at any location in the space. Evidence consistent with this requirement and prediction of the theory of invariant object recognition described in Chapter 8 is now starting to appear. In particular, many neurons in V1 may respond better to combinations of spatial features with the correct spatial configuration and to single oriented lines or edges (see Section 2.5).

Attention per se (see Section 9.6.2) will not suffice to solve the computational problem, as shown by the following. If the mechanism of attention as described in Chapters 9 and 10 enables features in a small region of space to be highlighted (have increased activity) (produced by either spatial or object-based bias), then this still does not specify what the spatial relations are between the features in the highlighted zone. If we paid attention to an area where a ‘T’ or an ‘L’ was located and the description was in terms of only first order features such as a vertical line and a horizontal line, then we would still not know the spatial relations between the features, and just highlighting them would not solve the shape/object discrimination problem. Of course attention can be a major help in such a system, by helping with the selection of objects for action. Attention in this sense may enable us to select one object rather than another, which would otherwise be a major problem in a cluttered scene, as described in Sections 8.4.5, 8.4.6, 5.6.3 and 8.4.9. But attention would not itself enable (p.461) us to tell one shape or object from another, where the shape is defined by the relative spatial locations of features within objects, and not just by the list of features contained within objects, as described in Section 8.2.

In a similar way, synchronization does not itself solve the problem of binding features in the correct spatial relation as required for shape and object recognition. Synchronization might enable the system to know that two features are for example part of the same object (see Sections 5.5.7 and 8.5), but would not by itself define the spatial relations between the features. Of course, there could be a special set of syntactic units, defining for example ‘the first feature is to the left of and a bit above the second feature’, which would be synchronously active with the first and second features, but this would raise all sorts of further problems, such as the system knowing the difference between the first and the second feature (itself needing a binding descriptor?), the enormous number of ‘features + syntactic relation’ pools that would need to be kept apart by the on-line synchronous binding process (cf. Malsburg (1973) and Malsburg (1999)) to describe a typical object, and the difficulty of specifying metric properties of the spatial relations (e.g. ‘a little above and a lot to the left of’) which would require an enormous number of spatial relation descriptors. All these are generic problems of syntactic structural description schemes, as described in Sections 8.2 and 13.5, which might not be a problem for computer implementations, but seem entirely implausible for the brain to implement for general-purpose object recognition. Of course, as noted elsewhere, the fact that the human brain can provide a structural description of objects is not the point at issue here, for this may be performed by a very different brain system present in humans which relies on the syntactic capabilities of our natural language processing.

We note that attention and/or synchronization could of course help when features such as shape, colour and motion are represented in different maps in intermediate-level visual analysis in the cortex. For example, as shown in Chapter 10, spatial attention could increase the activity of neurons in corresponding locations in shape and colour maps, so that the particular colour and form represented at that location will survive the competition in the object representation.

13.8 The representation at the end of the ventral, object processing, stream is in a form that is suitable for object-reward associations, recognition memory, short term memory, and episodic memory

The evidence for this is described in Chapter 5, and the implementations of these processes in areas that receive from the inferior temporal visual cortex are described in Chapter 12. The outputs of the ventral visual stream are appropriate for these functions because they provide evidence about the properties of objects in the world independently of how they appear on the retina, and it is objects that are associated with rewards and punishers, are recognized, or are found in particular places, not particular retinal images. This property enables the memory and other systems that receive from the inferior temporal visual cortex to generalize correctly, e.g. from one view to another view of the same object.

(p.462) 13.9 There are temporal cortical visual areas, found especially in the cortex in the anterior part of the superior temporal sulcus, with neurons specialized for face expression, for view-dependent representations, for face and body gestures, and for combining object and motion information

As described in Section 5.7, there are specialized populations of neurons that code for face expression and not face identity. These neurons are found primarily in the cortex in the superior temporal sulcus, while the neurons responsive to identity are found in the inferior temporal gyrus, and areas adjacent to this, such as TEa and TEm. Information about facial expression is of potential use in social interactions. A further way in which some of these neurons in the cortex in the superior temporal sulcus may be involved in social interactions is that some of them respond to gestures, e.g. to a face undergoing ventral flexion, or more generally turning towards or away from the observer. Many neurons in this region respond to objects or faces only when they are moving in particular ways, and this is a brain region in which evidence from the ventral and dorsal visual streams appears to be brought together for special functions. It is also important when decoding facial expression to retain some information about the direction of the head relative to the observer, for this is very important in determining whether a threat is being made in your direction. The presence of view-dependent, head and body gesture (Hasselmo, Rolls, Baylis and Nalwa 1989b), and eye gaze (Perrett, Smith, Potter, Mistlin, Head, Milner and Jeeves 1985b), representations in some of these cortical regions where face expression is represented is consistent with this requirement. These systems are likely to project to regions of the orbitofrontal cortex and amygdala, in which face expression and object–movement neurons are found.

13.10 Interactions between an object information processing stream and a spatial processing stream implemented by back projections to an early topologically organized visual area can account for many properties of visual attention

The model of attention described in Chapters 911 represents an advance beyond the biased competition hypothesis, in that it shows how object and spatial attention can be produced by dynamic interactions between the ‘what’ and ‘where’ streams, and in that as a computational model that has been simulated, the details of the model have been made fully explicit and have been defined quantitatively. An interesting and important feature of the model is that the model does not use explicit multiplication as a computational method, but the modulation of attention (for example the effects of posterior parietal module (PP) activity on V1) appears to be like multiplication. This is an interesting contribution of the model, namely that multiplicative-like attentional gains are implemented without any explicit multiplicative operation (see Section 7.8.3).

(p.463) In Chapters 911 we analyzed the neuronal (‘microscopic-level’) neurodynamical mechanisms that underlie visual attention. We formulated a computational model of cortical systems based on the ‘biased competition’ hypothesis. The model consists of interconnected populations of cortical neurons distributed in different brain modules, which are related to the different areas of the dorsal or ‘where’ and ventral or ‘what’ processing pathways of the primate visual cortex. The ‘where’ pathway incorporates mutual connections between a feature extracting module (V1–V4), and a parietal module (PP) that consists of pools coding the locations of the stimuli. The ‘what’ path incorporates mutual connections between the feature extracting module (V1–V4) and an inferotemporal module (IT) with pools of neurons coding for specific objects. External attentional top-down bias is defined as inputs coming from higher prefrontal modules which are not explicitly modelled in Chapters 911 but are modelled in Section 12.1. Intermodular attentional biasing is modelled through the coupling between pools of different modules, which are explicitly modelled. Attention appears now as an emergent effect that supports the dynamical evolution to a state where the constraints given by the stimulus and the external bias are satisfied. Visual search and attention can be explained in this theoretical framework of a biased competitive neurodynamics. The top-down bias guides attention to concentrate at a given spatial location or on given features. The neural population dynamics are handled analytically in the framework of the mean-field approximation. Consequently, the whole process can be expressed as a system of coupled differential equations. The model was extended in order to include the resolution hypothesis, and a ‘microscopic’ physical (i.e. neuron-level) implementation of the global precedence effect. We analyzed the attentional neurodynamics involved in visual search of hierarchical patterns, and also modelled a mechanism for feature binding that can account for conjunction visual search tasks.

The essential contributions of this model of attention are:

  1. 1. Different functions involved in active visual perception have been integrated by a model based on the biased competition hypothesis. Attentional top-down bias guides the dynamics to concentrate at a given spatial location or on given (object) features. The model integrates, in a unifying form, the explanation of several existing types of experimental data obtained at different levels of investigation. At the microscopic neuronal level, we simulated single cell recordings, at the mesoscopic level of cortical areas we reproduced the results of fMRI (functional magnetic resonance imaging) studies, and at the macroscopic perceptual level we accounted for psychophysical performance. Specific predictions at different levels of investigation have also been made. These predictions inspired single cell, fMRI, and psychophysical experiments, that in part have been already performed and the results of which are consistent with our theory.

  2. 2. Attention is a dynamical emergent property in our system, rather than a separate mechanism operating independently of other perceptual and cognitive processes.

  3. 3. The computational perspective provides not only a concrete mathematical description of mechanisms involved in brain function, but also a model that allows complete simulation and prediction of neuropsychological experiments. Interference with the operation of some of the modules was used to predict impairment in visual information selection in patients suffering (p.464) from brain injury. The resulting experiments support our understanding of the functional impairments resulting from localized brain damage in patients.

13.11 Visual search

As discussed in Section 12.5, in complex natural scenes visual search may take place largely overtly, by eye movements, which are serial in nature. However, mechanisms for covert visual search are described in Chapters 911, and although perhaps contributing to performance more in simple visual displays with two or a few objects present, may contribute to performance in complex natural scenes by influencing the next eye movement that will be made, as suggested in Chapter 10.

We demonstrated that it is possible to build a neural system for visual search, which works across the visual field in parallel but, due to the different latencies of its dynamics, can show the two experimentally observed modes of visual attention, namely: serial focal attention, and the parallel spread of attention over space. Neither explicit serial focal search nor saliency maps need to be assumed.

The visual system works always in parallel, but the different latencies associated with different spatial resolutions allow the emergence of an early attentional focus based on coarse features. Spatial resolution then gradually increases at this focus as the high spatial frequency channels contribute to the processing being performed. The spatial form of the early focus is initially attached to the coarse (low spatial resolution) structure of an object. When finer details start to modulate the initially activated pools, a focus more attached to the details of the object emerges. Consequently, our neuro dynamical cortical model explains the underlying massively parallel mechanisms that not only generate the emergence of a kind of attentional ‘spotlight’, but also its object-based character, and the associated spatially localized enhancement of spatial resolution suggested by the resolution hypothesis.

13.12 The parietal cortex contains egocentric spatial representations, and the medial temporal lobe system allocentric spatial representations

The representations in the parietal cortical areas are primarily egocentric (see Chapters 6 and 12). This is appropriate for a system which must guide actions towards locations already selected on the basis of the output of the ventral visual system. On the other hand, allocentric spatial representations, of places in the world independently of egocentric disposition, are represented in ventral brain regions such as the hippocampus (see Section 12.2), and the overlying parahippocampal and related cortical areas damage to which produces topographical agnosia (Habib and Sirigu 1987). Allocentric representations are formed to provide appropriate spatial links between items of object-based knowledge, such as the objects that might be represented in a city. As such, the close links from the object identification system in the inferior temporal visual cortex to the (temporal lobe) allocentric spatial representation systems are very appropriate. Of course, when the allocentric spatial representation needs to be linked to egocentric, idiothetic, information, then special linking mechanisms are needed, and the possible linking mechanisms in the hippocampus are described in Sections 7.5.5 and 12.2.

(p.465) 13.13 The controller of visual attention can now be understood in terms of the information represented in short term memory systems in the prefrontal cortex that biases earlier visual cortical spatial and object processing areas by back projections

The model described in Chapter 9 shows that no mysterious controller of attention needs to be found, but that instead the control is performed by the information loaded into prefrontal cortex short term memories biasing earlier visual cortical spatial and object processing areas by back projections. The short term memories are themselves loaded by presentation of the sample cue, in object or spatial working memory tasks such as delayed match to sample with intervening stimuli, as described in Section 12.1. The short term memories are loaded in visual search tasks with the object or location that is the subject of the search, as described in Section 12.1 and Chapter 9.

Other parts of the brain in addition to the prefrontal cortex might provide the top-down bias to the parietal spatial or the IT object modules. We do not wish to exclude these. One example is the auditory-verbal short term memory system in humans (which using rehearsal holds on-line a set of approximately 7 chunks of information), and which may be located in the cortex at the junction of the left parieto-occipito-temporal areas. The principle though is the same, that there is no mysterious controller of attention, and that what is needed is a short term memory system to hold the spatial or the object of attention active, and which provides top-down bias to the high-level spatial or object representation areas such as PP or IT.

This we believe is an important conceptual point, in that it removes the concern that there is some non-understood aspect of the control of attention, with a type of ‘deus ex machina’ or at least an unlocated (serial or parallel) ‘spotlight controller’ being needed. Indeed, the overall schematic architecture of the system described in this book is illustrated in Fig. 12.1. The architecture allows the target of attention to be analyzed in the spatial or object processing stream, then loaded into a prefrontal cortex (or other) short term memory system, from which it can exert its top-down biasing effect on the spatial or object stream, which in turn by interactive feedforward and feedback effects causes the whole system to settle to optimally satisfy the constraints. The constraint satisfaction we describe is not itself a mysterious process either, but can be understood as an energy minimization process now well understood in neural networks (Hopfield 1982, Amit 1989, Hertz, Krogh and Palmer 1991). This constraint satisfaction generally operates well in practice even when the conditions required for the formal analysis are not present, for example when the system does not have complete and reciprocal connectivity due to random asymmetric dilution of the connectivity, i.e. when synapses are missing at random (Treves 1991, Rolls and Treves 1998). For the memory retrieval properties of such systems to operate well, the number of synapses per neuron must be kept relatively high, above 1,000–2,000 (Rolls, Treves, Foster and Perez-Vicente 1997c), an important condition which significantly is well met by the actual numbers of synapses per neuron in the cortex. We note also that the large number of forward and backward connections between adjacent cortical areas in the architecture shown in Fig. 12.1 provides a suitable basis, given also some type of associative synaptic connection rule between the connected areas, for a system that can operate in the interactive constraint satisfaction way described in Sections 1.11.7.9, and Chapters 911.

We believe that the conceptual framework for understanding attention described in this (p.466) book may be useful in helping to understand the otherwise rather complicated pictures that are often produced in neuroimaging studies of attention in humans, in which large swathes of parietal and frontal cortical territory often show activation. We now have clear reasons for expecting frontal, parietal, temporal and even occipital lobe contributions to attention, given the architecture shown in Fig. 12.1 and the model described in Chapters 911. The model, and the specialization of function within the parietal cortex described in Chapter 4, lead us to understand that different parietal and even connected frontal areas may be activated during different types of spatial attention and memory, for example when attention must be paid to where a response is to be made with the arm or with the eyes, or when there are spatial cues in both visual fields, one of which is a target and others of which are distractors. We would also expect different temporal cortical areas to become activated while paying attention depending on whether the attention is to face identity, face expression, objects, objects undergoing motion, colour, etc (see Chapter 5). Given the tendency of neurons to cluster into small regions where similar neurons are found (due to the self-organizing map principles described in Section 7.4.6), we would even expect the exact loci of activation found in the temporal areas to be somewhat different for different classes of object, and to be in not necessarily the same relative positions in different humans. We would also expect some activation during attentional tasks to be found quite early on in cortical visual processing, perhaps as far back as V1, and have given reasons in Chapter 9 why this might though weak still be a useful feature of the attentional architecture.

Thus the fundamental understanding offered by our conceptualization of the operation of attentional processes in the brain may we hope help to provide a fundamental basis for understanding the phenomena that arise in imaging studies, but also of course in neurophysiological, psychophysical, and neuropsychological studies.

13.14 Output to object selection and action systems

How are the coordinates of a selected target object passed to the motor system for action, if there is little topology, and there is spatial invariance, in IT? It is suggested that the use of the position in visual space being fixated provides part of the interface between sensory representations of objects and their coordinates as targets for actions in the world (Section 12.4). The small receptive fields of IT neurons in natural scenes make this possible. After this, local, egocentric, processing implemented in the dorsal visual processing stream using e.g. stereodisparity may be used to guide action.

13.15 ‘What’ versus ‘where’ processing streams

It is argued in Section 12.4 that the ventral stream is needed so that the memory and related operations described in Chapter 12 including determining the reward values of objects (which are properties of objects in the world and not of their detailed position and size on the retina), can be computed about objects that are made explicit in the representation provided in the inferior temporal visual cortex. Having produced this (‘what’) representation of objects, emotional and motivational behaviour to the objects can be determined. It is important though that the representation of objects in the inferior temporal cortex is left untrammelled by reward associations, because the IT representations are needed for many different functions, (p.467) as described in Chapter 12, and these operations (such as remembering where fruit is located in the environment) must proceed independently of whether we currently are hungry for the fruit or not. Thus the ‘what’ representations need to be made explicit in a way that is independent of the location of the objects in egocentric space, and this is the major function performed by the ventral visual stream.

Correspondingly, the dorsal visual stream provides a representation of locations and motion in visual egocentric space, and this is needed in order to guide actions such as arm reaching or eye movements to the correct position in egocentric space.

Not only is it economical not to have such a 3D visual scene representation with all objects in their correct positions, but it is also computationally extremely difficult to produce such a representation using natural visual scenes. It may be partly because of the computationally enormous problem of trying to solve such a problem that our brains make do with a much simpler system, as described in this book, namely a ventral visual system that can represent the identity of objects and can be influenced by 3D cues but does not build a 3D scene-based representation, and a dorsal visual system that provides a 3D representation of space useful for guiding actions, but does not have object descriptions built-in.

13.16 Short Term Memory systems (in the frontal lobe) must be separate from perceptual mechanisms (in the temporal and parietal lobes)

A common method that the brain uses to implement a short term memory is to maintain the firing of neurons during a short memory period after the end of a stimulus (see Section 12.1). For the short term memory to be maintained during periods in which new stimuli are to be perceived, there must be separate networks for the perceptual and short term memory functions. Indeed two coupled networks, one in the inferior temporal visual cortex for perceptual functions, and another in the prefrontal cortex for maintaining the short term memory during intervening stimuli, provides a precise model of the interaction of perceptual and short term memory systems (Renart, Parga and Rolls 2000, Renart, Moreno, de al Rocha, Parga and Rolls 2001). In particular, this model shows how a prefrontal cortex attractor (autoassociation) network could be triggered by a sample visual stimulus represented in the inferior temporal visual cortex in a delayed match to sample task, and could keep this prefrontal attractor active during a memory interval in which intervening stimuli are shown. Then when the sample stimulus reappears in the task as a match stimulus, the inferior temporal cortex module shows a large response to the match stimulus, because it is activated both by the visual incoming match stimulus, and by the consistent back projected memory of the sample stimulus still being represented in the prefrontal cortex memory module (see Figs. 12.2 and 12.3). This computational model makes it clear that in order for ongoing perception implemented by posterior cortex (parietal and temporal lobe) networks to occur unhindered, there must be a separate set of modules that is capable of maintaining a representation over intervening stimuli. This is the fundamental understanding offered for the evolution and functions of the dorsolateral prefrontal cortex, and it is this ability to provide multiple separate short term attractor memories that provides we suggest the basis for its functions in planning. One of the underlying computational constraints that drives these points is that a short term memory (p.468) network implemented by continuing firing in an attractor state can usefully hold only one memory active at a time (see Section 7.3).

This approach emphasizes that in order to provide a good brain lesion test of prefrontal cortex short term memory functions, the task set should require a short term memory for stimuli over an interval in which other stimuli are being processed, because otherwise the posterior cortex perceptual modules could implement the short term memory function by their own recurrent collateral connections. This approach also emphasizes that there are many at least partially independent modules for short term memory functions in the prefrontal cortex (e.g. several modules for delayed saccades in the frontal eye fields; one or more for delayed spatial (body) responses in the dorsolateral prefrontal cortex; one or more for remembering visual stimuli in the more ventral prefrontal cortex; and at least one in the left prefrontal cortex used for remembering the words produced in a verbal fluency task—see Rolls and Treves (1998) Chapter 10).

This computational approach thus provides a clear understanding for why a separate (prefrontal) mechanism is needed for working memory functions. It may also be noted that if a prefrontal cortex module is to control behaviour in a working memory task, then it must be capable of assuming some type of executive control. There may be no need to have a single ‘central executive’ additional to the control that must be capable of being exerted by every short-term memory module. This is in contrast to what has traditionally been assumed for the prefrontal cortex (Shallice and Burgess 1996).

The same model shown in Fig. 12.2 can also be used to help understand the implementation of visual search tasks in the brain (Renart, Parga and Rolls 2000). In such a visual search task, the target stimulus is made known beforehand, and inferior temporal cortex neurons then respond more when the search target (as compared to a different stimulus) appears in the receptive field of the IT neuron (Chelazzi, Miller, Duncan and Desimone 1993, Chelazzi, Duncan, Miller and Desimone 1998). The model shows that this could be implemented by the same system of weakly coupled attractor networks in the prefrontal cortex (PF) and IT shown in Fig. 12.2 as follows. When the target stimulus is shown, it is loaded into the PF module from the IT module as described for the delayed match to sample task. Later, when the display appears with two or more stimuli present, there is an enhanced response to the target stimulus in the receptive field, because of the back projected activity from PF to IT, which adds to the firing being produced by the target stimulus itself (Renart, Parga and Rolls 2000, Renart, Moreno, de al Rocha, Parga and Rolls 2001). The interacting spatial and object networks described in Chapters 911 (see Fig. 12.1) take this analysis one stage further, and show that once the PF–IT interaction has set up a greater response to the search target in IT, this enhanced response can in turn by backprojections to topologically mapped earlier cortical visual areas move the ‘attentional spotlight’ to the place where the search target is located. A further way in which attractor networks can help to account for the responses of IT neurons is described in Section 12.5.

13.17 Cortico-cortical backprojections must be weak relative to forward and intramodular recurrent connections

The evidence and reasons for this are described in Sections 1.11, 7.9 and 12.1.

(p.469) 13.18 Long-term potentiation is needed for the formation but not the reuse of short-term memories

To set up a new short term memory attractor, synaptic modification is needed to form the new stable attractor. Once the attractor connections are set up, the attractor may be used repeatedly when triggered by an appropriate cue to hold the short term memory state active by continued neuronal firing even without any further synaptic modification (see Sections 7.3, 12.1 and Kesner and Rolls (2001)). Thus manipulations that impair the long-term potentiation of synapses (LTP) may impair the formation of new short term memory states, but not the use of previously learned short term memory states.

13.19 “Executive control” functions of the prefrontal cortex may simply reflect the functions of the prefrontal cortex in providing short term memory systems, used for example for attentional targets to be maintained on-line.

The simple architecture described in Section 9.3 and more generally in Chapter 9 (see Fig. 12.1 and Deco and Lee (2001)) allows spatial attention (Helmholtz 1867) and object attention (James 1890) to be accounted for in a symmetric fashion. The two modes of attention emerge depending simply on whether a top-down bias is introduced to either the dorsal stream posterior parietal (PP) module or the ventral stream IT module. In this framework, attention is produced by a simple top-down bias communicated from the short term memory systems of the brain which hold the target object or location in memory (e.g. in the prefrontal cortex) to the dorsal stream or the ventral stream. Moreover, this conceptualization offers a way of understanding the “executive control” that is ascribed to the prefrontal cortex. It appears based on phenomenology to implement “executive control”, but at least a major part of this function we suggest can be understood as providing the short term memory bias to posterior (parietal and temporal) perceptual systems to enable them to implement attentional effects as described. Of course, without a short term memory system in the prefrontal cortex to hold the target on-line in memory while the perceptual systems are processing sensory input (see Section 12.1), the whole organism would appear to an observer to be without “executive control”, and indeed to be displaying a “dysexecutive syndrome” (Shallice and Burgess 1996).

Although we show in Fig. 12.1 separate working memory systems in the lateral prefrontal cortex, for spatial locations more dorsally (PFCd) and for objects more ventrally (PFCv) towards the inferior convexity, in line with where the inputs from the parietal cortical areas and the temporal lobe object areas may be focused, we do not require physically separated short term memory systems for the model to operate. The requirement of the model is to have a short term memory system, whether for spatial or object information, that is reciprocally connected back to the spatial (PP) and object (IT) processing systems (see Fig. 12.1). We note that there is evidence for at least some mixing of spatial and object short term memory systems (Rao, Rainer and Miller 1997), and indeed attractor networks can store both continuous representations (of for example physical space) and discrete representations (of for example objects) (see Section 7.5.8 and Rolls, Stringer and Trappenberg (2002c)). Our view is that there may in fact be partly separate and partly overlapping short term memory systems in the (p.470) prefrontal cortex, with partial separation necessary in order to obtain a total memory capacity that is greater than that of just a single network covering the whole of the prefrontal cortex (see Section 7.9), which would be very limiting; but partly overlapping due to the short range spread of recurrent collateral connections in the cortex (see Chapter 1). Indeed, the short range connectivity within a cortical area of the recurrent collaterals with a relatively high density (up to perhaps 10%) within 1–2 mm (see Chapter 1) could be seen as a useful neocortical adaptation (and in contrast to the hippocampus) to enable partly separate operation of nearby cortical areas, in order to keep the total memory capacity high, in order to enable different computations to proceed simultaneously, and in order to enable several items to be kept in short term memory simultaneously by keeping the attractors separate (see Section 7.9).

13.20 Reward and punishment, and emotion and motivation, are not represented in the object processing stream

It is shown in Section 12.3 that visual sensory processing in the primate brain proceeds as far as the invariant representation of objects (invariant with respect to, for example, size, position on the retina, and even view), independently of reward versus punishment association. Why should this be, in terms of systems-level brain organization? The suggestion that is made is that the visual properties of the world about which reward associations must be learned are generally objects (for example the sight of a banana, or of an orange), and are not just raw pixels or edges, with no invariant properties, which are what is represented in the retina and V1. The implication is that the sensory processing must proceed to the stage of the invariant representation of objects before it is appropriate to learn reinforcement associations. The invariance aspect is important too, for if we had different representations for an object at different places in our visual field, then if we learned when an object was at one point on the retina that it was rewarding, we would not generalize correctly to it when presented at another position on the retina. If it had previously been punishing at that retinal position, we might find the same object rewarding when at one point on the retina, and punishing when at another. This is inappropriate given the world in which we live, and in which our brains evolved, in that the most appropriate assumption is that objects have the same reinforcement association wherever they are on the retina.

The same systems-level principle of brain organization is also likely to be true in other sensory systems, such as those for touch and hearing. For example, we do not generally want to learn that a particular pure tone is associated with reward or punishment. Instead, it might be a particular complex pattern of sounds such as a vocalization that carries a reinforcement signal, and this may be independent of the exact pitch at which it is uttered. The same may be true for touch in so far as one considers associations between objects identified by somatosensory input, and primary reinforcers. An example might be selecting a food object from a whole collection of objects in the dark.

The second point, which complements the first, is that the visual system is not provided with the appropriate primary reinforcers for such pattern association learning, in that visual processing in the primate brain is mainly unimodal to and through the inferior temporal visual cortex (see Fig. 12.12). It is only after the inferior temporal visual cortex, when it projects to structures such as the amygdala and orbitofrontal cortex, that the appropriate convergence (p.471) between visual processing pathways and pathways conveying information about primary reinforcers such as taste and touch/pain occurs (Fig. 12.12).

Part of the functional significance of not representing the reward value of visual stimuli until after object representations have been formed is that the object representations may be needed for many different functions, including recognition, short term memory, the formation of long-term episodic memories, etc. The systems-level principle here is that identification of what the visual stimulus or taste is should ideally be performed independently of how rewarding or pleasant the visual stimulus or taste is. The adaptive value of this is that even when neurons that reflect whether the taste (or the sight or smell of the food) is still rewarding have ceased responding because of feeding to satiety, it may still be important to have a representation of the sight (and, for that matter, the smell and taste, see Rolls (1999a)) of food, for then we can still (with other systems) learn, for example, where food is in the environment, even when we do not want to eat it. It would not be adaptive, for example, to become blind to the sight of food after we have eaten it to satiety.

It is shown in Section 12.3 that the learning of associations of visual representations of objects with primary reinforcers occurs in brain regions that receive from IT, such as the orbitofrontal cortex and amygdala. These brain regions contain representations of primary reinforcers such as taste and touch, and receive distributed representations of objects directly from IT that are in a form suitable for pattern association networks to learn the associations between visual stimuli and primary reinforcers. In this way the orbitofrontal cortex and amygdala enable visual representations to become goals for actions.

Outputs from the amygdala and orbitofrontal cortex thus provide a pathway by which visual stimuli become the targets of actions. This function results in the orbitofrontal cortex and amygdala having important functions in emotion and in motivation, as described in Section 12.3.

13.21 Effects of mood on memory and visual processing

Backprojections from brain areas such as the orbitofrontal cortex and amygdala where mood is represented can influence the recall of memories and the visual images that are recalled, in ways analyzed by Rolls and Stringer (2001b) and described in Section 12.3.6.

13.22 Visual outputs to Long Term Memory systems

The representation of objects in the inferior temporal cortical areas is in a suitable form for an input to long term memory mechanisms, as described in Section 12.2. The representation is suitable in that objects are made explicit in the output, and transforms of the objects are not. The representation is also suitable in that it has high capacity with the neurons carrying almost independent information, and is suitable as an input to associative memories in that much of the information can be read by neurons that perform dot product operations. The outputs that reach the hippocampus may be used for episodic memory, especially when this has a spatial component. The outputs to the perirhinal cortex may be used for recognition memory, where interestingly it has been shown that the degree of long-term familiarity of stimuli may be represented. This has interesting implications for understanding amnesia. A part of the orbitofrontal cortex has neurons that respond only for the first few occasions on (p.472) which novel stimuli are shown, and thus provides a long-term memory useful for detecting novel stimuli.

13.23 Episodic memory and the operation of mixed discrete and continuous attractor networks

As shown in Sections 12.2 and 7.5.8, the implementation of episodic memory which normally has continuous spatial and discrete (e.g. object) components, could be implemented by an attractor network that combines both continuous and discrete memory representations. This network may be located in medial temporal lobe regions, such as the hippocampus. The primate hippocampus provides with its spatial view cells an allocentric (world-based) representation of space ‘out there’, which is very appropriate for the spatial part of an episodic memory. The hippocampus can maintain this allocentric spatial representation in the dark without visual cues. Moreover, idiothetic, that is self-motion, cues are interfaced to visual representations in this brain region, in that eye movements update the allocentric representation even in the dark. A model for how this interfacing could operate using a continuous attractor network that performs ‘path integration’ on the idiothetic inputs is described in Sections 12.2 and 7.5.5.

13.24 Visual outputs to behavioural response systems

The inferior temporal cortex projects to brain regions such as the tail of the caudate nucleus that may be involved in orienting to changing objects (Caan, Perrett and Rolls 1984, Rolls 1999a), and to parts of the lateral prefrontal cortex which may be involved in conditional visual object to motor-response mapping (see Chapter 12).

13.25 Multimodal representations in different brain areas

Multimodal representations are found in many of the areas to which the inferior temporal visual cortex projects, for example in the orbitofrontal and amygdala (see Section 12.3) where they are involved in visual stimulus to reinforcement association learning, and in the cortex in the superior temporal sulcus, where visual and auditory processing streams converge (see Chapter 5 and Section 7.9).

13.26 Visuo-spatial scratchpad and change blindness

As argued in Section 12.8, the visual scene which we perceive may be implemented largely by at least partially separate and local short term memory attractor networks each representing different locations in the visual scene, and loaded with objects close to the fovea from the representation provided by the inferior temporal visual cortex when we are looking at the object. This system appears to be located in cortex in the parieto-occipital region.

(p.473) 13.27 A unified feature hierarchical model of invariant object recognition and dynamical attention

To help understand some of the computational issues involved in invariant object recognition using a feature hierarchy network, such as seems to be implemented in the brain, we described in Chapter 8 what for modelling tractability was kept as a feedforward model. The dynamical feedback model of attention described in Chapters 911 required dynamics, in order to account for many of the interacting processes involved in attention and its temporal phenomena, including “serial” vs “parallel” search. The approaches as described are complementary, but it is possible to combine both approaches into a single unified model. Indeed, the implementation of such a model is straightforward, in that it incorporates the overall network architecture and dynamical equations used in Chapter 9, with the local lateral inhibition within an area, the hierarchical multistage pyramidal architecture, and the trace learning rule, used in VisNet as described in Chapter 8. Indeed, we have now produced such a model, and have shown that it has the properties described for each model in Chapters 812. The computational principles described in this Chapter all arise in such an integrated overall approach to the operation of the ventral visual stream in object identification, the mechanisms of visual attention, and the role in these of interactions between the dorsal and ventral visual streams. Moreover, a single integrated model is allowing new aspects of the operation of the visual system to be explored, such as the fact that attentional processes produce larger effects in later than in earlier cortical stages of visual information processing (Kastner, Pinsk, De Weerd, Desimone and Ungerleider 1999).

13.28 Conscious visual perception

Conscious visual perception may be more associated with the operation of the ventral than the dorsal processing stream (see Fig. 1.10) (Milner and Goodale 1995) (Section 12.9). The possibility that consciousness is more closely associated with processing in the ventral visual pathways may be related to the fact that we can perform long-term planning about objects, a process that when helped by higher-order thoughts used to correct the plans, may be closely related to consciousness, as argued by Rolls (1999a).

13.29 Attention—future directions

We believe that the theoretical framework presented in this book not only offers a rigorous analysis of visual attention, but opens the possibility of posing new questions that can be addressed in future investigations. We see two main directions in which this kind of computational model could be further extended. The first is in the direction of the analysis of the functional role of rapid neural oscillations. Rapid oscillations of neural activity could have a relevant functional role, namely to facilitate dynamical cooperation between neuronal pools in the same or different brain areas. It is well known in the theory of dynamical systems that the synchronization of oscillators is a cooperative phenomenon. Cooperative mechanisms could complement the competitive mechanisms on which our computational cortical model is based. This would allow the activation of several competitive pools simultaneously, which is sometimes a convenient solution, for example for binding the features of two or more objects (p.474) or parts of objects by simultaneous activity. As we have shown, attention helps to bind the different feature dimensions of each object. Therefore, we can imagine that attention can activate neuronal pools simultaneously and create dynamical states that are synchronized. Synchronization may then generate cooperative effects that allow the preferential activation of different clusters of pools simultaneously. Two rivalling clusters of synchronized pools could therefore coexist simultaneously if the clusters were dephased, allowing for example the perception of two or more objects simultaneously. Another example would be to help implement perceptual grouping effects. In this scenario, synchronized neuronal activity would be an emergent result of the attentional dynamics, and not the cause of attention, or even binding as it is usually understood. In order to provide this extension to the model, a new mean field theory for the description of the population activity of a group of identical neurons is required. The new theory should go beyond the adiabatic assumptions such as we have used in Chapters 911, and should allow rapid oscillations as possible dynamical solutions. Alternatively, the direct simulation of spiking neurons at the microscopic level (e.g. using integrate-and-fire models) would of course include the fine temporal resolution required for achieving rapid oscillations and synchronization effects. A definite advantage of this approach is that we would no longer need to assume the existence of pools of identical neurons as required by the mean field theory approach. Indeed, as shown in Chapter 5, neurons tend to have different response tuning profiles to sets of stimuli, and use this to encode information about the stimulus set. We would expect an integrate-and-fire implementation to have the same generic dynamical properties as those described in Chapters 911.

A second direction in which the kind of computational model described in Chapters 911 could be further extended is by the formulation of very large-scale models based on global brain connectivity. From an experimental point of view, this development of the theory would allow modelling of fMRI brain activity and MEG (Magnetic Electro Encephalogram) signals simultaneously, which is a trend observed in recent research, because this combination of techniques offers reasonable spatial resolution with fMRI and excellent temporal resolution with MEG. The formulation of very large-scale models of the brain is linked with the so-called inverse problem of extracting brain functional connectivity based on the measurement of fMRI or MEG signals. It is planned, in the near future, to organize standard global public databases of fMRI observations. Such data would allow, in principle, global modelling of the brain based on very large-scale simulations of massively connected neuronal pools. The problem is therefore the extraction of the correct functional connectivity that unifies all data, and consequently several brain functions, at the same time. An information-theoretic approach could be formulated such that it is possible to determine the set of observables that maximize the information about the functional connectivity, contributing to reducing the inverse problem. For example, fMRI activity is unlikely to be sufficient alone to extract the global functional connectivity, and measures describing the temporal or synchronization structure of this activity (by means of MEG for example) may help to complement the temporally averaged information yielded by fMRI signals. The potential clinical relevance of such an approach for neuropsychological assessment based on fMRI and MEG could provide an enormous breakthrough in computer- and model-based diagnosis.

(p.475) 13.30 Integrated approaches to understanding vision

One of the aims of this book has been to show how different approaches to analysing how vision is implemented in the brain can be combined in a complementary way to produce an integrated understanding that can be quantitatively implemented, tested, and explored in a quantitatively defined model. In doing this, bottom-up evidence from low levels of investigation (including detailed evidence of what is represented by the computing elements of the brain, single neurons) is taken to be very relevant to the computational theory. This is somewhat in contrast we note to the approach advocated by Marr (1982), who insisted that the first stage should be specification of the computational theory, the second specification of the algorithm, and the third specification of the implementation. In this book, we have shown that the experimental evidence and what is plausible in biological systems provide important constraints on the computational theory. Indeed, the result of these constraints is that in this book we have argued and produced evidence to support the view that the syntactic 3D structural description approach is not how the brain solves object recognition (see Section 8.2.2), but that instead the brain appears to implement what is close to a feature hierarchy system.

Moreover, again based on the experimental evidence, instead of adopting a primarily bottom-up or feedforward approach (Marr 1982), we have incorporated top-down and interactive effects in the theory we describe of how attentional processes operate in vision.

However, we are very much in agreement with Marr in emphasizing the importance of the computational theory, and indeed it has been an aim of this book to develop computational theories of object recognition, of how other brain areas use these object representations, and of attention, but incorporating all the constraints from all levels of investigation (including disciplines from molecular biology to computational theory) that are available. Indeed, this interdisciplinary approach has led to quite remarkable advances in the last 35 years in understanding how some parts of the brain could function. Some of these advances, and how they are leading to an understanding of how the computations implemented in different parts of the brain link together to contribute to the overall systems-level operation of the brain to produce behaviour, are charted in this book.

13.31 Apostasis

41Extraordinary progress has been made in the last 35 years in understanding how the brain might actually work. Thirty-five years ago, observations on how neurons respond in different brain areas during behaviour were only just starting, and there was very limited understanding of how any part of the brain might work or compute. Many of the observations appeared as interesting phenomena, but there was no conceptual framework in place for understanding how the neuronal responses might be generated, nor of how they were part of an overall computational framework. Through the developments that have taken place since then, some of them charted in this book, we now have at least plausible and testable working models that are consistent with a great deal of experimental data about how invariant visual object representation and recognition may be implemented in the brain. We also have a model, also described in this book, and consistent with a great deal of experimental data, of how (p.476) short term memory is implemented in the brain, and how short term memory systems are related to perceptual systems. We also, and quite closely related to this, have detailed and testable models which make predictions about how attention works and is implemented in the brain. We also have testable models of long term memory, including episodic memory and spatial memory, which again are based on and are consistent with what is being found to be implemented in the primate brain. In addition, we have an understanding, at the evolutionary and adaptive level, of emotion, and at the computational level of how different brain areas implement the information processing underlying emotion (see also Rolls (1999a)). However, our understanding at the time of writing is developing to the stage where we can have not just a computational understanding of each of these systems separately, but also an understanding of how all these processes are linked and act together computationally. Even more than this, we have shown how many aspects of brain function, such as the operation of short term memory systems, of attention, and the effects of emotional state on memory and perception, can now start to be understood in terms of the reciprocal interactions of connected neuronal systems.

This leads us to emphasize that the understanding of how the brain actually operates is crucially dependent on a knowledge of the responses of single neurons, the computing elements of the brain, for it is through the connected operations in networks of single neurons, each with their own individual response properties, that the interesting collective computational properties of the brain arise in neuronal networks.

In describing in this book at least part of this great development in understanding in the last 35 years of how a significant number of parts of the brain actually operate, and how they operate together, we wish to make the point that we are at an exciting and conceptually fascinating time in the history of brain research. We are starting to see how a number of parts of the brain could work. There is much now that can be done and needs to be done to develop our understanding of how the brain actually works. But the work described in this book does give an indication of some of the types of information processing that take place in the brain, and an indication of the way in which we are now entering the age in which our conceptual understanding of brain function can be based on our developing understanding of how the brain computes.

Understanding how the brain works normally is of course an important foundation for understanding its dysfunctions and its functioning when damaged. Some examples have been given in this book. We regard this as a very important long term aim of the type of work described in this book.

Through neural computation, understanding

Notes:

(41) Apostasis—standing after.