TwoChoice Reaction Times: Basic Ideas and Data
TwoChoice Reaction Times: Basic Ideas and Data
Abstract and Keywords
This chapter begins with a discussion of choicereaction times and simplereaction times. It then discusses a conceptual scheme for tradeoffs, discriminability and accuracy, speedaccuracy tradeoff, and sequential effects.
Keywords: reaction time, tradeoff, speed, accuracy, sequential effects, discriminability
6.1 GENERAL CONSIDERATIONS
6.1.1 Experimental Design
The simplest choicereaction time experiment has much in common with the simplereaction time design. The major difference is that on each trial one of several signals is presented that the subject attempts to identify as rapidly as is consistent with some level of accuracy. This attempted identification is indicated by making one of several responses that correspond systematically to the stimuli. In terms of the driving example, not only must dangerous obstacles be detected, but they must be identified and appropriate responses—braking, swerving, or accelerating—be made, which one depending upon the exact identity, location, and movement of the obstacle.
Formally, we shall suppose that the signal is one of a fixed set of possible signals, s _{1}, …, s _{k}, and the corresponding responses are r _{1}, …, r _{k}, so that the subject responds r _{i} if he or she believes s _{i} to have been presented. One may view the selection of a signal by the experimenter as a random variable, which we denote as s _{n} on trial n, and the response is another random variable r _{n}. When there are only two signals and two responses, as will be true throughout this and the following three chapters, I shall avoid the subscripts and, depending upon the context, use either of two notations. In most twochoice situations, I use {a, b} and {A, B} for the names of the stimuli and corresponding responses; the generic symbols are s and r, s = a or b, r = A or B. Occasionally when the experiment was clearly a YesNo detection one in which the subject responded Yes if a signal was detected and No otherwise, I will denote the presentation set by {s, n}, where n stands for no signal or noise alone, and the response set by {Y, N}.
Many aspects of the design do not differ from the simple case. There is always a foreperiod—either the time from the preceding response, in which case it is called the response—stimulus interval, RSI, or from a warning signal—and it may be variable or constant. Catch trials can be used (e.g., Alegria, 1978, and Ollman, 1970), although it is not common to do so. Payoffs may be based upon response time as well as on its accuracy. But more interesting are the new possibilities that arise from the increased complexity of the design. I list five:

1. Because a discriminative response is being made, it is more plausible to (p.206) use a fixed foreperiod design than in the simple reaction case; indeed, it is entirely possible to have a wellmarked time interval during which the signal is presented (this is often used in YesNo and forcedchoice designs). Basically, the argument is that if a discriminative response is made, the subject must wait for the signal to occur and so there is no need to worry about anticipations. This argument is compelling so long as the signals are perfectly discriminable and no response errors occur; but the minute some inaccuracy in the performance is present, then the claim becomes suspect. For a discussion of this and the effective use of random foreperiod in a choice experiment to suppress anticipatory responses, see Green, Smith, and von Gierke (1983). And as we shall see in Section 6.6.5, considerable evidence exists that anticipations may be a problem in many choicereaction time experiments.

2. Since the experimenter has total control over the sequence of signals presented, it is possible to use that schedule in an attempt to elicit information about what the subject is doing. The most commonly used schedule is a purely random one, but in some studies the stimuli are not presented equally often and in a few sequential dependencies are built into the schedule in order to affect locally the decision strategies employed by the subjects.

3. In addition to payoffs based upon the observed reaction times, there is clearly the possibility of introducing information feedback and payoffs based upon the accuracy of the responses. By varying the values of the payoffs for both response time and accuracy, the experimenter sets up different monetary conflicts between accuracy and speed. If people are able to alter their strategies to take advantage of this tradeoff—and they are—then we may exploit this technique to gain some information about what those strategies are.

4. The relation between the signals can be manipulated over experimental runs in order to study how that variable affects behavior. For example, one can use two pure tones of frequencies f and f + Δf, where the separation Δf is manipulated. Clearly, the dependence of reactiontime performance may very well vary with the value of f used and almost surely will depend upon the intensity of the two tones, which can vary from threshold to very loud.

5. As was true for simple reactions, there are many options for the signals—indeed, there are all the possible differences signals may exhibit beyond simple presence and absence. And there are many options for the response, although the most common continues to be finger presses of highly sensitive keys. For an empirical study of the use of different fingers and hands, see Heuer (1981a, b). What is new, and complicating, is the multiple ways in which the possible responses can be related to the possible signals. In most studies, aside from those that undertake to study the impact of different mappings between responses and signals, experimenters attempt to select a mapping that on intuitive grounds is as natural, compatible, and symmetric as possible. Of course, one can study directly the impact of stimulusresponse compatibility on response time, and we will do so in (p.207) Section 10.2.3. In the twochoice situation, three closely related designs are the most common. In each there are two highly sensitive keys, and either the two forefingers are used, each being held over a key, or the subject's preferred forefinger rests on a welldefined spot between the two keys and is moved to the appropriate key, or the forefinger and middle finger of the preferred hand are placed over the two keys.
6.1.2 Response Measures
In any choice experiment there are at least two dependent measures—namely, the choices made and the times they take. One may, in addition, ask of the subject other things such as the confidence felt about the judgment. I shall focus primarily on the first two measures—to my mind the most natural ones. It will prove convenient to partition the story into a number of subpieces. In Section 6.2 and 6.4 I discuss matters that do not entail much, if any, interaction between the two measures or between the measures on successive trials. In Section 6.5 the focus turns to the interaction between response times and errors—the socalled speedaccuracy tradeoff—that arises when the times are manipulated. Section 6.6 examines interactions of responses and their associated times with the events on preceding trials—sequential effects in the data. The following three chapters discuss a number of models that have been proposed to account for some of these as well as other phenomena. After that, in Chapter 10, attention turns to response times when more than two signals are to be identified.
There is, of course, some option in the exact measures of accuracy and time to be collected and reported. For the choices there is little debate (perhaps there should be) about what data to collect: one postulates the existence of a conditional probability, P(r  s), of response r being made when signal s is presented. Of course, we must be sensitive to what aside from the current signal may affect this probability. For example, it may well depend upon some previous signals and responses. Much of the literature tacitly assumes otherwise and, ignoring any dependencies that may exist, relative frequencies of responding are used to estimate these simple conditional probabilities. Some of the data in Section 6.6 should lead to some concern about this practice.
Suppose the signals a, b and the responses A, B are in natural 1 : 1 correspondence; then there are just two independent conditional probabilities since
We usually elect to use P(A  a) and P(A  b). The study of the behavior of these probabilities is the province of much of traditional psychophysics, a topic that is alive and well today in part because of the theory of signal detectability. This literature offers several possible summary measures of (p.208) performance accuracy, the most famous of which is d′. We will go into this question more thoroughly in Sections 6.4 and 6.5.3.
For reaction times, a distribution of times can be developed for each of the four signalresponse pairs and, unlike the choice probabilities, these are not logically constrained to be related in any particular way. Of course, these distributions may also depend upon something else, such as preceding signals and responses. Again, much of the literature implicitly assumes no such dependence; but as we shall see in Section 6.6, that assumption is highly optimistic. As with simple reaction times, the distributions can be presented in a variety of ways, but in practice little has been published about actual distributions or their hazard functions. For the most part, only mean reaction times have been reported, although in a few recent studies some other statistics, usually the variance, skew, and kurtosis (see Sections 1.3.4 and 1.4.5), have also been provided.
6.2 RELATIONS TO SIMPLE REACTION TIMES
6.2.1 Means and Standard Deviations
Perhaps the best established fact about choice reaction times is that, with everything else as similar as possible, these times are slower than the comparable simple ones by 100 to 150 msec, and they are usually, but not always, somewhat more variable. This has been known for a very long time and has been repeatedly demonstrated; I shall cite just two studies in which the relationship has been explored with some care. The first does not involve any feedbackpayoff manipulation; the second does.
In his Experiment 4, Laming (1968) ran 20 subjects for 120 trials in each of five conditions that were organized in a Latin square design. The signals, white bars of width 0.50 in. and height 2.83 in. and 4.00 in., were presented singly in a tachistoscope and the subject had to identify each presentation as being either the longer or shorter stimulus. Signals were presented according to an equally likely random schedule. There were three general types of conditions. In the simplereaction time design subjects were to respond as rapidly as possible to both signals. In one variant they used only their right forefinger and in a second only their left forefinger. In the choice design, they used the right forefinger response for one signal and the left for the other. And in the recognition design, the situation was as in the choice one except that one of the two responses was withheld. That is, they had to identify the signals and respond to just one of them. In a certain sense, the recognition design resembles a choice one in which withholding a response is itself a response. Table 6.1 presents the median reaction times and the median over subjects of the standard deviation of reaction times. The pattern of medians is typical: the simple reaction times of about 220 msec are normal for visual stimuli; the choice values are about 200 msec slower; and the recognition times are some 35 msec faster than the choice ones. The (p.209)
TABLE 6.1. Response time data from Experiment 4 of Laming (1968)
Response time in msec 


Condition 
Median 
Median standard deviation 
Simple 

Right forefinger 
228 
87 
Left forefinger 
220 
88 
Choice 
419 
88 
Recognition 

4 in. 
384 
81 
2.83 in. 
385 
85 
Snodgrass, Luce, and Galanter (1967) ran three subjects in four conditions, including the three run by Laming plus a simple design in which just one of the signals was presented. The two simple cases are identified as simple1 and simple2, depending on the number of signals used. The warning and reaction signals were pure tones of 143 msec duration; the foreperiod was 2 sec. The warning signal was a 1100Hz tone and the two reaction signals were 1000 and 1200Hz tones, both of which were quite audible. Payoffs were used. They were defined relative to one of two time bands: the fast band ran from 100 to 200 msec following signal onset, and the slow one, from 200 to 300 msec. Responses prior to the band in force were fined 5¢, and those within each successive third of the band received rewards of 3¢, 2¢, and 1¢, respectively. In the choice and recognition designs, correct responses each received 1¢ and incorrect ones lost 5¢ each. At the completion of a trial, the subject was provided information feedback about the amounts won or lost. A total of from 210 to 240 observations were obtained in each combination of band and condition that was run (several possible combinations were omitted). The resulting data are shown in Figure 6.1. The pattern closely resembles that of the Laming experiment. Mean times to both simple conditions fall within the appropriate band and are virtually identical, whereas both the recognition and choice mean times either lie in the slow band or exceed 300 msec independent of which payoff band is used. Apparently, it was impossible for these subjects to identify the signals in less than 200 msec, and the payoff structure was such that to abandon accuracy completely would have been costly. The mean recognition data are from 75 to 100 msec slower and the choice ones are from 100 to 150 msec slower than simple reaction times.
The plot of standard deviations, shown in Figure 6.2, exhibits a similar pattern. These data differ appreciably from Laming's, which showed the same variability in all conditions. It is, perhaps, worth noting that his (p.210)
Although recognition data typically are somewhat faster than comparable choice reactions, this is not always true. For example, Smith (1978) (as reported in Smith, 1980) found the recognition times (248 msec) to be slower than the choice ones (217 and 220) in a study of vibrotactile stimuli to the index fingers with keypress responses by the stimulated fingers. This reversal of choice and recognition times will reappear as an issue in Section 6.2.2.
Turning to another phenomenon, we know (Section 2.4.1) that simple MRTs become slower as the proportion of catch trials is increased. Alegria (1978) examined their impact in a choice situation where, of course, we must ask both how the error rate and the MRT is affected. A fixed foreperiod of 700 msec was defined by a spot of light moving from left to right, and the signal interval began when it passed a vertical line. The signals were tones of 900 and 2650 Hz, and responses were key presses of the right index and middle fingers. The three conditions of catch trial frequency were 0%, 20%, and 77%. As can be seen in Table 6.2, the error rate was little affected by the proportion of catch trials or by whether the trial previous to the one being examined was a catch or a signal trial. Figure 6.3 shows that MRT is greatly affected by the nature of the preceding trial, and the effect cumulates over trials, but MRT seems little affected by the actual proportion of catch trials. These data are not consistent with the idea that subjects speed up by relaxing their criterion for what constitutes a signal and thereby increase the number of anticipatory responses. The reason for this conclusion is the fact that the error rate is, if anything, smaller following a signal trial than it is following a catch trial.
Section 2.4.3, on the impact of exponential foreperiods in simple reaction time, reported that MRT increases gradually with the actual foreperiod wait,
TABLE 6.2. Percent error as a function of type of preceding trial and proportion of catch trials (Alegria, 1978)
Proportion of catch trials 
Preceding trial 


Catch 
Signal 

.77 
11.1 
7.6 
.20 
9.5 
5.2 
0 
– 
7.7 
6.2.2 Donders' Subtraction Idea
Donders (1868), in a highly seminal paper, proposed that the time to carry out a specific mental subprocess can be inferred by running pairs of experiments that are identical in all respects save that in the one the subject must use the particular process whereas in the other it is not used. He put the idea this way (Koster's 1969 translation):
The idea occurred to me to interpose into the process of the physiological time some new components of mental action. If I investigated how much this would lengthen the physiological time, this would, I judged, reveal the time required for the interposed term. (Donders, 1969, p. 418)
He proceeded then to report data for several different classes of stimuli for both simple and choicereaction times (some for twostimulus designs (p.213) and much for a fivestimulus one), which he spoke of as, respectively, the aand bprocedures. The difference between the two times he attributed to the difference in what is required—namely, both the identification of the signal presented and the selection of the correct response to make.
He next suggested that the times of the two subprocesses could be estimated separately by collecting times in the recognition or, as he called it, the cprocedure. As we have seen above, this entails a schedule of presentations like that used in the choice design, but instead of there being as many responses as there are stimuli, there is just one. It is made whenever one of the signals is presented and is withheld whenever the other(s) occur. Using vowels as stimuli and himself as subject, he reported the difference in times between the c and aprocedures was 36 msec, which he took to be an estimate of the recognition time, and the b − c difference was 47 msec, an estimate of the time to effect the choice between responses. The comparable numbers for the Laming data are 161 and 34 msec and for the Snodgrass et al. data they are 110 and 40 msec.
Data in which choices are faster than recognitions, such as Smith (1978), should be impossible in Donders' framework. They may result from subjects using different criteria for responding in the two procedures.
Indeed, Donders expressed one concern over the cprocedure:
[Other people] give the response when they ought to have remained silent. And if this happens only once, the whole series must be rejected: for, how can we be certain that when they had to make the response and did make it, they had properly waited until they were sure to have discriminated? … For that reason I attach much value to the result of the three series mentioned above and obtained for myself as a subject, utilizing the three methods described for each series, in which the experiments turned out to be faultless. (Donders, 1969, p. 242)
Although the subtraction method was very actively pursued during the last quarter of the 19th century and today is often used with relatively little attention given to its theoretical basis (e.g., Posner, 1978), it has not found favor in this century among those who study response times as a speciality. The criticisms have been of four types:
First is the one mentioned by Donders himself—namely, that the recognition method may not, in fact, induce the subjects always to wait until the signal actually has been identified. (This may be a difficulty for the other two methods, as well.) His proposed method of eliminating those runs in which errors occur is not fully satisfactory because, as we noted in Sections 2.2.5 and 2.2.6, it is highly subject to the vagaries of small sample statistics. As we shall see below in Section 6.6.5, there is evidence from choice designs that anticipations also occur when the time pressure is sufficiently great. This problem can probably be greatly reduced by using a random (exponential) foreperiod, as in the simplereactiontime design (see Green et al., 1983). The reason that this should work better than catch trials is the greater (p.214) opportunity it provides the lax criterion to evidence itself since each trial affords an opportunity to anticipate. Working against it is the fact that many foreperiods are relatively brief.
The second concern, which was more in evidence in the 1970s than earlier, centers on the assumption of a purely serial process in which all of the times of the separate stages simply add. As we know from Chapter 3, this assumption has not been clearly established for the decision latency and the residue. Sternberg (1969a, b), in his classic study on the method of additive factors, suggested a method for approaching the questions of whether the times associated with the several stages are additive, provided that one has empirical procedures for affecting the stages individually. A special case of the method was discussed in Section 3.3.4, and it will be discussed more fully in Section 12.4. This means that we have methods, perhaps not yet as perfected as we would like, to investigate empirically the truth of this criticism.
The third criticism, which is the least easy to deal with, was the one that turned the tide against the method at the turn of the century. It calls into question the assumption of “pure insertion”—namely, that it is possible to add a stage of mental processing without in any way affecting the remaining stages. Sternberg (1969b, p. 422) describes the attack in this way.
[I]ntrospective reports put into question the assumption of pure insertion, by suggesting that when the task was changed to insert a stage, other stages might also be altered. (For example, it was felt that changes in stimulusprocessing requirements might also alter a responseorganization stage.) If so, the difference between RTs could not be identified as the duration of the inserted stage. Because of these difficulties, Külpe, among others, urged caution in the interpretation of results from the subtraction method (1895, Sees. 69, 70). But it appears that no tests other than introspection were proposed for distinguishing valid from invalid applications of the method.
A stronger stand was taken in later secondary sources. For example, in a section on the “discarding of the subtraction method” in his Experimental Psychology (1938, p. 309), R. S. Woodworth queried “[Since] we cannot break up the reaction into successive acts and obtain the time of each act, of what use is the reactiontime?” And, more recently, D. M. Johnson said in his Psychology of Thought and Judgment (1955, p. 5), “The reactiontime experiment suggests a method for the analysis of mental processes which turned out to be unworkable.”
As Donders seemed unaware of the problem and as introspection is not a wholly convincing argument, an example of the difficulty is in order. Suppose that the decision latencies of simple reaction times are as was described in Section 4.4—namely, a race between change and level detectors, where a level detector is nothing more than the recognition process being put to another use. If in a recognition or choice situation the identification mechanism is no longer available to serve as a level detector because it is being used to identify which signal has been presented, then the (p.215) insertion of the identification task necessarily alters the detection mechanism, changing it from a race to the use of just the change detector. This being so means that the detection of signal onset, particularly of a weak signal, is somewhat slower and more variable than it would have been if the same mechanisms for detection were used as in simple reaction time, and so the recognition times will be somewhat overestimated.
The idea of stages of processing is very much alive today, as we shall see in Part III, but very few are willing to bank on the idea of pure insertion. It is not to be ruled out a priori, but anyone proposing it is under considerable obligation to explain why the other stages will be unaffected by the insertion.
The fourth criticism, first pointed out by Cattell (1886b), is that the creaction involves more than pure identification since the subject must also decide either to respond or to withhold the response, which in a sense is just as much of a choice as selecting one of two positive responses. Wundt suggested a dprocedure in which the subject withholds the response until the signal is identified, but in fact makes the same response to every signal. This procedure caused difficulty for most subjects and has rarely been used. It is possible, although not necessary, to interpret Smith's slow creactions as direct evidence favoring Cattell's view.
For additional discussion of these matters see Welford (1980b), who summarized the matter as follows (p. 107):
The evidence suggests that Donders' approach was correct in that, except under very highly compatible or familiar conditions, the creaction involves less choice of response than the breactions, but was wrong in assuming that all choice of response was eliminated. The difference between b and creactions will therefore underestimate the time taken by choice, and the difference between c and areactions will overestimate the time taken to identify signals.
6.2.3 Subtraction with Independence
An obvious assumption to add to Donders' additivity and pure insertion is statistical independence of the times for the several stages. This, it will be recalled, was the basic assumption of Chapter 3. The main consequence of adding this is the prediction that not only will the mean times add, but so will all of the cumulants (Section 1.4.5).
Taylor (1966) applied these ideas to the following design. Donders conditions b and c were coupled with two modified conditions. Using the same stimulus presentation schedule as in b and c, condition b′ involves substituting for one of the signals ordinary catch trials, and the subject is required to make the discriminative response. In condition c′, the presentation is as in b′, but only a single response is required on signal trials. The model assumes a total of four stages: signal detection, signal identification, response selection, and response execution. The first and last are common to (p.216)
TABLE 6.3. Presence of stages in Taylor (1966) design (see text for description of conditions)
Stage 


Condition 
Signal identification 
Response selection 
b 
Yes 
Yes 
c 
Yes 
No 
b′ 
No 
Yes 
c′ 
No 
No 
Taylor tested this null hypothesis in an experiment in which the stimuli were red and green disks, the warning signal was a 500Hz tone, and foreperiods of 800, 1000, 1300, and 1500 msec were equally likely. The sample size in each condition was the 32 responses of the preferred hand of each of eight subjects, for a total of 256 per condition. He tested the additivity prediction for the mean, variance, and third cumulant, and none rejected the null hypothesis.
Were this to be repeated today, one would probably use the Fast Fourier Transform to verify the additivity of the entire cumulative generating function; however, I am not sure what statistical test one should employ in that case.
As was pointed out in Section 3.2.3, if the component inserted is exponential with time constant λ and the density is f with it in and g without it, then
So an easy test of the joint hypothesis of pure insertion, independence, and the insertion of an exponential stage is that the right side of the above expression be independent of t. Ashby and Townsend (1980) examined this (p.217) for data reported by Townsend and Roos (1973) on a memory search procedure of the type discussed in Section 11.1, and to a surprising degree constancy was found. Thus, while there are ample a priori reasons to doubt all three assumptions, an unlikely consequence of them was sustained in one data set. This suggests that additional, more detailed work should be carried out on Donders' model.
6.2.4 Varying Signal Intensity
The effect of intensity on simple reaction times is simple: both MRT and VRT decrease systematically as signal intensity is increased (Sections 2.3.1 and 2.3.2), and for audition but not vision an interaction exists between signal intensity and response criterion (Section 2.5.2). Neither of these statements appears to be true for choice reaction times, as was made evident by Nissen (1977) in a careful summary of the results to that point. The independence for visual stimuli was found. For example, Pachella and Fisher (1969) varied the intensity and linear spacing of 10 lights and imposed a deadline to control response criterion, and did not find evidence for an interaction of intensity with the deadline, although there was one between intensity and spacing. Posner (1978) varied the intensity of visual signals, and using conditions of mixed and blocked intensities he found no interaction. However, since no interaction was found in simple reactions for visual stimuli, the question of what would happen with auditory signals was not obvious. In 1977 the evidence was skimpy.
This question was taken up by van der Molen, Keuss, and Orlebeke in a series of papers. In 1979 van der Molen and Keuss published a study in which the stimuli were 250 msec tones of 1000 Hz and 3000 Hz with intensities ranging from 70 to 105 dB. There were several foreperiod conditions, which I will not go into except to note that the foreperiods were of the order of a few seconds. Both simple and choice reactions were obtained. The major finding, which earlier had been suggested in the data of Keuss (1972) and Keuss and Orlebeke (1977), was that for the choice condition the MRT is a Ushaped function of intensity. The simple reactions were decreasing, as is usual. This result makes clear that the impact of intensity in the choice context is by no means as simple as it is for simple reactions, where it can be interpreted as simply affecting the rate at which information about the signal accumulates. They raised the possibility that a primary effect of intensity is in the response selection stage of the process rather than just in the accumulation process. That was not a certain conclusion because the error rate had not remained constant as intensity was altered. Additional Ushaped data were exhibited by van der Molen and Orlebeke (1980).
To demonstrate more clearly the role of response selection, van der Molen and Keuss (1981) adapted a procedure introduced by J. R. Simon (1969; Simon, Acosta, & Mewaldt, 1975; Simon, Acosta, Mewaldt, & (p.218) Speidel, 1976). The signals were as before with a range of 50 to 110 dB. They were presented monaurally through ear phones, and responses were key presses with the two hands corresponding to signal frequency. Lights and digits presented just before the beginning of the foreperiod provided additional information about what was to be presented. The digit code was the following: 000 indicated to the subject that the signal was equally likely to be presented to either ear, 001 that it would go to the right ear, and 100 that it would go to the left ear. One colored light indicated that the ear receiving the signal and its frequency would be perfectly correlated, whereas the other color indicated no correlation of location and frequency. Note that the correct response could be either ipsilateral or contralateral to the ear to which the signal was presented. The data showed that responding was fast and a monotonic decreasing function of intensity when the correlated presentation was used and the response was ipsilateral. In any contralateral or uncorrelated ipsilateral condition, MRT was slower and Ushaped. They concluded that these results support the hypothesis that a major impact of intensity in choice reactions is on the response selection stage.
In still another study, Keuss and van der Molen (1982) varied the foreperiod—either a fixed one of 2 sec or a variable one of 20, 25, 30, 35, or 40 sec—and whether the subject had preknowledge of the intensity of the presentation. The effect of preknowledge was to reduce MRT by about 10 msec. More startling was the fact that both simple and choice MRTs decreased with intensity except for the briefer 2sec foreperiod, where again it was found to be Ushaped. Moreover, the error rate was far larger in this case than in the others. They seemed to conclude that the foreperiod duration was the important factor, although it was completely confounded with constant versus variable foreperiod. Assuming that it is the duration, they claimed this to be consistent with the idea of intensity affecting the response selection process. I do not find it as compelling as the earlier studies.
6.3 A CONCEPTUAL SCHEME FOR TRADEOFFS
6.3.1 Types of Parameters
A major theoretical feature of cognitive psychology is its attempt to distinguish both theoretically and experimentally between two classes of variables and mechanisms that underlie behavior. The one class consists of those experimental manipulations that directly affect behavior through mechanisms that are independent of the subjects' motivations. For example, most psychologists and physiologists believe that the neural pulse patterns that arise in the peripheral nervous system, immediately after the sensory transducer converts the physical stimuli into these patterns, are quite independent of what the subject will ultimately do with that information. This means that these representations of the signal are independent of the type of (p.219) experiment—reaction time, magnitude estimation, discrimination—, of the questions we pose to the subject, and of the information feedback and payoffs we provide. Such mechanisms are often called sensory or perceptual ones, and the free parameters that arise in models of the mechanism are called sensory parameters. The experimental variables that activate such mechanisms have no agreed upon name. In one unpublished manuscript, Ollman (1975) referred to them as “display variables,” but in a later paper (1977) he changed it to “task variables.” I shall use a more explicit version of his first term, sensory display variables.
The other type of mechanism is the decision process that, in the light of the experimental task posed, the subject brings to bear on the sensory information. These are mechanisms that have variously been called control, decision, motivation, or strategy mechanisms. Ollman refers in both papers to the experimental variables that are thought to affect these mechanisms directly as strategy variables. I shall follow the terminology of decision mechanism, decision parameters, and decision strategy variables. The latter include a whole range of things having to do with experimental procedure: the task—whether it is detection, absolute identification, item recognition, and so on—the details of the presentation schedule of the signals, the payoffs that are used to affect both the tradeoff of errors and to manipulate response times, and various instructions aimed at affecting the tradeoffs established among various aspects of the situation.
It should be realized that there is a class of motivational variables, of which attention is a prime example, that I shall not attempt to deal with in a systematic fashion. Often, attentional issues lie not far from the surface in our attempts to understand many experiments, and they certainly relate to the capacity considerations of Chapters 11 and 12. Moreover, they are a major concern of many psychologists (Kahneman, 1973). Some believe that such variables affect the sensory mechanism, which if true only complicates the story to be outlined.
To the degree that we are accurate in classifying sensory display and decision strategy variables, the former affect the sensory parameters and the latter the decision parameters. But a major asymmetry is thought to exist. Because the decision parameters are under the subject's control, they may be affected by sensory display variables as well as by decision strategy ones; whereas, it is assumed that the sensory parameters are not affected by the decision strategy variables. What makes the study of even the simplest sensory or perceptual processes tricky, and so interesting, is the fact that we can never see the impact of the sensory display variables on the sensory mechanism free from their impact on the decision parameters. Even if we hold constant all of the decision strategy variables that are known to affect the decision mechanism, but not the sensory ones, we cannot be sure that the decision parameters are constant. The subject may make changes in these parameters as a joint function of the sensory display and decision strategy variables.
(p.220) One major issue of the field centers on how to decide whether a particular experimental design has been successful in controlling the decision parameters as intended. This is a very subtle matter, one that entails a careful interplay of theoretical ideas and experimental variations. During the late 1960s and throughout the 1970s this issue was confronted explicitly as it had never been before, and out of this developed considerable sensitivity to the socalled speedaccuracy tradeoff.
In Section 6.3.2 I shall try to formulate this general conceptual framework as clearly as I know how, and various special cases of it will arise in the particular models examined in Chapters 7–10.
6.3.2 Formal Statement
Assuming the distinction just made between sensory and decision mechanisms is real, we may formulate the general situation in quite general mathematical terms. Each experimental trial can be thought of as confronting the subject with a particular environment. This consists not only of the experimentally manipulated stimulus presented on that trial, but the surrounding context, the task confronting the subject, the reward structure of the situation, and the previous history of stimuli and responses. We can think of this environment as described by a vector of variables denoted by $\overrightarrow{E}$. We may partition this into a subvector $\overrightarrow{S}$ of sensory display variables and a subvector $\overrightarrow{D}$ of decision strategy variables: $\overrightarrow{E}=(\overrightarrow{S},\overrightarrow{D}).$. The observable information obtained from the trial are two random variables, the response r and some measure T of time of occurrence. We assume that (r, T) is governed by a joint probability density function that is conditioned by the environmental vector, and it is denoted
In order to estimate f from data, it is essential that we be able to repeat the environment on a number of trials, and so be able to use the empirical histograms as a way to estimate f. Obviously, this is not possible if $\overrightarrow{E}$ really includes all past stimuli and responses, and so in practice we truncate the amount of the past included in our definition of $\overrightarrow{E}$. For more on that, see Section 6.6.
The theoretical structure postulates the existence of a sensory mechanism with a vector $\overrightarrow{\sigma}=({\sigma}_{1},\dots ,{\sigma}_{1})$ of sensory parameters and a decision mechanism with a vector $\overrightarrow{\delta}=({\delta}_{1},\dots ,{\delta}_{m})$ of decision parameters. In general, $\overrightarrow{\sigma}$ and $\overrightarrow{\delta}$ should be thought of as random vectors; that is, their components are random variables and they have a joint distribution that is conditional on $\overrightarrow{E}$. If $\overrightarrow{\sigma}$ and $\overrightarrow{\delta}$ are numerical l and mtuples, respectively, we denote the joint density function of $\overrightarrow{\sigma}$ and $\overrightarrow{\delta}$ by
When we have no reason to distinguish between the two types of parameters, we simply write $\overrightarrow{\in}=(\overrightarrow{\sigma},\overrightarrow{\delta}).$. For each set of parameter values, it is (p.221) assumed that the sensory and decision processes relate the (r, T) pair probabilistically to the parameters and to them alone. In particular, it is assumed that (r, T) has no direct connection to $\overrightarrow{E}$ except as it is mediated through the parameters. We denote the joint density function of (r, T) conditional on $\overrightarrow{\in}$ by
So Ψ is the theory of how the sensory and decision processes jointly convert a particular set of parameter values into the (r, T) pair. By the law of total probability applied to the conditional probabilities of Eqs. 6.2 and 6.3, we obtain from Eq. 6.1
A special case of considerable interest is where r and T are independent random variables for each set of parameters; that is, if $P(r\overrightarrow{\in})$ and $f(t\overrightarrow{\in})$ denote the marginal distributions of Ψ,
We speak of this as local (or conditional) independence. Observe that it does not in general entail that $f(r,\text{}t\overrightarrow{E})$ also be expressed as the product of its marginals. [It is perhaps worth noting that conditional independence is the keystone of the method of latent structure analysis sometimes used in sociology (Lazarsfeld, 1954)]. Among the models we shall discuss in Chapters 7 to 9, local independence holds for the fast guess and counting models, but not for the random walk or timing models. One happy feature of local independence is the ease with which the marginal distributions are calculated.
Theorem 6.1. If Eqs. 6.4 and 6.5 hold, then
The easy proof is left to the reader.
6.3.3 An Example: The FastGuess Model
A specific, simple example should help fix these ideas and those to follow (especially Section 6.5). I shall use a version of the fastguess model of Ollman (1966) and Yellott (1967, 1971), which will be studied more fully in (p.222) Section 7.4. Suppose that prior to the presentation of the signal the subject opts to behave in one of two quite distinct ways. One is to make a simple reaction to signal onset without waiting long enough to gain any idea as to which signal was presented. We assume that the simplereactiontime density to both signals is g_{0}(t) and, quite independent of that, response r is selected with probability β_{r,} r = A, B, where β_{A} + β_{B} = 1. The other option is to wait until the information is extracted from the signal and to respond according to that evidence. This time is assumed to have density g_{1}(t) for both signals and that, independent of the time taken, the conditional probability of response r to signal s is P _{sr}, where s = a, b, r = A, B, P _{sA} + P _{sB} = 1. Note that we have built in local independence of r and T for each of the two states, 0 representing fast guesses (simple reactions) and 1 the informed responses. Denote by ρ the probability that the subject opts for the informed state. In arriving at the general form for $f(r,\text{}t\overrightarrow{E})$ let us suppress all notation for $\overrightarrow{E}$ save for the signal presented, s. From Eqs. 6.4 and 6.5 we obtain
Observe that this model has two decision parameters—namely, ρ and β_{A} (recall, β_{B} = 1 − β_{A})—, two sensory functions—g_{0} and g_{1}—, and two discrimination (sensory) parameters—P _{sA}, s = a, b (recall, P _{sB} = 1 − P _{sA}). For some purposes we can reduce the functions to their means, ν_{0} and ν_{1}. Implicitly, I assume ν_{0}<ν_{1} (see Section 6.2.1).
Either by direct computation from Eq. 6.8 or by using Eqs. 6.6 and 6.7, we obtain for the marginals
Note that f(t  s) is actually independent of s. From Eq. 6.9 we can compute the overall expected reaction time to the presentation of a particular signal s:
For some purposes it is useful to compute E(T) for each (s, r) pair separately—that is, the mean of f(t  r, s) = f(r, t  s)/P(r  s), which from Eq. 6.8 we see is
A few words may be appropriate at this point about how one attempts to confront such a model with data. One immediately obvious problem is that the data provide us with estimates of $f(r,\text{}t\overrightarrow{E})$ whereas the theory as stated in Eq. 6.8 has a number of parameters that are not explicit functions of $\overrightarrow{E}$. (p.223) Of course, from the interpretation of the model the parameters ρ and β_{r} belong to the decision process and the others belong to the sensory process. Thus, we do not anticipate any dependence of the latter parameters on manipulations of the decision strategy, but the former may depend upon any aspect of $\overrightarrow{E}$. It is quite typical of models throughout psychology—not just those for response times—that no explicit account is offered for the dependence of the parameters on environments. This is a fact and limitation, not a virtue, of our theorizing.
Because we cannot compute the parameters from knowing $\overrightarrow{E}$, in practice we estimate them in some fashion from the data and then attempt to evaluate how well the model accounts for those data. For example, this will be done for the fast guess model in Section 7.4. If the fit is reasonably satisfactory and if enough data have been collected, which often is not the case, we can then study empirically how the parameters vary with the different experimental manipulations. Sometimes quite regular relations arise that can be approximated by some mathematical functions and that are then used in later applications of the model to data.
6.3.4 Discrimination of Color by Pigeons
In some data, however, it is reasonably clear without any parameter estimation that something like fast guesses are involved because the two distributions g_{0} and g_{1} are unimodal and are so separated that the overall responsetime distribution is bimodal. The clearest example of this that I know is not with human data, but with pigeons. During the training phase, Blough (1978) reinforced the birds for responding to the onset of a 582nm light (S^{+}) and the onset of the signal was delayed a random amount whenever a response (peck) was made at a time when the light was not on. This is a discrimination, not a choice design. In the test phase, all lights from 575 to 589nm in 1nm steps were presented equally often except for 582 nm, which was three times more frequent than the others and was reinforced on one third of its occurrences. Response probability was a decreasing function of the deviation from 582 and it was approximately the same decay on both sides. To keep the figure from being too complex, only the data for the smaller wave lengths are shown. The pattern of response times, shown for one bird in Figure 6.4 is strikingly bimodal. The earlier mode, the unshaded region of these distributions, clearly does not differ from signal to signal in frequency of occurrence, location in time, or general shape. Since that mode had a central tendency of about 170 msec, I suspect they were simple reaction times to signal onset—that is, fast guesses. However, I am not aware of any simple reaction time data for pigeons with which to compare that number. The second mode, the shaded region, is clearly signal dependent in that the number of responses of this type decreases as the signal deviates from the reinforced S ^{+} (582 nm); however, its location did not seem to change and its central tendency was about 350 msec. These data, (p.224)
6.4 DISCRIMINABILITY AND ACCURACY
6.4.1 Varying the Response Criterion: Various ROC Curves
By now it is a commonplace that either by varying the relative frequency of a to b, or by differential payoffs for the four possible signalresponse pairs, or just by instructing the subject to favor A or B, one can cause P(A  a) and P(A  b) to vary all the way from both conditional probabilities being 0 to both being 1—that is, from no A responses at all to all A responses. Note that nothing about the stimulating conditions, and so presumably nothing about the sensory mechanism, is altered. Just the presentation probability or the payoffs or the instructions are varied. Moreover, the locus of these pairs of points is highly regular, appearing to form a convex function when P(A  a) is plotted against P(A  b)—that is, an increasing function with a decreasing slope. Typical data are shown in Figure 6.5. Such functions are (p.225) called ROC curves, the term stemming from the engineering phrase “receiver operating characteristic.” Detailed discussions of these curves, of data, and of mathematical models to account for them can be found in Green and Swets (1966, 1974) and Egan (1975). In the case of YesNo detection in which A = Y, B = N, a = s = signal, and b = n = noise, we speak of P(A  s) as a “hit,” P(N  s) as a “miss,” P(Y  n) as a “false alarm,” and P(N  n) as a “correct rejection.”
Let us work out the ROC for the fastguess model, which we do by eliminating β_{A} from Eq. 6.10, where s = a, b, r = A:
For some time it had been noted that as the criterion was varied, subjects exhibited a tendency to respond faster when the internal representation of the signal was far from the criterion, as judged by confidence ratings, and slower when it was close to the criterion (Emmerich, Gray, Watson, & Tanis, 1972; Fernberger, Glass, Hoffman, & Willig, 1934; Festinger, 1943a, b; Gescheider, Wright, Weber, Kirchner, & Milligan, 1969; Koppell, 1976; Pike, 1973; and Pike & Ryder, 1973). Pike and Ryder (1973) refer to the assumption that E(T) = f(x − c), where c is the criterion, as the latency function hypothesis. Festinger (1943a, b), Garrett (1922), and Johnson (1939) all reported no evidence that the relation between confidence and reaction time is affected by instructions emphasizing either speed or accu
The earlier apparent relation between confidence and reaction time together with the fact that ROC curves can be quite accurately inferred from confidence judgments led to the idea of trying to infer the ROC curve from responsetime data. One method was proposed by Carterette, Friedman, and Cosmides (1965) and another, closely related, one by Norman and Wickelgren (1969), which is illustrated in Figure 6.6. The idea is this: For each signal, place the two subdistributions, weighted by their probabilities of occurring, back to back, with the A response on the left. From these (p.227) artificial densities we generate the ROC as follows: Align them at the point of transition from A responses to B responses and vary the criterion. To be more explicit, the locus of points (x, y) is generated as follows. For x ≤ P(A  a), let t be such that
For x > P(A  a), let t be such that
This locus of points has been called both the latency operating characteristic, abbreviated LOC, and the RTROC. It is important to note that Lappin and Disch (1972a, b and 1973) used the term LOC for a speedaccuracy measure that others call the conditional accuracy function (see Section 6.5.4).
The construction we have just outlined is really quite arbitrary; it does not stem from any particular theoretical view about what the subject is doing. The major motive for computing it is the intuition that reaction time serves as a proxy for the subject's confidence in the response made. According to Gescheider et al. (1969) in a model with a response criterion, such as that of the theory of signal detectability, this function has to do with how far the evidence about the stimulus is from the response criterion.
The only serious theoretical study of the relation between the ROC and RTROC is Thomas and Myers (1972). The results, which are rather complicated and will not be reported very fully here, were developed for both discrete and continuous signal detection models; that is, there is an internally observed random variable X _{s} for signal s, which is either discrete or continuous and has density function f(x  s), and there is a response criterion β such that the response is A when X _{s} > β and B when X _{s} < β. They assumed that the latency is a decreasing function of X _{s} − β. They considered the somewhat special case where the distributions are simply a shift family; that is, there is a constant k _{ab} such that for all x, f(t  a) = f(t − k _{ab}  b), and that the slope of the ROC curve is decreasing (which they showed is equivalent to −d ^{2} logf(x  s)/dx ^{2} ≥ 0). Under these assumptions, they proved that the RTROC lies below the ROC except, of course, at the point (P(A  a), P(A  b)), which, by construction, they have in common.
Emmerich et al. (1972) reported a detection study of a 400Hz tone in noise in which detection responses and confidence judgments were made; in addition, without the subjects being aware of it, response times were recorded. The confidence and RTROCs are shown in Figure 6.7. Note that these are not plots of P(A  a) versus P(A  b), but of the corresponding Gaussian zscores {i.e., z(r  s) is defined by P(r  s) = Φ[z(r  s)], where Φ is the unit Gaussian distribution}, which results in straightline ROCs if the (p.228)
Thus latencybased ROCs should probably not be viewed as a prime source of information about sensory processing alone. Response latencies are known to be influenced by many factors, and this is undoubtedly also the case for latencybased ROCs. Yager and Duncan (1971) reach a similar conclusion….
Additional cause for skepticism is provided by Blough's (1978) study of wavelength discrimination by pigeons, which was discussed previously in Section 6.3.4. Using the responsetime distributions, RTROC curves were developed with the result in Figure 6.8. These do not look much like the typical ROC curves. For example, from other pigeon data in which rate of responding was observed, one gets the more typical data shown in Figure 6.9. The reason for the linear relations seen in Figure 6.8 is the bimodal character of the responsetime distributions seen in Figure 6.4 (Section 6.3.4).
(p.229) 6.4.2 Varying the Discriminability of the Signals
The most obvious effect of altering the separation between two signals that differ on just one dimension—for example, intensity of lights—is to alter the probability of correctly identifying them. This information was traditionally presented in the form of the psychometric function: holding signal b fixed and varying a, it is the plot of P(A  a) as a function of the signal separation,
For very careful and insightful empirical and theoretical analysis of psychometric functions, see Laming (1985). He makes very clear the importance of plotting these functions in terms of different physical measures depending on the exact nature of the task. He also provides a most interesting theoretical analysis concerning the information subjects are using in the basic psychophysical experiments.
In order to get a unique measure it is necessary to use something that captures the entire ROC curve. The most standard measure is called d′, and its calculation is well known (see Green and Swets, 1966, 1974, or Egan, 1975, or any other book on signal detection theory); it is the mean separation of the underlying distributions of internal representations normalized by some average of their standard deviations.
Navon (1975) showed under the same assumptions made by Thomas and Meyers (1972) that if the falsealarm rate is held constant as stimulus discriminability is varied, then E(T  a, A) − E(T  b, A) is a monotonic function of d′ _{ab}.
The second, and almost equally obvious, effect is that subjects are slower (p.231) when the signals are close and faster when they are farther apart. Moreover, this phenomenon occurs whether the conditions of signal separation are run blocked or randomized. Among the relevant references are Birren and Botwinick (1955), Botwinick, Brinley, and Robbin (1958), Crossman (1955), Henmon (1906), Festinger (1943a, b), Johnson (1939), Kellogg (1931), Lemmon (1927), Link and Tindall (1971), Morgan and Alluisi (1967), Pickett (1964, 1967, 1968), Pike (1968), Vickers (1970), Vickers, Caudrey, and Willson (1971), Vickers and Packer (1981), and Wilding (1974). The fact that it appears using randomized separations means that the phenomenon is very much stimulus controlled since the subject cannot know in advance whether the next discrimination will be easy or difficult.
As Johnson (1939) made clear and as has been replicated many times since, the stimulus range over which these two measures—accuracy and time—vary appears to be rather different. At the point the response probability appears to reach its ceiling of 1, response times continue to get briefer with increasing signal separation. Moreover, the same is true for the subject's reports of confidence in the response, which is one reason that reaction time is often thought to reflect confidence in a judgment (or vice versa, since it is not apparent on what the confidence judgments are based). It is unclear the degree to which this is a real difference or a case of probabilities being very close to 1 and estimated to be 1 from a finite sample. In the latter case, it may be the ranges corresponding to changes in d′ and E(T) may actually be comparable.
Some debate has occurred over what aspect of the stimulus separation is controlling, differences or ratios or something else. Furthermore, there is no very good agreement as to the exact nature of the functions involved (see Vickers, 1980, pp. 36–38). If accuracy increases and time decreases with signal separation, then they must covary and one can be plotted as a function of the other. This plot, however, is not what is meant when one speaks of a speedaccuracy tradeoff, which is discussed in Section 6.5.
The only study of the dependence of response time on signal separation I shall present here in any detail is Wilding (1974), because he gives more information about the reactiontime distributions than do the others. The task for each of his seven subjects was to decide on each trial if a 300msec spot of light of moderate brightness was to the right or left of the (unmarked) center of the visual field. There were four possible locations on each side forming a horizontal line, numbered from 1 on the left to 8 on the right, spanning a visual angle of about 0.4°. So 1 and 8 were the most discriminable stimuli and 4 and 5, the least. The data were collected in runs of 110 trials, the first 10 of which were discarded. There were four runs in each of two sessions which differed according to instructions, the one emphasizing accuracy and the other speed.
In analyzing the data for certain things, such as the fastest or the slowest response, one must be cautious about sample sizes. For signals 1 and 8 the probability of being correct was virtually 1, whereas it was very much less (p.232)
Figure 6.11 shows the latency frequency histograms for each subject for the accuracy condition; the columns correspond to (1, 8), (2, 7),…, (6.3), as in Figure 6.9. There are not enough data here to tell much about the mathematical form involved except that both the mean and variance increase as stimulus discriminability decreases.
6.4.3 Are Errors Faster, the Same as, or Slower than the Corresponding Correct Responses?
This question has loomed important because a number of the models to be discussed in Chapters 7 to 9 make strong (and often nonobvious) predictions about the relation. I know of three appraisals of the situation—Swensson (1972a), Vickers (1980), and Wilding (1971a)—and they are inconsistent. The earlier ones, which Vickers seems to have ignored, are the more accurate.
It is actually clear from the data already presented that there is no simple answer to the question. For the pigeons attempting to make a difficult discrimination, but doing it quite rapidly, the data in Figure 6.7 make clear that errors are faster than correct responses; in fact, the larger the error the faster it is. But for people also engaged in a visual discrimination, we see in Figure 6.10 that errors are slower than the corresponding correct response. The source of the difference probably is not the species of the subject, although no directly comparable data exist.
According to Swensson the important difference, at least for human beings, can be described as follows. Errors are faster than correct responses when two conditions are met: the discrimination is easy and the pressure to (p.234)
There are fewer studies in which the discrimination is difficult and the pressure to be fast is great. Henmon (1911) showed that under such conditions the fastest and slowest response times involved higher proportions of errors than did the intermediate times, suggesting that the subjects may have oscillated between two modes of behavior. Rabbitt and Vyas (1970) suggested that errors can arise from a failure of either what they call perceptual analysis or response selection. They state (apparently their belief) that when the failure is in response selection, errors are unusually fast; but when it is in perceptual analysis, error and correct RT distributions are the same. Apparently, perceptual analysis corresponds to what I call the decision process and is the topic of most modeling. In Blough's experiment, the pigeons exhibited fast errors, and judging by the times involved the pigeons were acting as if they were under time pressure. Of course, Wilding found errors to be slower than correct responses under both his speed and accuracy instructions, but one cannot but question the effectiveness of his speed instructions when the fastest times exceeded 450 msec. Link and Tindall (1971) combined four levels of discriminability with three time deadlines—260 msec, 460 msec, and ∞ msec—in a study of samedifferent discrimination of pairs of line lengths successively presented, separated by 200 msec. Their results are shown in Figure 6.12. Note that under the accuracy condition and for the most difficult discriminations, errors are slower than correct responses; whereas at the 460msec deadline the pattern is that errors are faster than correct responses and the magnitude of the effect increases with increased discriminability; and at 260 msec, which is only slightly more than simple reaction times, the mean times are constant, a little less than 200 msec, independent of the level of discriminability and of whether the response is correct or in error. The latter appear to be dominated by fast guesses—that is, simple reactions—but they cannot be entirely that since accuracy is somewhat above chance. A careful analysis of these data will be presented in Section 7.6.2. Thomas (1973) reported a (p.236)
6.5 SPEEDACCURACY TRADEOFF
6.5.1 General Concept of a SpeedAccuracy Tradeoff Function (SATF)
Within any responsetime model of the type formulated in Section 6.3.3, as the decision parameters are varied, changes occur in $\text{\psi (}r,t\overrightarrow{\sigma}\text{,}\overrightarrow{\delta}\text{),}$ which in turn are reflected in $f(r,t\overrightarrow{E})$. The general intuition is that these changes are such that the marginal measures of probability of choice, $P(r\overrightarrow{E})$, and of response time, $f(r\overrightarrow{E})$, covary. The more time taken up in arriving at a decision, the more information available, and so the better its quality. This statement is, perhaps, overly simple since if a great deal of time is allowed to pass between the completion of the signal presentation and the execution of the response, then the accuracy of responding may deteriorate because of some form of memory decay. But for a reasonable range of times the (p.237) statement appears to be correct. What is probably most relevant is the portion of the stimulus presentation that can be processed before an order to respond is issued. Usually we attempt to control that time indirectly through time deadlines on the response time.
As usual, the theory involves a covariation due to changes in parameters, and the data a covariation due to changes induced in $\overrightarrow{E}$. For example, suppose the experimenter imposes a time deadline on the subject such that any response occurring after signal onset but before the deadline is rewarded for accuracy, but those that are slower than the deadline are fined independent of their accuracy. The effect of changes in the deadline is to vary both the accuracy and the mean response time. Suppose that we suppress all of the notation for $\overrightarrow{E}$ except for the two things that vary from trial to trial—namely, the signal presented, s, and the deadline, δ, imposed. The averaged data then consist of four pairs of numbers:
These are called empirical speedaccuracy tradeoff functions (SATF) or latencyprobability functions (LPF). I use the former term and abbreviation. As four separate functions are a bother, especially since they probably contain much the same information, in practice some simplification is made. Usually a single measure of accuracy—four different ones will be mentioned below—is plotted against the overall MRT, and that is referred to as the SATF. Other terms are found in the literature such as the speedaccuracy operating characteristic or SA OC (Pew, 1969) and the macro tradeoff (Thomas, 1974). The rationale for the latter term will become apparent later. One must be ever sensitive to the fact that such a collapse of information may discard something of importance.
Observe that if we vary something in the environment, different from the deadline, that affects both speed and accuracy, we may or may not get the same SATF. As an example of a different procedure, Reed (1973) signaled the subjects when to respond, and he varied the time during the presentation of the reaction signal when the response was to be initiated. Theoretically, what SATF we get depends upon which decision parameters are affected by the experimental manipulation. Since we do not usually have a very firm connection between $\overrightarrow{E}$ and $\overrightarrow{\delta}$, there can be disagreement about which theoretical tradeoff goes with which empirical one. For the fast guess model this is not really an issue, since the only decision parameter affecting E(T  s) (Eq. 6.11) is ρ. So if we eliminate it between Eqs. 6.10 and 6.11 we (p.238) obtain the unique theoretical SATF,
Swensson and Thomas (1974) described a broad class of models, called fixedstopping ones, that yields a relation that has played a role in empirical investigations. Suppose that there is a series of n distinct observations X _{i} each taking time T _{i}, i = 1,…, n, where all of the random variables are independent, the X _{i} are identically distributed, and the T _{i} are also identically distributed. The density of X _{i}, f(x  s), depends on the signal presented whereas that of T _{i} does not. If R is the residual time, then the response time is
So
Assume the decision variable to be the logarithm of the likelihood of the observations; that is,
For n reasonably large, the Central Limit Theorem (Appendix A.1.1) implies that Y _{n} is distributed approximately as a Gaussian with mean E(Y _{n}  s) = μ_{s} and variance $V({Y}_{n}s)={\sigma}_{s}^{2}/n,\text{}s=a,b$. If a response criterion c is selected for responding A when Y _{n} > c, then we see that
Denoting the zscore of P(A  s)—that is, the upper limit on the unit normal that yields this probability, by z(s)—then eliminating c between z(a) and z(b) yields the linear ROC curve
There are several ways to define a socalled d′ measure of the accuracy of performance described by the ROC curve; perhaps the simplest is that value of z(a) corresponding to z(b) = 0; that is,
Now, if the speedaccuracy tradeoff is achieved by varying n, we see that it is given by
They also describe another class of models with an optional stopping rule, in which the value of n depends upon the observations actually made; it is more complex and versions of it are described in detail in Section 8.2 and 8.3.
(p.239) With more complicated models in which E(T  s) and/or P(r  s) depend upon two or more decision parameters, then there can be a great deal of uncertainty as to what theoretical relation to compare with what data. Weatherburn (1978) pointed this out as a serious issue for a number of the models we shall discuss in Chapter 8. Pike and Dalgleish (1982) attempted to counter his admittedly correct observations by showing that for some of the models the locus of possible pairs of speed and accuracy values is sufficiently constrained under all possible (or plausible) values of the parameters that the model can, in principle, be rejected. Weatherburn and Grayson (1982) replied, reemphasizing that the rejoinder rested on interpretations of model parameters, which may not be correct. The fact is that the models they were discussing indeed do have a wide range of possible speedaccuracy pairs consistent with them, not a simple function. Great caution must be exercised when a model has more than one decision parameter that affects both speed and accuracy. This point, which I believe to be of considerable importance, has not been as widely recognized as it should be by those who advocate the use of SATFs. In a sense, whether a tradeoff plot is useful depends, in part, on the complexity of the model one assumes to underlie the behavior. However, ignoring the tradeoff or assuming that certain parameters can be held constant by empirically trying to achieve constancy of accuracy or of time is subject to exactly the same difficulties.
A substantive realization of these observations can be found in Santee and Egeth (1982), in which they argue that in some cognitive tasks—they use letter recognition—experimental procedures that permit the use of accuracy measures draw upon a different aspect of the cognitive processing than do procedures that use responsetime measures. In their study they manipulated exposure duration. They argued that for brief exposures the accuracy is affected by limitations on data processing by the subject, whereas with long exposures, which is typical of cognitive experiments, the behavior studied has to do with response limitations. For example, in their Experiment 1, there were two letters on either side of a fixation point, and an arrow under one indicated that the subject was to respond whether that letter was an A or an E. There were three conditions involving the other letter: it could be the same as the indicated one, or the other target letter, or an irrelevant letter (K and L were used). I refer to these as same, different, and irrelevant, respectively. With the tachistoscope timed to produce about 75% overall accuracy in each subject (8 to 20 msec exposure durations), it was found that subjects were most accurate in the different condition and least accurate in the same one. With an exposure of 100 msec and instructions to respond as rapidly as possible while maintaining a high degree of accuracy, they were fastest for same and slowest for different. They believe these findings to be inconsistent, and so conclude that different aspects of the process are being tapped.
(p.240) 6.5.2 Use of the SATF
Consider assessing the impact of some variable, say the amount of alcohol ingested, upon performance. If as the dosage level is increased both accuracy and MRT change in the same direction, then it can be quite unclear whether there has been a change in the quality of performance or merely a change in the speedaccuracy tradeoff. In particular, it may be quite misleading to plot just one of the two measures against dosage level. For a summary of references in which ambiguous results about alcohol have been presented, see Jennings, Wood, and Lawrence (1976).
Some experimenters have attempted to overcome this problem by experimentally controlling one of the two variables. For example, one can use some sort of time payoff scheme, such as band payoffs or a deadline, to maintain MRT within a narrow range as the independent variable—in this case, amount of alcohol ingested—is manipulated, and to evaluate the performance in terms of accuracy. Alternatively, one can attempt to control accuracy. This approach is widely used in the study of short term memory, as we shall see in Chapter 11. Often an attempt is made to keep the error rate low, in the neighborhood of 2% to 5%. Not only is it difficult to estimate such a rate with any degree of accuracy, but if the SATF is changing rapidly in the region of small errors—which as we shall see it often appears to be—then this is a region in which very large time changes can correspond to very small changes in the error rate, making it very unlikely that the intended control is effective. Furthermore, one can easily envisage a model having more than one sensory state, one of which exhibits changes only in accuracy and another of which involves a speedaccuracy tradeoff. If experimentally we control accuracy through the first stage, then we will have done nothing whatsoever to control the SATF, which is under the jurisdiction of the second stage.
Because of these difficulties in keeping control of one of the variables, some authors (most notably Wickelgren and those associated with him, but also Ollman, Swensson, and Thomas) have taken the position that the only sensible thing to do is to estimate the entire SATF and to report how it as a whole varies with the experimental manipulation. This attitude parallels
TABLE 6.4. Mean slopes and intercepts of the bestfitting linear regression for each alcohol condition (Jennings et al., 1976)
Dose (mg/kg) 


0 
.33 
.66 
1.00 
1.33 

Slope (bits/sec) 
6.45 
5.71 
4.92 
4.90 
3.38 
Intercept (msec) 
168 
162 
161 
173 
150 
As an example, Jennings et al. (1976) studied the SATF for choice reactions involving the identification of 1000 Hz and 1100 Hz tones. To manipulate the tradeoff they used a variety of deadlines, and subjects were paid for accuracy when responses were faster than the deadline and were fined for the slower responses. Measuring accuracy in terms of information transmitted (see Section 6.5.3 for a general discussion of accuracy measures), they fit linear functions to the curves and Table 6.4 shows how the intercept and slope varied with alcohol dose level. The slope is affected systematically, and the intercept somewhat irregularly.
6.5.3 Empirical Representations of SATFs
A certain amount of discussion has been devoted to the best way to present the empirical SATF. The initial studies^{*} (Fitts, 1966; Pachella & Pew, 1968) separately plotted percent correct and MRT as functions of the decision strategy variable manipulated by the experimenter. Schouten and Bekker (1967), using a somewhat different measure of performance, which will be discussed in detail in Section 6.5.4, plotted their measure of accuracy against MRT. They, and others, who have plotted a probability measure of accuracy against MRT, have found the general pattern shown in Figure 6.13, which is composed of three separate pieces that can, to a rough first approximation, be thought of as linear pieces. Up to a certain time, the accuracy level remains at chance, after which it grows linearly until it reaches its ceiling of (p.242) 1, after which it stays at perfect accuracy with increases in time. The important facts are that for sufficiently short times, accuracy is nil; beyond another time, changes in MRT, which do occur, do not seem to affect the accuracy, which is virtually perfect; and between the two times there is a monotonic increase in accuracy. As was noted earlier, it is unclear whether accuracy really does become perfect or whether we are dealing with an asymptotic phenomenon. The models usually imply the latter.
Taylor, Lindsay, and Forbes (1967) replotted the Schouten and Bekker data, replacing the probability measure of accuracy by the d′ measure of signal detectability theory, and they showed that (d′)^{2} was approximately linear with MRT. Pew (1969), noting that log odds = log P _{c}/(1 − P _{c}), where P _{c} is the probability of being correct, is approximately linear with (d′)^{2} in the 2alternative case, replotted the data existing at the time in log odds, which is shown in Figure 6.14. Another set of data, plotted in the same way, will be presented in Figure 9.6 (Section 9.3.4) when we discuss the timing and counting models. Lappin and Disch (1972a) raised the question as to which of several measures of accuracy gave the most linear plot against MRT. They compared d′; (d′)^{2} (see Eq. 6.14); information transmitted, that is,
Why do we concern ourselves with the question of which accuracy measures lead to linear SATFs? One reason is ease of comparison, but that is hardly overriding since as we shall see shortly in certain memory studies comparisons are readily made among exponential fits. A more significant reason is formulated as follows by Thomas (1974, p. 449): “Capacity [of an information processing system] is usually defined as the maximum rate at which information is processed, and it is measured by finding that monotonic (p.243)
Kantowitz (1978), in response to a general survey of SATFs and their role in cognitive research by Wickelgren (1977), was highly critical of our current uncertainty about which accuracy measure to use. He cited Townsend and (p.244) Ashby (1978, p. 122) as suggesting some reasons why the Lappin and Disch (1972a) and Swensson (1972a) studies were inconclusive, including the possibility of too variable data, the fact that some of the measures are nearly identical (although not d′ and its square), and that the range of RTs from 100 to 300 msec was simply too small to see much in the way of nonlinearities. Wickelgren's (1978) response, while sharp about other matters, does not really disagree with this point.
Those working on memory rather than sensory discrimination have not found d′ to be very linear with MRT, as a summary by Dosher (1979) of some of these studies makes clear. Corbett (1977), Corbett and Wickelgren (1978), Dosher (1976), and Wickelgren and Corbett (1977) all fit their SATFs by the cumulative exponential
Reed (1973) used the somewhat more complex
McClelland (1979) used his cascade model (Section 4.3.2) to arrive at another possible formula for the SATF. He assumed that for each s, r pair the level of activation is the deterministic one embodied in Eq. 4.12 perturbed by two sources of noise. One of these is purely additive and he attributed it to noise associated with activation of the response units. He took it to be Gaussian with mean 0 and variance 1, thereby establishing a unit of measurement. Let it be denoted X. The other source is assumed to be additive on the scale factor A _{s} that multiplies the generalized gamma Γ_{n}(t). This random variable Y is also assumed to be Gaussian with mean 0 and variance ${\sigma}_{s}^{2}$. Moreover, the residual time is R. Putting this together, the decision variable is
Assuming the random variables are independent, then its expected value and variance are
(p.245) Using the usual signal detection approach to this Gaussian decision variable, d′ between signals a and b is easily seen to be
6.5.4 Conditional Accuracy Function (CAF)
For the joint density $f(r,t\overrightarrow{E}),$ the plot of
A major difference between the CAF and SAFT is that the former can be computed in any experimental condition for which sufficient data are collected to estimate the density functions, whereas the latter is developed only by varying the experimental conditions. For example, by using several response time deadlines one can generate the SATF, and one can compute a CAF for each deadline separately.
To get some idea of just how distinct the CAF is from the SATF, we compute it for the fast guess model. By Eqs. 6.8 and 6.9.
Obviously, the form of P(r  t, s) as a function of t depends entirely upon the forms of g _{0} and g _{1}, whereas the SATF of Eq. 14 is linear in MRT independent of their forms.
Actually, we can establish the general relationship between the CAF and SATF as follows (Thomas, 1974). Let r′ denote the response other than r, then by Bayes' theorem (Section 1.3.2),
Substituting the SATF for P(r  s) establishes the general connection between them. Observe that,
Thus, the general character of the pattern relating the CAF and SATF as one varies MRT must be something of the sort shown in Figure 6.15.
One nice result involving the CAF has been established by Ollman (unpublished, 1974).
(p.247) Theorem 6.2. Suppose in a twochoice situation, r′ denotes the other response from r. Let T denote the response time random variable and suppress the notation for the environment except for the signal presented. If P(r  s) > 0 and P(r′  s) > 0, then
Proof. We use
So, whether errors to a given signal are faster or slower than the corresponding correct responses depends on whether the correlation embodied in the CAF is positive or negative. Note, this is not the correct versus error comparison made in Section 6.3.3 since there it was the response, not the stimulus, that was constant in the comparison.
For the fastguess model,
Since ρ(1 − ρ)>0 and ν_{1}>ν_{0}, the covariance is positive or negative as P _{sr} − β_{r} is positive or negative.
6.5.5 Use of the CAF
Lappin and Disch (1972a) and Harm and Lappin (1973) explored the question: does a subject's knowledge of the presentation probability affect (p.248)
6.5.6 Can CAFs be Pieced Together to get the SATF?
Apparently the first appearance of CAFs in the reactiontime literature was in the Schouten and Bekker (1967) study in which they manipulated (p.249) reaction time rather directly. The reaction signals were lights, one above the other, which the subject identified by key presses. In addition, three 20msec acoustic “pips” spaced at 75 msec were presented, and subjects were instructed to respond in coincidence with the third pip. The time between that pip and the signal onset was manipulated experimentally. The data were reported in terms of CAFs, although that term was not used. In principle, a CAF can be estimated in its entirety for each speed manipulation, but because of the fall off of the RT density on either side of the mean the sample sizes become quite small for times more than a standard deviation away from the mean. For this reason, it is tempting to try to piece them together to get one overall function. Schouten and Bekker's data, the means of which were replotted by Pew in Figure 6.14, are shown in Figure 6.18. Their conclusion was that these CAF lie on top of one another and so, judging by Figure 6.15, they may actually reconstruct the SATF.
Wood and Jennings (1976) discussed whether it is reasonable to expect this piecing together to work—of course, we already know from the example of fastguess model that it cannot always work. They presented data from the end of the training period of their alcohol study. The CAFs calculated for each deadline are shown in Figure 6.19 and as is reasonably apparent—they confirmed it by a nonparametric analysis of variance—these estimated CAFs are not samples from a single function.
At a theoretical level Ollman (1977) raised the question under what conditions would the CAF and SATF coincide. He found a set of sufficient conditions that he dubbed the Adjustable Timing Model (ATM). Because the framework given in Eq. 6.4 is somewhat more general than that postulated by Ollman, we must add a condition not mentioned explicitly in his ATM. This is the postulate that the sensory parameters are fixed, not random variables, which we denote $\overrightarrow{\sigma}=\overrightarrow{\sigma}(\overrightarrow{E})$. Thus, $\text{\Phi (}\overrightarrow{\in}\text{}\text{}\text{}\overrightarrow{E}\text{)}\text{}\text{=}\text{}\text{\Phi (}\overrightarrow{\delta}\text{}\text{}\text{}\overrightarrow{E}\text{)}$, where $\overrightarrow{\delta}$ is the decision parameter vector. Introducing this assumption into
From Eqs. 6.20 and 6.21, which embody the assumptions of the ATM, it is easy to see that
The significance of this is that any manipulation of $\overrightarrow{E}$ that affects $\overrightarrow{\delta}$ and so t, but not $\overrightarrow{\sigma}$ will generate the same CAF, namely, $P(rt,\overrightarrow{\sigma})$.
It does not follow that this function is the same as the SATF; in general, they will differ. However, as Ollman has pointed out, if the ATM holds and if the CAF is linear—that is,
To show this, consider
To my knowledge, no good a priori reasons exist to expect ATM to hold. Indeed, the whole philosophy of the theory of signal detectability is exactly the opposite—namely, that decision parameters do in fact directly affect the (p.252)
6.5.7 Conclusions about SATFs
I believe the SATF may prove to be of comparable importance to the ROC curves of error tradeoffs in giving us a way to describe the compromises subjects make between accuracy and time demands. In general, enough data should be obtained in order that some summary of the SATF can be reported. As yet, there is no consensus about the best way to summarize the SATF. Presumably, as we better understand the empirical SATFs, we will arrive at a simple measure to summarize them, something comparable to d′ for ROC curves. Some authors have attempted to find an accuracy measure that leads to a linear relation, in which case two parameters summarize it. However, apparent linearity seems to be relatively insensitive to what appear, on other grounds, to be appreciable nonlinear changes in the accuracy measures. Other authors, working with search paradigms (see Chapter 11) have attempted to fit the resulting function of d′ versus MRT by one or another of several families of curves, and they then study how the several parameters of the fitted family vary with experimental manipulations; however, there are no compelling theoretical reasons underlying the choice of some of these families and no real consensus exists as to which is best to use.
Wickelgren (1977) has presented a very spirited argument for preferring SATF analysis to either pure reaction time or pure error analysis. As was (p.253) noted, Kantowitz (1978) took exception on, among other grounds, that we do not really know which accuracy measure to use. Weatherburn (1978) observed that if two or more parameters are involved, then the SATF is simply not a welldefined function, but rather a region of possible speedaccuracy pairs that can sometimes be thought of as a family of functions. Schmitt and Scheiver (1977) took exception to the attempts to use SATFs on the grounds that one does not know which family of functions to use, and they claimed that the analysis of data had, in several cases, been incomplete in terms of the family chosen. Dosher (1979), one of those attacked, provided a vigorous defense, pointing out gross errors in the critique, and giving a careful appraisal of the situation.
Without denying our uncertainty about how to summarize the data and the real possibility that because of multiple parameters there may be no single function relating speed and accuracy, there can be little doubt that presenting SATFs in some form is more informative than data reported just in terms of MRTs with error rates either listed, usually in an effort to persuade the reader that they have not varied appreciably, or merely described as less than some small amount. As was mentioned previously the problem is that in many of the plots such as the exponential relation between d′ and MRT, small changes in the accuracy measure can translate into far larger changes in the MRT when the error rate is small than when it is large. This parallels closely the problem of illdefined psychometric functions arising because at small values of P(A  b) a change of one percentage point corresponds to a very large change in P(A  a); that is, the ROC is steep for small P(A  b).
The CAF, which has been invoked by some as almost interchangeable with the SATF, is in fact completely distinct from the SATF, and there really is no justifiable reason to treat them as the same. Despite the fact that the CAF is defined for each experimental condition, on the whole I think it is the less useful of the two measures. The CAF does not clearly address the tradeoff of interest, it is certainly far more difficult to pin down empirically over a wide range of times, and in most theoretical models it is analytically less tractable than the SATF.
6.6 SEQUENTIAL EFFECTS^{*}
Up to now I have treated the data as if successive trials are independent. On that assumption, it is reasonable to suppose that $f(r,t\overrightarrow{E})$ on a trial depends (p.254) on the signal presented on that trial and upon the entire experimental context, but not on the preceding history of stimuli and responses. If that were so, then this density function could be estimated by collecting together all trials on which a particular stimulus was presented and forming the (r, T) histogram. But if the trials are not independent, we are in some danger when we make such an estimate. At the very least, the usual binomial computation for evaluating the magnitude of the variance of an estimate based upon a known sample size is surely incorrect (Norman, 1971). Depending upon the signs of the correlations, the dependence can either make the estimate too large or too small. Beyond that, if the trials are not independent, then we face the major theoretical problem of trying to account systematically for the dependencies.
The evidence for the existence of such dependencies or sequential effects is very simple: we determine whether the estimate of some statistic, usually either the response probabilities P(r  s) or the corresponding expected reaction time E(T  s, r), differs appreciably depending upon how much of the history is taken into account. As stated by Kornblum (1973b, p. 260), “The term sequential effect may be defined as follows: If a subset of trials can be selected from a series of consecutive trials on the basis of a particular relationship that each of these selected trials bear to their predecessor(s) in the series, and the data for that subset differs significantly from the rest of the trials, then the data may be said to exhibit sequential effects.” Let it be very clear that the principle of selection depends on events prior to the trial in question and does not in any way depend upon the data from that trial.
6.6.1 Stimulus Controlled Effects on the Mean
The most thoroughly studied sequential effects are those arising when the signal on the current trial is the same as or different from that on the preceding trial. These trials are referred to, respectively, as repetitions and alternations (in the case of two stimuli) or as nonrepetitions (in the case of more than two stimuli).
The discussion of sequential effects begins here for the twostimulus, tworesponse situation, and I draw heavily upon the survey articles of Kirby (1980) and Kornblum (1973b). One notable omission from the Kirby review is the extensive set of experiments and their detailed analysis in Laming (1968); among other things, his sample sizes (averaged over the subjects) are appreciably larger than any of the other studies except Green et al. (1983), who report very large samples on a few subjects. Although we can clearly demonstrate the existence of such effects in the twochoice experiment and discover a number of their properties, many of the hypotheses that arise can only be tested in kchoice designs with k > 3. So our discussion of sequential effects will resume in Section 10.3 when we turn to these more complex reactiontime experiments.
An illustration of the limitations of the k = 2 case may be useful. Suppose (p.255) we have reason to believe that the magnitude of the sequential effects depends both upon the relative frequency with which the signals are presented, Pr(s _{n} = a) = p, where s _{n} is the signal presented on trial n, and upon the tendency for the signals to be repeated in the presentation schedule (which is sequential structure imposed by the experimenter), which we make independent of the particular signal—that is, P(s _{n} = a  s _{n−1} = a) = P(s _{n} = b  s _{n−1} = b) = P. Since P(s _{n} = a) is independent of n, it must satisfy the constraint
Another problem in studying sequential effects is reduced sample sizes. Suppose we consider the purely random schedule, and we wish to partition the history of stimulus presentations back m trials. If the overall sample is of size N, then each history has an expected sample size of N/2^{m} and so the standard error of the resulting mean estimate is 2^{½m}σ/N ^{½}. So, for example, if the true MRT is 400 msec with a standard deviation of 75 msec and the basic sample N is 2000, the standard error of the overall MRT for one signal is 2.37 msec, that of a two step history is 3.35 and that of a four step one is 6.71 msec. Many of the differences in the data are under 10 msec, and so they must be viewed with some skepticism.
The first two studies in which the data were partitioned according to their history of stimuli were Laming (1968, Ch. 8) and Remington (1969). The latter experiment involved five subjects who responded by key presses to one of two lights. A warning light and a 1sec foreperiod was used prior to the signal presentation. Subjects were asked to respond as rapidly as possible, consistent with an error rate of less than 5%. The actual overall level was about 1%. The average interstimulus interval was about 4 secs. (I begin to report this time because, as will soon be evident, it is an important independent variable for sequential effects.) There were two experimental conditions: one with equally likely presentations (50: 50) and the other with a ratio of 70:30. Some data were rejected as not having achieved stability. What remained were 800 observations per subject in the 50:50 condition and 1000 in the 70:30. These were partitioned into histories up to five trials back, and they are shown in Figures 6.21 and 6.22. Observe in Figure 6.21 the pronounced repetition effect. Note that in this figure the symbol A denotes the stimulus on the trial under consideration and B the other stimulus, and the past history is read from the current trial on the right and back to the left. Thus, a string of the other stimulus B before and the current A presentation makes for a slow response, the time increasing as the (p.256)
An experiment by Falmagne, Cohen, and Dwivedi (1975) in essence replicated these results of Remington, but in some ways is more striking. The span of times from the slowest to the fastest as a function of presentation pattern is some two to five times as large as in Remington's data. This may be because a fairly brief responsestimulus interval (200 msec) was used by Falmagne et al. as compared with Remington's average of four seconds. Since the data are not qualitatively different and they will be described in (p.257) some detail relative to a sequential model (Section 7.6.4), I do not present them here.
As was mentioned earlier, Laming's experiments involved the identification of two white bars on a black background presented tachistocopically. His Experiment 3, which was run without automation, had a 2500msec RSI interval. Twentyfour subjects participated in five series of 200 trials
Laming also found that repetitions decreased the MRT and they decreased the errors made. In contrast to Remington, the MRT to an alternation following a string was largely independent of the length of the string, but the error rate increased monotonically with the length.
His Experiment 5 was automated, which permitted use of the intertrial interval as an experimental variable. The same general experiment was run with intertrial intervals of 1, 8, 64, 512, and 4096 msec, arranged in a Latin (p.259) square design. Each of 25 subjects was run for 100 trials at each ITI, yielding 12,500 observations. The MRT data for selected histories are shown in Figure 6.23 and the corresponding error data in Figure 6.24. The code is read from left to right in time up to the trial before the signal in question. A 0 means that the signal in that position (trial) was the same as the one for which the data are being plotted, and a 1 means that it was the other signal. Thus, for example, 0100 arises either from the sequences on trials n − 4, n − 3, n − 2, n − 1, and n of either abaaa or babbb. It is quite clear that the shorter the ITI, the slower the response and the more likely an error, especially after particular histories. Laming (1968, p. 109) summarized the results as follows:
The sequential analyses of Experiments 1, 2, and 3 suggested that if the subject experiences a run of one signal or an alternating sequence, he expects those patterns to continue. On this basis the subjective expectation of … the signal actually presented … increases from left to right [in the left panels of Figures 6.23 and 6.24] and from right to left [in the right panels]. When the intertrial interval is long the mean reaction times and proportions of errors behave as one would expect; they both decrease as the subjective expectation of [the signal] increases. But when the intertrial interval is short they behave differently: after a run of either kind of signal the mean reaction times and proportion of errors all decrease, while after an alternating sequence they increase greatly, irrespective of which signal might have been most expected.
Later Kirby (1976b), apparently unaware of Laming's work, ran a study in which he manipulated the interval between a response and the next presentation, the RSI. His data presentation was the same as Remington's except that he separated it into first and second halves. These are shown in Figure
(p.261) 6.25. The most striking fact can be seen in the firstorder sequential effects—namely, that a repetition of a signal, AA in his notation, speeds up the MRT for the 50msec RSI, but slows it down for the 500 and 2000msec RSIs. It should be noted, however, that at all RSIs the effect of additional repetitions is to speed the MRT slightly from the one step repetition; however, for any length of history at the two longer RSIs, the fastest time arises with pure alternation.
Certain inconsistencies exist between Laming and Kirby's data, and one wonders what might account for them. Two notable differences in the studies are the sample sizes (12,500 versus 3,600) and the fact that Laming included all responses whereas Kirby discarded error responses (less than 4.5%). The first means that for the smaller sample size, it is entirely possible that more of the orderings are inverted due to sampling variability, and so not all differences can be assumed to be real. The second raises the possibility that the experiments were run at different speedaccuracy tradeoffs and that this affects the sequential pattern in some way. Another fact, much emphasized by Green et al. (1983) as probably contributing to instability in all of these experiments, is the use of many, relatively inexperienced subjects for relatively few trials each. In their work, which I discuss more fully in Section 8.4, only three subjects were used, but each was practiced for at least 3500 trials and was run for 21,600 trials. There was evidence of changes in MRT during at least the first 7200 trials of the experiment. Another factor they emphasize is their use of random (exponential) foreperiods, in contrast to most other studies that use constant foreperiods (i.e., RSI). No matter which, if any or all, of these factors are relevant to the different results, the fact is that considerable uncertainty obtains about the basic facts.
That the RSI affects in some manner the significance of alternations and repetitions had been noted earlier.
The point at which this change from a repetition to an alternation effect takes place appears to be approximately half a second. Thus, repetition effects have been found for RSIs of less than approximately half a second by Bertelson (1961, 1963), Bertelson and Renkin (1966), Hale (1967), Kornblum (1967), Hale (1969a), Eichelman (1970), Kirby (1976b) and alternation effects with intervals greater than half a second by Williams (1966), Hale (1967), Moss, Engel, and Faberman (1967), Kirby (1972, 1976b). At intervals of, or close to, half a second, repetition, alternation, and nonsignificant sequential effects have been reported (e.g., Bertelson, 1961; Hale, 1967; Schvaneveldt and Chase, 1969; Eichelman, 1970; Kirby, 1976b). (Kirby, 1980, p. 132)
And, of course, we know from Section 5.4 that a significant interaction exists even in simple reaction times when two signals are separated by less than 300 msec. So far as I know, no attempt has been made to use those ideas in the study of sequential effects in choice paradigms or to place both sets of phenomena in a common framework.
(p.262) Kirby went on to point out that there are at least four anomolous studies in which a repetition effect was in evidence at long RSI: Bertelson and Renkin (1966), Entus and Bindra (1970), Remington (1969), and Hannes (1968). To this list, we must add Laming's (1968) data. As I just remarked, we do not know the source of these differences, but it is interesting that in a second experiment Kirby (1976) was able to produce either repetition or alternation effects at long RSI by instructing the subjects to attend to repetitions or alternations; whereas, at short RSI the instructions had little effect, with a repetition effect occurring under both instructions.
6.6.2 Facilitation and Expectancy
The data make clear that at least the previous signal and probably a considerable string of previous signals affect the MRT. Moreover, the nature of that effect differs depending upon the speed with which new stimuli occur. It seems likely, therefore, that two mechanisms are operative: one having a brief temporal span and the other a much longer one. In terms of the types of mechanisms we have talked about, I suspect the brief one is part of the sensory mechanism and the longer lasting one involves memory phenomena that last some seconds and are a part of the decision process. In this literature, the relevant sensory mechanism is spoken of as “automatic facilitation” (Kirby, 1976; also “intertrial phenomenon” by Bertelson, 1961, 1963, and “automatic after effect” by Vervaeck and Boer, 1980) and the decision one as a “strategy” (or “subjective expectancy” or “anticipation”). And, as we shall see below, there is reason to suspect that there may be at least two distinct strategy mechanisms involved.
The facilitation mechanism is thought to be of one of two types. The first is that the signal leaves some sort of sensory trace that decays to a negligible level in about 750 msec. Sperling (1960) used masking techniques to provide evidence for such a trace. When a second signal occurs in less time than that, the traces are “superimposed.” If the two signals differ, there is little gain and perhaps some interference in the superimposed representation. If they are the same, however, the residual trace somehow facilitates the next presentation of the signal. This could involve either some sort of direct addition of the old to the new representation, making it stronger than it would otherwise be and, thereby, allows the identification to proceed more rapidly than usual, or it could entail some sort of priming of a signal coding mechanism, for example, by influencing the order in which certain features are examined. In either event, a repetition effect results. The facts that MRTs in a choice situation run from 300 to 400 msec with signals of 100msec duration and that the repetition effect gives way to an alternation one at about an RSI of 500 msec suggest that the trace has largely disappeared after 700 to 800 msec.
The other facilitation mechanism is somewhat less clearly formulated. It supposes that the effect of a repeated signal is to bypass some of the signal (p.263) processing that is normally involved. Since exactly what is bypassed is unclear, it is not evident how to distinguish facilitation from trace strengthening.
If a sensoryperceptual facilitation mechanism were the whole story, then since it is entirely driven by the stimulus schedule we should not see any impact of previous experience or instructions on it. Kirby (1976) tested this idea in his third experiment, again using lights as signals. There were six conditions, half of which were at an RSI of 1 msec and the others at 2000 msec. Within each condition, the last third of the trials were run at a 50:50 ratio of repetitions and alternations. The first twothirds were run at three ratios: 70:30, 50:50, and 30:70. He found that for the longer RSI, a repetition effect in the 70:30 condition and an alternation effect in the 30:70 condition persisted into the last third of the data. For the short RSI, the effects did not persist. These data are consistent with the idea that for short RSIs the sequential effects are stimulus determined, but for the long ones something other than the stimulus pattern affects the behavior.
So one considers possible decision strategies. The fact that faster responses occur with alternations suggests that the subjects in some sense anticipate the occurrence of alternations. This is reminiscent of the famed negative recency effect uncovered in probability learning experiments (Jarvik, 1951). It appears that many people have a powerful tendency to act as if a local law of averages exists—that is, as if there were a force altering conditional probabilities in a random sequence—so as to keep the local proportions very close to 50:50. Such a rule or belief would lead to an exaggerated expectation for alternations.
This idea that the subject is predicting, albeit incorrectly, which signal will be presented has led to a series of attempts to have the subject make the predictions overt and to see how MRT depends on the prediction. The major idea is that if we single out successive pairs of correct predictions, then the sequential effects on those trials should vanish. This literature was discussed in detail by Kirby (1980, pp. 143–144), who concluded that there is just too much evidence that the act of predicting directly affects the response times, and so overt prediction fails to be a useful experimental strategy. The relevant papers are DeKlerk and Eerland (1973), Geller (1975), Geller and Pitz (1970), Geller, Whitman, Wrenn, and Shipley (1971), Hacker and Hinrichs (1974), Hale (1967), Hinrichs and Craft (1971a), Schvaneveldt and Chase (1969), Whitman and Geller (1971a, b; 1972), and Williams (1966).
Kirby (1980, pp. 145–148) examined the evidence as to whether the strategy effect is, as he appears to believe, one of preparation and/or expectancy or whether the strategy develops after the signal is presented. Although the argument is protracted, it appears to me that its main thread goes as follows. If the strategy comes into play after signal onset, then shortening the RSI should only accentuate it since it reduces the time for memory to decay. Any effect to the contrary must be due to sensory (p.264) facilitation, which as we have seen comes into play with RSIs under half a second. On the other hand, if the strategy is set prior to signal presentation, the shorter the RSI the less time it has to be developed, and so at the shortest times its effect should be negligible. And Kirby contends that the data are more consistent with the latter view. For example, the fact that some wellpracticed subjects develop the ability to produce alternation or repetition effects at will, even at short RSIs, he interprets as their becoming more efficient at preparing their strategies. However, these data could just as well be interpreted in terms of a shift from a preparation strategy, which at short RSI does not have time to be effective, to a reactive one that is established after the signal onset. I find the arguments unpersuasive, and I believe that the basic nature of these strategy effects is still an open question. I shall, nonetheless, explore some rather detailed, specific suggestions about them in Section 6.6.5.
Vervaeck and Boer (1980) made some important observations in this connection. First, they noted that an expectancy hypothesis predicts not only that the expected signal will be responded to faster than when no expectancy is involved, but that the unexpected one will be responded to more slowly. In contrast, a general facilitation mechanism of any sort predicts that an increase of the facilitation factor leads to faster responses and a decrease to slower responses independent of which signal is presented. At short RSI, Kirby (1976) reported sequential effects that differed according to whether the signal was a repetition or an alternation, whereas Laming (1968, Experiment 5) obtained facilitation that did not depend upon the signal. Vervaeck and Boer pointed out a significant procedural difference: Laming measured his RSI from the onset of the response key, and Kirby measured it from the offset. They judged from other data that the typical duration of a key press was from 100 to 200 msec. They repeated both experiments under the same conditions, found a difference of 106 msec between the two observed RSIs, and replicated both Laming's and Kirby's results.
6.6.3 StimulusResponse Controlled Sequential Effects
Our discussion of sequential effects to this point should seem a bit odd since nothing has been said about the responses except that their MRT is used as the dependent variable. What about previous responses as a source of sequential effects? To my knowledge, this has never been examined with anything like the care that has gone into stimulus effects. In particular, I do not know of any twochoice studies, analogous to those reported in Figures 6.20–6.24, that partition the data according to the past history of responses or, better, the joint past history of signals and responses. Laming (1968, Section 8.4) reports some linear regression results, but no very clear pattern is evident. There are, however, several studies in which the joint past history for one trial back is examined, and I discuss them here. Concerning the sequential effects exhibited jointly by response probability and time, there is (p.265) but one study, which is reported in the next section. Within the general psychophysical literature there is a fair amount of data about sequential effects in absolute identification and magnitude estimation, but with many more than two signals. The only data for twostimulus designs of which I am aware are concerned primarily with the impact of an error on the next trial. Some of these data, particularly those in Rabbitt (1966), are based upon more than two signals, but they are too relevant to postpone until Chapter 10.
In his studies, Rabbitt has used a short RSI, often 20 msec and never more than 220 msec. The task was the identification of one of several lights appearing in different positions. His data showed: that errors are faster than correct responses (Section 6.4.3), that the MRT on the trial preceding an error did not differ significantly from the overall MRT, and that the MRT following an error is slower than the overall MRT. The fact that the MRT before an error is not unusually fast suggests that the fastness of the error trials is not part of some overall waxing and waning of the reaction time. Rabbitt and Rogers (1977) and Rabbitt (1969), using Arabic numerals as signals and key presses as responses, showed that the delay following an error was considerably greater when the alternative signal was used (and so to be correct in the two choice situation the response was a repetition of the previously erroneous one) than when the signal was repeated.
Laming (1979b), in discussing Rabbitt's work, cited the data from Experiment 5 of Laming (1968) (described earlier) in which he varied the RSI and found appreciable changes in performance. Recall, at long RSI the probability of an error following an error is sharply reduced below the average error rate independent of whether the signal was repeated or alternated; whereas, at short RSI the error probability is reduced for a repeated signal and greatly increased when the signal is alternated. For Experiments 1, 2, and 3—where the first two were slight variants in which the presentation probability was varied and in 3 the error rate was varied by instructions—the RSI was long: 2500 msec, 1500 msec, and 2500 msec. Because of the pronounced effects that the previous stimulus history is known to have (Section 6.6.1), Laming corrected for it both in his error probabilities and MRTs using a multiple regression analysis described in Appendix C of Laming (1968). The data are broken up according to whether the error stimulus is repeated or alternated, and so the erroneous response is either alternated or repeated in order to be correct. The data are shown in Figure 6.26 as a function of the number of trials since the preceding error. Error probabilities are presented directly whereas times are presented in terms of deviations from the overall MRT for that stimulus history. We see, first, that error trials are faster by about 50 msec than the overall mean times. Second, immediately following an error the time is slower than average, which is true whether the stimulus is repeated or alternated. The effect is somewhat larger for alternations than repetitions. Third, following an error, the error rate for both types of trials is reduced below average (except for the alternative (p.266)
Exp 
No of Conditions 
Ss/Cond 
Obs/S 

1 
4 
6 
1000 
2 
4 
6 
1000 
3 
8 
3 
1000 
4 
10 
2 
800 
5 
5 
5 
1000 
[Figure 1 of Laming (1979b); copyright 1979; reprinted by permission.]
6.6.4 Are the Sequential Effects a SpeedAccuracy Tradeoff?
The answer to the question of the title is not obvious from what has been presented. It could be Yes, but equally well it could be No in that the SATF itself is changed as a result of the previous history. Obviously, the answer to the question may very well depend upon the RSI.
Swensson (1972b) performed the following relevant experiment. There were five distinct but unmarked horizontal locations on a cathode ray tube, (p.267) and a square with one of the two diagonals was shown successively in these locations from left to right, with the choice of the diagonal made at random for each location. The subjects responded by key presses to identify the direction of the diagonal at each presentation. At the end of each sequence of five stimuli, information was fed back on the number of correct responses and the sum of the five response times. Two subjects were run. One major manipulation was the time between each response and the presentation of the next signal in the sequence, the RSI. One at 0 msec was called the immediate serial task (IS) and the other at 1000 msec was called delayed serial (DS). Another manipulation was to shift the emphasis between speed and accuracy between blocks of 50 or 100 trials. In addition, for subject RG an explicit monetary payoff was used to effect the speedaccuracy tradeoff. The total number of trials (each a sequence of five stimuli) run in each condition for each subject varied from 1500 to 3000. The data were pooled over blocks of trials having comparable error rates into four groups. The SATF presented is log odds versus MRT.
Because of the design into five presentations per trial, the first question to be raised is whether serial position effects are in evidence. Figure 6.27 shows that a considerable serial position effect exists for the IS condition and is very slight for the DS condition. This is not surprising if memory traces dominate studies with brief RSIs.
To study the sequential effects, the data from positions 2 through 5 were combined and then partitioned into four categories according to whether the response (not the signal) in question is a repetition or alternation (coded in
Once again it is clear that RSI makes an important difference. These data seem to accord with earlier evidence that with short RSI there is some tendency for the subject to attempt to correct the error just made, which leads to an unusually fast and, relative to the next signal, largely random response (Burns, 1971; Rabbitt, 1966, 1967, 1968a, 1968b, 1969). For the long RSI this tendency disappears and to a first approximation the sequential effects appear to be not changes in the SATF, but shifts in the tradeoff on that function.
(p.269) 6.6.5 Two Types of Decision Strategy
Judging by Swensson's data, the most obvious idea to account for the impact of an error with a long RSI is as some sort of adjustment on the SATF, the subject becoming more conservative and so more accurate at the expense of being slower. The difficulty with this view is that it predicts that changes in MRT and error probability should covary, even though we know from Figure 6.26 that during the recovery phase following an error they do not.
Laming (1968, pp. 80–82) suggested another possibility. This arose in his discussion of the important classical random walk model (Sections 8.2, 8.3.1, and 8.4.2) which model has as one of its predictions that the distributions of error and of correct responses (same response, different signals) should be identical. As this is contrary to the data, Laming asked if some plausible mechanism would account for the fact that errors are usually faster than correct responses. He suggested that when the subject is under time pressure he or she may err as to the actual onset of the signal. Laming described this tendency as arising from time estimation, but it is just as plausible that the subject adjusts the detection criterion to the point where on a significant
Laming made the interesting observation that the differential recovery of MRT and error probability following an error can, in fact, be accounted for by just the anticipation mechanism grafted onto a standard decision model (the SPRT model of Section 8.3.1). He showed that if the subject starts to accumulate information well in advance of signal presentation, the error rate is substantially increased from what it would have been had accumulation begun exactly at signal presentation. That result is not surprising, but what is surprising is the fact that E(T) is reduced by only a few milliseconds. This arises because the information accumulated prior to the signal does not itself tend to lead to a decision, but rather introduces variability in the starting point of information accumulation at signal onset. He assumed that following an error, the subject becomes really very conservative and begins accumulating information well after signal onset. This tendency both greatly reduces the error probability (on the assumption that signal information is continuing to be available) and delays E(T) by the amount of the delay after signal onset plus the amount of anticipation that was in effect at the time of the error. If on the next trial the time at which accumulation is initiated is moved forward, then E(T) is moved forward by the same amount, but the error probability does not change. In fact, the initiation time can shorten until it coincides with signal onset before the error probability begins to rise to its original value. Clearly, this single mechanism is sufficient qualitatively to account for the apparently separate recovery of MRT and error probability.
This model illustrates nicely Weatherburn's point that there may well be more than one source of tradeoff between speed and accuracy. The time to begin accumulating information establishes one tradeoff, and the criterion for responding on the basis of the information accumulated establishes a second one. At the present time, we do not know of any reliable way experimentally to evoke just one of them.
Some authors in discussing the sequential data have spoken of selective preparation for one response rather than the other (Bertelson, 1961; Falmagne, 1965), and others of nonselective preparation (Alegria, 1975; Bertelson, 1967; Granjon & Reynard, 1977). It is difficult to know for sure what is meant by these concepts, but one possibility is the two sorts of mechanisms just discussed, with the criterion for signal detection being nonselective.
(p.271) One final point. Laming (1969b) developed an argument, based on his 1968 data, that the same theoretical ideas that account for the sequential patterns may also underlie the socalled signalprobability effect. This effect is the fact that the more probable signal is responded to more rapidly and more accurately than the less probable one. It is clear that the more probable signal will, in general, be preceded by a longer run of repetitions than the less probable one. Thus to the extent that repetitions lead to both speed and accuracy, the effect follows. He worked out this idea in some detail for the SPRT model. We will return to the relation between sequential effects and presentation probability in Section 10.3.1.
6.7 CONCLUSIONS
Comparing choicereaction times with simplereaction times leaves little doubt that somewhat more is going on. It usually takes 100 msec and sometimes more to respond to the identity of a signal than to its presence. Many believe that most of the additional time has to do with the processing of information required to distinguish among the possible signals. Others believe that some and perhaps much of the time is required to select among the possible responses. This point of view is argued by Sternberg (1969a). To some degree, this distinction can be examined by uncorrelating the number of responses from the number of signals; some of those studies are taken up in Chapters 11 and 12.
It is also clear from data that subjects effect important compromises or tradeoffs. Perhaps the best known of these is between the two types of error, which is represented by the ROC curves and has become a standard feature not only of modern psychophysics but of much of cognitive psychology. Some models assume that the mind is able to partition the internal evidence about stimuli into categories corresponding to the responses. Once the experimenter records response times as well as choices, a number of questions arise about how the response times relate to the responses made. One controversial question has been how the time depends upon whether the response is correct or in error. The result tends to be this: for highly discriminable signals responded to under considerable time pressure, errors are faster than the corresponding correct responses, but for signals that are difficult to discriminate and with pressure for accurate responding, the opposite is the case. A few studies fail to conform to this pattern, and there is little work on time pressure coupled with difficulttodiscriminate signals.
Another major tradeoff is between speed and accuracy, which is thought to arise when the subject varies the amount of information to be accumulated and processed prior to a response. There is at least the possibility that this tradeoff is a strategy distinct from another tradeoff—namely, the selection of a criterion to determine the actual response to be made once the information (p.272) is in hand. In the fastguess and fixed stoppingrule models they were distinct; in some of the other models studied in later chapters they are not.
The last body of data examined in the chapter concerned sequential effects. Here we were led to believe that there may be at least three distinct mechanisms at work, the last of which is probably a manifestation of the speedaccuracy tradeoff just discussed. The first is the possibility of direct sensory interaction between successive signal presentations, which some authors have suggested arises from the superposition of sensory traces when a signal is repeated sufficiently rapidly. The evidence for this came from the different effects that occur as the time from a response to the next signal presentation is varied. But even after eliminating this sensory effect by using long SRIs, there is still a rather complex pattern of sequential effects following an error. After an error both the MRT slows and accuracy increases, but on subsequent trials the former returns rapidly—in some cases in one trial—to its overall mean value whereas the error rate increases only slowly over a number of trials. Some have interpreted this as evidence for both a type of nonselective preparation that is not long sustained and a selective one that is. Within the information accrual framework, Laming has suggested that the nonselective one is some mechanism—perhaps time estimation, perhaps a signal detector that is causing anticipations—that often initiates the accrual process prior to the actual signal onset. The other mechanism appears to be some sort of speedaccuracy tradeoff for which there are a number of models (Chapters 7 to 10).
Most theories attempt to provide accounts of the ROC and SATF. Relatively little has been done to account in detail for the sequential effects, in part because it leads to messy mathematics and in part because the phenomena to be explained remain somewhat uncertain. There are a few attempts to wed stochastic learning processes to otherwise static models. Laming has coupled a time estimation model with the random walk one, which was originally designed only to provide an account of the ROC and SATF. I am not aware of any models for the sensory effect found with brief RSIs.
I have elected to partition this theoretical work into three broad classes. In Chapter 7 the models assume that the subject can opt, as in the fastguess model, to be in one of several states, each of which has its characteristic time and error pattern. Chapters 8 and 9 explore information accrual models for twosignal experiments. They differ only in whether we assume that time is quantized independently of the information accrual process or by that process itself. And Chapter 10 deals with identification data and models when there are more than two signals, a topic far less fully developed than the twosignal case.
Notes:
(*) Perhaps the earliest relevant study is Garrett (1922) in which he said (p. 6), “Everyday knowledge seems to indicate that, in general, accuracy diminishes as speed increases, but there is little detailed information beyond the bare statement.” He then went on to study how accuracy is affected by stimulus exposure time, but he did not directly manipulate the overall response time, as such, and so it was not really an example of a SATF.
(*) I am particularly indebted to D. R. J. Laming for severe, and accurate, criticism of an earlier version of this section. I suspect that he will view my changes as inadequate, especially since his primary recommendation was that I drop the section entirely, in part at least, on the grounds that too little consensus exists about the empirical facts for the material to be of any real use to model builders. My judgment is that, in spite of its complexity and inconsistencies, this literature is simply too important to ignore.