R. Duncan Luce

Print publication date: 1991

Print ISBN-13: 9780195070019

Published to Oxford Scholarship Online: January 2008

DOI: 10.1093/acprof:oso/9780195070019.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 15 August 2018

Two-Choice Reaction Times: Basic Ideas and Data

Chapter:
(p.205) 6 Two-Choice Reaction Times: Basic Ideas and Data
Source:
Response Times
Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780195070019.003.0006

Abstract and Keywords

This chapter begins with a discussion of choice-reaction times and simple-reaction times. It then discusses a conceptual scheme for tradeoffs, discriminability and accuracy, speed-accuracy tradeoff, and sequential effects.

6.1 GENERAL CONSIDERATIONS

6.1.1 Experimental Design

The simplest choice-reaction time experiment has much in common with the simple-reaction time design. The major difference is that on each trial one of several signals is presented that the subject attempts to identify as rapidly as is consistent with some level of accuracy. This attempted identification is indicated by making one of several responses that correspond systematically to the stimuli. In terms of the driving example, not only must dangerous obstacles be detected, but they must be identified and appropriate responses—braking, swerving, or accelerating—be made, which one depending upon the exact identity, location, and movement of the obstacle.

Formally, we shall suppose that the signal is one of a fixed set of possible signals, s 1, …, s k, and the corresponding responses are r 1, …, r k, so that the subject responds r i if he or she believes s i to have been presented. One may view the selection of a signal by the experimenter as a random variable, which we denote as s n on trial n, and the response is another random variable r n. When there are only two signals and two responses, as will be true throughout this and the following three chapters, I shall avoid the subscripts and, depending upon the context, use either of two notations. In most two-choice situations, I use {a, b} and {A, B} for the names of the stimuli and corresponding responses; the generic symbols are s and r, s = a or b, r = A or B. Occasionally when the experiment was clearly a Yes-No detection one in which the subject responded Yes if a signal was detected and No otherwise, I will denote the presentation set by {s, n}, where n stands for no signal or noise alone, and the response set by {Y, N}.

Many aspects of the design do not differ from the simple case. There is always a foreperiod—either the time from the preceding response, in which case it is called the response—stimulus interval, RSI, or from a warning signal—and it may be variable or constant. Catch trials can be used (e.g., Alegria, 1978, and Ollman, 1970), although it is not common to do so. Payoffs may be based upon response time as well as on its accuracy. But more interesting are the new possibilities that arise from the increased complexity of the design. I list five:

1. 1. Because a discriminative response is being made, it is more plausible to (p.206) use a fixed foreperiod design than in the simple reaction case; indeed, it is entirely possible to have a well-marked time interval during which the signal is presented (this is often used in Yes-No and forced-choice designs). Basically, the argument is that if a discriminative response is made, the subject must wait for the signal to occur and so there is no need to worry about anticipations. This argument is compelling so long as the signals are perfectly discriminable and no response errors occur; but the minute some inaccuracy in the performance is present, then the claim becomes suspect. For a discussion of this and the effective use of random foreperiod in a choice experiment to suppress anticipatory responses, see Green, Smith, and von Gierke (1983). And as we shall see in Section 6.6.5, considerable evidence exists that anticipations may be a problem in many choice-reaction time experiments.

2. 2. Since the experimenter has total control over the sequence of signals presented, it is possible to use that schedule in an attempt to elicit information about what the subject is doing. The most commonly used schedule is a purely random one, but in some studies the stimuli are not presented equally often and in a few sequential dependencies are built into the schedule in order to affect locally the decision strategies employed by the subjects.

3. 3. In addition to payoffs based upon the observed reaction times, there is clearly the possibility of introducing information feedback and payoffs based upon the accuracy of the responses. By varying the values of the payoffs for both response time and accuracy, the experimenter sets up different monetary conflicts between accuracy and speed. If people are able to alter their strategies to take advantage of this tradeoff—and they are—then we may exploit this technique to gain some information about what those strategies are.

4. 4. The relation between the signals can be manipulated over experimental runs in order to study how that variable affects behavior. For example, one can use two pure tones of frequencies f and f + Δf, where the separation Δf is manipulated. Clearly, the dependence of reaction-time performance may very well vary with the value of f used and almost surely will depend upon the intensity of the two tones, which can vary from threshold to very loud.

5. 5. As was true for simple reactions, there are many options for the signals—indeed, there are all the possible differences signals may exhibit beyond simple presence and absence. And there are many options for the response, although the most common continues to be finger presses of highly sensitive keys. For an empirical study of the use of different fingers and hands, see Heuer (1981a, b). What is new, and complicating, is the multiple ways in which the possible responses can be related to the possible signals. In most studies, aside from those that undertake to study the impact of different mappings between responses and signals, experimenters attempt to select a mapping that on intuitive grounds is as natural, compatible, and symmetric as possible. Of course, one can study directly the impact of stimulus-response compatibility on response time, and we will do so in (p.207) Section 10.2.3. In the two-choice situation, three closely related designs are the most common. In each there are two highly sensitive keys, and either the two forefingers are used, each being held over a key, or the subject's preferred forefinger rests on a well-defined spot between the two keys and is moved to the appropriate key, or the forefinger and middle finger of the preferred hand are placed over the two keys.

6.1.2 Response Measures

In any choice experiment there are at least two dependent measures—namely, the choices made and the times they take. One may, in addition, ask of the subject other things such as the confidence felt about the judgment. I shall focus primarily on the first two measures—to my mind the most natural ones. It will prove convenient to partition the story into a number of subpieces. In Section 6.2 and 6.4 I discuss matters that do not entail much, if any, interaction between the two measures or between the measures on successive trials. In Section 6.5 the focus turns to the interaction between response times and errors—the so-called speed-accuracy tradeoff—that arises when the times are manipulated. Section 6.6 examines interactions of responses and their associated times with the events on preceding trials—sequential effects in the data. The following three chapters discuss a number of models that have been proposed to account for some of these as well as other phenomena. After that, in Chapter 10, attention turns to response times when more than two signals are to be identified.

There is, of course, some option in the exact measures of accuracy and time to be collected and reported. For the choices there is little debate (perhaps there should be) about what data to collect: one postulates the existence of a conditional probability, P(r | s), of response r being made when signal s is presented. Of course, we must be sensitive to what aside from the current signal may affect this probability. For example, it may well depend upon some previous signals and responses. Much of the literature tacitly assumes otherwise and, ignoring any dependencies that may exist, relative frequencies of responding are used to estimate these simple conditional probabilities. Some of the data in Section 6.6 should lead to some concern about this practice.

Suppose the signals a, b and the responses A, B are in natural 1 : 1 correspondence; then there are just two independent conditional probabilities since

$Display mathematics$

We usually elect to use P(A | a) and P(A | b). The study of the behavior of these probabilities is the province of much of traditional psychophysics, a topic that is alive and well today in part because of the theory of signal detectability. This literature offers several possible summary measures of (p.208) performance accuracy, the most famous of which is d′. We will go into this question more thoroughly in Sections 6.4 and 6.5.3.

For reaction times, a distribution of times can be developed for each of the four signal-response pairs and, unlike the choice probabilities, these are not logically constrained to be related in any particular way. Of course, these distributions may also depend upon something else, such as preceding signals and responses. Again, much of the literature implicitly assumes no such dependence; but as we shall see in Section 6.6, that assumption is highly optimistic. As with simple reaction times, the distributions can be presented in a variety of ways, but in practice little has been published about actual distributions or their hazard functions. For the most part, only mean reaction times have been reported, although in a few recent studies some other statistics, usually the variance, skew, and kurtosis (see Sections 1.3.4 and 1.4.5), have also been provided.

6.2 RELATIONS TO SIMPLE REACTION TIMES

6.2.1 Means and Standard Deviations

Perhaps the best established fact about choice reaction times is that, with everything else as similar as possible, these times are slower than the comparable simple ones by 100 to 150 msec, and they are usually, but not always, somewhat more variable. This has been known for a very long time and has been repeatedly demonstrated; I shall cite just two studies in which the relationship has been explored with some care. The first does not involve any feedback-payoff manipulation; the second does.

In his Experiment 4, Laming (1968) ran 20 subjects for 120 trials in each of five conditions that were organized in a Latin square design. The signals, white bars of width 0.50 in. and height 2.83 in. and 4.00 in., were presented singly in a tachistoscope and the subject had to identify each presentation as being either the longer or shorter stimulus. Signals were presented according to an equally likely random schedule. There were three general types of conditions. In the simple-reaction time design subjects were to respond as rapidly as possible to both signals. In one variant they used only their right forefinger and in a second only their left forefinger. In the choice design, they used the right forefinger response for one signal and the left for the other. And in the recognition design, the situation was as in the choice one except that one of the two responses was withheld. That is, they had to identify the signals and respond to just one of them. In a certain sense, the recognition design resembles a choice one in which withholding a response is itself a response. Table 6.1 presents the median reaction times and the median over subjects of the standard deviation of reaction times. The pattern of medians is typical: the simple reaction times of about 220 msec are normal for visual stimuli; the choice values are about 200 msec slower; and the recognition times are some 35 msec faster than the choice ones. The (p.209)

TABLE 6.1. Response time data from Experiment 4 of Laming (1968)

Response time in msec

Condition

Median

Median standard deviation

Simple

Right forefinger

228

87

Left forefinger

220

88

Choice

419

88

Recognition

4 in.

384

81

2.83 in.

385

85

variability, which appears to be the same in all conditions, is not so typical. I will say more about it below.

Snodgrass, Luce, and Galanter (1967) ran three subjects in four conditions, including the three run by Laming plus a simple design in which just one of the signals was presented. The two simple cases are identified as simple-1 and simple-2, depending on the number of signals used. The warning and reaction signals were pure tones of 143 msec duration; the foreperiod was 2 sec. The warning signal was a 1100-Hz tone and the two reaction signals were 1000- and 1200-Hz tones, both of which were quite audible. Payoffs were used. They were defined relative to one of two time bands: the fast band ran from 100 to 200 msec following signal onset, and the slow one, from 200 to 300 msec. Responses prior to the band in force were fined 5¢, and those within each successive third of the band received rewards of 3¢, 2¢, and 1¢, respectively. In the choice and recognition designs, correct responses each received 1¢ and incorrect ones lost 5¢ each. At the completion of a trial, the subject was provided information feedback about the amounts won or lost. A total of from 210 to 240 observations were obtained in each combination of band and condition that was run (several possible combinations were omitted). The resulting data are shown in Figure 6.1. The pattern closely resembles that of the Laming experiment. Mean times to both simple conditions fall within the appropriate band and are virtually identical, whereas both the recognition and choice mean times either lie in the slow band or exceed 300 msec independent of which payoff band is used. Apparently, it was impossible for these subjects to identify the signals in less than 200 msec, and the payoff structure was such that to abandon accuracy completely would have been costly. The mean recognition data are from 75 to 100 msec slower and the choice ones are from 100 to 150 msec slower than simple reaction times.

The plot of standard deviations, shown in Figure 6.2, exhibits a similar pattern. These data differ appreciably from Laming's, which showed the same variability in all conditions. It is, perhaps, worth noting that his (p.210)

FIG. 6.1 MRT to pure tones of 1000 and 1200 Hz under four conditions: simple reactions to a single signal; simple reactions to either signal; recognition of and response to just one of the two signals; and choice reactions where the signal presented is to be identified. Two reinforcement bands, 100–200 msec and 200–300 msec, were used. Each data point is based on a sample of 240 reactions. [Figure 2 of Snodgrass et al. (1976); copyright 1967; reprinted by permission.]

FIG. 6.2 Estimated standard deviations for the experiment discussed in Figure 6.1. [Figure 3 of Snodgrass et al. (1967); copyright 1967; reprinted by permission.]

(p.211) variability for simple reaction times is rather large—the coefficient of variation (σ/μ) being about .40 as compared with about .17 in the Snodgrass et al. data—whereas the variability for the choice conditions is similar in the two experiments. There are a number of differences that might underlie this difference in results. The two most notable are the modality, auditory versus visual, and the use of feedback and payoffs in the one and not the other.

Although recognition data typically are somewhat faster than comparable choice reactions, this is not always true. For example, Smith (1978) (as reported in Smith, 1980) found the recognition times (248 msec) to be slower than the choice ones (217 and 220) in a study of vibrotactile stimuli to the index fingers with keypress responses by the stimulated fingers. This reversal of choice and recognition times will reappear as an issue in Section 6.2.2.

Turning to another phenomenon, we know (Section 2.4.1) that simple MRTs become slower as the proportion of catch trials is increased. Alegria (1978) examined their impact in a choice situation where, of course, we must ask both how the error rate and the MRT is affected. A fixed foreperiod of 700 msec was defined by a spot of light moving from left to right, and the signal interval began when it passed a vertical line. The signals were tones of 900 and 2650 Hz, and responses were key presses of the right index and middle fingers. The three conditions of catch trial frequency were 0%, 20%, and 77%. As can be seen in Table 6.2, the error rate was little affected by the proportion of catch trials or by whether the trial previous to the one being examined was a catch or a signal trial. Figure 6.3 shows that MRT is greatly affected by the nature of the preceding trial, and the effect cumulates over trials, but MRT seems little affected by the actual proportion of catch trials. These data are not consistent with the idea that subjects speed up by relaxing their criterion for what constitutes a signal and thereby increase the number of anticipatory responses. The reason for this conclusion is the fact that the error rate is, if anything, smaller following a signal trial than it is following a catch trial.

Section 2.4.3, on the impact of exponential foreperiods in simple reaction time, reported that MRT increases gradually with the actual foreperiod wait,

TABLE 6.2. Percent error as a function of type of preceding trial and proportion of catch trials (Alegria, 1978)

Proportion of catch trials

Preceding trial

Catch

Signal

.77

11.1

7.6

.20

9.5

5.2

0

7.7

(p.212)

FIG. 6.3 MRT for choice reactions to two tone frequencies with ether 20% or 77% catch trials versus the length of the preceding run of catch trials (−) or of signal trials (+). The solid line, which is ordinary simple reactions, involved 44,800 observations per subject; the 20% condition involved 164,000 observations per subject; and the 77%, 234,000. There were six subjects. Depending upon the number n of conditioning events, these sample sizes are reduced by a factor of 1/2n. [Figure 1 of Alegria (1978); copyright 1978; reprinted by permission.]

an increase of from 50 to 100 msec over a range from half a second to 30 sec. For choice reaction times, Green, Smith, and von Gierke (1983) ran a closely comparable auditory experiment in which the signals differed in frequency. They found substantially the same pattern—a rise of about 30 msec in the range from 200 msec to 6 sec. Error responses are slightly faster than correct ones making the same response (see Section 6.4.3), but the curves exhibit basically the same shape. For waits less than 200 msec, the reaction time increases rather sharply as the wait decreases, rising some 20 msec over that interval. The impact of an exponentially distributed foreperiod, therefore, appears to be substantially the same in the choice and simple-reaction time paradigms.

6.2.2 Donders' Subtraction Idea

Donders (1868), in a highly seminal paper, proposed that the time to carry out a specific mental subprocess can be inferred by running pairs of experiments that are identical in all respects save that in the one the subject must use the particular process whereas in the other it is not used. He put the idea this way (Koster's 1969 translation):

The idea occurred to me to interpose into the process of the physiological time some new components of mental action. If I investigated how much this would lengthen the physiological time, this would, I judged, reveal the time required for the interposed term. (Donders, 1969, p. 418)

He proceeded then to report data for several different classes of stimuli for both simple- and choice-reaction times (some for two-stimulus designs (p.213) and much for a five-stimulus one), which he spoke of as, respectively, the a-and b-procedures. The difference between the two times he attributed to the difference in what is required—namely, both the identification of the signal presented and the selection of the correct response to make.

He next suggested that the times of the two subprocesses could be estimated separately by collecting times in the recognition or, as he called it, the c-procedure. As we have seen above, this entails a schedule of presentations like that used in the choice design, but instead of there being as many responses as there are stimuli, there is just one. It is made whenever one of the signals is presented and is withheld whenever the other(s) occur. Using vowels as stimuli and himself as subject, he reported the difference in times between the c- and a-procedures was 36 msec, which he took to be an estimate of the recognition time, and the bc difference was 47 msec, an estimate of the time to effect the choice between responses. The comparable numbers for the Laming data are 161 and 34 msec and for the Snodgrass et al. data they are 110 and 40 msec.

Data in which choices are faster than recognitions, such as Smith (1978), should be impossible in Donders' framework. They may result from subjects using different criteria for responding in the two procedures.

Indeed, Donders expressed one concern over the c-procedure:

[Other people] give the response when they ought to have remained silent. And if this happens only once, the whole series must be rejected: for, how can we be certain that when they had to make the response and did make it, they had properly waited until they were sure to have discriminated? … For that reason I attach much value to the result of the three series mentioned above and obtained for myself as a subject, utilizing the three methods described for each series, in which the experiments turned out to be faultless. (Donders, 1969, p. 242)

Although the subtraction method was very actively pursued during the last quarter of the 19th century and today is often used with relatively little attention given to its theoretical basis (e.g., Posner, 1978), it has not found favor in this century among those who study response times as a speciality. The criticisms have been of four types:

First is the one mentioned by Donders himself—namely, that the recognition method may not, in fact, induce the subjects always to wait until the signal actually has been identified. (This may be a difficulty for the other two methods, as well.) His proposed method of eliminating those runs in which errors occur is not fully satisfactory because, as we noted in Sections 2.2.5 and 2.2.6, it is highly subject to the vagaries of small sample statistics. As we shall see below in Section 6.6.5, there is evidence from choice designs that anticipations also occur when the time pressure is sufficiently great. This problem can probably be greatly reduced by using a random (exponential) foreperiod, as in the simple-reaction-time design (see Green et al., 1983). The reason that this should work better than catch trials is the greater (p.214) opportunity it provides the lax criterion to evidence itself since each trial affords an opportunity to anticipate. Working against it is the fact that many foreperiods are relatively brief.

The second concern, which was more in evidence in the 1970s than earlier, centers on the assumption of a purely serial process in which all of the times of the separate stages simply add. As we know from Chapter 3, this assumption has not been clearly established for the decision latency and the residue. Sternberg (1969a, b), in his classic study on the method of additive factors, suggested a method for approaching the questions of whether the times associated with the several stages are additive, provided that one has empirical procedures for affecting the stages individually. A special case of the method was discussed in Section 3.3.4, and it will be discussed more fully in Section 12.4. This means that we have methods, perhaps not yet as perfected as we would like, to investigate empirically the truth of this criticism.

The third criticism, which is the least easy to deal with, was the one that turned the tide against the method at the turn of the century. It calls into question the assumption of “pure insertion”—namely, that it is possible to add a stage of mental processing without in any way affecting the remaining stages. Sternberg (1969b, p. 422) describes the attack in this way.

[I]ntrospective reports put into question the assumption of pure insertion, by suggesting that when the task was changed to insert a stage, other stages might also be altered. (For example, it was felt that changes in stimulus-processing requirements might also alter a response-organization stage.) If so, the difference between RTs could not be identified as the duration of the inserted stage. Because of these difficulties, Külpe, among others, urged caution in the interpretation of results from the subtraction method (1895, Sees. 69, 70). But it appears that no tests other than introspection were proposed for distinguishing valid from invalid applications of the method.

A stronger stand was taken in later secondary sources. For example, in a section on the “discarding of the subtraction method” in his Experimental Psychology (1938, p. 309), R. S. Woodworth queried “[Since] we cannot break up the reaction into successive acts and obtain the time of each act, of what use is the reaction-time?” And, more recently, D. M. Johnson said in his Psychology of Thought and Judgment (1955, p. 5), “The reaction-time experiment suggests a method for the analysis of mental processes which turned out to be unworkable.”

As Donders seemed unaware of the problem and as introspection is not a wholly convincing argument, an example of the difficulty is in order. Suppose that the decision latencies of simple reaction times are as was described in Section 4.4—namely, a race between change and level detectors, where a level detector is nothing more than the recognition process being put to another use. If in a recognition or choice situation the identification mechanism is no longer available to serve as a level detector because it is being used to identify which signal has been presented, then the (p.215) insertion of the identification task necessarily alters the detection mechanism, changing it from a race to the use of just the change detector. This being so means that the detection of signal onset, particularly of a weak signal, is somewhat slower and more variable than it would have been if the same mechanisms for detection were used as in simple reaction time, and so the recognition times will be somewhat overestimated.

The idea of stages of processing is very much alive today, as we shall see in Part III, but very few are willing to bank on the idea of pure insertion. It is not to be ruled out a priori, but anyone proposing it is under considerable obligation to explain why the other stages will be unaffected by the insertion.

The fourth criticism, first pointed out by Cattell (1886b), is that the c-reaction involves more than pure identification since the subject must also decide either to respond or to withhold the response, which in a sense is just as much of a choice as selecting one of two positive responses. Wundt suggested a d-procedure in which the subject withholds the response until the signal is identified, but in fact makes the same response to every signal. This procedure caused difficulty for most subjects and has rarely been used. It is possible, although not necessary, to interpret Smith's slow c-reactions as direct evidence favoring Cattell's view.

For additional discussion of these matters see Welford (1980b), who summarized the matter as follows (p. 107):

The evidence suggests that Donders' approach was correct in that, except under very highly compatible or familiar conditions, the c-reaction involves less choice of response than the b-reactions, but was wrong in assuming that all choice of response was eliminated. The difference between b- and c-reactions will therefore underestimate the time taken by choice, and the difference between c- and a-reactions will overestimate the time taken to identify signals.

6.2.3 Subtraction with Independence

An obvious assumption to add to Donders' additivity and pure insertion is statistical independence of the times for the several stages. This, it will be recalled, was the basic assumption of Chapter 3. The main consequence of adding this is the prediction that not only will the mean times add, but so will all of the cumulants (Section 1.4.5).

Taylor (1966) applied these ideas to the following design. Donders conditions b and c were coupled with two modified conditions. Using the same stimulus presentation schedule as in b and c, condition b′ involves substituting for one of the signals ordinary catch trials, and the subject is required to make the discriminative response. In condition c′, the presentation is as in b′, but only a single response is required on signal trials. The model assumes a total of four stages: signal detection, signal identification, response selection, and response execution. The first and last are common to (p.216)

TABLE 6.3. Presence of stages in Taylor (1966) design (see text for description of conditions)

Stage

Condition

Signal identification

Response selection

b

Yes

Yes

c

Yes

No

b′

No

Yes

c′

No

No

all four conditions, and so they can be ignored. The second and third are assumed to be invoked as they are needed, and the pattern assumed is shown in Table 6.3. Assuming that this is correct, then we see that the difference between b and c′ is both stages, that between c and c′ is just the signal identification stage, and that between b′ and c′ is just the response selection stage. So if we let T(i) denote the response time observed in condition i, the following equation embodies the additivity and pure insertion assumptions
$Display mathematics$
which is equivalent to
$Display mathematics$

Taylor tested this null hypothesis in an experiment in which the stimuli were red and green disks, the warning signal was a 500-Hz tone, and foreperiods of 800, 1000, 1300, and 1500 msec were equally likely. The sample size in each condition was the 32 responses of the preferred hand of each of eight subjects, for a total of 256 per condition. He tested the additivity prediction for the mean, variance, and third cumulant, and none rejected the null hypothesis.

Were this to be repeated today, one would probably use the Fast Fourier Transform to verify the additivity of the entire cumulative generating function; however, I am not sure what statistical test one should employ in that case.

As was pointed out in Section 3.2.3, if the component inserted is exponential with time constant λ and the density is f with it in and g without it, then

$Display mathematics$

So an easy test of the joint hypothesis of pure insertion, independence, and the insertion of an exponential stage is that the right side of the above expression be independent of t. Ashby and Townsend (1980) examined this (p.217) for data reported by Townsend and Roos (1973) on a memory search procedure of the type discussed in Section 11.1, and to a surprising degree constancy was found. Thus, while there are ample a priori reasons to doubt all three assumptions, an unlikely consequence of them was sustained in one data set. This suggests that additional, more detailed work should be carried out on Donders' model.

6.2.4 Varying Signal Intensity

The effect of intensity on simple reaction times is simple: both MRT and VRT decrease systematically as signal intensity is increased (Sections 2.3.1 and 2.3.2), and for audition but not vision an interaction exists between signal intensity and response criterion (Section 2.5.2). Neither of these statements appears to be true for choice reaction times, as was made evident by Nissen (1977) in a careful summary of the results to that point. The independence for visual stimuli was found. For example, Pachella and Fisher (1969) varied the intensity and linear spacing of 10 lights and imposed a deadline to control response criterion, and did not find evidence for an interaction of intensity with the deadline, although there was one between intensity and spacing. Posner (1978) varied the intensity of visual signals, and using conditions of mixed and blocked intensities he found no interaction. However, since no interaction was found in simple reactions for visual stimuli, the question of what would happen with auditory signals was not obvious. In 1977 the evidence was skimpy.

This question was taken up by van der Molen, Keuss, and Orlebeke in a series of papers. In 1979 van der Molen and Keuss published a study in which the stimuli were 250 msec tones of 1000 Hz and 3000 Hz with intensities ranging from 70 to 105 dB. There were several foreperiod conditions, which I will not go into except to note that the foreperiods were of the order of a few seconds. Both simple and choice reactions were obtained. The major finding, which earlier had been suggested in the data of Keuss (1972) and Keuss and Orlebeke (1977), was that for the choice condition the MRT is a U-shaped function of intensity. The simple reactions were decreasing, as is usual. This result makes clear that the impact of intensity in the choice context is by no means as simple as it is for simple reactions, where it can be interpreted as simply affecting the rate at which information about the signal accumulates. They raised the possibility that a primary effect of intensity is in the response selection stage of the process rather than just in the accumulation process. That was not a certain conclusion because the error rate had not remained constant as intensity was altered. Additional U-shaped data were exhibited by van der Molen and Orlebeke (1980).

To demonstrate more clearly the role of response selection, van der Molen and Keuss (1981) adapted a procedure introduced by J. R. Simon (1969; Simon, Acosta, & Mewaldt, 1975; Simon, Acosta, Mewaldt, & (p.218) Speidel, 1976). The signals were as before with a range of 50 to 110 dB. They were presented monaurally through ear phones, and responses were key presses with the two hands corresponding to signal frequency. Lights and digits presented just before the beginning of the foreperiod provided additional information about what was to be presented. The digit code was the following: 000 indicated to the subject that the signal was equally likely to be presented to either ear, 001 that it would go to the right ear, and 100 that it would go to the left ear. One colored light indicated that the ear receiving the signal and its frequency would be perfectly correlated, whereas the other color indicated no correlation of location and frequency. Note that the correct response could be either ipsilateral or contralateral to the ear to which the signal was presented. The data showed that responding was fast and a monotonic decreasing function of intensity when the correlated presentation was used and the response was ipsilateral. In any contralateral or uncorrelated ipsilateral condition, MRT was slower and U-shaped. They concluded that these results support the hypothesis that a major impact of intensity in choice reactions is on the response selection stage.

In still another study, Keuss and van der Molen (1982) varied the foreperiod—either a fixed one of 2 sec or a variable one of 20, 25, 30, 35, or 40 sec—and whether the subject had preknowledge of the intensity of the presentation. The effect of preknowledge was to reduce MRT by about 10 msec. More startling was the fact that both simple and choice MRTs decreased with intensity except for the briefer 2-sec foreperiod, where again it was found to be U-shaped. Moreover, the error rate was far larger in this case than in the others. They seemed to conclude that the foreperiod duration was the important factor, although it was completely confounded with constant versus variable foreperiod. Assuming that it is the duration, they claimed this to be consistent with the idea of intensity affecting the response selection process. I do not find it as compelling as the earlier studies.

6.3 A CONCEPTUAL SCHEME FOR TRADEOFFS

6.3.1 Types of Parameters

A major theoretical feature of cognitive psychology is its attempt to distinguish both theoretically and experimentally between two classes of variables and mechanisms that underlie behavior. The one class consists of those experimental manipulations that directly affect behavior through mechanisms that are independent of the subjects' motivations. For example, most psychologists and physiologists believe that the neural pulse patterns that arise in the peripheral nervous system, immediately after the sensory transducer converts the physical stimuli into these patterns, are quite independent of what the subject will ultimately do with that information. This means that these representations of the signal are independent of the type of (p.219) experiment—reaction time, magnitude estimation, discrimination—, of the questions we pose to the subject, and of the information feedback and payoffs we provide. Such mechanisms are often called sensory or perceptual ones, and the free parameters that arise in models of the mechanism are called sensory parameters. The experimental variables that activate such mechanisms have no agreed upon name. In one unpublished manuscript, Ollman (1975) referred to them as “display variables,” but in a later paper (1977) he changed it to “task variables.” I shall use a more explicit version of his first term, sensory display variables.

The other type of mechanism is the decision process that, in the light of the experimental task posed, the subject brings to bear on the sensory information. These are mechanisms that have variously been called control, decision, motivation, or strategy mechanisms. Ollman refers in both papers to the experimental variables that are thought to affect these mechanisms directly as strategy variables. I shall follow the terminology of decision mechanism, decision parameters, and decision strategy variables. The latter include a whole range of things having to do with experimental procedure: the task—whether it is detection, absolute identification, item recognition, and so on—the details of the presentation schedule of the signals, the payoffs that are used to affect both the tradeoff of errors and to manipulate response times, and various instructions aimed at affecting the tradeoffs established among various aspects of the situation.

It should be realized that there is a class of motivational variables, of which attention is a prime example, that I shall not attempt to deal with in a systematic fashion. Often, attentional issues lie not far from the surface in our attempts to understand many experiments, and they certainly relate to the capacity considerations of Chapters 11 and 12. Moreover, they are a major concern of many psychologists (Kahneman, 1973). Some believe that such variables affect the sensory mechanism, which if true only complicates the story to be outlined.

To the degree that we are accurate in classifying sensory display and decision strategy variables, the former affect the sensory parameters and the latter the decision parameters. But a major asymmetry is thought to exist. Because the decision parameters are under the subject's control, they may be affected by sensory display variables as well as by decision strategy ones; whereas, it is assumed that the sensory parameters are not affected by the decision strategy variables. What makes the study of even the simplest sensory or perceptual processes tricky, and so interesting, is the fact that we can never see the impact of the sensory display variables on the sensory mechanism free from their impact on the decision parameters. Even if we hold constant all of the decision strategy variables that are known to affect the decision mechanism, but not the sensory ones, we cannot be sure that the decision parameters are constant. The subject may make changes in these parameters as a joint function of the sensory display and decision strategy variables.

(p.220) One major issue of the field centers on how to decide whether a particular experimental design has been successful in controlling the decision parameters as intended. This is a very subtle matter, one that entails a careful interplay of theoretical ideas and experimental variations. During the late 1960s and throughout the 1970s this issue was confronted explicitly as it had never been before, and out of this developed considerable sensitivity to the so-called speed-accuracy tradeoff.

In Section 6.3.2 I shall try to formulate this general conceptual framework as clearly as I know how, and various special cases of it will arise in the particular models examined in Chapters 710.

6.3.2 Formal Statement

Assuming the distinction just made between sensory and decision mechanisms is real, we may formulate the general situation in quite general mathematical terms. Each experimental trial can be thought of as confronting the subject with a particular environment. This consists not only of the experimentally manipulated stimulus presented on that trial, but the surrounding context, the task confronting the subject, the reward structure of the situation, and the previous history of stimuli and responses. We can think of this environment as described by a vector of variables denoted by $E →$. We may partition this into a subvector $S →$ of sensory display variables and a subvector $D →$ of decision strategy variables: $E → = ( S → , D → ) .$. The observable information obtained from the trial are two random variables, the response r and some measure T of time of occurrence. We assume that (r, T) is governed by a joint probability density function that is conditioned by the environmental vector, and it is denoted

(6.1)
$Display mathematics$

In order to estimate f from data, it is essential that we be able to repeat the environment on a number of trials, and so be able to use the empirical histograms as a way to estimate f. Obviously, this is not possible if $E →$ really includes all past stimuli and responses, and so in practice we truncate the amount of the past included in our definition of $E →$. For more on that, see Section 6.6.

The theoretical structure postulates the existence of a sensory mechanism with a vector $σ → = ( σ 1 , … , σ 1 )$ of sensory parameters and a decision mechanism with a vector $δ → = ( δ 1 , … , δ m )$ of decision parameters. In general, $σ →$ and $δ →$ should be thought of as random vectors; that is, their components are random variables and they have a joint distribution that is conditional on $E →$. If $σ →$ and $δ →$ are numerical l- and m-tuples, respectively, we denote the joint density function of $σ →$ and $δ →$ by

(6.2)
$Display mathematics$

When we have no reason to distinguish between the two types of parameters, we simply write $∈ → = ( σ → , δ → ) .$. For each set of parameter values, it is (p.221) assumed that the sensory and decision processes relate the (r, T) pair probabilistically to the parameters and to them alone. In particular, it is assumed that (r, T) has no direct connection to $E →$ except as it is mediated through the parameters. We denote the joint density function of (r, T) conditional on $∈ →$ by

(6.3)
$Display mathematics$

So Ψ is the theory of how the sensory and decision processes jointly convert a particular set of parameter values into the (r, T) pair. By the law of total probability applied to the conditional probabilities of Eqs. 6.2 and 6.3, we obtain from Eq. 6.1

(6.4)
$Display mathematics$
where the integral is to be interpreted as just that in the case of continuous parameters and as a sum in the case of discrete ones. We will always assume that the densities are of such a character that the integral exists in some usual sense (Riemann or Lebesgue).

A special case of considerable interest is where r and T are independent random variables for each set of parameters; that is, if $P ( r | ∈ → )$ and $f ( t | ∈ → )$ denote the marginal distributions of Ψ,

(6.5)
$Display mathematics$

We speak of this as local (or conditional) independence. Observe that it does not in general entail that $f ( r , t | E → )$ also be expressed as the product of its marginals. [It is perhaps worth noting that conditional independence is the keystone of the method of latent structure analysis sometimes used in sociology (Lazarsfeld, 1954)]. Among the models we shall discuss in Chapters 7 to 9, local independence holds for the fast guess and counting models, but not for the random walk or timing models. One happy feature of local independence is the ease with which the marginal distributions are calculated.

Theorem 6.1. If Eqs. 6.4 and 6.5 hold, then

(6.6)
$Display mathematics$

(6.7)
$Display mathematics$

The easy proof is left to the reader.

6.3.3 An Example: The Fast-Guess Model

A specific, simple example should help fix these ideas and those to follow (especially Section 6.5). I shall use a version of the fast-guess model of Ollman (1966) and Yellott (1967, 1971), which will be studied more fully in (p.222) Section 7.4. Suppose that prior to the presentation of the signal the subject opts to behave in one of two quite distinct ways. One is to make a simple reaction to signal onset without waiting long enough to gain any idea as to which signal was presented. We assume that the simple-reaction-time density to both signals is g0(t) and, quite independent of that, response r is selected with probability βr, r = A, B, where βA + βB = 1. The other option is to wait until the information is extracted from the signal and to respond according to that evidence. This time is assumed to have density g1(t) for both signals and that, independent of the time taken, the conditional probability of response r to signal s is P sr, where s = a, b, r = A, B, P sA + P sB = 1. Note that we have built in local independence of r and T for each of the two states, 0 representing fast guesses (simple reactions) and 1 the informed responses. Denote by ρ the probability that the subject opts for the informed state. In arriving at the general form for $f ( r , t | E → )$ let us suppress all notation for $E →$ save for the signal presented, s. From Eqs. 6.4 and 6.5 we obtain

(6.8)
$Display mathematics$

Observe that this model has two decision parameters—namely, ρ and βA (recall, βB = 1 − βA)—, two sensory functions—g0 and g1—, and two discrimination (sensory) parameters—P sA, s = a, b (recall, P sB = 1 − P sA). For some purposes we can reduce the functions to their means, ν0 and ν1. Implicitly, I assume ν01 (see Section 6.2.1).

Either by direct computation from Eq. 6.8 or by using Eqs. 6.6 and 6.7, we obtain for the marginals

(6.9)
$Display mathematics$

(6.10)
$Display mathematics$

Note that f(t | s) is actually independent of s. From Eq. 6.9 we can compute the overall expected reaction time to the presentation of a particular signal s:

(6.11)
$Display mathematics$

For some purposes it is useful to compute E(T) for each (s, r) pair separately—that is, the mean of f(t | r, s) = f(r, t | s)/P(r | s), which from Eq. 6.8 we see is

(6.12)
$Display mathematics$
where P(r | s) is given by Eq. 6.10.

A few words may be appropriate at this point about how one attempts to confront such a model with data. One immediately obvious problem is that the data provide us with estimates of $f ( r , t | E → )$ whereas the theory as stated in Eq. 6.8 has a number of parameters that are not explicit functions of $E →$. (p.223) Of course, from the interpretation of the model the parameters ρ and βr belong to the decision process and the others belong to the sensory process. Thus, we do not anticipate any dependence of the latter parameters on manipulations of the decision strategy, but the former may depend upon any aspect of $E →$. It is quite typical of models throughout psychology—not just those for response times—that no explicit account is offered for the dependence of the parameters on environments. This is a fact and limitation, not a virtue, of our theorizing.

Because we cannot compute the parameters from knowing $E →$, in practice we estimate them in some fashion from the data and then attempt to evaluate how well the model accounts for those data. For example, this will be done for the fast guess model in Section 7.4. If the fit is reasonably satisfactory and if enough data have been collected, which often is not the case, we can then study empirically how the parameters vary with the different experimental manipulations. Sometimes quite regular relations arise that can be approximated by some mathematical functions and that are then used in later applications of the model to data.

6.3.4 Discrimination of Color by Pigeons

In some data, however, it is reasonably clear without any parameter estimation that something like fast guesses are involved because the two distributions g0 and g1 are unimodal and are so separated that the overall response-time distribution is bimodal. The clearest example of this that I know is not with human data, but with pigeons. During the training phase, Blough (1978) reinforced the birds for responding to the onset of a 582-nm light (S+) and the onset of the signal was delayed a random amount whenever a response (peck) was made at a time when the light was not on. This is a discrimination, not a choice design. In the test phase, all lights from 575- to 589-nm in 1-nm steps were presented equally often except for 582 nm, which was three times more frequent than the others and was reinforced on one third of its occurrences. Response probability was a decreasing function of the deviation from 582 and it was approximately the same decay on both sides. To keep the figure from being too complex, only the data for the smaller wave lengths are shown. The pattern of response times, shown for one bird in Figure 6.4 is strikingly bimodal. The earlier mode, the unshaded region of these distributions, clearly does not differ from signal to signal in frequency of occurrence, location in time, or general shape. Since that mode had a central tendency of about 170 msec, I suspect they were simple reaction times to signal onset—that is, fast guesses. However, I am not aware of any simple reaction time data for pigeons with which to compare that number. The second mode, the shaded region, is clearly signal dependent in that the number of responses of this type decreases as the signal deviates from the reinforced S + (582 nm); however, its location did not seem to change and its central tendency was about 350 msec. These data, (p.224)

FIG. 6.4 Histograms of response times of a pigeon to signal lights of several frequencies, of which the one marked S + was reinforced during training and partially reinforced during testing. The sample size for each distribution was 1760. Note the striking bimodality of the histograms and the relative independence of the first mode as the wavelength of the light varies. [Figure 4 of Blough (1978); copyright 1978; reprinted by permission.]

which are completely consistent with the fast-guess model, suggest that pigeons's reactions to visual signals are similar to those of people; the mean of 170 msec for simple reactions may be a trifle faster, but the additional delay of about 180 msec for accurate responding is, as we saw in Section 6.2.1, very similar to people.

6.4 DISCRIMINABILITY AND ACCURACY

6.4.1 Varying the Response Criterion: Various ROC Curves

By now it is a commonplace that either by varying the relative frequency of a to b, or by differential payoffs for the four possible signal-response pairs, or just by instructing the subject to favor A or B, one can cause P(A | a) and P(A | b) to vary all the way from both conditional probabilities being 0 to both being 1—that is, from no A responses at all to all A responses. Note that nothing about the stimulating conditions, and so presumably nothing about the sensory mechanism, is altered. Just the presentation probability or the payoffs or the instructions are varied. Moreover, the locus of these pairs of points is highly regular, appearing to form a convex function when P(A | a) is plotted against P(A | b)—that is, an increasing function with a decreasing slope. Typical data are shown in Figure 6.5. Such functions are (p.225) called ROC curves, the term stemming from the engineering phrase “receiver operating characteristic.” Detailed discussions of these curves, of data, and of mathematical models to account for them can be found in Green and Swets (1966, 1974) and Egan (1975). In the case of Yes-No detection in which A = Y, B = N, a = s = signal, and b = n = noise, we speak of P(A | s) as a “hit,” P(N | s) as a “miss,” P(Y | n) as a “false alarm,” and P(N | n) as a “correct rejection.”

Let us work out the ROC for the fast-guess model, which we do by eliminating βA from Eq. 6.10, where s = a, b, r = A:

$Display mathematics$
which is simply a straight line with slope 1. Other models yield different, usually curved, ROCs. For example, the standard model of the theory of signal detectability assumes that each stimulus is internally represented as a Gaussian random variable and the continuum of possible values is partitioned by a cut—called the response criterion—and the subject responds according to whether the observation is larger or smaller than the criterion. This model is sketched in Figure 6.5.

For some time it had been noted that as the criterion was varied, subjects exhibited a tendency to respond faster when the internal representation of the signal was far from the criterion, as judged by confidence ratings, and slower when it was close to the criterion (Emmerich, Gray, Watson, & Tanis, 1972; Fernberger, Glass, Hoffman, & Willig, 1934; Festinger, 1943a, b; Gescheider, Wright, Weber, Kirchner, & Milligan, 1969; Koppell, 1976; Pike, 1973; and Pike & Ryder, 1973). Pike and Ryder (1973) refer to the assumption that E(T) = f(|xc|), where c is the criterion, as the latency function hypothesis. Festinger (1943a, b), Garrett (1922), and Johnson (1939) all reported no evidence that the relation between confidence and reaction time is affected by instructions emphasizing either speed or accu-

FIG. 6.5 Example of auditory ROC data obtained under fixed signal conditions with varied payoffs. The theoretical curve arises from Gaussian decision variables postulated in the theory of signal detectability. The inset indicates how the ROC is generated by varying the response criterion. [Figure 4.1 of Green and Swets (1966); copyright 1966; reprinted by permission.]

(p.226)

FIG. 6.6 Schematic of how latency ROC curves are constructed; see text for explanation.

racy of responding (Section 6.5). As a result confidence and reaction time were assumed to be very closely related. Recently, however, Vickers and Packer (1981) carried out a careful study using line length discriminations and found a decided difference as a function of instruction. The reason for this difference in results is uncertain.

The earlier apparent relation between confidence and reaction time together with the fact that ROC curves can be quite accurately inferred from confidence judgments led to the idea of trying to infer the ROC curve from response-time data. One method was proposed by Carterette, Friedman, and Cosmides (1965) and another, closely related, one by Norman and Wickelgren (1969), which is illustrated in Figure 6.6. The idea is this: For each signal, place the two subdistributions, weighted by their probabilities of occurring, back to back, with the A response on the left. From these (p.227) artificial densities we generate the ROC as follows: Align them at the point of transition from A responses to B responses and vary the criterion. To be more explicit, the locus of points (x, y) is generated as follows. For xP(A | a), let t be such that

$Display mathematics$
and set
$Display mathematics$

For x > P(A | a), let t be such that

$Display mathematics$
and set
$Display mathematics$

This locus of points has been called both the latency operating characteristic, abbreviated LOC, and the RT-ROC. It is important to note that Lappin and Disch (1972a, b and 1973) used the term LOC for a speed-accuracy measure that others call the conditional accuracy function (see Section 6.5.4).

The construction we have just outlined is really quite arbitrary; it does not stem from any particular theoretical view about what the subject is doing. The major motive for computing it is the intuition that reaction time serves as a proxy for the subject's confidence in the response made. According to Gescheider et al. (1969) in a model with a response criterion, such as that of the theory of signal detectability, this function has to do with how far the evidence about the stimulus is from the response criterion.

The only serious theoretical study of the relation between the ROC and RT-ROC is Thomas and Myers (1972). The results, which are rather complicated and will not be reported very fully here, were developed for both discrete and continuous signal detection models; that is, there is an internally observed random variable X s for signal s, which is either discrete or continuous and has density function f(x | s), and there is a response criterion β such that the response is A when X s > β and B when X s < β. They assumed that the latency is a decreasing function of |X s − β|. They considered the somewhat special case where the distributions are simply a shift family; that is, there is a constant k ab such that for all x, f(t | a) = f(tk ab | b), and that the slope of the ROC curve is decreasing (which they showed is equivalent to −d 2 logf(x | s)/dx 2 ≥ 0). Under these assumptions, they proved that the RT-ROC lies below the ROC except, of course, at the point (P(A | a), P(A | b)), which, by construction, they have in common.

Emmerich et al. (1972) reported a detection study of a 400-Hz tone in noise in which detection responses and confidence judgments were made; in addition, without the subjects being aware of it, response times were recorded. The confidence and RT-ROCs are shown in Figure 6.7. Note that these are not plots of P(A | a) versus P(A | b), but of the corresponding Gaussian z-scores {i.e., z(r | s) is defined by P(r | s) = Φ[z(r | s)], where Φ is the unit Gaussian distribution}, which results in straightline ROCs if the (p.228)

FIG. 6.7 ROC data of several types presented in z-score coordinates (displaced along the abscissa for clarity), in which case the Gaussian ROCs are straight lines. The three curves on the left are latency ROCs, and the two on the right are conventional ones obtained from choice probabilities, with the squares data from Emmerich (1968) and the triangles from Watson et al. (1964). [Figure 4 of Emmerich et al. (1972); copyright 1972; reprinted by permission.]

underlying distributions are Gaussian. Ordinary Yes-No and confidence ROCs are usually rather closely fit by straight lines in z-score coordinates, but as can be seen the RT-ROCs exhibit a rather distinct elbow. This result is typical of all that have been reported: Moss, Meyers, and Filmore (1970) on same-difference judgments of two tones; Norman and Wickelgren (1969) using the first of memorized digit pairs as the stimuli; and Yager and Duncan (1971) using a generalization task with gold fish. The conclusion drawn by Emmerich et al. (1972, p. 72) was:

Thus latency-based ROCs should probably not be viewed as a prime source of information about sensory processing alone. Response latencies are known to be influenced by many factors, and this is undoubtedly also the case for latency-based ROCs. Yager and Duncan (1971) reach a similar conclusion….

Additional cause for skepticism is provided by Blough's (1978) study of wavelength discrimination by pigeons, which was discussed previously in Section 6.3.4. Using the response-time distributions, RT-ROC curves were developed with the result in Figure 6.8. These do not look much like the typical ROC curves. For example, from other pigeon data in which rate of responding was observed, one gets the more typical data shown in Figure 6.9. The reason for the linear relations seen in Figure 6.8 is the bimodal character of the response-time distributions seen in Figure 6.4 (Section 6.3.4).

(p.229) 6.4.2 Varying the Discriminability of the Signals

The most obvious effect of altering the separation between two signals that differ on just one dimension—for example, intensity of lights—is to alter the probability of correctly identifying them. This information was traditionally presented in the form of the psychometric function: holding signal b fixed and varying a, it is the plot of P(A | a) as a function of the signal separation,

FIG. 6.8 Latency ROCs for pigeons; an example of the distributions from which these were constructed was shown in Figure 6.4. [Figure 2 of Blough (1978); copyright 1978; reprinted by permission.]

(p.230)

FIG. 6.9 Conventional choice probability ROCs for pigeons responding to light stimuli. [Figure 3 of Blough (1978); copyright 1978; reprinted by permission.]

usually either the difference, the ratio, or log ratio of the relevant physical measures. This function begins at about $1 2$ for two identical signals and grows to 1 when they are sufficiently widely separated—approximately a factor of 4 in intensity. However, given the impact of response criterion just discussed, it is clear that there is no unique psychometric function but rather a family of them. For example, in the fast-guess model (Section 6.3.3) with the interpretation given the parameters above, the impact of signal difference will be on P sr and not directly on ρ or βA, which are thus parameters of the family of psychometric functions for this model (Eq. 6.10).

For very careful and insightful empirical and theoretical analysis of psychometric functions, see Laming (1985). He makes very clear the importance of plotting these functions in terms of different physical measures depending on the exact nature of the task. He also provides a most interesting theoretical analysis concerning the information subjects are using in the basic psychophysical experiments.

In order to get a unique measure it is necessary to use something that captures the entire ROC curve. The most standard measure is called d′, and its calculation is well known (see Green and Swets, 1966, 1974, or Egan, 1975, or any other book on signal detection theory); it is the mean separation of the underlying distributions of internal representations normalized by some average of their standard deviations.

Navon (1975) showed under the same assumptions made by Thomas and Meyers (1972) that if the false-alarm rate is held constant as stimulus discriminability is varied, then E(T | a, A) − E(T | b, A) is a monotonic function of d′ ab.

The second, and almost equally obvious, effect is that subjects are slower (p.231) when the signals are close and faster when they are farther apart. Moreover, this phenomenon occurs whether the conditions of signal separation are run blocked or randomized. Among the relevant references are Birren and Botwinick (1955), Botwinick, Brinley, and Robbin (1958), Crossman (1955), Henmon (1906), Festinger (1943a, b), Johnson (1939), Kellogg (1931), Lemmon (1927), Link and Tindall (1971), Morgan and Alluisi (1967), Pickett (1964, 1967, 1968), Pike (1968), Vickers (1970), Vickers, Caudrey, and Willson (1971), Vickers and Packer (1981), and Wilding (1974). The fact that it appears using randomized separations means that the phenomenon is very much stimulus controlled since the subject cannot know in advance whether the next discrimination will be easy or difficult.

As Johnson (1939) made clear and as has been replicated many times since, the stimulus range over which these two measures—accuracy and time—vary appears to be rather different. At the point the response probability appears to reach its ceiling of 1, response times continue to get briefer with increasing signal separation. Moreover, the same is true for the subject's reports of confidence in the response, which is one reason that reaction time is often thought to reflect confidence in a judgment (or vice versa, since it is not apparent on what the confidence judgments are based). It is unclear the degree to which this is a real difference or a case of probabilities being very close to 1 and estimated to be 1 from a finite sample. In the latter case, it may be the ranges corresponding to changes in d′ and E(T) may actually be comparable.

Some debate has occurred over what aspect of the stimulus separation is controlling, differences or ratios or something else. Furthermore, there is no very good agreement as to the exact nature of the functions involved (see Vickers, 1980, pp. 36–38). If accuracy increases and time decreases with signal separation, then they must covary and one can be plotted as a function of the other. This plot, however, is not what is meant when one speaks of a speed-accuracy tradeoff, which is discussed in Section 6.5.

The only study of the dependence of response time on signal separation I shall present here in any detail is Wilding (1974), because he gives more information about the reaction-time distributions than do the others. The task for each of his seven subjects was to decide on each trial if a 300-msec spot of light of moderate brightness was to the right or left of the (unmarked) center of the visual field. There were four possible locations on each side forming a horizontal line, numbered from 1 on the left to 8 on the right, spanning a visual angle of about 0.4°. So 1 and 8 were the most discriminable stimuli and 4 and 5, the least. The data were collected in runs of 110 trials, the first 10 of which were discarded. There were four runs in each of two sessions which differed according to instructions, the one emphasizing accuracy and the other speed.

In analyzing the data for certain things, such as the fastest or the slowest response, one must be cautious about sample sizes. For signals 1 and 8 the probability of being correct was virtually 1, whereas it was very much less (p.232)

FIG. 6.10 For signal positions to be discriminated as being to the right or left of an unmarked point, various summary statistics about the response times as a function of signal location and whether the response was correct or in error. (1, 8), (2, 7), (3, 6), (4, 5) are all correct responses from the most distant to the least, and (5, 4) and (6, 3) are incorrect ones for the least distant and the next position. The data on the left of each subpanel were obtained with the subjects instructed to be fast, and on the right, to be accurate. Each subject participated in 400 trials; there were seven subjects. [Figure 1 of Wilding (1974); copyright 1974; reprinted by permission.]

(p.233) than that for 4 and 5, and so the sample sizes of, say, correct responses were not the same. Wilding, therefore, forced comparability by choosing randomly from the larger sets samples of the same size as smaller ones, and he reported the results both for the original and the truncated samples. Figure 6.10 shows a number of statistics. The measures all come in pairs, with the left one arising from the speed instructions and the right one from the accuracy instructions. We see a number of things. First, all of the times are very slow, indeed, much slower than the choice data discussed in Section 6.2.1. I do not understand why this was so. One possibility is that the stimuli were actually weak, but I cannot be certain from the paper. Second, the effect of the instructions is to produce a mean difference of nearly 500 msec. Third, the times that arise from errors, the pairs denoted (6, 3) and (5, 4), are slower than the times from the corresponding correct responses, (3, 6) and (4, 5). Fourth, easy discriminations are, of course, both faster and less prone to error than are difficult ones. Fifth, the pattern of the standard deviations mimics rather closely that of the means. And sixth, the skewness and kurtosis are both positive and increasing with ease of discrimination. This is in agreement with data of Vickers, Caudrey, and Willson (1971) discussed in Section 8.5.1.

Figure 6.11 shows the latency frequency histograms for each subject for the accuracy condition; the columns correspond to (1, 8), (2, 7),…, (6.3), as in Figure 6.9. There are not enough data here to tell much about the mathematical form involved except that both the mean and variance increase as stimulus discriminability decreases.

6.4.3 Are Errors Faster, the Same as, or Slower than the Corresponding Correct Responses?

This question has loomed important because a number of the models to be discussed in Chapters 7 to 9 make strong (and often nonobvious) predictions about the relation. I know of three appraisals of the situation—Swensson (1972a), Vickers (1980), and Wilding (1971a)—and they are inconsistent. The earlier ones, which Vickers seems to have ignored, are the more accurate.

It is actually clear from the data already presented that there is no simple answer to the question. For the pigeons attempting to make a difficult discrimination, but doing it quite rapidly, the data in Figure 6.7 make clear that errors are faster than correct responses; in fact, the larger the error the faster it is. But for people also engaged in a visual discrimination, we see in Figure 6.10 that errors are slower than the corresponding correct response. The source of the difference probably is not the species of the subject, although no directly comparable data exist.

According to Swensson the important difference, at least for human beings, can be described as follows. Errors are faster than correct responses when two conditions are met: the discrimination is easy and the pressure to (p.234)

FIG. 6.11 For the experiment of Figure 6.10, histograms for response times in the accuracy condition. The rows correspond, from top to bottom, to the abscissa code of Figure 6.10 with (1, 8) at the top; the columns are different subjects. [Figure 2 of Wilding (1974); copyright 1974; reprinted by permission.]

(p.235) be fast is substantial. This is true of the data of Egeth and Smith (1967), Hale (1969a), Laming (1968), Lemmon (1927), Ollman (1966), Rabbitt (1966), Schouten and Bekker (1967), Swensson (1972a), Swensson and Edwards (1971), Weaver (1942), and Yellot (1967). A near exception to this rule are the results of Green et al. (1983). In this study well-practiced observers each making over 20,000 frequency identifications exhibited virtually the same mean times for errors and correct responses. A careful analysis of the distributions, however, did show a difference in the direction of faster errors. I will go into this experiment in detail in Section 8.5. Continuing with Swensson's rule, errors are slower than correct responses when two conditions are met: the discrimination is difficult and the pressure to be accurate is substantial. This is true of Audley and Mercer (1968), Emmerich et al. (1972), Hecker, Stevens, and Williams (1966), Henmon (1911), Kellogg (1931), Pickett (1967), Pierrel and Murray (1963), Pike (1968), Swensson (1972a), Vickers (1970), and Wilding (1971a, 1974).

There are fewer studies in which the discrimination is difficult and the pressure to be fast is great. Henmon (1911) showed that under such conditions the fastest and slowest response times involved higher proportions of errors than did the intermediate times, suggesting that the subjects may have oscillated between two modes of behavior. Rabbitt and Vyas (1970) suggested that errors can arise from a failure of either what they call perceptual analysis or response selection. They state (apparently their belief) that when the failure is in response selection, errors are unusually fast; but when it is in perceptual analysis, error and correct RT distributions are the same. Apparently, perceptual analysis corresponds to what I call the decision process and is the topic of most modeling. In Blough's experiment, the pigeons exhibited fast errors, and judging by the times involved the pigeons were acting as if they were under time pressure. Of course, Wilding found errors to be slower than correct responses under both his speed and accuracy instructions, but one cannot but question the effectiveness of his speed instructions when the fastest times exceeded 450 msec. Link and Tindall (1971) combined four levels of discriminability with three time deadlines—260 msec, 460 msec, and ∞ msec—in a study of same-different discrimination of pairs of line lengths successively presented, separated by 200 msec. Their results are shown in Figure 6.12. Note that under the accuracy condition and for the most difficult discriminations, errors are slower than correct responses; whereas at the 460-msec deadline the pattern is that errors are faster than correct responses and the magnitude of the effect increases with increased discriminability; and at 260 msec, which is only slightly more than simple reaction times, the mean times are constant, a little less than 200 msec, independent of the level of discriminability and of whether the response is correct or in error. The latter appear to be dominated by fast guesses—that is, simple reactions—but they cannot be entirely that since accuracy is somewhat above chance. A careful analysis of these data will be presented in Section 7.6.2. Thomas (1973) reported a (p.236)

FIG. 6.12 For same-different judgments of pairs of lines, MRT and proportion of correct judgments as a function of signal discriminability (abscissa) under three deadline conditions. In the MRT plot, the solid circles are obtained by using the fast-guess model's correction for guessing. [Figure 1 of Link and Tindall (1971); copyright 1971; reprinted by permission.]

study in which errors were fast for one foreperiod distribution and one group of subjects and slow for another distribution and group of subjects; I cannot tell from the experimental description if the signals were easy or difficult to discriminate. Heath (1980) ran a study on the discrimination of the order of onset of two lights, where the time between onset varied, and he imposed response deadlines. Unfortunately, the data do not seem to be in a form suited to answering the question of this section.

6.5.1 General Concept of a Speed-Accuracy Tradeoff Function (SATF)

Within any response-time model of the type formulated in Section 6.3.3, as the decision parameters are varied, changes occur in $ψ( r , t | σ → , δ → ),$ which in turn are reflected in $f ( r , t | E → )$. The general intuition is that these changes are such that the marginal measures of probability of choice, $P ( r | E → )$, and of response time, $f ( r | E → )$, covary. The more time taken up in arriving at a decision, the more information available, and so the better its quality. This statement is, perhaps, overly simple since if a great deal of time is allowed to pass between the completion of the signal presentation and the execution of the response, then the accuracy of responding may deteriorate because of some form of memory decay. But for a reasonable range of times the (p.237) statement appears to be correct. What is probably most relevant is the portion of the stimulus presentation that can be processed before an order to respond is issued. Usually we attempt to control that time indirectly through time deadlines on the response time.

As usual, the theory involves a covariation due to changes in parameters, and the data a covariation due to changes induced in $E →$. For example, suppose the experimenter imposes a time deadline on the subject such that any response occurring after signal onset but before the deadline is rewarded for accuracy, but those that are slower than the deadline are fined independent of their accuracy. The effect of changes in the deadline is to vary both the accuracy and the mean response time. Suppose that we suppress all of the notation for $E →$ except for the two things that vary from trial to trial—namely, the signal presented, s, and the deadline, δ, imposed. The averaged data then consist of four pairs of numbers:

$Display mathematics$
for each value of δ. If we think of these as functions of δ, then we may solve for the one in terms of the other, eliminating δ, yielding four empirical functions of the form
$Display mathematics$

These are called empirical speed-accuracy tradeoff functions (SATF) or latency-probability functions (LPF). I use the former term and abbreviation. As four separate functions are a bother, especially since they probably contain much the same information, in practice some simplification is made. Usually a single measure of accuracy—four different ones will be mentioned below—is plotted against the overall MRT, and that is referred to as the SATF. Other terms are found in the literature such as the speed-accuracy operating characteristic or S-A OC (Pew, 1969) and the macro tradeoff (Thomas, 1974). The rationale for the latter term will become apparent later. One must be ever sensitive to the fact that such a collapse of information may discard something of importance.

Observe that if we vary something in the environment, different from the deadline, that affects both speed and accuracy, we may or may not get the same SATF. As an example of a different procedure, Reed (1973) signaled the subjects when to respond, and he varied the time during the presentation of the reaction signal when the response was to be initiated. Theoretically, what SATF we get depends upon which decision parameters are affected by the experimental manipulation. Since we do not usually have a very firm connection between $E →$ and $δ →$, there can be disagreement about which theoretical tradeoff goes with which empirical one. For the fast guess model this is not really an issue, since the only decision parameter affecting E(T | s) (Eq. 6.11) is ρ. So if we eliminate it between Eqs. 6.10 and 6.11 we (p.238) obtain the unique theoretical SATF,

(6.13)
$Display mathematics$
which is the simplest possible relation, a linear one.

Swensson and Thomas (1974) described a broad class of models, called fixed-stopping ones, that yields a relation that has played a role in empirical investigations. Suppose that there is a series of n distinct observations X i each taking time T i, i = 1,…, n, where all of the random variables are independent, the X i are identically distributed, and the T i are also identically distributed. The density of X i, f(x | s), depends on the signal presented whereas that of T i does not. If R is the residual time, then the response time is

$Display mathematics$

So

$Display mathematics$

Assume the decision variable to be the logarithm of the likelihood of the observations; that is,

$Display mathematics$

For n reasonably large, the Central Limit Theorem (Appendix A.1.1) implies that Y n is distributed approximately as a Gaussian with mean E(Y n | s) = μs and variance $V ( Y n | s ) = σ s 2 / n , s = a , b$. If a response criterion c is selected for responding A when Y n > c, then we see that

$Display mathematics$

Denoting the z-score of P(A | s)—that is, the upper limit on the unit normal that yields this probability, by z(s)—then eliminating c between z(a) and z(b) yields the linear ROC curve

$Display mathematics$

There are several ways to define a so-called d′ measure of the accuracy of performance described by the ROC curve; perhaps the simplest is that value of z(a) corresponding to z(b) = 0; that is,

$Display mathematics$

Now, if the speed-accuracy tradeoff is achieved by varying n, we see that it is given by

(6.14)
$Display mathematics$

They also describe another class of models with an optional stopping rule, in which the value of n depends upon the observations actually made; it is more complex and versions of it are described in detail in Section 8.2 and 8.3.

(p.239) With more complicated models in which E(T | s) and/or P(r | s) depend upon two or more decision parameters, then there can be a great deal of uncertainty as to what theoretical relation to compare with what data. Weatherburn (1978) pointed this out as a serious issue for a number of the models we shall discuss in Chapter 8. Pike and Dalgleish (1982) attempted to counter his admittedly correct observations by showing that for some of the models the locus of possible pairs of speed and accuracy values is sufficiently constrained under all possible (or plausible) values of the parameters that the model can, in principle, be rejected. Weatherburn and Grayson (1982) replied, reemphasizing that the rejoinder rested on interpretations of model parameters, which may not be correct. The fact is that the models they were discussing indeed do have a wide range of possible speed-accuracy pairs consistent with them, not a simple function. Great caution must be exercised when a model has more than one decision parameter that affects both speed and accuracy. This point, which I believe to be of considerable importance, has not been as widely recognized as it should be by those who advocate the use of SATFs. In a sense, whether a tradeoff plot is useful depends, in part, on the complexity of the model one assumes to underlie the behavior. However, ignoring the tradeoff or assuming that certain parameters can be held constant by empirically trying to achieve constancy of accuracy or of time is subject to exactly the same difficulties.

A substantive realization of these observations can be found in Santee and Egeth (1982), in which they argue that in some cognitive tasks—they use letter recognition—experimental procedures that permit the use of accuracy measures draw upon a different aspect of the cognitive processing than do procedures that use response-time measures. In their study they manipulated exposure duration. They argued that for brief exposures the accuracy is affected by limitations on data processing by the subject, whereas with long exposures, which is typical of cognitive experiments, the behavior studied has to do with response limitations. For example, in their Experiment 1, there were two letters on either side of a fixation point, and an arrow under one indicated that the subject was to respond whether that letter was an A or an E. There were three conditions involving the other letter: it could be the same as the indicated one, or the other target letter, or an irrelevant letter (K and L were used). I refer to these as same, different, and irrelevant, respectively. With the tachistoscope timed to produce about 75% overall accuracy in each subject (8 to 20 msec exposure durations), it was found that subjects were most accurate in the different condition and least accurate in the same one. With an exposure of 100 msec and instructions to respond as rapidly as possible while maintaining a high degree of accuracy, they were fastest for same and slowest for different. They believe these findings to be inconsistent, and so conclude that different aspects of the process are being tapped.

(p.240) 6.5.2 Use of the SATF

Consider assessing the impact of some variable, say the amount of alcohol ingested, upon performance. If as the dosage level is increased both accuracy and MRT change in the same direction, then it can be quite unclear whether there has been a change in the quality of performance or merely a change in the speed-accuracy tradeoff. In particular, it may be quite misleading to plot just one of the two measures against dosage level. For a summary of references in which ambiguous results about alcohol have been presented, see Jennings, Wood, and Lawrence (1976).

Some experimenters have attempted to overcome this problem by experimentally controlling one of the two variables. For example, one can use some sort of time payoff scheme, such as band payoffs or a deadline, to maintain MRT within a narrow range as the independent variable—in this case, amount of alcohol ingested—is manipulated, and to evaluate the performance in terms of accuracy. Alternatively, one can attempt to control accuracy. This approach is widely used in the study of short term memory, as we shall see in Chapter 11. Often an attempt is made to keep the error rate low, in the neighborhood of 2% to 5%. Not only is it difficult to estimate such a rate with any degree of accuracy, but if the SATF is changing rapidly in the region of small errors—which as we shall see it often appears to be—then this is a region in which very large time changes can correspond to very small changes in the error rate, making it very unlikely that the intended control is effective. Furthermore, one can easily envisage a model having more than one sensory state, one of which exhibits changes only in accuracy and another of which involves a speed-accuracy tradeoff. If experimentally we control accuracy through the first stage, then we will have done nothing whatsoever to control the SATF, which is under the jurisdiction of the second stage.

Because of these difficulties in keeping control of one of the variables, some authors (most notably Wickelgren and those associated with him, but also Ollman, Swensson, and Thomas) have taken the position that the only sensible thing to do is to estimate the entire SATF and to report how it as a whole varies with the experimental manipulation. This attitude parallels

TABLE 6.4. Mean slopes and intercepts of the best-fitting linear regression for each alcohol condition (Jennings et al., 1976)

Dose (mg/kg)

0

.33

.66

1.00

1.33

Slope (bits/sec)

6.45

5.71

4.92

4.90

3.38

Intercept (msec)

168

162

161

173

150

(p.241)

FIG. 6.13 Schematic of the general tradeoff relation (SATF) believed to hold between some measure of accuracy and response time. For any time below t 1 accuracy is nil; for times above t 2 accuracy does not change; and in between it is monotonic increasing. [Figure 1 of Wood and Jennings (1976); copyright 1976; reprinted by permission of the publisher.]

closely that expressed by those who say discrimination can only be studied via the ROC curve, and any measure of discriminability should relate to that curve.

As an example, Jennings et al. (1976) studied the SATF for choice reactions involving the identification of 1000 Hz and 1100 Hz tones. To manipulate the tradeoff they used a variety of deadlines, and subjects were paid for accuracy when responses were faster than the deadline and were fined for the slower responses. Measuring accuracy in terms of information transmitted (see Section 6.5.3 for a general discussion of accuracy measures), they fit linear functions to the curves and Table 6.4 shows how the intercept and slope varied with alcohol dose level. The slope is affected systematically, and the intercept somewhat irregularly.

6.5.3 Empirical Representations of SATFs

A certain amount of discussion has been devoted to the best way to present the empirical SATF. The initial studies* (Fitts, 1966; Pachella & Pew, 1968) separately plotted percent correct and MRT as functions of the decision strategy variable manipulated by the experimenter. Schouten and Bekker (1967), using a somewhat different measure of performance, which will be discussed in detail in Section 6.5.4, plotted their measure of accuracy against MRT. They, and others, who have plotted a probability measure of accuracy against MRT, have found the general pattern shown in Figure 6.13, which is composed of three separate pieces that can, to a rough first approximation, be thought of as linear pieces. Up to a certain time, the accuracy level remains at chance, after which it grows linearly until it reaches its ceiling of (p.242) 1, after which it stays at perfect accuracy with increases in time. The important facts are that for sufficiently short times, accuracy is nil; beyond another time, changes in MRT, which do occur, do not seem to affect the accuracy, which is virtually perfect; and between the two times there is a monotonic increase in accuracy. As was noted earlier, it is unclear whether accuracy really does become perfect or whether we are dealing with an asymptotic phenomenon. The models usually imply the latter.

Taylor, Lindsay, and Forbes (1967) replotted the Schouten and Bekker data, replacing the probability measure of accuracy by the d′ measure of signal detectability theory, and they showed that (d′)2 was approximately linear with MRT. Pew (1969), noting that log odds = log P c/(1 − P c), where P c is the probability of being correct, is approximately linear with (d′)2 in the 2-alternative case, replotted the data existing at the time in log odds, which is shown in Figure 6.14. Another set of data, plotted in the same way, will be presented in Figure 9.6 (Section 9.3.4) when we discuss the timing and counting models. Lappin and Disch (1972a) raised the question as to which of several measures of accuracy gave the most linear plot against MRT. They compared d′; (d′)2 (see Eq. 6.14); information transmitted, that is,

$Display mathematics$
(see Section 10.2 for a rationale); and
$Display mathematics$
which becomes log odds in a symmetric situation and was suggested by Luce (1963). For each they established the best linear regression and evaluated the fit by the percent of variance accounted for; it had the small range of .86 to .91 with d′ having a very slight edge over the others. Swensson (1972a) made a similar comparison with similar results. Salthouse (1981) reported for each of four subjects the correlations of MRT with P c, d′, (d′)2, log odds, and information transmitted. Again, the range of values was not large—.706 to .924. Ranking the five measures for each subject and adding the ranks show information transmitted to be best, (d′)2 the worst, and the other three about midway with little difference among them. So among these measures of accuracy none provides a clearly better correlation with MRT, and as we shall see, different theories suggest different choices.

Why do we concern ourselves with the question of which accuracy measures lead to linear SATFs? One reason is ease of comparison, but that is hardly overriding since as we shall see shortly in certain memory studies comparisons are readily made among exponential fits. A more significant reason is formulated as follows by Thomas (1974, p. 449): “Capacity [of an information processing system] is usually defined as the maximum rate at which information is processed, and it is measured by finding that monotonic (p.243)

FIG. 6.14 SATF, with accuracy measured as log odds, from four studies: a is Schouten and Bekker (1967) and Pachella and Pew (1968), b and c are data from P. Fitts, and d is data from R. Swensson. [Figure 1 of Pew (1969); copyright 1969; reprinted by permission.]

function of accuracy which is linear with reaction time. The slope of this line is taken to be the measure of capacity.…” To a considerable degree, this appears to be definitional rather than descriptive. It is, however, disconcerting that the linear function usually does not pass through the origin. For further discussion of the concept and modeling of capacity, see Townsend and Ashby (1978, 1984).

Kantowitz (1978), in response to a general survey of SATFs and their role in cognitive research by Wickelgren (1977), was highly critical of our current uncertainty about which accuracy measure to use. He cited Townsend and (p.244) Ashby (1978, p. 122) as suggesting some reasons why the Lappin and Disch (1972a) and Swensson (1972a) studies were inconclusive, including the possibility of too variable data, the fact that some of the measures are nearly identical (although not d′ and its square), and that the range of RTs from 100 to 300 msec was simply too small to see much in the way of nonlinearities. Wickelgren's (1978) response, while sharp about other matters, does not really disagree with this point.

Those working on memory rather than sensory discrimination have not found d′ to be very linear with MRT, as a summary by Dosher (1979) of some of these studies makes clear. Corbett (1977), Corbett and Wickelgren (1978), Dosher (1976), and Wickelgren and Corbett (1977) all fit their SATFs by the cumulative exponential

$Display mathematics$

Reed (1973) used the somewhat more complex

$Display mathematics$
in order to account for a drop in d′ with sufficiently large times. Ratcliff (1978) proposed
$Display mathematics$
which Dosher (1979) noted is a special case of Reed's formula. Both the exponential and Reed's formula are ad hoc; Ratcliff's follows from a continuous random walk model to be discussed in Section 11.4.4.

McClelland (1979) used his cascade model (Section 4.3.2) to arrive at another possible formula for the SATF. He assumed that for each s, r pair the level of activation is the deterministic one embodied in Eq. 4.12 perturbed by two sources of noise. One of these is purely additive and he attributed it to noise associated with activation of the response units. He took it to be Gaussian with mean 0 and variance 1, thereby establishing a unit of measurement. Let it be denoted X. The other source is assumed to be additive on the scale factor A s that multiplies the generalized gamma Γn(t). This random variable Y is also assumed to be Gaussian with mean 0 and variance $σ s 2$. Moreover, the residual time is R. Putting this together, the decision variable is

$Display mathematics$

Assuming the random variables are independent, then its expected value and variance are

$Display mathematics$

(p.245) Using the usual signal detection approach to this Gaussian decision variable, d′ between signals a and b is easily seen to be

$Display mathematics$
where $σ 2 = 1 2 ( σ a 2 + σ b 2 ) 1 / 2$. He showed numerically that for appropriate choices of the parameters, this equation is virtually indistinguishable from Wickelgren's cumulative exponential.

6.5.4 Conditional Accuracy Function (CAF)

For the joint density $f ( r , t | E → ) ,$ the plot of

(6.15)
$Display mathematics$
versus t is a tradeoff function of some interest. Thomas (1974) called it the micro tradeoff in contrast to the macro tradeoff of the SATF; Lappin and Disch (1972a, b) used the term latency operating characteristic (which of course has also been suggested for other things); Rabbitt and Vyas (1970), the T-function; and Ollman (1977) the conditional accuracy function, (CAF), which Lappin (1978) has adopted as better than LOC. And as I think CAF is the most descriptive, I too shall use it.

A major difference between the CAF and SAFT is that the former can be computed in any experimental condition for which sufficient data are collected to estimate the density functions, whereas the latter is developed only by varying the experimental conditions. For example, by using several response time deadlines one can generate the SATF, and one can compute a CAF for each deadline separately.

To get some idea of just how distinct the CAF is from the SATF, we compute it for the fast guess model. By Eqs. 6.8 and 6.9.

$Display mathematics$

Obviously, the form of P(r | t, s) as a function of t depends entirely upon the forms of g 0 and g 1, whereas the SATF of Eq. 14 is linear in MRT independent of their forms.

Actually, we can establish the general relationship between the CAF and SATF as follows (Thomas, 1974). Let r′ denote the response other than r, then by Bayes' theorem (Section 1.3.2),

(6.16)
$Display mathematics$

(p.246)

FIG. 6.15 A possible relation between SATF and conditional accuracy function (CAF).

Substituting the SATF for P(r | s) establishes the general connection between them. Observe that,

(6.17)
$Display mathematics$

Thus, the general character of the pattern relating the CAF and SATF as one varies MRT must be something of the sort shown in Figure 6.15.

One nice result involving the CAF has been established by Ollman (unpublished, 1974).

(p.247) Theorem 6.2. Suppose in a two-choice situation, r′ denotes the other response from r. Let T denote the response time random variable and suppress the notation for the environment except for the signal presented. If P(r | s) > 0 and P(r′ | s) > 0, then

(6.18)
$Display mathematics$

Proof. We use

$Display mathematics$
and Eq. 6.17 in the following calculation:
$Display mathematics$

So, whether errors to a given signal are faster or slower than the corresponding correct responses depends on whether the correlation embodied in the CAF is positive or negative. Note, this is not the correct versus error comparison made in Section 6.3.3 since there it was the response, not the stimulus, that was constant in the comparison.

For the fast-guess model,

(6.19)
$Display mathematics$

Since ρ(1 − ρ)>0 and ν10, the covariance is positive or negative as P sr − βr is positive or negative.

6.5.5 Use of the CAF

Lappin and Disch (1972a) and Harm and Lappin (1973) explored the question: does a subject's knowledge of the presentation probability affect (p.248)

FIG. 6.16 CAF for subjects identifying random dot patterns with two different presentation schedules, where response time is manipulated by 20-msec reward bands. The accuracy measure is $1 2$ In P(B | a)P(A | b)/P(A | a)P(B | b). [Figure 1 of Lappin (1978); copyright 1978; reprinted by permission.]

the subject's ability to discriminate signals? Using the CAF, with accuracy measured by
$Display mathematics$
they found the CAF to be essentially unaffected. This was done with perfectly discriminable signals, so Lappin (1978) repeated the study using difficult-to-discriminate signals—namely, pairs of random patterns of eight dots located in an invisible 8×8 matrix. Subjects were required to confine their responses to a 100-msec wide band that was located so as to get about 75% correct responses. There were two conditions: 50:50 and 75:25 presentations schedules. The resulting CAF is shown in Figure 6.16, and we see that the CAF has a slight tendency to be flatter in the biased condition, but without additional data it is probably unwise to assume any real effect. By contrast, the bias measure of choice theory,
$Display mathematics$
is much affected, as seen in Figure 6.17.

6.5.6 Can CAFs be Pieced Together to get the SATF?

Apparently the first appearance of CAFs in the reaction-time literature was in the Schouten and Bekker (1967) study in which they manipulated (p.249) reaction time rather directly. The reaction signals were lights, one above the other, which the subject identified by key presses. In addition, three 20-msec acoustic “pips” spaced at 75 msec were presented, and subjects were instructed to respond in coincidence with the third pip. The time between that pip and the signal onset was manipulated experimentally. The data were reported in terms of CAFs, although that term was not used. In principle, a CAF can be estimated in its entirety for each speed manipulation, but because of the fall off of the RT density on either side of the mean the sample sizes become quite small for times more than a standard deviation away from the mean. For this reason, it is tempting to try to piece them together to get one overall function. Schouten and Bekker's data, the means of which were replotted by Pew in Figure 6.14, are shown in Figure 6.18. Their conclusion was that these CAF lie on top of one another and so, judging by Figure 6.15, they may actually reconstruct the SATF.

Wood and Jennings (1976) discussed whether it is reasonable to expect this piecing together to work—of course, we already know from the example of fast-guess model that it cannot always work. They presented data from the end of the training period of their alcohol study. The CAFs calculated for each deadline are shown in Figure 6.19 and as is reasonably apparent—they confirmed it by a non-parametric analysis of variance—these estimated CAFs are not samples from a single function.

At a theoretical level Ollman (1977) raised the question under what conditions would the CAF and SATF coincide. He found a set of sufficient conditions that he dubbed the Adjustable Timing Model (ATM). Because the framework given in Eq. 6.4 is somewhat more general than that postulated by Ollman, we must add a condition not mentioned explicitly in his ATM. This is the postulate that the sensory parameters are fixed, not random variables, which we denote $σ → = σ → ( E → )$. Thus, $Φ( ∈ → | E → ) = Φ( δ → | E → )$, where $δ →$ is the decision parameter vector. Introducing this assumption into

FIG. 6.17 For the experiment of Figure 6.16, response bias as measured by $1 2 ln ⁡ P ( A | a ) × P ( A | a ) / P ( B | a ) P ( B | b )$ versus band location. Note the substantial shifts due to changes in presentation probability. [Figure 2 of Lappin (1978; copyright 1978; reprinted by permission.]

(p.250)

FIG. 6.18 SATF pieced together from a number of CAFs, where the stimuli were lights and the subjects were induced to use different response times by a series of three auditory pips (duration 20 msec each, separated by 75 msec) and the reaction to the signal light was to coincide with the third pip. These were set at the values τ = 100, 200, 300, 400, 600, and 800 msec. The data are averaged over 20 subjects and have 4000 responses per CAF. [Figure 4 of Schouten and Bekker (1967); copyright 1967; reprinted by permission.]

FIG. 6.19 CAFs for the identification of 1000- and 1100-Hz tones with response deadlines and payoffs for responses within the deadline. They do not appear to be samples from a single function. [Figure 2 of Wood and Jennings (1976); copyright 1976; reprinted by permission of the publisher.|

(p.251) Eqs. 6.4 and 6.15,
(6.20)
$Display mathematics$
where we have used the fact $∑ r P ( r | t , σ → , δ → ) = 1$. Ollman's major assumption is that the impact of the decision parameter vector $δ →$ on r is completely indirect, via its impact on t; that is,
(6.21)
$Display mathematics$

From Eqs. 6.20 and 6.21, which embody the assumptions of the ATM, it is easy to see that

(6.22)
$Display mathematics$

The significance of this is that any manipulation of $E →$ that affects $δ →$ and so t, but not $σ →$ will generate the same CAF, namely, $P ( r | t , σ → )$.

It does not follow that this function is the same as the SATF; in general, they will differ. However, as Ollman has pointed out, if the ATM holds and if the CAF is linear—that is,

$Display mathematics$
where k and c are independent of t but may very well depend on r and on σ—then the SATF is
$Display mathematics$

To show this, consider

$Display mathematics$

To my knowledge, no good a priori reasons exist to expect ATM to hold. Indeed, the whole philosophy of the theory of signal detectability is exactly the opposite—namely, that decision parameters do in fact directly affect the (p.252)

FIG. 6.20 SATF (solid circles) and CAF (open circles) calculated from the same set of data for two subjects. Note that both are approximately linear, but they are distinct functions. [Figure 3 of Wood and Jennings (1976); copyright 1976; reprinted by permission of the publisher.|

choice probabilities and most of the models we shall examine in the next two chapters fail to meet the conditions of ATM. Moreover, Wood and Jennings (1976) have presented data where both the CAF and SATF appear to be linear, but quite different, as can be seen in Figure 6.20.

I believe the SATF may prove to be of comparable importance to the ROC curves of error tradeoffs in giving us a way to describe the compromises subjects make between accuracy and time demands. In general, enough data should be obtained in order that some summary of the SATF can be reported. As yet, there is no consensus about the best way to summarize the SATF. Presumably, as we better understand the empirical SATFs, we will arrive at a simple measure to summarize them, something comparable to d′ for ROC curves. Some authors have attempted to find an accuracy measure that leads to a linear relation, in which case two parameters summarize it. However, apparent linearity seems to be relatively insensitive to what appear, on other grounds, to be appreciable nonlinear changes in the accuracy measures. Other authors, working with search paradigms (see Chapter 11) have attempted to fit the resulting function of d′ versus MRT by one or another of several families of curves, and they then study how the several parameters of the fitted family vary with experimental manipulations; however, there are no compelling theoretical reasons underlying the choice of some of these families and no real consensus exists as to which is best to use.

Wickelgren (1977) has presented a very spirited argument for preferring SATF analysis to either pure reaction time or pure error analysis. As was (p.253) noted, Kantowitz (1978) took exception on, among other grounds, that we do not really know which accuracy measure to use. Weatherburn (1978) observed that if two or more parameters are involved, then the SATF is simply not a well-defined function, but rather a region of possible speed-accuracy pairs that can sometimes be thought of as a family of functions. Schmitt and Scheiver (1977) took exception to the attempts to use SATFs on the grounds that one does not know which family of functions to use, and they claimed that the analysis of data had, in several cases, been incomplete in terms of the family chosen. Dosher (1979), one of those attacked, provided a vigorous defense, pointing out gross errors in the critique, and giving a careful appraisal of the situation.

Without denying our uncertainty about how to summarize the data and the real possibility that because of multiple parameters there may be no single function relating speed and accuracy, there can be little doubt that presenting SATFs in some form is more informative than data reported just in terms of MRTs with error rates either listed, usually in an effort to persuade the reader that they have not varied appreciably, or merely described as less than some small amount. As was mentioned previously the problem is that in many of the plots such as the exponential relation between d′ and MRT, small changes in the accuracy measure can translate into far larger changes in the MRT when the error rate is small than when it is large. This parallels closely the problem of ill-defined psychometric functions arising because at small values of P(A | b) a change of one percentage point corresponds to a very large change in P(A | a); that is, the ROC is steep for small P(A | b).

The CAF, which has been invoked by some as almost interchangeable with the SATF, is in fact completely distinct from the SATF, and there really is no justifiable reason to treat them as the same. Despite the fact that the CAF is defined for each experimental condition, on the whole I think it is the less useful of the two measures. The CAF does not clearly address the tradeoff of interest, it is certainly far more difficult to pin down empirically over a wide range of times, and in most theoretical models it is analytically less tractable than the SATF.

6.6 SEQUENTIAL EFFECTS*

Up to now I have treated the data as if successive trials are independent. On that assumption, it is reasonable to suppose that $f ( r , t | E → )$ on a trial depends (p.254) on the signal presented on that trial and upon the entire experimental context, but not on the preceding history of stimuli and responses. If that were so, then this density function could be estimated by collecting together all trials on which a particular stimulus was presented and forming the (r, T) histogram. But if the trials are not independent, we are in some danger when we make such an estimate. At the very least, the usual binomial computation for evaluating the magnitude of the variance of an estimate based upon a known sample size is surely incorrect (Norman, 1971). Depending upon the signs of the correlations, the dependence can either make the estimate too large or too small. Beyond that, if the trials are not independent, then we face the major theoretical problem of trying to account systematically for the dependencies.

The evidence for the existence of such dependencies or sequential effects is very simple: we determine whether the estimate of some statistic, usually either the response probabilities P(r | s) or the corresponding expected reaction time E(T | s, r), differs appreciably depending upon how much of the history is taken into account. As stated by Kornblum (1973b, p. 260), “The term sequential effect may be defined as follows: If a subset of trials can be selected from a series of consecutive trials on the basis of a particular relationship that each of these selected trials bear to their predecessor(s) in the series, and the data for that subset differs significantly from the rest of the trials, then the data may be said to exhibit sequential effects.” Let it be very clear that the principle of selection depends on events prior to the trial in question and does not in any way depend upon the data from that trial.

6.6.1 Stimulus Controlled Effects on the Mean

The most thoroughly studied sequential effects are those arising when the signal on the current trial is the same as or different from that on the preceding trial. These trials are referred to, respectively, as repetitions and alternations (in the case of two stimuli) or as non-repetitions (in the case of more than two stimuli).

The discussion of sequential effects begins here for the two-stimulus, two-response situation, and I draw heavily upon the survey articles of Kirby (1980) and Kornblum (1973b). One notable omission from the Kirby review is the extensive set of experiments and their detailed analysis in Laming (1968); among other things, his sample sizes (averaged over the subjects) are appreciably larger than any of the other studies except Green et al. (1983), who report very large samples on a few subjects. Although we can clearly demonstrate the existence of such effects in the two-choice experiment and discover a number of their properties, many of the hypotheses that arise can only be tested in k-choice designs with k > 3. So our discussion of sequential effects will resume in Section 10.3 when we turn to these more complex reaction-time experiments.

An illustration of the limitations of the k = 2 case may be useful. Suppose (p.255) we have reason to believe that the magnitude of the sequential effects depends both upon the relative frequency with which the signals are presented, Pr(s n = a) = p, where s n is the signal presented on trial n, and upon the tendency for the signals to be repeated in the presentation schedule (which is sequential structure imposed by the experimenter), which we make independent of the particular signal—that is, P(s n = a | s n−1 = a) = P(s n = b | s n−1 = b) = P. Since P(s n = a) is independent of n, it must satisfy the constraint

$Display mathematics$
and solving, either P = 1 or $p = 1 2 .$ Thus, if we wish to vary p and P independently, we must either use more than two signals or abandon the condition that the probability of repetition is the same for both signals.

Another problem in studying sequential effects is reduced sample sizes. Suppose we consider the purely random schedule, and we wish to partition the history of stimulus presentations back m trials. If the overall sample is of size N, then each history has an expected sample size of N/2m and so the standard error of the resulting mean estimate is 2½mσ/N ½. So, for example, if the true MRT is 400 msec with a standard deviation of 75 msec and the basic sample N is 2000, the standard error of the overall MRT for one signal is 2.37 msec, that of a two step history is 3.35 and that of a four step one is 6.71 msec. Many of the differences in the data are under 10 msec, and so they must be viewed with some skepticism.

The first two studies in which the data were partitioned according to their history of stimuli were Laming (1968, Ch. 8) and Remington (1969). The latter experiment involved five subjects who responded by key presses to one of two lights. A warning light and a 1-sec foreperiod was used prior to the signal presentation. Subjects were asked to respond as rapidly as possible, consistent with an error rate of less than 5%. The actual overall level was about 1%. The average interstimulus interval was about 4 secs. (I begin to report this time because, as will soon be evident, it is an important independent variable for sequential effects.) There were two experimental conditions: one with equally likely presentations (50: 50) and the other with a ratio of 70:30. Some data were rejected as not having achieved stability. What remained were 800 observations per subject in the 50:50 condition and 1000 in the 70:30. These were partitioned into histories up to five trials back, and they are shown in Figures 6.21 and 6.22. Observe in Figure 6.21 the pronounced repetition effect. Note that in this figure the symbol A denotes the stimulus on the trial under consideration and B the other stimulus, and the past history is read from the current trial on the right and back to the left. Thus, a string of the other stimulus B before and the current A presentation makes for a slow response, the time increasing as the (p.256)

FIG. 6.21 MRT versus previous stimulus history for two equally likely lights. The sample size was 800 for the signal not partitioned into any previous history and it decreases to 800/2n for a history of length n. The symbol A designates the signal being responded to and B the other signal. The stimulus history is read from right to left. [Figure 2 of Remington (1969); copyright 1969; reprinted by permission.]

number of Bs increases; a string of As before an A makes for a rapid response. The pattern for the 70:30 data is similar with, of course the less frequent signal distinctly slower than the more frequent one.

An experiment by Falmagne, Cohen, and Dwivedi (1975) in essence replicated these results of Remington, but in some ways is more striking. The span of times from the slowest to the fastest as a function of presentation pattern is some two to five times as large as in Remington's data. This may be because a fairly brief response-stimulus interval (200 msec) was used by Falmagne et al. as compared with Remington's average of four seconds. Since the data are not qualitatively different and they will be described in (p.257) some detail relative to a sequential model (Section 7.6.4), I do not present them here.

As was mentioned earlier, Laming's experiments involved the identification of two white bars on a black background presented tachistocopically. His Experiment 3, which was run without automation, had a 2500-msec RSI interval. Twenty-four subjects participated in five series of 200 trials

FIG. 6.22 MRT versus previous stimulus history with 70:30 presentation probability. Here 1 denotes the more probable signal and 2 the less probable one. [Figure 3 of Remington (1969); copyright 1969; reproduced by permission of the publisher.]

(p.258)

FIG. 6.23 MRT versus previous stimulus history when the signals were equally likely line lengths with the intertrial interval as a parameter. The code used for the history is explained in the text. The sample size was 5000/2n per stimulus per history of length n (see discussion in text). [Figure 8.5 of Laming (1968); copyright 1968; reproduced by permission.]

each, for a total of 24,000 observations. The series differed in both instructions and a point scheme aimed at influencing the speed-accuracy tradeoff. They were such that the error rate was intentionally considerably larger than Remington's. For example, when an alternation followed a string of six or seven identical presentations, it was approximately 20%. As there was little difference among the series, they were pooled for the purposes of sequential analyses.

Laming also found that repetitions decreased the MRT and they decreased the errors made. In contrast to Remington, the MRT to an alternation following a string was largely independent of the length of the string, but the error rate increased monotonically with the length.

His Experiment 5 was automated, which permitted use of the intertrial interval as an experimental variable. The same general experiment was run with intertrial intervals of 1, 8, 64, 512, and 4096 msec, arranged in a Latin (p.259) square design. Each of 25 subjects was run for 100 trials at each ITI, yielding 12,500 observations. The MRT data for selected histories are shown in Figure 6.23 and the corresponding error data in Figure 6.24. The code is read from left to right in time up to the trial before the signal in question. A 0 means that the signal in that position (trial) was the same as the one for which the data are being plotted, and a 1 means that it was the other signal. Thus, for example, 0100 arises either from the sequences on trials n − 4, n − 3, n − 2, n − 1, and n of either abaaa or babbb. It is quite clear that the shorter the ITI, the slower the response and the more likely an error, especially after particular histories. Laming (1968, p. 109) summarized the results as follows:

The sequential analyses of Experiments 1, 2, and 3 suggested that if the subject experiences a run of one signal or an alternating sequence, he expects those patterns to continue. On this basis the subjective expectation of … the signal actually presented … increases from left to right [in the left panels of Figures 6.23 and 6.24] and from right to left [in the right panels]. When the intertrial interval is long the mean reaction times and proportions of errors behave as one would expect; they both decrease as the subjective expectation of [the signal] increases. But when the intertrial interval is short they behave differently: after a run of either kind of signal the mean reaction times and proportion of errors all decrease, while after an alternating sequence they increase greatly, irrespective of which signal might have been most expected.

Later Kirby (1976b), apparently unaware of Laming's work, ran a study in which he manipulated the interval between a response and the next presentation, the RSI. His data presentation was the same as Remington's except that he separated it into first and second halves. These are shown in Figure

FIG. 6.24 The response proportions corresponding to Figure 6.23. [Figure 8.6 of Laming (1968); copyright 1968; reprinted by permission.]

(p.260)

FIG. 6.25 Effect of response-stimulus interval (RSI) and experience on MRT versus previous stimulus history. The stimulus history code is as in Figures 6.21 and 22. The sample is 100 trials per run and three runs per panel. [Figure 4.1 of Kirby (1980); copyright 1980; reprinted by permission.]

(p.261) 6.25. The most striking fact can be seen in the first-order sequential effects—namely, that a repetition of a signal, AA in his notation, speeds up the MRT for the 50-msec RSI, but slows it down for the 500- and 2000-msec RSIs. It should be noted, however, that at all RSIs the effect of additional repetitions is to speed the MRT slightly from the one step repetition; however, for any length of history at the two longer RSIs, the fastest time arises with pure alternation.

Certain inconsistencies exist between Laming and Kirby's data, and one wonders what might account for them. Two notable differences in the studies are the sample sizes (12,500 versus 3,600) and the fact that Laming included all responses whereas Kirby discarded error responses (less than 4.5%). The first means that for the smaller sample size, it is entirely possible that more of the orderings are inverted due to sampling variability, and so not all differences can be assumed to be real. The second raises the possibility that the experiments were run at different speed-accuracy tradeoffs and that this affects the sequential pattern in some way. Another fact, much emphasized by Green et al. (1983) as probably contributing to instability in all of these experiments, is the use of many, relatively inexperienced subjects for relatively few trials each. In their work, which I discuss more fully in Section 8.4, only three subjects were used, but each was practiced for at least 3500 trials and was run for 21,600 trials. There was evidence of changes in MRT during at least the first 7200 trials of the experiment. Another factor they emphasize is their use of random (exponential) foreperiods, in contrast to most other studies that use constant foreperiods (i.e., RSI). No matter which, if any or all, of these factors are relevant to the different results, the fact is that considerable uncertainty obtains about the basic facts.

That the RSI affects in some manner the significance of alternations and repetitions had been noted earlier.

The point at which this change from a repetition to an alternation effect takes place appears to be approximately half a second. Thus, repetition effects have been found for RSIs of less than approximately half a second by Bertelson (1961, 1963), Bertelson and Renkin (1966), Hale (1967), Kornblum (1967), Hale (1969a), Eichelman (1970), Kirby (1976b) and alternation effects with intervals greater than half a second by Williams (1966), Hale (1967), Moss, Engel, and Faberman (1967), Kirby (1972, 1976b). At intervals of, or close to, half a second, repetition, alternation, and nonsignificant sequential effects have been reported (e.g., Bertelson, 1961; Hale, 1967; Schvaneveldt and Chase, 1969; Eichelman, 1970; Kirby, 1976b). (Kirby, 1980, p. 132)

And, of course, we know from Section 5.4 that a significant interaction exists even in simple reaction times when two signals are separated by less than 300 msec. So far as I know, no attempt has been made to use those ideas in the study of sequential effects in choice paradigms or to place both sets of phenomena in a common framework.

(p.262) Kirby went on to point out that there are at least four anomolous studies in which a repetition effect was in evidence at long RSI: Bertelson and Renkin (1966), Entus and Bindra (1970), Remington (1969), and Hannes (1968). To this list, we must add Laming's (1968) data. As I just remarked, we do not know the source of these differences, but it is interesting that in a second experiment Kirby (1976) was able to produce either repetition or alternation effects at long RSI by instructing the subjects to attend to repetitions or alternations; whereas, at short RSI the instructions had little effect, with a repetition effect occurring under both instructions.

6.6.2 Facilitation and Expectancy

The data make clear that at least the previous signal and probably a considerable string of previous signals affect the MRT. Moreover, the nature of that effect differs depending upon the speed with which new stimuli occur. It seems likely, therefore, that two mechanisms are operative: one having a brief temporal span and the other a much longer one. In terms of the types of mechanisms we have talked about, I suspect the brief one is part of the sensory mechanism and the longer lasting one involves memory phenomena that last some seconds and are a part of the decision process. In this literature, the relevant sensory mechanism is spoken of as “automatic facilitation” (Kirby, 1976; also “intertrial phenomenon” by Bertelson, 1961, 1963, and “automatic after effect” by Vervaeck and Boer, 1980) and the decision one as a “strategy” (or “subjective expectancy” or “anticipation”). And, as we shall see below, there is reason to suspect that there may be at least two distinct strategy mechanisms involved.

The facilitation mechanism is thought to be of one of two types. The first is that the signal leaves some sort of sensory trace that decays to a negligible level in about 750 msec. Sperling (1960) used masking techniques to provide evidence for such a trace. When a second signal occurs in less time than that, the traces are “superimposed.” If the two signals differ, there is little gain and perhaps some interference in the superimposed representation. If they are the same, however, the residual trace somehow facilitates the next presentation of the signal. This could involve either some sort of direct addition of the old to the new representation, making it stronger than it would otherwise be and, thereby, allows the identification to proceed more rapidly than usual, or it could entail some sort of priming of a signal coding mechanism, for example, by influencing the order in which certain features are examined. In either event, a repetition effect results. The facts that MRTs in a choice situation run from 300 to 400 msec with signals of 100-msec duration and that the repetition effect gives way to an alternation one at about an RSI of 500 msec suggest that the trace has largely disappeared after 700 to 800 msec.

The other facilitation mechanism is somewhat less clearly formulated. It supposes that the effect of a repeated signal is to bypass some of the signal (p.263) processing that is normally involved. Since exactly what is bypassed is unclear, it is not evident how to distinguish facilitation from trace strengthening.

If a sensory-perceptual facilitation mechanism were the whole story, then since it is entirely driven by the stimulus schedule we should not see any impact of previous experience or instructions on it. Kirby (1976) tested this idea in his third experiment, again using lights as signals. There were six conditions, half of which were at an RSI of 1 msec and the others at 2000 msec. Within each condition, the last third of the trials were run at a 50:50 ratio of repetitions and alternations. The first two-thirds were run at three ratios: 70:30, 50:50, and 30:70. He found that for the longer RSI, a repetition effect in the 70:30 condition and an alternation effect in the 30:70 condition persisted into the last third of the data. For the short RSI, the effects did not persist. These data are consistent with the idea that for short RSIs the sequential effects are stimulus determined, but for the long ones something other than the stimulus pattern affects the behavior.

So one considers possible decision strategies. The fact that faster responses occur with alternations suggests that the subjects in some sense anticipate the occurrence of alternations. This is reminiscent of the famed negative recency effect uncovered in probability learning experiments (Jarvik, 1951). It appears that many people have a powerful tendency to act as if a local law of averages exists—that is, as if there were a force altering conditional probabilities in a random sequence—so as to keep the local proportions very close to 50:50. Such a rule or belief would lead to an exaggerated expectation for alternations.

This idea that the subject is predicting, albeit incorrectly, which signal will be presented has led to a series of attempts to have the subject make the predictions overt and to see how MRT depends on the prediction. The major idea is that if we single out successive pairs of correct predictions, then the sequential effects on those trials should vanish. This literature was discussed in detail by Kirby (1980, pp. 143–144), who concluded that there is just too much evidence that the act of predicting directly affects the response times, and so overt prediction fails to be a useful experimental strategy. The relevant papers are DeKlerk and Eerland (1973), Geller (1975), Geller and Pitz (1970), Geller, Whitman, Wrenn, and Shipley (1971), Hacker and Hinrichs (1974), Hale (1967), Hinrichs and Craft (1971a), Schvaneveldt and Chase (1969), Whitman and Geller (1971a, b; 1972), and Williams (1966).

Kirby (1980, pp. 145–148) examined the evidence as to whether the strategy effect is, as he appears to believe, one of preparation and/or expectancy or whether the strategy develops after the signal is presented. Although the argument is protracted, it appears to me that its main thread goes as follows. If the strategy comes into play after signal onset, then shortening the RSI should only accentuate it since it reduces the time for memory to decay. Any effect to the contrary must be due to sensory (p.264) facilitation, which as we have seen comes into play with RSIs under half a second. On the other hand, if the strategy is set prior to signal presentation, the shorter the RSI the less time it has to be developed, and so at the shortest times its effect should be negligible. And Kirby contends that the data are more consistent with the latter view. For example, the fact that some well-practiced subjects develop the ability to produce alternation or repetition effects at will, even at short RSIs, he interprets as their becoming more efficient at preparing their strategies. However, these data could just as well be interpreted in terms of a shift from a preparation strategy, which at short RSI does not have time to be effective, to a reactive one that is established after the signal onset. I find the arguments unpersuasive, and I believe that the basic nature of these strategy effects is still an open question. I shall, nonetheless, explore some rather detailed, specific suggestions about them in Section 6.6.5.

Vervaeck and Boer (1980) made some important observations in this connection. First, they noted that an expectancy hypothesis predicts not only that the expected signal will be responded to faster than when no expectancy is involved, but that the unexpected one will be responded to more slowly. In contrast, a general facilitation mechanism of any sort predicts that an increase of the facilitation factor leads to faster responses and a decrease to slower responses independent of which signal is presented. At short RSI, Kirby (1976) reported sequential effects that differed according to whether the signal was a repetition or an alternation, whereas Laming (1968, Experiment 5) obtained facilitation that did not depend upon the signal. Vervaeck and Boer pointed out a significant procedural difference: Laming measured his RSI from the onset of the response key, and Kirby measured it from the offset. They judged from other data that the typical duration of a key press was from 100 to 200 msec. They repeated both experiments under the same conditions, found a difference of 106 msec between the two observed RSIs, and replicated both Laming's and Kirby's results.

6.6.3 Stimulus-Response Controlled Sequential Effects

Our discussion of sequential effects to this point should seem a bit odd since nothing has been said about the responses except that their MRT is used as the dependent variable. What about previous responses as a source of sequential effects? To my knowledge, this has never been examined with anything like the care that has gone into stimulus effects. In particular, I do not know of any two-choice studies, analogous to those reported in Figures 6.206.24, that partition the data according to the past history of responses or, better, the joint past history of signals and responses. Laming (1968, Section 8.4) reports some linear regression results, but no very clear pattern is evident. There are, however, several studies in which the joint past history for one trial back is examined, and I discuss them here. Concerning the sequential effects exhibited jointly by response probability and time, there is (p.265) but one study, which is reported in the next section. Within the general psychophysical literature there is a fair amount of data about sequential effects in absolute identification and magnitude estimation, but with many more than two signals. The only data for two-stimulus designs of which I am aware are concerned primarily with the impact of an error on the next trial. Some of these data, particularly those in Rabbitt (1966), are based upon more than two signals, but they are too relevant to postpone until Chapter 10.

In his studies, Rabbitt has used a short RSI, often 20 msec and never more than 220 msec. The task was the identification of one of several lights appearing in different positions. His data showed: that errors are faster than correct responses (Section 6.4.3), that the MRT on the trial preceding an error did not differ significantly from the overall MRT, and that the MRT following an error is slower than the overall MRT. The fact that the MRT before an error is not unusually fast suggests that the fastness of the error trials is not part of some overall waxing and waning of the reaction time. Rabbitt and Rogers (1977) and Rabbitt (1969), using Arabic numerals as signals and key presses as responses, showed that the delay following an error was considerably greater when the alternative signal was used (and so to be correct in the two choice situation the response was a repetition of the previously erroneous one) than when the signal was repeated.

Laming (1979b), in discussing Rabbitt's work, cited the data from Experiment 5 of Laming (1968) (described earlier) in which he varied the RSI and found appreciable changes in performance. Recall, at long RSI the probability of an error following an error is sharply reduced below the average error rate independent of whether the signal was repeated or alternated; whereas, at short RSI the error probability is reduced for a repeated signal and greatly increased when the signal is alternated. For Experiments 1, 2, and 3—where the first two were slight variants in which the presentation probability was varied and in 3 the error rate was varied by instructions—the RSI was long: 2500 msec, 1500 msec, and 2500 msec. Because of the pronounced effects that the previous stimulus history is known to have (Section 6.6.1), Laming corrected for it both in his error probabilities and MRTs using a multiple regression analysis described in Appendix C of Laming (1968). The data are broken up according to whether the error stimulus is repeated or alternated, and so the erroneous response is either alternated or repeated in order to be correct. The data are shown in Figure 6.26 as a function of the number of trials since the preceding error. Error probabilities are presented directly whereas times are presented in terms of deviations from the overall MRT for that stimulus history. We see, first, that error trials are faster by about 50 msec than the overall mean times. Second, immediately following an error the time is slower than average, which is true whether the stimulus is repeated or alternated. The effect is somewhat larger for alternations than repetitions. Third, following an error, the error rate for both types of trials is reduced below average (except for the alternative (p.266)

FIG. 6.26 For three experiments in which the subject is attempting to identify length, the probability of an error and the deviation of MRT from the overall mean versus the number of trials since an error. The data on the left are when the stimulus in question is the same as the one for which an error was made, and on the right, when they differ. The sample sizes were as follows:

Exp

No of Conditions

Ss/Cond

Obs/S

1

4

6

1000

2

4

6

1000

3

8

3

1000

4

10

2

800

5

5

5

1000

[Figure 1 of Laming (1979b); copyright 1979; reprinted by permission.]

signal of Experiment 3). Fourth, the recovery of MRT to its normal value is comparatively rapid; in the case of repeated signals it is complete in one trial. Fifth, the recovery of error probability to its normal level is, except for Experiment 3, not achieved even after five trials. The fact that the recovery patterns for MRT and error probability are quite different suggests two distinct mechanisms, but this is not a necessary conclusion. For example, Laming has shown how a single mechanism can suffice (Section 6.6.5). First, however, it is desirable to examine the sequential problem empirically from the point of view of the SATF.

6.6.4 Are the Sequential Effects a Speed-Accuracy Tradeoff?

The answer to the question of the title is not obvious from what has been presented. It could be Yes, but equally well it could be No in that the SATF itself is changed as a result of the previous history. Obviously, the answer to the question may very well depend upon the RSI.

Swensson (1972b) performed the following relevant experiment. There were five distinct but unmarked horizontal locations on a cathode ray tube, (p.267) and a square with one of the two diagonals was shown successively in these locations from left to right, with the choice of the diagonal made at random for each location. The subjects responded by key presses to identify the direction of the diagonal at each presentation. At the end of each sequence of five stimuli, information was fed back on the number of correct responses and the sum of the five response times. Two subjects were run. One major manipulation was the time between each response and the presentation of the next signal in the sequence, the RSI. One at 0 msec was called the immediate serial task (IS) and the other at 1000 msec was called delayed serial (DS). Another manipulation was to shift the emphasis between speed and accuracy between blocks of 50 or 100 trials. In addition, for subject RG an explicit monetary payoff was used to effect the speed-accuracy tradeoff. The total number of trials (each a sequence of five stimuli) run in each condition for each subject varied from 1500 to 3000. The data were pooled over blocks of trials having comparable error rates into four groups. The SATF presented is log odds versus MRT.

Because of the design into five presentations per trial, the first question to be raised is whether serial position effects are in evidence. Figure 6.27 shows that a considerable serial position effect exists for the IS condition and is very slight for the DS condition. This is not surprising if memory traces dominate studies with brief RSIs.

To study the sequential effects, the data from positions 2 through 5 were combined and then partitioned into four categories according to whether the response (not the signal) in question is a repetition or alternation (coded in

FIG. 6.27 SATF for one of two subjects for each serial position of a sequence of five presentations on each of which the subject was to identify orientation of a line. The IS condition is with zero RSI and DS with a 1000-msec RSI. Sample sizes varied from 1500 to 3500 per condition. The bars show ±1 standard deviation. [Figure 3 of Swensson (1972b); copyright 1972; reprinted by permission.]

(p.268)

FIG. 6.28 SATF sequential effects for two subjects in the zero RSI condition of Figure 6.27. Open symbols are repeated responses and solid ones are alternated ones; circles mean the previous response was correct and triangles means it was in error. [Figure 5 of Swensson (1972b); copyright 1972; reprinted by permission.]

the figures as open and closed symbols, respectively) and whether the response follows an error or a correct response (coded by circles or triangles, respectively). The data for the IS condition are shown in Figure 6.28. Following a correct response there is little difference between alternations and repetitions, but following an error the performance is rather seriously degraded, the more so for alternations which in the case of RG are exceedingly fast and have approximately a 50% error rate. By contrast, the data from the DS condition, shown in Figure 6.29, show very little effect on the SATF of these categorizations except that repetitions of the response may be slightly poorer than alternations following a correct response for RG and following an error for RS.

Once again it is clear that RSI makes an important difference. These data seem to accord with earlier evidence that with short RSI there is some tendency for the subject to attempt to correct the error just made, which leads to an unusually fast and, relative to the next signal, largely random response (Burns, 1971; Rabbitt, 1966, 1967, 1968a, 1968b, 1969). For the long RSI this tendency disappears and to a first approximation the sequential effects appear to be not changes in the SATF, but shifts in the tradeoff on that function.

(p.269) 6.6.5 Two Types of Decision Strategy

Judging by Swensson's data, the most obvious idea to account for the impact of an error with a long RSI is as some sort of adjustment on the SATF, the subject becoming more conservative and so more accurate at the expense of being slower. The difficulty with this view is that it predicts that changes in MRT and error probability should covary, even though we know from Figure 6.26 that during the recovery phase following an error they do not.

Laming (1968, pp. 80–82) suggested another possibility. This arose in his discussion of the important classical random walk model (Sections 8.2, 8.3.1, and 8.4.2) which model has as one of its predictions that the distributions of error and of correct responses (same response, different signals) should be identical. As this is contrary to the data, Laming asked if some plausible mechanism would account for the fact that errors are usually faster than correct responses. He suggested that when the subject is under time pressure he or she may err as to the actual onset of the signal. Laming described this tendency as arising from time estimation, but it is just as plausible that the subject adjusts the detection criterion to the point where on a significant

FIG. 6.29 Same plot as Figure 6.28 for the 1000-msec RSI. [Figure 6 of Swensson (1972b); copyright 1972; reprinted by permission.]

(p.270) fraction of the trials the pre-stimulus background “noise” triggers a detection response in the system. Either way, if a premature cue serves to begin the information accumulation on which a decision is to be based, then the response is both more likely to be in error and to be somewhat faster than it would otherwise be. According to this view, then, there are two criteria at the subjects' disposal. The one has to do with the point at which information for the decision begins to be accumulated—either the criterion for detection or the setting of the parameters of the time estimation process. The other is the criterion for making a response once information begins to be collected.

Laming made the interesting observation that the differential recovery of MRT and error probability following an error can, in fact, be accounted for by just the anticipation mechanism grafted onto a standard decision model (the SPRT model of Section 8.3.1). He showed that if the subject starts to accumulate information well in advance of signal presentation, the error rate is substantially increased from what it would have been had accumulation begun exactly at signal presentation. That result is not surprising, but what is surprising is the fact that E(T) is reduced by only a few milliseconds. This arises because the information accumulated prior to the signal does not itself tend to lead to a decision, but rather introduces variability in the starting point of information accumulation at signal onset. He assumed that following an error, the subject becomes really very conservative and begins accumulating information well after signal onset. This tendency both greatly reduces the error probability (on the assumption that signal information is continuing to be available) and delays E(T) by the amount of the delay after signal onset plus the amount of anticipation that was in effect at the time of the error. If on the next trial the time at which accumulation is initiated is moved forward, then E(T) is moved forward by the same amount, but the error probability does not change. In fact, the initiation time can shorten until it coincides with signal onset before the error probability begins to rise to its original value. Clearly, this single mechanism is sufficient qualitatively to account for the apparently separate recovery of MRT and error probability.

This model illustrates nicely Weatherburn's point that there may well be more than one source of tradeoff between speed and accuracy. The time to begin accumulating information establishes one tradeoff, and the criterion for responding on the basis of the information accumulated establishes a second one. At the present time, we do not know of any reliable way experimentally to evoke just one of them.

Some authors in discussing the sequential data have spoken of selective preparation for one response rather than the other (Bertelson, 1961; Falmagne, 1965), and others of nonselective preparation (Alegria, 1975; Bertelson, 1967; Granjon & Reynard, 1977). It is difficult to know for sure what is meant by these concepts, but one possibility is the two sorts of mechanisms just discussed, with the criterion for signal detection being non-selective.

(p.271) One final point. Laming (1969b) developed an argument, based on his 1968 data, that the same theoretical ideas that account for the sequential patterns may also underlie the so-called signal-probability effect. This effect is the fact that the more probable signal is responded to more rapidly and more accurately than the less probable one. It is clear that the more probable signal will, in general, be preceded by a longer run of repetitions than the less probable one. Thus to the extent that repetitions lead to both speed and accuracy, the effect follows. He worked out this idea in some detail for the SPRT model. We will return to the relation between sequential effects and presentation probability in Section 10.3.1.

6.7 CONCLUSIONS

Comparing choice-reaction times with simple-reaction times leaves little doubt that somewhat more is going on. It usually takes 100 msec and sometimes more to respond to the identity of a signal than to its presence. Many believe that most of the additional time has to do with the processing of information required to distinguish among the possible signals. Others believe that some and perhaps much of the time is required to select among the possible responses. This point of view is argued by Sternberg (1969a). To some degree, this distinction can be examined by uncorrelating the number of responses from the number of signals; some of those studies are taken up in Chapters 11 and 12.

It is also clear from data that subjects effect important compromises or tradeoffs. Perhaps the best known of these is between the two types of error, which is represented by the ROC curves and has become a standard feature not only of modern psychophysics but of much of cognitive psychology. Some models assume that the mind is able to partition the internal evidence about stimuli into categories corresponding to the responses. Once the experimenter records response times as well as choices, a number of questions arise about how the response times relate to the responses made. One controversial question has been how the time depends upon whether the response is correct or in error. The result tends to be this: for highly discriminable signals responded to under considerable time pressure, errors are faster than the corresponding correct responses, but for signals that are difficult to discriminate and with pressure for accurate responding, the opposite is the case. A few studies fail to conform to this pattern, and there is little work on time pressure coupled with difficult-to-discriminate signals.

Another major tradeoff is between speed and accuracy, which is thought to arise when the subject varies the amount of information to be accumulated and processed prior to a response. There is at least the possibility that this tradeoff is a strategy distinct from another tradeoff—namely, the selection of a criterion to determine the actual response to be made once the information (p.272) is in hand. In the fast-guess and fixed stopping-rule models they were distinct; in some of the other models studied in later chapters they are not.

The last body of data examined in the chapter concerned sequential effects. Here we were led to believe that there may be at least three distinct mechanisms at work, the last of which is probably a manifestation of the speed-accuracy tradeoff just discussed. The first is the possibility of direct sensory interaction between successive signal presentations, which some authors have suggested arises from the superposition of sensory traces when a signal is repeated sufficiently rapidly. The evidence for this came from the different effects that occur as the time from a response to the next signal presentation is varied. But even after eliminating this sensory effect by using long SRIs, there is still a rather complex pattern of sequential effects following an error. After an error both the MRT slows and accuracy increases, but on subsequent trials the former returns rapidly—in some cases in one trial—to its overall mean value whereas the error rate increases only slowly over a number of trials. Some have interpreted this as evidence for both a type of non-selective preparation that is not long sustained and a selective one that is. Within the information accrual framework, Laming has suggested that the non-selective one is some mechanism—perhaps time estimation, perhaps a signal detector that is causing anticipations—that often initiates the accrual process prior to the actual signal onset. The other mechanism appears to be some sort of speed-accuracy tradeoff for which there are a number of models (Chapters 7 to 10).

Most theories attempt to provide accounts of the ROC and SATF. Relatively little has been done to account in detail for the sequential effects, in part because it leads to messy mathematics and in part because the phenomena to be explained remain somewhat uncertain. There are a few attempts to wed stochastic learning processes to otherwise static models. Laming has coupled a time estimation model with the random walk one, which was originally designed only to provide an account of the ROC and SATF. I am not aware of any models for the sensory effect found with brief RSIs.

I have elected to partition this theoretical work into three broad classes. In Chapter 7 the models assume that the subject can opt, as in the fast-guess model, to be in one of several states, each of which has its characteristic time and error pattern. Chapters 8 and 9 explore information accrual models for two-signal experiments. They differ only in whether we assume that time is quantized independently of the information accrual process or by that process itself. And Chapter 10 deals with identification data and models when there are more than two signals, a topic far less fully developed than the two-signal case.

Notes:

(*) Perhaps the earliest relevant study is Garrett (1922) in which he said (p. 6), “Everyday knowledge seems to indicate that, in general, accuracy diminishes as speed increases, but there is little detailed information beyond the bare statement.” He then went on to study how accuracy is affected by stimulus exposure time, but he did not directly manipulate the overall response time, as such, and so it was not really an example of a SATF.

(*) I am particularly indebted to D. R. J. Laming for severe, and accurate, criticism of an earlier version of this section. I suspect that he will view my changes as inadequate, especially since his primary recommendation was that I drop the section entirely, in part at least, on the grounds that too little consensus exists about the empirical facts for the material to be of any real use to model builders. My judgment is that, in spite of its complexity and inconsistencies, this literature is simply too important to ignore.