# (p.221) Appendix. Predictivism

# (p.221) Appendix. Predictivism

# Abstract and Keywords

Predictivism (or the historical theory of confirmation) is the view that evidence gives more support to a hypothesis if it was discovered after the hypothesis was formulated (for example, in the process of testing the hypothesis) rather than before the hypothesis was formulated. Given objective criteria (especially criteria of the relative simplicity of hypotheses) for how much evidence renders a hypothesis probable, predictivism is false. Nevertheless, we are more likely to find the kind of evidence that gives great probability to a hypothesis if we are looking for it and we only know what to look for when we have formulated the hypothesis. There are, however, very unusual situations in which the time at which evidence was discovered does affect the probability that it gives to a hypothesis.

# The Normal Irrelevance of Novelty Evidence

On the account that I have given in Chapter 4 of whether and how far evidence *e* makes logically probable a hypothesis *h*, it is irrelevant whether *e* was known before *h* was formulated and perhaps taken account of in the formulation of *h*, or whether *h* was formulated first and then *e* was discovered, perhaps in the process of testing *h*. The contrary view that the time or circumstances of the formulation of *h* (relative to when *e* was discovered or utilized) make a difference to whether and how far *e* ‘confirms’ h, I shall call predictivism; and I shall contrast it with the timeless view which I endorse that such considerations are irrelevant. The predictivist view has been defended by various writers, some of whom derive their ideas from Karl Popper. These writers have urged that it is always very easy to construct some hypothesis which fits the evidence obtained so far; but once we have constructed a hypothesis, then nature may very well turn up evidence which falsifies it (or at any rate, strongly disconfirms it) in a clear objective manner. Hence subsequent testing provides objective support in a way that mere fitting the data does not, Popper claimed. This appendix will investigate how far there is any truth in the predictivist view.

In order to consider predictivism in its most plausible form, we need to separate off from it two aspects of the way in which Popper himself expressed it. Popper claimed that a hypothesis *h* has a high degree of ‘corroboration’ only in so far as it has passed severe tests. And he claimed further that those tests had to be sincere attempts to show *h* false. (‘Our *c* (*h* | *e*) can be adequately interpreted as degree of corroboration of *h*—or of the rationality of our belief in *h* in the light of tests—only if *e* consists of reports of the outcome of sincere attempts to refute *h*, rather than attempts to verify *h*’^{1}). It might look to the unwary as if Popper is suggesting that a hypothesis is confirmed—that is, has its probability increased—only if it is subjected to ‘sincere attempts’ to refute it. But Popper does not mean by ‘corroborates’ anything like ‘confirms’. For he holds that corroboration is solely a measure of past performance and is no good guide to the future, that it in no way makes the predictions of the hypothesis more probably true. This is, of course, intuitively a highly unappealing view—surely scientific theories will more probably give true predictions in so far as they have passed severe tests. So let us regard the claim of the importance of testing as a claim about confirmation. Secondly,
(p.222)
Popper understood by severe tests ‘sincere attempts to refute *h*, rather than attempts to verify *h*’. But it seems implausible that the scientist's private intention when performing his tests can have any relevance to whether his results confirm or disconfirm *h*. For their confirming effect is a public matter, measurable and equally accessible to other scientists who did not themselves perform the tests—whereas intentions are mental (the subject has privileged access to them), and may be quite unknown to others. Also, I think it unreasonable to require that the evidence be obtained in the course of a (publicly recognizable) ‘test’ of *h*; so many of the most important pieces of evidence for or against a hypothesis turn up by accident. What, I think, Popper and others are best represented as saying is that if you ‘fit’ a hypothesis to the evidence, that evidence does not give as much support to it as it would if it was discovered after the hypothesis was formulated.

What is at stake is, therefore, best phrased in terms of confirmation and of public evidence; and it is in these terms that most predictivists phrase their claims. The issue then is whether certain evidence would make a certain hypothesis more probable if it was in some way novel evidence instead of being already known. To put the issue more formally, suppose that we have some evidence *e* _{1} (about the time of discovery of which we are ignorant), which is such that it gives to a hypothesis *h* on background evidence *k* a probability *P*(*h* | *e* _{1} & *k*). Then does the addition to *e* _{1} of further evidence *e* _{2} to the effect that *e* _{1} was in some sense novel evidence ever affect the value of that probability—for example, by increasing it if it is greater than *P*(*h* | *k*) or greater than $\frac{1}{2}$, or decreasing it if it is less than *P*(*h* | *k*) or less than $\frac{1}{2}$? For simplicity's sake I shall concern myself mainly with the former (the possibility of an increase of *P*(*h* | *e* _{1} & *k*)); if my arguments hold with respect to that, similar arguments will clearly hold with respect to the latter possibility. Is *P*(*h* _{1}| *e* _{1} & *e* _{2} & *k*) ever greater than *P*(*h* _{1}| *e* _{1} & *k*)?

There are different understandings in the literature of what the novelty of *e* _{1} and so of the priority to it of *h* consists in. *e* _{2} might report the temporal priority of the formulation of *h* to the discovery of *e* _{1}; *e* _{1} is then temporally novel. Or *e* _{2} might report that *e* _{1} was not ‘used’ in the formulation of *h*; it was not taken account of in the (public) justification of *h* and so *e* _{1} is ‘use novel’. And finally there is the epistemic priority of the formulation of *h*; people at the time had no good reason provided by hypotheses then current for believing that *e* _{1} would occur, before *h* was put forward. This I shall call the ‘relative novelty’ of *e* _{1}—*e* _{1} is relatively novel if, while it is quite probable given *h* and *k*, it is very improbable given the actual hypotheses in vogue when *h* was formulated. This ‘relative novelty’ is a historical feature of the circumstances of the discovery of *e* _{1} (in what climate *h* was formulated)—to be distinguished sharply from the low prior probability of *e* _{1}—*P*(*e* _{1}| *k*) or its low probability if *h* is false—*P*(*e* _{1}|∼ *h* & *k*). The latter two values arise from the probability of *e* on all the various hypotheses that might hold (whether or not in vogue), weighted by their prior probabilities—whether or not they were recognized at the time of the formulation of *h*. It is certainly the case, as I stated on p. 104 in my exposition of the criteria of logical probability, that *e* confirms *h*
(p.223)
more, the less probable is *e* given not‐*h*; more precisely, the lower is *P*(*e*|∼ *h* & *k*), the more *P*(*h* | *e* & *k*) exceeds *P*(*h* | *k*). To the extent to which *e* is to be expected anyway, then, even if it is a prediction of *h*, it is not much evidence in favour of *h*. But that is not a controversial claim, and, as we have seen, follows straightforwardly from the probability calculus.

Of the three kinds of novelty, most predictivists discount temporal novelty as the relevant kind, preferring either use novelty^{2} or relative novelty^{3}. I shall call any such evidence of the novelty of *e* _{1} novelty evidence. Novelty evidence *e* _{2} is in effect historical evidence of when (relative to the evidence *e* _{1} of what was observed) a hypothesis *h* was formulated, by whom or under what circumstances. I shall argue that, for normal *k*, novelty evidence is irrelevant to the confirmation of *h*, but that, for certain *k*, such *e* _{2} is relevant.

I include within normal *k* not merely the evidence of how things behave in neighbouring fields of enquiry, but also the circumstances in which *e* _{1} occurs. (We could include these within *e* _{1}, but it would make the account of some simple examples that I am about to discuss more complicated.) The crucial condition is that *k* is not to include any historical evidence about who formulates or advocates *h*, when and under what circumstances (and so any historical information about the success rate in prediction of the formulator, or of others before that time or in similar circumstances.)

Here is a trivial example in which *k* is normal and of which, I suggest, the timeless view gives a correct account. Let *h* be ‘all metals expand (of physical necessity) when heated’. (The words ‘of physical necessity’ make the claim that all metals expand when heated a claim about a law of nature, not a mere accidental regularity.) Let *k* be ‘1,000 pieces of metal were heated’, *e* _{1} be ‘those 1,000 pieces of metal expanded’, *e* _{2} be some novelty evidence such as that *h* was put forward before *e* _{1} was known. *k* and *h* (if true) would explain *e* _{1}—the metals expanded because they were heated and all metals expand when heated. The timeless theory then claims that *P*(*h* | *e* _{1} & *k*) = *P*(*h* | *e* _{1} & *e* _{2} & *k*). I suggest that the simplicity of the theory *h* and its ability to explain (given *k*) a large amount of evidence *e* _{1} is what makes it likely to be true, quite independently of when it was put forward. It is also quite independent of what led anyone to formulate *h*, and of whether *e* _{1} was very probable or very improbable given the theories in vogue at the time of the formulation of *h*. Some very crazy astrologically based theories might have been in vogue then that (together with *k*) predicted and explained *e* _{1}. Yet that would not make *e* _{1} any less good evidence for *h*.

In the above example *h* is a universal lawlike hypothesis. The timeless view also works for predictions—given normal *k*. Observational evidence is typically
(p.224)
evidence for predictions by being evidence confirmatory of universal (or statistical) lawlike hypotheses of which the prediction is a deductive (or inductive) consequence; and (although we noted on pp. 109–10 that there are exceptions) normally the most probable prediction is that yielded by the most probable hypothesis. Let *e* _{1} and *e* _{2} be as before, *k* be ‘1,001 pieces of metal were heated’, and *h* be ‘the 1,001th piece of metal expanded’. *e* _{1} with *k* is evidence that confirms ‘all metals expand when heated’, and so that the 1,001th piece of metal will expand when heated. *h* derives its probability from being (with *k*) a consequence of a simple hypothesis able to explain a large amount of evidence, independently of whether or not *e* _{1} was novel in any sense. *P*(*h* | *e* _{1} & *k*) = *P*(*h* | *e* _{1} & *e* _{2} & *k*).

My intuitions on this simple example are also my intuitions on more sophisticated real‐life examples, where my condition on background knowledge holds. They are, for example, my intuitions with respect to Mendeleev's theory, recently discussed by Maher^{4} on the predictivist side, and by Howson and Franklin^{5} on the timeless side. Mendeleev's theory (*h*) entailed and (if true) explained the existence and many of the properties of the newly (1871–8) discovered elements scandium, gallium, and germanium (*e* _{1}). Mendeleev's was not just any old theory that had this consequence; it was not just *e* _{1} plus some unrelated *f*. It was an integrated theory of groups of related elements having analogous properties recurring as atomic weight increased, from which the existence of the sixty or so elements already known (*k*) followed. In virtue of being a more integrated and so simpler theory than any other theory from which *k* followed and by which it could be explained, it was already more likely to be true than any other theory. The further information that other results (*e* _{1}) followed from it and could be explained by it was therefore plausibly further evidence for it independently of when and how they were discovered. Howson and Franklin compare the relation of Mendeleev's theory to chemical elements to the relation of the ‘eightfold way’ to elementary particles; and the prediction of the three elements by the former to the prediction of the Ω‐particle by the latter. They cite a passage from Yuval Néeman, one of the proponents of the eightfold way in which he also makes the comparison and comments that ‘the importance attached to a successful prediction is associated with human psychology rather than with scientific methodology. It would not have detracted at all from the effectiveness of the eightfold way if the Ω‐had been discovered before the theory was proposed.’

That theories can acquire a very high degree of support simply in virtue of their ability to explain evidence already available is illustrated by the situation of Newton's theory of motion at the end of the seventeenth century. It was judged by very many—and surely correctly—to be very probable when it was first put forward. Yet it made no new immediately testable predictions, only the predictions (p.225) that were already made by laws that were already known and that it explained (for example, Kepler' s laws of planetary motion and Galileo's law of fall). Its high probability arose solely from its being a very simple higher‐level theory from which those diverse laws are deducible. My intuitions tell me that it would have been no more likely to be true, if it had been put forward before Kepler's laws were discovered and had been used to predict them.

So much for my intuitions. But my intuitions clash with those of the predictivist. So I need to show that my intuitions fit into a wider theory of confirmation for which other reasons can be adduced, and I need to explain why the predictivist has the inclination to give a wrong account of cases such as I have cited. My intuitions fit into the whole Bayesian picture, in favour of which there are the other good reasons that have been given in Chapters 3 and 4. Consider my first simple example in which *h* = ‘all metals expand when heated’, *k* = ‘1,000 pieces of metal were heated’, *e* _{1} = ‘those 1,000 pieces of metal expanded’, and *e* _{2} = ‘*h* was put forward before *e* _{1} was known’. On Bayes's theorem

*P*(

*h*|

*e*

_{1}&

*e*

_{2}&

*k*) will be greater than

*P*(

*h*|

*e*

_{1}&

*k*), as the predictivist typically claims that it is, only if the addition of

*e*

_{2}to

*e*

_{1}lowers

*P*(

*e*

_{1}|

*k*) (and so

*P*(

*e*

_{1}|∼

*h*&

*k*)) by a greater proportion than it lowers

*P*(

*e*

_{1}|

*h*&

*k*). What this would mean is that, on the mere information that the 1,000 pieces of metal were heated, it would be more likely that the hypothesis that all metals expand when heated would have been proposed before it was known that the 1,000 pieces of metal expanded if the hypothesis was true, than if it was false. That seems to me, and I hope also to the average predictivist, massively implausible. The same applies if we take

*e*

_{2}as ‘

*e*

_{1}was not used in formulating

*h*’. Then a Bayesian predictivist is committed to: on the mere information that

*k*, it would be more likely that

*h*would have been proposed without taking

*e*

_{1}into account if

*h*were true than if it were not. And if

*e*

_{2}is ‘the hypotheses in vogue at the time of the formulation of

*h*were such that

*e*

_{1}is improbable given them’, the Bayesian predictivist is committed to: it would be more likely that

*h*would have been proposed when the theories then in vogue did not predict

*e*

_{1}, if

*h*were true than if it were not. All of this is again massively implausible. Hence the predictivist can save his thesis only by abandoning Bayes's theorem; and there are, I suggest, good reasons for not doing so.

But what, if predictivism is false, is the source of the temptation to espouse it? I think that there are two sources. First, there is the consideration that moved Popper that any collection of evidence can always accommodate some hypothesis—that is, for any *e* _{1} and *k* one can always devise a theory *h*, such that
(p.226)
*P*(*e* _{1}| *h* & *k*) = 1 or is high. (Indeed, one can always devise an infinite number of such hypotheses.) That has seemed to suggest, totally erroneously, to some that there are no objective criteria for when a hypothesis so constructed is supported by evidence. Whereas, the contrast is made, once we have a hypothesis that makes a prediction, we can look to see whether the prediction comes off, and whether it does or not is a clear objective matter. But, as I have urged throughout this book (and especially in Chapter 4), there are very clear objective criteria for when a hypothesis is supported by a collection of evidence already obtained. In the trivial metal example that I used earlier, equally accommodating to the evidence that 1,000 pieces of metal had been heated and expanded, are *h* ‘all metals expand when heated’, *h* _{1} ‘metals 1–1,000 expand when heated, and other metals do not’, and, supposing all the metals observed so far were observed by physicists in Britain, *h* _{2} ‘all and only metals observed by physicists in Britain expand when heated.’ Quite obviously, *h* _{1} and *h* _{2} are not supported by evidence, whereas *h* is. (Or rather, to put the point more carefully, all of these hypotheses are ‘confirmed’—that is, have their probability increased by the evidence that they predict—since many rival hypotheses are incompatible with it; but only *h*, being far simpler than other hypotheses that predict the evidence, obtains any significant degree of probability.) The obvious reason for this is that *h* is a simple hypothesis, whereas *h* _{1} and *h* _{2} are not simple theories (by the criteria analysed in Chapter 4).

The second source of predictivism, as I see it, is this. A hypothesis *h* that entails (or renders very probable) for some circumstances *k* the occurrence of some event *e* _{1} has, as we have noted, its probability raised by *e* _{1}, the less likely *e* _{1} is to occur if *h* were not true—the lower *P*(*e* _{1}|∼ *h* & *k*) and so the lower *P*(*e* _{1}| *k*). If we have already formulated *h* _{1}, we know which *e* _{1} to look for that will have this characteristic of *P*(*e* _{1}| *h* & *k*) very high and *P*(*e* _{1}|∼ *h* & *k*) very low. We can bring about *k* and see whether ∼ *e* _{1} or *e* _{1} occurs, and that will provide a ‘severe test’ of the hypothesis. If we formulate *h* after accumulating evidence, we may or may not have among that evidence an *e* _{1} with that characteristic—but we are much more likely to get it if we are actually looking for it. Hence, producing hypotheses and then testing them may indeed be a better way of getting evidence that (if they are true) supports them strongly, than trying to fit them to evidence we already have. But that has no tendency to cast doubt on the fact that, for given evidence *e* _{1}, *P*(*h* | *e* _{1} & *k*) has a value that is unaffected by the addition of evidence about when *e* _{1} was discovered.

# Looking for Favourable Evidence

There is, it is true, always the temptation for the accommodator to use methods of looking for evidence that will make it fairly improbable that he will find evidence against his hypothesis. But the probability that ought to guide action is that relative to total available evidence. All relevant evidence should be taken into account, and that will include evidence about any methods used to obtain other evidence.
(p.227)
If the methods are such as to ensure that only evidence that the chosen hypothesis predicts will be observed, then intuitively that evidence will not give support to the hypothesis. But that gives no support to predictivism or any other non Bayesian account of the relation of evidence to hypothesis, for that result follows straightforwardly from Bayes's theorem (without bringing in any evidence about when *h* was proposed relative to the discovery of *e*). For, if *e* and only *e* must be observed, and *e* is entailed by hypothesis *h*, not merely does *P*(*e* | *h* & *k*) = 1 but *P*(*e* | *k*) = 1. And so, by Bayes's theorem *P*(*h* | *e* & *k*) = *P*(*h* | *k*), and *e* does not increase the probability of *h*. Thus, suppose you test your hypothesis that all fish in the sea are more than 10 cm in length by using a net with a 10 cm mesh, then the fact that you catch only fish longer than 10 cm is no evidence in favour of your hypothesis—whereas it would be substantial evidence if you had got the same result with a net of much smaller mesh. And this is because, using a 10 cm mesh net, you can get no other result from fishing, whether or not your hypothesis is true.

However, it does not follow from that that, if you secure certain evidence by a method designed to obtain that evidence, then necessarily that evidence is less supportive of your hypothesis than it would otherwise be. Everything depends on the relative probability of that evidence being obtained by that method or alternatively by a different method, on the favoured hypothesis and on its rivals. If using a certain method increases the probability that *e* will be found if *h* is true and also if its rivals are true by the same proportion, then the evidence is equally supportive whether it is obtained by that method or not. (The trouble with the fishing‐net case is that using the 10 cm mesh net rather than a net with a smaller mesh increased the probability of the evidence being found by much more on the preferred hypothesis, on such rival hypotheses as that all the fish are larger than 5 cm.

Consider the example of ‘optional stopping’. You look for evidence relevant to choosing between *h* _{1} and some rival hypotheses, but—keen to prove *h* _{1} as probable as possible—you go on looking until you have got evidence that (if we ignore the procedure used to obtain it) would make h_{1} very much more probable than its rivals. Intuitively, one might suppose that the evidence would not be nearly as favourable to *h* _{1} if it is obtained by optional stopping than if it were obtained by choosing in advance which areas to investigate.^{6} But that will not be so if—as is often the case—the use of the method of optional stopping increases or decreases equally the probability that the evidence will be found both on the preferred hypothesis and on its rivals. In a description of an optional‐stopping experiment, it is important to be precise about what is the evidence obtained by which method of optional stopping.

(p.228) The paradigm case of an optional‐stopping experiment is where you have two hypotheses about the statistical probability of a coin landing heads (statistical probability in an infinite sequence under a distribution of initial conditions typical of that under which tosses are made). You go on tossing until you reach a proportion of heads to total throws that is that most probable on one of the hypotheses and stop there. The suggestion is made that the fact that this result was obtained by this means makes it less strong evidence in favour of the chosen hypothesis than if you had decided in advance to stop after exactly that number of tosses. It follows from Bayes's theorem that in this case optional stopping is irrelevant; and it is used as an objection to Bayes's theorem and so to the use of the probability calculus for calculating logical probability that this is so.

Assuming that the alternative hypotheses have equal prior probabilities, everything depends on the ‘likelihood ratio’, the relative probability on the two hypotheses of getting the given result. Let *h* _{1} claim that the statistical probability is 1–2, *h* _{2} claim that it is $\frac{2}{3}$. You go on tossing until you get exactly 50 per cent heads and then you stop. The 50 per cent ratio can be obtained by many different sequences of heads and tails. For example, a 50 per cent ratio after 4 tosses could be obtained by HTTH or by HTHT or by TTHH and so on. The possibility of each different sequence of 2*n* tosses that would result in obtaining the required ratio (that is, *n* heads), given *h* _{1}, is ($\frac{1}{2}$)^{2n}; given *h* _{2} it is ($\frac{1}{3}$)^{n} ($\frac{2}{3}$)^{n}. Given that the only evidence is that the ratio was obtained after 2*n* tosses (that is, there is no evidence of optional stopping), then there are $\frac{2n!}{n!n!}$ distinct sequences by which the required ratio can be obtained. (*n*! is 1 × 2 × 3 . . . × *n*; 2*n*! is 1 × 2 × 3 . . . × *n* . . . × 2*n*.) The probability of reaching it given *h*1 is $(\frac{1}{2}){}^{2n}\frac{2n!}{n!n!}$; the probability of reaching it given *h*2 is $(\frac{1}{3}){}^{n}(\frac{2}{3}){}^{n}\frac{2n!}{n!n!}$. The larger is 2*n*, the larger is the difference between the two probabilities, since ($\frac{1}{2}$)^{2n} is greater than ($\frac{1}{3}$)^{n} ($\frac{2}{3}$)^{n} for any *n*. But, if we have the further evidence *e*3 that the result (*e*1) was obtained by optional stopping, this amounts to evidence that it was *not* reached after 2 tosses or 4 tosses or 6 tosses . . . up to (2*n*−2) tosses. So, for fixed *n*, the number of ways in which the required ratio can be obtained by optional stopping will be less—indeed very much less—than the number of ways it could otherwise be obtained. For example, while the 50 per cent ratio can be reached in 6 different ways after 4 tosses, 4 of these involve reaching it after 2 tosses as well; and it can be reached only after 4 tosses but not after 2 tosses in only 2 ways (HHTT or TTHH). So the result being obtained after 4 tosses by ‘optional stopping’ is *less* probable than it being obtained after having tossed the coin the exact number of times decided in advance. So, while the probability of each of these ways remains ($\frac{1}{2}$)^{2n} on *h*1 and ($\frac{1}{3}$)^{n} ($\frac{2}{3}$)^{n} on *h*2, the number of ways in which the 50 per cent ratio could have been obtained is much less if optional stopping has been used. But that will make no difference to the ratio of the probabilities of getting the evidence on the two hypotheses, which will remain ($\frac{1}{2}$)^{2n} : ($\frac{1}{3}$)^{n} ($\frac{2}{3}$)^{n}. In consequence, while optional stopping makes it in this case less probable that you will find your required ratio after 2*n* tosses, given either *h*1 or *h*2, that makes no difference to the probability that the total evidence gives to the
(p.229)
respective hypotheses. For, while *P*(*e* _{1} & *e* _{3}| *h* _{1} & *k*) is less than *P*(*e* _{1}| *h* _{1} & *k*), so is *P*(*e* _{1} & *e* _{3}| *h* _{2} & *k*) less than *P*(*e* _{1}| *h* _{2} & *k*) in the same proportion. Hence *P*(*h* _{1}| *e* _{1} & *e* _{3} & *k*) = *P*(*h* _{1}| *e* _{1} & *k*). Optional stopping has made no difference. And once we see that in this kind of case optional stopping increases or decreases by the same proportion the probability that the required ratio will be obtained both on the preferred hypothesis *and* on rival hypotheses, we recognize that the fact that the result was obtained by optional stopping is irrelevant to its evidential force. Bayesianism yields this initially surprising but on reflection intuitively plausible result.

There are, however, other kinds of optional‐stopping experiment in which the optional stopping does make a difference to the likelihood ratio and so to the probabilities of the hypotheses being tested. Suppose our hypotheses as before. But suppose that all we learn at the end of the experiment is that the 50 per cent ratio was obtained at some stage *within* 2N tosses, but we are not told at what stage. That is, *either* after 2 tosses, there was $\frac{1}{2}$ heads; *or* after 4 tosses, there were $\frac{2}{4}$ heads; *or* after 6 tosses there were $\frac{3}{6}$ tosses, and so on. It does, of course, become very likely indeed, if we take a large 2N, that at some stage we will get the 50 per cent result if *h* _{1} is true. It also becomes quite likely that we will get that result if *h* _{2} is true. But, the larger is 2N, the greater the likelihood of getting the 50 per cent result if *h* _{1} is true exceeds the likelihood of getting it if *h* _{2} is true. The ratio of the two likelihoods, for 2N = 2 is 1.125; for 2N = 4 is 1.151; for 2N = 6 is 1.171. But compare that sequence with the likelihoods of getting the 50 per cent ratio at an exact number of tosses. The ‘likelihood ratio’ of the likelihood of getting the 50 per cent result if *h* _{1} is true divided by the likelihood of getting it if *h* _{2} is true at exactly 2N tosses, for 2N = 2 is 1.125, for 2N = 4 is 1.266, for 2N = 6 is 1.423. So, if we compare two different sorts of evidence—evidence that the 50 per cent ratio was obtained at some stage within 2N tosses, with evidence that the ratio was obtained at exactly 2N tosses—then clearly the latter evidence is much better evidence in favour of the hypothesis on which it is the most probable outcome; it renders that hypothesis more probable than does the former evidence.^{7} That follows straightforwardly from the probability calculus. It was perhaps the unexplicit assumption that this was the kind of comparison involved in an optional‐stopping experiment—a comparison between a result having been obtained at some stage or other before a number of tosses fixed in advance had been made, and that result having been obtained at exactly that fixed number of tosses—which led to over‐general claims
(p.230)
that any evidence obtained by optional stopping was weaker evidence than that evidence would be if it had not been obtained by optional stopping. The general point is that *if* using a certain method to obtain evidence *e* has the consequence that obtaining *e* is more probable on *h* _{1} than *h* _{2} by a smaller amount than it would be if you used a different method, then the use of that method diminishes the evidential force of *e*; that is, *e* increases the probability of *h* _{1} less than it would do if the other method had been used. But optional stopping as such need not have that consequence.

# The Occasional Relevance of Novelty Evidence

So, reverting to the main theme of this appendix, we have seen no reason to suppose that, for normal background evidence *k*, novelty evidence about when and by whom the hypothesis was formulated has any tendency to affect the probability of that hypothesis on any other evidence; and I have given some explanation of why people have, wrongly, sometimes thought otherwise. But it would be mistaken to suppose that my claim holds for every *k*, since, for any evidence and any hypothesis, there is always some background evidence that makes the former relevant to the latter. In particular, my claim does not hold in many cases where *k* reports evidence of a historical kind, the force of which (together with *e* _{1} and *e* _{2}) is to indicate that someone has access to evidence relevant to *h* that is not publicly available. (*e* _{1} is the evidence of what was observed; *e* _{2} is novelty evidence about *e* _{1} of the kind described in the earlier section.)

Here is an example where *k* is evidence of this historical kind. Let *h* be Grand Unified Field Theory, and *e* _{1} be some observed consequence thereof. Let *k* be that *h* was formulated by Hawks, who always puts forward his theories after assembling many pieces of observational evidence that he does not reveal to the public, and that—so long as, subsequent to being formulated, they make one true new prediction—are always subsequently confirmed and never falsified. Then of course the evidence that a consequence of the theory (*e* _{1}) was observed subsequent to its formulation (*e* _{2}) increases its probability above that given by the mere evidence that a consequence of the theory was observed—*P*(*h* | *e* _{1} & *k*) < *P*(*h* | *e* _{1} & *e* _{2} & *k*).

Now let us consider in more detail an example recently brought forward by Maher in defence of the predictivist thesis that observation of a consequence of a theory subsequent to the formulation of the theory is more evidence in favour of the theory than the mere occurrence of the consequence:

We imagine an experiment in which a coin is tossed 99 times, and a subject records whether the coin landed heads or tails on each toss. The coin seems normal, and the sequence of tosses appears random. The subject is now asked to state the outcome of the first 100 tosses of the coin. The subject responds by reading back the outcome of the first 99 tosses, and adds that the 100th toss will be heads. Assuming that no mistakes have been made in recording the observed tosses, the probability that the subject is right about these 100 tosses (p.231) is equal to the probability that the last toss will be heads. Everyone seems to agree that they would give this a probability of about $\frac{1}{2}$.

Now we modify the situation slightly. Here a subject is asked to predict the results of 100 tosses of the coin. The subject responds with an apparently random sequence of heads and tails. The coin is tossed 99 times, and these tosses are exactly as the subject predicted. The coin is now to be tossed for the 100th time, and the subject has predicted that this toss will land heads. At this point, the probability that the subject is right about all 100 tosses is equal to the probability that the 100th toss will land heads. But in this case, everyone seems to agree that they would give it a probability close to 1.

The difference between the two situations is that in the first the subject has accommodated the data about the first 99 tosses, while in the second that data has been predicted. Clearly the reason for our different attitude in the two situations is that the successful prediction is strong evidence that the subject has a reliable method of predicting coin tosses, while the successful accommodation provides no reason to think that the subject has a reliable method of predicting coin tosses.

^{8}

Let *e* _{1} be the outcomes of the first 99 tosses, *h* be *e* _{1} plus the proposition that heads will occur on the 100th toss, *e* _{2} be that *h* was formulated before *e* _{1} was observed, and *k* be a description of the set‐up ‘where the coin seems normal and the sequence of tosses appears random’. *k* will also have to include the information that *h* was the only (or almost the only) hypothesis formulated, for, as Howson and Franklin^{9} point out, if all possible hypotheses have been formulated, the example will not work. *h* would be no more likely to be true than the hypothesis consisting of *e* _{1} plus the proposition that tails will occur on the 100th toss, if that had been formulated. The fact that someone guessed the lottery numbers correctly is no reason for supposing that he will guess the numbers correctly next time, when on the successful occasion all possible numbers had been guessed by someone or other.

However, given *k* above, claims Maher, ‘everyone seems to agree that’ *P*(*h* | *e* _{1} & *e* _{2} & *k*) is close to 1. Everyone is surely correct on this. Yet, Maher also claims, ‘everyone seems to agree’ that *P*(*h*/ *e* _{1}&∼ *e* _{2} & *k*) is about $\frac{1}{2}$. Hence it will follow that *P*(*h* | *e* _{1} & *e* _{2} & *k*)> *P*(*h* | *e* _{1} & *k*), and so historical evidence increases confirmation. Now, even if ‘everyone agrees’ that *P*(*h* | *e* _{1}&∼ *e* _{2} & *k*) is about $\frac{1}{2}$, it is possible that they are mistaken. That apparently random sequence may not be really random. It may be that there is a pattern of regularity in the first 99 tosses from which it follows that the 100th toss will very probably be heads, which the subject who put forward *h* has alone spotted. Then *P*(*h* | *e* _{1} & *k*) will also be close to 1 (even though most of us are too stupid to realize that), and the historical information *e* _{2} is irrelevant.

But suppose there is no observable pattern in the tosses. In that case, what ‘everyone agrees’ about the probabilities is correct. So we ask why is *P*(*h* | *e* _{1} & *e* _{2} & *k*) close to 1. The answer is that *k* includes historical information that *h* was the only
(p.232)
hypothesis put forward. That together with *e* _{1} and *e* _{2}—the fact that his predictions were so accurate—is very strong evidence that the hypothesizer has access to information about bias in the set‐up that we do not (either via some publicly observable evidence other than that of the results of the 99 tosses, or via some internal intuitions—maybe he has powers of telekinesis). This is for the reason^{10} that (*e* _{1} & *e* _{2} & *k*) would be very improbable if the hypothesizer did not have this information. That is, we trust the prediction because of who made it, not because of when it was made. That that is the correct account of what is going on here can be seen by the fact that, if we add to *k* irrefutable evidence that the hypothesizer had no private information, then we must conclude that his correct prediction of the first 99 tosses was a mere lucky guess and provides no reason for supposing that he will be right next time.

So Maher's example gives no reason to suppose that in general mere evidence of the novelty of other evidence adds to the confirming force of that other evidence. I know of no plausible example to show that it does so, except in cases where the background evidence *k* includes historical evidence; typically evidence about who formulated or advocated the hypothesis, when the force of the novelty evidence *e* _{2} is to indicate that that person knows more about the subject matter than the public evidence *e* _{1} shows. In this case alone, I suggest, where the historical evidence shows private information, is the ‘method’ by which the hypothesis is generated of any importance for its probability. In general, the method by which the hypothesis was generated is irrelevant to its probability on evidence. Whether or not Mendeleev's theory was generated ‘by the method of looking for patterns in the elements’, its probability depends on whether it *does* correctly entail patterns, not how it was arrived at. Kekule's theory of the benzene ring is neither more or less probable on its evidence because it was suggested to Kekule in a dream. Only if the evidence suggests that someone has private information does it become important whether the hypothesis was generated in the light of consideration of that information. If it was, then evidence that the hypothesis has been generated by a method that has had success so far is (indirect) evidence in favour of its truth. But if we have the private evidence for ourselves we can ignore all that, and assess its force directly.

## Notes:

(1)
Karl Popper, *Logic of Scientific Discovery* (Hutchinson, 1959), 414.

(2)
See John Worrall, ‘Fresnel, Poisson, and the White Spot: The Role of Successful Predictions in the Acceptance of Scientific Theories’, in D. Gooding, T. Pinch, and S. Schaffer (eds.), *The Uses of Experiment* (Cambridge University Press, 1989).

(3)
See Alan Musgrave, ‘Logical versus Historical Theories of Confirmation’, *British Journal for the Philosophy of Science*, 25 (1974), 1–23.

(4)
Patrick Maher, ‘Howson and Franklin on Prediction’, *Philosophy of Science*, 60 (1993), 329–40.

(5)
Colin Howson and Allan Franklin, ‘Maher, Mendeleev and Bayesianism’, *Philosophy of Science*, 58 (1991), 574–85.

(6)
See Deborah Mayo, *Error and the Growth of Experimental Knowledge* (University of Chicago Press, 1966), 341–59, for this claim.

(7)
As we make more and more tosses and N and so 2N → ∞, the latter ratio—the ratio of the likelihood of getting the 50% result at 2N throws on *h* _{1} divided by its likelihood on *h* _{2—} being 3^{2n}/2^{3n} gets larger without limit. However, the former ratio of the likelihood of getting the 50% result at some stage within 2N tosses on *h* _{1} divided by the likelihood of getting it on *h* _{2}, approaches a maximum of 1.5 as 2N → ∞. For *p* as the probability of heads and *q* as the probability of tails on one throw, the probability that ‘no return to equilibrium ever occurs’, i.e. that the 50% ratio is never reached is | *p*−*q*|, that is 0 on *h* _{1} and and $\frac{1}{3}$ on *h* _{2}. (See W. Feller, *An Introduction to Probability Theory and its Applications*, i (3rd edn., John Wiley & Sons, 1968), 274.) So the ratio of the likelihood of getting the 50% ratio within 2N throws, as N → ∞, is $\frac{3}{2}$ = 1.5.

(8) Maher, ‘Howson and Franklin on Prediction’, 330.

(9) Howson and Franklin, ‘Maher, Mendeleev and Bayesianism’, 577.

(10)
Given by Colin Howson in ‘Accommodation, Prediction, and Bayesian Confirmation Theory’, *Philosophy of Science Association Proceedings, 1988* (Philosophy of Science Association, 1989), 381–92; see pp. 383–4.