Does a difference make a difference?
Does a difference make a difference?
Abstract and Keywords
This chapter presents a 1987 commentary on clinical trials. It argues that there should be public involvement in formulating criteria on whether one treatment is better than another. Approaches used to quantitate the degree of preference for health states provide a model for incorporating community views in clinical trial design.
‘The most important maxim for data analysis, and one which statisticians have shunned, is this: Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.
John W. Tukey1
More than eighty years ago, Bernard Shaw wrote (with pen dipped in vitriol) that the doctor of his day drew ‘disastrous conclusions from his clinical experience because he has no conception of scientific method, and believes, like any rustic, that the handling of evidence needs no expertness.’2 Cynics will say that, except for the awareness of male bias in the use of gender pronouns, nothing has changed. But fair witnesses can point to clear signs that the doctor of today is developing a deep respect for the formalities used in the evaluation of everyday experience. Indeed, it can be argued that the medical profession has been cowed into unhealthy submission by the confidence and calculus of number jugglers (especially the emphasis on hypothesis-testing rather than estimation of the size-of-difference of a measured outcome between groups),3 and by the flaunting of P values to impart an aura of mathematical proof.4
Conscientious physicians seeking fair appraisals of treatment possibilities tumbling out of today’s cornucopia of proposals need to keep their feet planted in the real world. At the end of the day, patients, their immediate surrogates, their families and the community at large need to be convinced that all of the heady medical activity is in the sufferer’s best interest. The need for realism is seen very clearly in the effect-size problem in the design of clinical trials of compared treatments: What is an important difference in outcomes?
When planning the dimensions of a clinical trial, researchers are required (in the Neyman–Pearson ‘game’5) to provide a precise answer (p.12) to the value-laden question: ‘What is the smallest difference between treatments you think is important enough to find?’ And they must wrestle with two uncomfortable questions about levels of error protection: (i) If you say, at the end of a trial, ‘There is an important difference in outcomes,’ how large a risk of being wrong are you willing to take? and (ii) How large a risk are you willing to take in missing the actual existence of an important difference by declaring, ‘There is no statistically significant difference’? It is hard to escape the uneasy feeling that the underlying motive of the need for caution on the part of the investigator, when he/she replies to the posed question, has more to do with self-protection (the researcher’s reputation is at stake) than with the need to consider overall social consequences.
‘Methods’ sections in reports of clinical trials rarely provide the prior arguments used: i) when the decision was made to define the smallest difference thought to be important; and ii) how the decision was made to accept stated levels of uncertainty about conclusions. Silence about the details fuels suspicion that the agreements are often narrowly defined. Many investigators give the impression they are embattled. As noted earlier, they seem to think they are forced to conduct formal trials only to persuade sceptical rivals that the new treatment ‘works.’ The beleaguered view was revealed several years ago in a rare confession, ‘We’re convinced that treatment A is far better than B, but we can’t convince Doctors X, Y and Z unless we have a randomised clinical study.’6
Where is the public choice in the matter of what shall be considered an important difference in treatment outcomes? Surely the preferences of those whose lives and well-being are directly affected need to be taken into account if a treatment of medical interest is to be transformed into something of social value. Is it not time to insist that there be public involvement in formulating the exact criteria to decide that ‘treatment A is far better than B?’ Approaches used to quantitate the degree of preference for health states (e.g. adaptations of the standard gamble technique for measuring preferences7) provide a model for incorporating community views in clinical trials design. The additional planning time required to obtain public consultation will be well worth the investment if questions asked in treatment trials begin to approach the one proposed by John Burroughs for evaluating the outcome of all human interventions:8 ‘Is life sweeter? That is the test.’
The question of how to improve on Burrough’s subjective test of ‘an important difference’ in treatment outcomes still goes answered. The failure would not have surprised the American Chief Justice, Oliver (p.13) Wendell Holmes: ‘Most people think dramatically,’ he once observed, ‘not quantitatively.’
What needs to be overcome is a naive ‘all-and-none concept’; the unrealistic expectation that an effective treatment will cure virtually all patients with a specified illness; and, conversely, practically none of the patients with this illness will improve without specific treatment. But help is on the way. It is encouraging to see that in the past few years, the advocates of evidence-based medicine at McMaster University and at Oxford University have been examining the effect-size question in considerable detail.1 They have been devising ways to express quantitative estimates of effectiveness that focus on the limitations of treatments. The aim is to make this realistic information available to practitioners, patients, families, communities, and even to new players on the medical scene, the bureaucrats in managed care. (Increasingly and ominously, the latter administrators are making many crucial decisions about treatments.)
One innovative approach, introduced by the Canadians, is the ‘number needed to treat’ concept.2 They cite, as an example, a parallel-treatment trial of hypertension in which 20% of enrolled patients allotted to the placebo group suffered a cerebral stroke within a five-year interval of observation; the risk of this adverse event was reduced to 12% among patients with moderate hypertension who received active treatment. The estimate of the absolute risk reduction in the trial was 0.20 − 0.12 = 0.08; and the reciprocal of this number, 1.0/0.08 = 13, provides an estimate of the number needed to treat (NNT). The practitioner uses this sober information about limits to tell decision makers that for patients with moderate hypertension, one would need to treat about 13 patients for five years with the expectation of preventing one stroke. On the other hand, from the results in patients with mild hypertension, the risk of stroke was reduced from a baseline of 0.015 in the placebo-treated group, to 0.009 in patients who received specific treatment. Here, the magnitude of limitation is unmistakable: to prevent one stroke the estimate of NNT is 167.
This easy-to-grasp information (and additional quantitative statements: confidence intervals of NNT, to indicate the imprecision of the study sample; and threshold NNT, to take into account the level of impact that warrants a recommendation for treatment) should help to inform the decisions made by all concerned about the treatment of patients with disorders that vary widely in different practice settings and from patient to patient.