(p.210) (p.211) Appendix Mathematical Notes
(p.210) (p.211) Appendix Mathematical Notes
Introduction
In the main body of this book we have described a quantitative theory of crime. The theory is based on extensive data collected by criminal justice agencies and as a result it might more accurately be described as a theory of conviction and reconviction, or crime as it is experienced by the criminal justice system. We have tried to make our description nontechnical, relying mainly on graphical representations of both the data and our models to support our arguments. Inevitably we have had to include mathematical equations where necessary but have not fully explained the link between our basic assumptions stated in Chapter 3 and the mathematical formulation of the theory. If the theory is to be used in practical applications, like prison population forecasting or estimating the impact of policy, a more detailed exposition of the mathematics is needed. This Appendix is intended to provide the mathematical and statistical logic and understanding necessary to develop the theory and apply it.
The theory developed in Chapters 2 and 3, and in particular the mathematics of the theory, is based on the concept of constant probabilities. It is important therefore for us to make clear what we understand by probability. There are two schools of thought on probability: the Bayesian School and the Frequentist School. Bayesians see probability as a reflection of the state of knowledge concerning some future event, which is updated in the light of experiment or experience. In the absence of any empirical evidence a purely subjective probability is assigned as a prior probability which is then updated to form a posterior probability in the light of experience. This approach is ideally suited to activities like horse racing or stock market analysis but, in our view is less useful when studying the random events occurring in stable stochastic systems, for example in physics (radioactive decay), or in criminology (the largescale behaviour of criminals in the population). Frequentists, on the other hand, define probability in terms of relative frequencies. Probabilities cannot be assigned to events without some preexisting data or well founded theoretical reasoning concerning the system or population in which the event might occur. In practice Bayesians effectively use the Frequentist’s methods where the data is available.
(p.212) Constant Probability Systems
In the context of our theory, among the events that we consider is ‘one or more criminal convictions in a life time’. If we select an individual at random from the entire population, we define the probability of that individual being convicted within their lifetime as the ratio of the number of individuals who have been or will be convicted to the total number of individuals in the population. There is clearly a problem doing the calculation as we do not know the number of individuals who will be convicted in the future. We must therefore estimate this probability from the information that we do have. Our data source, the Offenders Index, enabled the selection of cohort samples of individuals born in one of four weeks in selected years, 1953, 1958, 1963, 1968, and 1973. In Chapter 2 we made estimates of the whole life conviction probability (the cohort criminality) for each of the cohorts. The whole life criminality was calculated using the age–crime model of Chapter 3 to estimate the number who had or would be convicted.
The criminality estimate was found to vary between the cohorts by more than could be accounted for simply by chance, suggesting that there is also some additional stochastic variation in criminality over time. It is also clear from Chapter 2 that male and female criminalities are very different both from each other and from the overall value. Thus we can only estimate the probability of an event for an individual in the context of the population to which that individual belongs. Probability is a property of the population and not of the individual. To some extent this chimes with the Bayesian view. If we do not know the gender of the individual we would assign the whole population probability and update that when the gender was known.
In developing our theory we make use of the concept of randomness. A random selection is made on the basis of no prior knowledge, except that the selection is from a known parent population. A random selection is assumed to have the same statistical properties as the parent population. Selecting on the basis of some property or characteristic would not be random, but having made the selection a new parent population would be defined from which a random selection could then be made. In the main text we frequently make nonrandom selections to create subsets (categories) of offenders with specific characteristics: males/females, serious/less serious offenders, etc. In these subsets the statistical structure is often similar but with different parameters, but not necessarily as both will depend on the selection criterion (conditioning).
We also make use of inferred subsets where collectively the subset (group or category) exhibits a frequency distribution which infers that the members of the subset share a common parameter value for the property that generates the distribution. The main examples of inferred categories are those with a common constant probability of reconviction (or desistance) (p.213) or a constant probability of offending (being convicted) in a given time interval. These inferred categories are said to be homogeneous with respect to the specified property. Moving up a level, where the parent population contains two or more of these inferred categories it is said to be heterogeneous with respect to the specified property.
In our analysis of recidivism, for each cohort, the parent population was the set of individuals born in one of four weeks in the cohort year who were convicted of one or more offences in the followup period. Recidivism is defined as the proportion of offenders with at least n convictions who are reconvicted. In the parent population the distribution of offence number was found to be heterogeneous with respect to recidivism probability. However we were able to create two inferred subsets (categories) that were homogeneous with respect to recidivism probability p, which was constant for all conviction numbers. Within the inferred categories the probability of at least n convictions was simply p ^{(} ^{n1} ^{)}.
In our analysis of reconviction times it is not immediately obvious that we are again dealing with a constant probability process, or why we can infer this from the data. By way of explanation, let us assume that the probability p of an event (say an offender committing a crime) is constant in any/all small interval/s of time and on average there are λ events in unit time (say crimes per year). In a time interval t (years) there will on average be λ*t events. If there are n of our small time intervals in time t: then λ*t = n*p. By considering each of the n intervals as an independent opportunity for the event to occur we can calculate the probability of exactly zero, one, two etc events occurring in time t simply by using the binomial expansion:
where: q = 1p
If we now assume that our small time intervals get smaller and smaller so that n tends to ∞ and p tends to zero and that this happens in such a way that n*p = l*t remains true and n*(n−1)*p^{2} = (n*p)^{2} etc then in the limit the righthand side of Equation A.1 becomes:
But this is the series expansion of e^{l} ^{*} ^{t} and, as this is a probability distribution, the sum should be equal to 1. Dividing each term by e^{l} ^{*} ^{t} satisfies this requirement, and successive terms in the expansion give the probability (p.214) of 0, 1, 2, 3 or in general r events in the time interval t. The probability of exactly r events in a time interval t is given by:
The right hand side of Equation A.3 is the general term in the Poisson distribution.
Of particular interest in studying the Poisson process is the time interval between successive events or put another way the probability of no events in the time interval t expressed as a function of t. Putting r = 0 in Equation A.3 gives the probability that the interevent time is greater than t:
Equation A.4 is of the form of the survival time distributions used in the analyses of Chapter 2 with average survival time 1/l. If in the above derivations we assumed that we selected events at random with probability p_{s} then the average number of selected events in time interval t would become λ_{s}*t = p_{s}*l*t, ie λ_{s} = p_{s}*l and, substituting λ_{s} for λ in Equations A.2 to A.4, the derivation would proceed in exactly the same way, resulting in a further Poisson process with parameter λ_{s}.
We use this result in Chapter 3 to account for the apparent disparity between the time to first conviction and the interconviction time distributions.^{1} This result also implies that convictions are in effect a random selection from crimes committed which also occur as random events in a Poisson process. It should be noted here that any one of the characteristics of the Poisson process implies the others. A negative exponential distribution of interevent times implies a Poisson distribution of events in any given time interval which in turn implies a constant probability of an event occurring in any small interval of time within the observation period.
The Poisson process is a common feature of criminal career models (see eg Barnett et al 1987; CanelaCacho et al 1997; Greenberg 1991; Maltz 1996). In criminal career research the rate of offending λ is often estimated from surveys of individuals which are heavily conditioned by the target sample of the survey, which could be anything from the general population to prison inmates. Piquero and Blumstein (2007) in their discussion of incapacitation make many references to λ. They suggest that ‘Rather than focusing on individuallevel measurement of λ, it is more reasonable to direct attention to the distribution of λ among various populations’. (p.215) In estimating the distribution of λ for offenders in general, Greenberg (1991) assumed that the population was heterogeneous with respect to λ and that λ was distributed as a gamma variate leading to a Pareto distribution for the number of crimes in time t. CanelaCacho et al (1997) also assumed heterogeneity for λ in the offender population. They assumed that the distribution of λ was the sum of r exponential distributions and r equal to 3 was required to fit data from the Rand surveys of prison inmates in the late 1970s.
In our model we also assume heterogeneity, dividing our parent population into just two categories which are homogeneous with respect to λ. This implies that individuals in our inferred categories commit crimes as a Poisson process. We estimate the Poisson rate λ from interconviction survival times. This estimation technique has a number of advantages. It automatically accommodates low individual λs, which is very important when we analyse first convictions. The ‘too many zeros’ problem (Chaikin and Rolph 1981) encountered when counting convictions in some short time period does not occur; desistance from crime does not influence the estimation of λ; and censorship, caused by the limit of the observation period, is clearly identifiable on the survival plot and can be compensated for.
However, we can derive a distribution for individual λs. An individual measurement of the number, r, of offences committed in unit time, say one year, by an individual offender can be considered as a random instance from the Poisson distribution of Equation A.3, with t = 1. The estimate of individual λ from that instance is simply r. Therefore Equation A.3 (with t = 1) gives the probability distribution of individual λs, ie a Poisson distribution of r with mean λ. Estimating λ from one year subsamples for a birth cohort would of course involve the too many zeroes problem caused by desistance and, unless the one year samples were drawn from the whole observation period, the estimation process would also suffer from selection bias induced by the age–crime curve. To avoid these problems we have contented ourselves with the interconviction survival time method of estimation.
Allocation of Offenders to the Risk/Rate Categories
In Chapter 2 we derived the dualrisk recidivism model and the dualrate survival time model, Equations 2.4 and 2.8 respectively. We repeat them here for ease of reference:
The risk equation
The rate equation

N is the conviction (court appearance) number,

Y(n) is the number of offenders in a cohort or crosssection sample with at least n convictions,

a is the number of convicted offenders in the cohort or cohort equivalent,

P_{h} is the probability that a highrisk offender will reoffend,

P_{l} is the probability that a lowrisk offender will reoffend,

α is the proportion of highrisk offenders,

S(t) is the number of reconvictions of offenders surviving t years from their previous conviction,

B is the total number of reconvictions in the sample,

λ_{h} is 1/(mean time to next conviction) for highrate offenders,

λ_{h} is 1/(mean time to next conviction) for lowrate offenders,

b is the proportion parameter for rate.
From Equation A.5 the total number of convictions (court appearances) sustained for a cohort is given by:
This is expanded to:
For a crosssection, Equation, A.7 represents the total number of offenders in the sample. Equation A.8, summing over the range of n = 2 to ∞, gives the number of reconvictions for the cohort or the number of offenders with more than one conviction for the crosssection. The summations in Equation A.8 separately provide the estimates for the high and lowrisk offender categories. It is assumed that all lowrisk offenders are lowrate but that highrisk offenders can be either high or lowrate. These assumptions lead to the following relationships:
The total number of reconvictions:
The number of highrisk/lowrate reconvictions:
The number of lowrisk/lowrate reconvictions:
(p.217) And the number of highrisk/highrate reconvictions:
For a cohort these can be translated into numbers of offenders as follows.
In the lowrisk/lowrate category:
In the highrisk/highrate category:
In the highrisk/lowrate category:
An Alternative Modelling Approach
In Chapter 3 we derived equations for the age–crime (conviction) curve in which we explicitly modelled the apparent rise in crime between the ages of 10 and 18. This approach worked well for first convictions but led to equations requiring numerical rather than analytic solutions for subsequent convictions. The problem was caused by the explicit modelling of the rise in crime. We now consider a system in which we assume that crime itself does not vary with age and in which crimes are committed randomly and in direct proportion to the number of active offenders. In what follows, an ‘offence’ refers to ‘a conviction opportunity’, ie where the offender is caught and could be, but isn’t necessarily, convicted. We also assume that the population is homogeneous with respect to offending.
Let N_{r}(t) be the number of offenders in the population responsible for r offences in the period up to time t. We start our process at time t = 0 with N_{0}(0) potential offenders who have no previous offences, thus N_{r}(0) = 0 for all nonzero r. This is the situation for all potential offenders on their 10th birthday as, by definition, under10s cannot commit crime. As offenders commit each offence they move from the subpopulation with r offences to that with r + 1 offences and this occurs at rate λ*N_{r}(t). This system can be described as an infinite series of first order linear differential equations:
(p.218) The general solution to this equation is:
Now for r = 0 the term N_{r1}(t) is undefined in the system and does not exist, thus:
For r > 0, N_{r}(0) = 0, and the integral term in Equation A.17 evaluates to:
and:
and in general:
Thus the number of offenders with exactly r offences at time t is the product of the total number of potential offenders in the population, N_{0}(0), and the probability of r events in time t in a Poisson process with mean λ (see Equation A.3). We can now derive an expression for the rate of first offences (equal to the rate of decline in the number of offenders with no offences) as a function of t (age), either by substituting for N_{0}(t) from Equation A.18 into Equation A.16 with r = 0, or by differentiating A.18 to give:
But this suggests that at t = 0 the offending rate is λ*N_{0}(0) whereas we know that at age 10 (t = 0) the offending rate should be very close to zero. However, if we assume that early conviction opportunities are ignored and do not result in conviction, the probability distribution of the first recorded offence/conviction as a function of time would be some combination of the probability distributions of the second, third, etc offences as functions of time. The probability density of the rth offence occurring at time t is simply derived from the probability of r − 1 offences in time t giving: (p.219)
For integer values of r: G(r) = (r1)!
This is the gamma distribution, which we introduced in Chapter 4 as an approximation to account for the rise in crime in the early part of the criminal career.
Incapacitation
The negative exponential distribution has the property of being memoryless, in the sense that what happens after time t is independent of what happened before. We can therefore choose to start our (re) conviction process at any time. If in Equation A.16 we assume that, instead of moving offenders on into the next offencecount subset, we return recidivists into the active offender pool and move the desisters into a nonoffender pool then the differential equation becomes:
This has the solution:
Where N(t) is the number of (active) offenders still offending at time t and N(0) is the number who will offend at some time after we choose to start the process. From this we can see that the average residual career length of active offenders, from our arbitrary start time, is 1/(λ*(1−p)). In this last expression, the operative word is active and this is very important when considering incapacitation. AviItzhik and Shinnar (1973) and Shinnar and Shinnar (1975) in their models of crime made many basic assumptions in common with us. However, in estimating incapacitation, like us they assumed that criminal career length is exponentially distributed, but they also implicitly assumed that for an individual the career is fixed in time, which implies that offenders could terminate their careers whilst incarcerated and those still active on release would have a reduced residual career length (ie active offending time = career length − time in prison). Their result that incapacitation reduces crime relies on this assumption; the dependency on the invariance of criminal career length with respect to CJS interventions is not made explicit in their analysis. (p.220) Using their formulation and parameter estimates from UK data, Tarling (1993, pp 143–146) estimated that the extant prison populations in England and Wales in 1975, 1980, and 1986 had reduced recorded crime by between 5.8 per cent and 9 per cent. But as we show below even these modest estimates would appear to be gross exaggerations.
As reported in Chapter 5, from the 1953 cohort, for highrisk offenders with at least one custodial sentence and more than four convictions, the proportion of reconvictions after custody was 84.8 per cent compared with 83.1 per cent after noncustodial disposals. Similarly, from Table 5.2 the four year reconviction proportions after custody and supervision were 64.4 per cent and 62.5 per cent respectively. In both of these situations, where recidivism risk and seriousness is controlled for, we would have expected some 11 per cent fewer reconvictions after custodial, compared with noncustodial, sentences for fixed in time careers. This is because, for highrate offenders our estimated residual career length would be about five years and average prison time served about seven months during which time 11 per cent of offenders, who otherwise would have reoffended, should desist and not reoffend on release, which should result in 11 per cent fewer reconvictions than for noncustodial disposals. Tarling’s shorter residual career length estimates would result in even higher reductions in recidivism. What was actually observed is that recidivism tends to be higher following custody than for noncustodial sentences. There is therefore no evidence in these data that criminal careers terminate during incarceration rather than at the point of conviction, in fact to the contrary. We therefore conclude that there is no overall crime reduction brought about by incapacitation except where offenders are incarcerated for a large proportion of their active lives.
In our theory we assume that the career termination decision is made at the time of, and as a result of, conviction. This assumption implies that on release from prison the proportion p, destined to reoffend, is the same as for any other disposal, as is the residual career length. Released prisoners simply rejoin the active offender pool in which the residual career length is distributed exponentially with the same constant parameter value. We now consider incapacitation from the viewpoint of our theory. In a simplified situation where the birth rate is constant over time and there is a single offender group in which criminality = c, incarceration probability = p_{c}, and conviction rate = λ are also all constant, we can create a simple model of the impact of prison on crime.
In our theory crime is proportional to the active criminal population and we therefore need to calculate the impact of prison on the number of active offenders. With a constant birth rate, the age–crime curve for all cohorts is the same. If for the moment we ignore crimes committed prior to the first conviction opportunity and start our process at age 10 or at the conviction opportunity before the first conviction, whichever is later, we can calculate, using Equation A.25, the rate of conviction for the cohort at (p.221) time t. As all of the cohorts have identical rate of conviction profiles, we can use this to calculate the overall number of convictions by integrating A.25 over t = 0 to ∞:
Here N(0) is the number of offenders in one homogeneous category of a cohort sample and N(∞) is the lifetime total number of convictions sustained by the category, but, because all cohorts are equivalent, N(0) and N(∞) are also equal to the number of new, N_{N}(t), and active, N_{A}(t), offenders respectively in the equivalent category of active offenders at time t. If the system is in equilibrium, the number of offenders giving up crime, N_{A}(t)*(1−p), will be balanced by the number of new offenders, N_{N}(t), being convicted for the first time and hence entering the system (N_{N}(t) is a constant because birth rate is assumed constant). This is trivially true for empty prisons. If we now incarcerate a proportion p_{c} of those convicted and sentence them to an average time served 1/λ_{s} then the prison population N_{p}(t) would build up until the prison element of the system was also in equilibrium, thus the rate of change in the active offender population is given by:
During the buildup of the prison population the right hand bracketed (prison) term in A.27 will become negative as more individuals enter prison than leave; this will cause the active population N_{A}(t) to reduce, causing the left hand bracketed (active) term to become more positive; this will cause the rate of decrease of N_{A}(t) over time to slow down and change sign to become an increase; eventually the prison population will stabilize and the prison and active terms will both return to zero. In the steady state the active population is given by:
But, remembering that in a stable population N_{A}(t) ≡ N(∞) and N_{N}(t) ≡ N(0), this is the same situation as existed when the prisons were empty; see Equation A.26. Thus, in the steady state, the active population, N_{A}(t), and by inference crime, is independent of the actual prison population. However, following a step change in custodial sentencing policy, both the active population and crime will reduce if the prison population is increasing and increase if it is reducing. The changes in active population (crime (p.222) rate) are transient but the changes in prison population persist, unless of course the prison population increases/decreases indefinitely. In the steady state the prison population, N_{p}(t) is given by:
The steady state prison population is therefore proportional to the product of the probability of a custodial sentence and sentence length (time served).
If we now assume that the prison population is increased at a constant linear rate r (extra inmates per year) then Equation A.27 would become:
Which has the solution:
Over time the exponential term tends to zero, thus in the steady state:
The right hand side of Equation A.32 is independent of t and therefore constant, thus there is an ongoing reduction in the active population of:
The constant rate of increase in prison population can be expressed as:
Giving the proportionate change in the active population of:
If the probability of custody, p_{c}, is increased in such a way as to result in a constant annual increase in the prison population then the active population will be reduced by a steady state constant proportion.
(p.223) Although the derivations above assume only one risk category, we can accommodate different risk/rate categories by simply summing the results over all homogeneous categories. Also the real situation is more complicated, the active population is determined by demographics (the birth rate at each age weighted by the normalized age–crime curve) and policies are subject to change over time potentially influencing any or all of the parameters. But the principles still hold.
Over the sixyear period from 1993 to 1999 the prison population in England and Wales increased by about 50 per cent, a linear change of 8.3 per cent, of the initial value per year. With an overall custody rate of 14 per cent this results in a steady state Δp_{c} = 0.012, and a change in active high and lowrisk populations of −6.6 per cent and −0.6 per cent respectively. Our analysis suggests that overall about half of crime is committed prior to the first conviction and that the risk group proportions in the offender population are 43 per cent highrisk and 57 per cent lowrisk. From these estimates and Equation A.34 the percentage change in recorded crime, due to increasing the prison population, during period 1993–1999 would have been in the region of −1.5 per cent.
Steady State Solutions
The derivations above, concerning constant probability systems and the alternative approach to generating the models, demonstrate that, for homogeneous categories of offenders, the models derived in Chapters 3 and 4 follow directly from the basic assumptions of our theory. The distribution fitting and the goodness of fit achieved in Chapter 2 strongly support our assumptions of combinations of homogeneous categories in both recidivism probabilities and offending rates. The Poisson processes derived above (from both constant probability processes and proportional offending approaches) implicitly assume 100 per cent recidivism but we can incorporate recidivism probabilities less than one simply by multiplying the distributions for the rth offences by p ^{(} ^{r1} ^{)} leading to the age–crime models of Chapters 2 and 3. Recidivism was also incorporated in our simplified model of active and incarcerated populations derived above in our discussion of incapacitation.
Of interest to planners and policy makers are estimates of overall crime/conviction rates and which aspects of the process are amenable to policy interventions. Equation A.28 showed that the size of the active population is proportional to the number of first convictions and inversely proportional to the rate of desistance λ*(1−p). First convictions at time t are proportional to the weighted sum of birth rates for each age at time t. In A.28 λ is the conviction rate and, as discussed earlier, convictions are a sample of offences committed. Thus doubling the probability of conviction given an offence should halve crime if (cumulatively) conviction truly is the cause of desistance. The factor 1/(1−p) which occurs in A.28, and other (p.224) formulae, is simply the sum of the series $\sum _{n=0}^{\infty}{p}^{n}$ and is the average numberof convictions for members of the risk category with reconviction probability p.
Thus reducing p for the highrisk category by 10 per cent from 0.84 to 0.76 would reduce the future crime of those offenders by about 40 per cent and crime overall by about 13 per cent. Reducing recidivism for lowrisk offenders would have a much smaller impact as relatively few are convicted more than once or twice. The number of first convictions, N_{N}(t) is proportional to the birth rate, B(t, age), and for each group N_{Ng}(t) = B(t, age)*c*q_{g}. The proportionality parameter is made up of population criminality c and the proportion of offenders in each category q_{g}. Overall crime can potentially be reduced by reducing criminality and/or moving offenders from high recidivism to lowrecidivism risk categories. Early intervention programmes and more effective informal and preconviction disposals could possibly make these changes.
In the above we have assumed that criminal careers start at the first conviction. Although our estimates of crime will implicitly include some offences prior to the first conviction, the majority of early offences will be excluded. From our two modelling approaches we can estimate the extent of crime by unconvicted offenders as follows: by numerically integrating Equation 3.4, with C set equal to 1, over the age range 10 to 70 we obtain an estimate of the average number of offender years between age 10 and the first conviction for each of the rate categories. Multiplying this by λ for the category results in an estimate of the average number of conviction opportunities which have been ignored, otherwise dealt with, or missed due to the reduced probability of detection prior to being known to the police. Table A.1 gives estimates from the three category model of Chapters 2 and 3.
Our approximate model explicitly assumes that early conviction opportunities are ignored and the numbers of these ignored opportunities are estimated in the fitting process. Average values over all cohorts are quoted in Table 4.5 for males and females separately. These approximate model estimates suggest that about 42 per cent of crime is committed by offenders prior to their first convictions for males and about 38 per cent for females.
The three category model of Chapter 3 is likely to overestimate crimes because, for example, acts like playground fights, although strictly assaults, would not generally be regarded as crime but may be the forerunner of more serious violence. The approximate model estimates, on the other hand, are likely to be underestimates as crimes committed by 10yearolds are omitted completely and the gamma approximation for the high category requires a lower λ, to achieve the fit, which is corrected for by the temporal adjustment δ for actual convictions but not for the ignored conviction opportunities. Both estimates are, however, speculative, as we have very limited information on unsolved crimes and who has committed them. We do (p.225)
Table A.1 Estimates of the number of conviction opportunities and the proportion of crime committed prior to the first conviction, three group model
Integral of Equation 3.4 $\underset{10}{\overset{70}{\int}}{(1+{e}^{\alpha \cdot \left(tc\right)})}^{\frac{{P}_{f}\cdot \lambda}{\alpha}}}\cdot dt$ 


q_{g} 
Average No. of convictions 
${}^{\frac{{P}_{f}\cdot \lambda}{\alpha}}$ 
Average offending years prior to 1st conviction 
Conviction opportunities prior to 1st conviction 
Proportion of crime prior to 1st conviction 

Highrate highrisk 
0.17 
6.25 
0.8 
5.5 
4.8 
43% 
Lowrate lowrisk 
0.76 
1.40 
0.2 
13.2 
2.9 
67% 
Lowrate highrisk 
0.07 
6.25 
0.2 
13.2 
2.9 
47% 
Overall 
55% 
Note: Estimates are for the 1953 cohort from the Offenders Index.
Estimating the Active Offender Population Size
The definition of an active offender used in our theory differs from that used in most criminal career research, in that our offenders are active from the age of 10 until they desist. We have no intermittency because being active is defined by the constant probabilities of continuing to offend at the constant Poisson rate. Our analysis suggests that offenders do actually desist. This is because, for many offenders, the time between their last recorded conviction and the end of the observation period is long compared with the average interconviction time. If these offenders had continued to offend as before, the proportion caught and convicted would have approached 1. Also, offenders who are reconvicted appear in precisely the numbers and at the age predicted by the recidivism and rate parameters. Occam’s razor favours this simple explanation over the rather convoluted changes in λ that would otherwise be required. Such convoluted changes are also not supported by the data.
In a crosssection sample, like the 1997 sentencing sample of Chapter 3, we can estimate the high and lowrate parameters, λ_{h} and λ_{l} from an analysis of time since the previous conviction. And from the conviction number frequencies we can estimate p_{h}, p_{l} and the proportions of offenders in the risk/rate categories. For each homogeneous category of offenders with rate parameter λ and reconviction probability p: we can calculate the cohort equivalent category size in the sample, N_{S}, and therefore the total convictions, N_{S}/(1−p), for the category. We now apply this calculation to the offender categories in the 1997 sentencing sample. This sample was in fact six oneweek samples from across the year, so the average total convictions for one group in one week is given by N_{S}/(6*(1−p)). Now the expected number of convictions, N_{E}, in a week for a single category is given by:
giving:
where N_{A} is the size of the active offender population.
(p.227) Substituting parameter estimates from the 1997 sentencing sample into A.36 gives estimates of:

• 156,800 active (in the sense that they will be convicted of one or more offences at some time in the future) highrisk/highrate offenders of whom approximately 133,700 would have been convicted in 1997;

• 604,700 active lowrisk/lowrate offenders but only approximately 127,400 would have been convicted in 1997;

• 438,900 active highrisk/lowrate offenders of whom only approximately 92,400 would have been convicted in 1997.
In 1997 we estimate that there were just under 1,200,000 individuals in England and Wales who would commit and be convicted of relatively serious (standard list) crime if appropriate opportunities presented themselves and who would do so at least once in the remainder of their lives. Almost 30 per cent of these individuals would have been convicted within 12 months. Not surprisingly the most active offenders are disproportionately responsible for convictions; the highrisk/highrate offenders represent 17 per cent of a cohort and 13 per cent of active offenders. They are responsible for 38 per cent of annual convictions and, if clearup rates are the same for all categories, the same proportion of crime. The lowrisk lowrate offenders make up 76 per cent of a cohort, 50 per cent of the active offender population and accrue only 36 per cent of annual convictions. The highrisk/lowrate group make up only 7 per cent of a cohort but 37 per cent of active offenders and 26 per cent of annual convictions. In these calculations we have taken no account of early career offending prior to (and including) the last conviction opportunity before the first actual conviction. Including early offending would almost certainly increase the disproportionality of crime committed by highrisk/highrate offenders.
If offender population estimates are required for specific crime types, drug dealing or burglary for example, then these can be obtained by substituting parameter values for specific crime types into Equation A.36.
Maximum Likelihood Estimation of the Recidivism Parameters
In Chapter 2 we derived an equation for the dualrisk recidivism model. Initially we used a graphical technique which fitted a straight line to the log of the conviction number frequency data (n > 6) from the 1953 cohort, subtracted the fitted line from the data to obtain the residuals and fitted a second straight line to these residuals. This procedure provided us with a good structural model of the data but it was unclear how well the model fitted. Visually the fit was almost unbelievably good but there was no direct measure of the sensitivity of the fit to the parameter values. It is also clear that the parameters are not independent of each other. A small change in p_{h} would give rise to changes in both α and p_{l}. The parameters quoted in (p.228) Chapter 2 were in fact jointly estimated using a maximum likelihood objective function in an iterative curve fitting procedure. The objective function was derived as follows:
In the cohort datasets there is one record for every conviction (court appearance) of each offender in the cohort sample. For each offender the convictions are numbered from 1, the first conviction, to the last conviction in the observation period. The likelihood of a record having conviction number n is simply the probability of n under our dualrisk recidivism model:
Where:
In a cohort dataset, x_{n} records have conviction number n and the likelihood of this is:
The likelihood of the whole dataset is given by the product of the likelihoods of the x_{n}s for each conviction number. Therefore:
and:
The parameters p_{h}, p_{l} and α were estimated by minimizing –loglik(data) in the fitting procedure. The proportion of variance accounted for by the model was over 99.9 per cent, an extremely high correlation between the model and the data. For the 1953 cohort data the maximum likelihood estimates were p_{h} = 0.840, p_{l} = 0.313 and α = 0.237. Because the parameters are jointly estimated, conventional confidence intervals for individual parameters are misleading as such intervals would represent a rectangular box around the maximum likelihood estimate (see Figure A.1). The true (p.229)
Notes:
(^{1}) In Chapter 3 we use this result in the form of the intervent time T which is equal to 1/ λ. ie selecting events at random with probability p results in a stream of random events with interevent time 1/p*λ = T/p.
(^{2}) This might be considered similar to a 95 per cent ‘confidence interval’ and indeed would be exactly equivalent if we were dealing with a two parameter multivariate normal distribution.