Jump to ContentJump to Main Navigation
Explaining Criminal CareersImplications for Justice Policy$

John F. MacLeod, Peter Grove, and David Farrington

Print publication date: 2012

Print ISBN-13: 9780199697243

Published to Oxford Scholarship Online: January 2014

DOI: 10.1093/acprof:oso/9780199697243.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see www.oxfordscholarship.com/page/privacy-policy).date: 19 October 2018

(p.210) (p.211) Appendix Mathematical Notes

(p.210) (p.211) Appendix Mathematical Notes

Explaining Criminal Careers
Oxford University Press


In the main body of this book we have described a quantitative theory of crime. The theory is based on extensive data collected by criminal justice agencies and as a result it might more accurately be described as a theory of conviction and reconviction, or crime as it is experienced by the criminal justice system. We have tried to make our description non-technical, relying mainly on graphical representations of both the data and our models to support our arguments. Inevitably we have had to include mathematical equations where necessary but have not fully explained the link between our basic assumptions stated in Chapter 3 and the mathematical formulation of the theory. If the theory is to be used in practical applications, like prison population forecasting or estimating the impact of policy, a more detailed exposition of the mathematics is needed. This Appendix is intended to provide the mathematical and statistical logic and understanding necessary to develop the theory and apply it.

The theory developed in Chapters 2 and 3, and in particular the mathematics of the theory, is based on the concept of constant probabilities. It is important therefore for us to make clear what we understand by probability. There are two schools of thought on probability: the Bayesian School and the Frequentist School. Bayesians see probability as a reflection of the state of knowledge concerning some future event, which is updated in the light of experiment or experience. In the absence of any empirical evidence a purely subjective probability is assigned as a prior probability which is then updated to form a posterior probability in the light of experience. This approach is ideally suited to activities like horse racing or stock market analysis but, in our view is less useful when studying the random events occurring in stable stochastic systems, for example in physics (radioactive decay), or in criminology (the large-scale behaviour of criminals in the population). Frequentists, on the other hand, define probability in terms of relative frequencies. Probabilities cannot be assigned to events without some pre-existing data or well founded theoretical reasoning concerning the system or population in which the event might occur. In practice Bayesians effectively use the Frequentist’s methods where the data is available.

(p.212) Constant Probability Systems

In the context of our theory, among the events that we consider is ‘one or more criminal convictions in a life time’. If we select an individual at random from the entire population, we define the probability of that individual being convicted within their lifetime as the ratio of the number of individuals who have been or will be convicted to the total number of individuals in the population. There is clearly a problem doing the calculation as we do not know the number of individuals who will be convicted in the future. We must therefore estimate this probability from the information that we do have. Our data source, the Offenders Index, enabled the selection of cohort samples of individuals born in one of four weeks in selected years, 1953, 1958, 1963, 1968, and 1973. In Chapter 2 we made estimates of the whole life conviction probability (the cohort criminality) for each of the cohorts. The whole life criminality was calculated using the age–crime model of Chapter 3 to estimate the number who had or would be convicted.

The criminality estimate was found to vary between the cohorts by more than could be accounted for simply by chance, suggesting that there is also some additional stochastic variation in criminality over time. It is also clear from Chapter 2 that male and female criminalities are very different both from each other and from the overall value. Thus we can only estimate the probability of an event for an individual in the context of the population to which that individual belongs. Probability is a property of the population and not of the individual. To some extent this chimes with the Bayesian view. If we do not know the gender of the individual we would assign the whole population probability and update that when the gender was known.

In developing our theory we make use of the concept of randomness. A random selection is made on the basis of no prior knowledge, except that the selection is from a known parent population. A random selection is assumed to have the same statistical properties as the parent population. Selecting on the basis of some property or characteristic would not be random, but having made the selection a new parent population would be defined from which a random selection could then be made. In the main text we frequently make non-random selections to create subsets (categories) of offenders with specific characteristics: males/females, serious/less serious offenders, etc. In these subsets the statistical structure is often similar but with different parameters, but not necessarily as both will depend on the selection criterion (conditioning).

We also make use of inferred subsets where collectively the subset (group or category) exhibits a frequency distribution which infers that the members of the subset share a common parameter value for the property that generates the distribution. The main examples of inferred categories are those with a common constant probability of reconviction (or desistance) (p.213) or a constant probability of offending (being convicted) in a given time interval. These inferred categories are said to be homogeneous with respect to the specified property. Moving up a level, where the parent population contains two or more of these inferred categories it is said to be heterogeneous with respect to the specified property.

In our analysis of recidivism, for each cohort, the parent population was the set of individuals born in one of four weeks in the cohort year who were convicted of one or more offences in the follow-up period. Recidivism is defined as the proportion of offenders with at least n convictions who are reconvicted. In the parent population the distribution of offence number was found to be heterogeneous with respect to recidivism probability. However we were able to create two inferred subsets (categories) that were homogeneous with respect to recidivism probability p, which was constant for all conviction numbers. Within the inferred categories the probability of at least n convictions was simply p ( n-1 ).

In our analysis of reconviction times it is not immediately obvious that we are again dealing with a constant probability process, or why we can infer this from the data. By way of explanation, let us assume that the probability p of an event (say an offender committing a crime) is constant in any/all small interval/s of time and on average there are λ events in unit time (say crimes per year). In a time interval t (years) there will on average be λ*t events. If there are n of our small time intervals in time t: then λ*t = n*p. By considering each of the n intervals as an independent opportunity for the event to occur we can calculate the probability of exactly zero, one, two etc events occurring in time t simply by using the binomial expansion:

( q + p ) n = q n + n p q n 1 + n ( n 1 ) p 2 q n 2 2 ! + n ( n 1 ) ( n 2 ) p 3 q n 3 3 ! + ....

where: q = 1-p

If we now assume that our small time intervals get smaller and smaller so that n tends to ∞ and p tends to zero and that this happens in such a way that n*p = l*t remains true and n*(n−1)*p2 = (n*p)2 etc then in the limit the right-hand side of Equation A.1 becomes:

1 + λ t + ( λ t ) 2 2 ! + ( λ t ) 3 3 ! + + ( λ t ) r r ! +

But this is the series expansion of el * t and, as this is a probability distribution, the sum should be equal to 1. Dividing each term by el * t satisfies this requirement, and successive terms in the expansion give the probability (p.214) of 0, 1, 2, 3 or in general r events in the time interval t. The probability of exactly r events in a time interval t is given by:

P ( r , t ) = e λ t ( λ t ) r r !

The right hand side of Equation A.3 is the general term in the Poisson distribution.

Of particular interest in studying the Poisson process is the time interval between successive events or put another way the probability of no events in the time interval t expressed as a function of t. Putting r = 0 in Equation A.3 gives the probability that the inter-event time is greater than t:

P ( t i m e t o n e x t e v e n t > t ) = e λ t

Equation A.4 is of the form of the survival time distributions used in the analyses of Chapter 2 with average survival time 1/l. If in the above derivations we assumed that we selected events at random with probability ps then the average number of selected events in time interval t would become λs*t = ps*l*t, ie λs = ps*l and, substituting λs for λ in Equations A.2 to A.4, the derivation would proceed in exactly the same way, resulting in a further Poisson process with parameter λs.

We use this result in Chapter 3 to account for the apparent disparity between the time to first conviction and the inter-conviction time distributions.1 This result also implies that convictions are in effect a random selection from crimes committed which also occur as random events in a Poisson process. It should be noted here that any one of the characteristics of the Poisson process implies the others. A negative exponential distribution of inter-event times implies a Poisson distribution of events in any given time interval which in turn implies a constant probability of an event occurring in any small interval of time within the observation period.

The Poisson process is a common feature of criminal career models (see eg Barnett et al 1987; Canela-Cacho et al 1997; Greenberg 1991; Maltz 1996). In criminal career research the rate of offending λ is often estimated from surveys of individuals which are heavily conditioned by the target sample of the survey, which could be anything from the general population to prison inmates. Piquero and Blumstein (2007) in their discussion of incapacitation make many references to λ. They suggest that ‘Rather than focusing on individual-level measurement of λ, it is more reasonable to direct attention to the distribution of λ among various populations’. (p.215) In estimating the distribution of λ for offenders in general, Greenberg (1991) assumed that the population was heterogeneous with respect to λ and that λ was distributed as a gamma variate leading to a Pareto distribution for the number of crimes in time t. Canela-Cacho et al (1997) also assumed heterogeneity for λ in the offender population. They assumed that the distribution of λ was the sum of r exponential distributions and r equal to 3 was required to fit data from the Rand surveys of prison inmates in the late 1970s.

In our model we also assume heterogeneity, dividing our parent population into just two categories which are homogeneous with respect to λ. This implies that individuals in our inferred categories commit crimes as a Poisson process. We estimate the Poisson rate λ from inter-conviction survival times. This estimation technique has a number of advantages. It automatically accommodates low individual λs, which is very important when we analyse first convictions. The ‘too many zeros’ problem (Chaikin and Rolph 1981) encountered when counting convictions in some short time period does not occur; desistance from crime does not influence the estimation of λ; and censorship, caused by the limit of the observation period, is clearly identifiable on the survival plot and can be compensated for.

However, we can derive a distribution for individual λs. An individual measurement of the number, r, of offences committed in unit time, say one year, by an individual offender can be considered as a random instance from the Poisson distribution of Equation A.3, with t = 1. The estimate of individual λ from that instance is simply r. Therefore Equation A.3 (with t = 1) gives the probability distribution of individual λs, ie a Poisson distribution of r with mean λ. Estimating λ from one year sub-samples for a birth cohort would of course involve the too many zeroes problem caused by desistance and, unless the one year samples were drawn from the whole observation period, the estimation process would also suffer from selection bias induced by the age–crime curve. To avoid these problems we have contented ourselves with the inter-conviction survival time method of estimation.

Allocation of Offenders to the Risk/Rate Categories

In Chapter 2 we derived the dual-risk recidivism model and the dual-rate survival time model, Equations 2.4 and 2.8 respectively. We repeat them here for ease of reference:

The risk equation

Y ( n ) = A ( a p h ( n 1 ) + ( 1 a ) p l ( n 1 ) )

The rate equation

S ( t ) = B ( b e λ h t + ( 1 b ) e λ l t )

(p.216) Where:

  • N is the conviction (court appearance) number,

  • Y(n) is the number of offenders in a cohort or cross-section sample with at least n convictions,

  • a is the number of convicted offenders in the cohort or cohort equivalent,

  • Ph is the probability that a high-risk offender will reoffend,

  • Pl is the probability that a low-risk offender will reoffend,

  • α is the proportion of high-risk offenders,

  • S(t) is the number of reconvictions of offenders surviving t years from their previous conviction,

  • B is the total number of reconvictions in the sample,

  • λh is 1/(mean time to next conviction) for high-rate offenders,

  • λh is 1/(mean time to next conviction) for low-rate offenders,

  • b is the proportion parameter for rate.

From Equation A.5 the total number of convictions (court appearances) sustained for a cohort is given by:

Y t o t a l = n = 1 Y ( n )

This is expanded to:

Y t o t a l = n = 1 ( A a p h ( n 1 ) ) + n = 1 ( A ( 1 a ) p l ( n 1 ) )

For a cross-section, Equation, A.7 represents the total number of offenders in the sample. Equation A.8, summing over the range of n = 2 to ∞, gives the number of reconvictions for the cohort or the number of offenders with more than one conviction for the cross-section. The summations in Equation A.8 separately provide the estimates for the high- and low-risk offender categories. It is assumed that all low-risk offenders are low-rate but that high-risk offenders can be either high or low-rate. These assumptions lead to the following relationships:

The total number of reconvictions:

S t o t a l = n = 2 ( A a p h ( n 1 ) ) + n = 2 ( A ( 1 a ) p l ( n 1 ) )

The number of high-risk/low-rate reconvictions:

S h l = A a n = 2 p h ( n 1 ) b S t o t a l

The number of low-risk/low-rate reconvictions:

S l l = A ( 1 a ) n = 2 ( p l ( n 1 ) )

(p.217) And the number of high-risk/high-rate reconvictions:

S h h = b S t o t a l

For a cohort these can be translated into numbers of offenders as follows.

In the low-risk/low-rate category:

N l l = Y ( 1 ) ( 1 a )

In the high-risk/high-rate category:

N h h = Y ( 1 ) a S h h S h h + S h l

In the high-risk/low-rate category:

N h l = Y ( 1 ) a S h l S h h + S h l )

An Alternative Modelling Approach

In Chapter 3 we derived equations for the age–crime (conviction) curve in which we explicitly modelled the apparent rise in crime between the ages of 10 and 18. This approach worked well for first convictions but led to equations requiring numerical rather than analytic solutions for subsequent convictions. The problem was caused by the explicit modelling of the rise in crime. We now consider a system in which we assume that crime itself does not vary with age and in which crimes are committed randomly and in direct proportion to the number of active offenders. In what follows, an ‘offence’ refers to ‘a conviction opportunity’, ie where the offender is caught and could be, but isn’t necessarily, convicted. We also assume that the population is homogeneous with respect to offending.

Let Nr(t) be the number of offenders in the population responsible for r offences in the period up to time t. We start our process at time t = 0 with N0(0) potential offenders who have no previous offences, thus Nr(0) = 0 for all non-zero r. This is the situation for all potential offenders on their 10th birthday as, by definition, under-10s cannot commit crime. As offenders commit each offence they move from the sub-population with r offences to that with r + 1 offences and this occurs at rate λ*Nr(t). This system can be described as an infinite series of first order linear differential equations:

d N r ( t ) d t = λ N r 1 ( t ) λ N r ( t ) r 0

(p.218) The general solution to this equation is:

N r ( t ) = N r ( 0 ) e λ t + λ e λ t 0 t e λ τ N r 1 ( τ ) d τ

Now for r = 0 the term Nr-1(t) is undefined in the system and does not exist, thus:

N 0 ( t ) = N 0 ( 0 ) e λ t

For r > 0, Nr(0) = 0, and the integral term in Equation A.17 evaluates to:

N 1 ( t ) = N 0 ( 0 ) λ t e λ t for r = 1


N 2 ( t ) = 1 2 N 0 ( 0 ) ( λ t ) 2 e λ t for r = 2

and in general:

N r ( t ) = N 0 ( 0 ) ( λ t ) r e λ t r !

Thus the number of offenders with exactly r offences at time t is the product of the total number of potential offenders in the population, N0(0), and the probability of r events in time t in a Poisson process with mean λ (see Equation A.3). We can now derive an expression for the rate of first offences (equal to the rate of decline in the number of offenders with no offences) as a function of t (age), either by substituting for N0(t) from Equation A.18 into Equation A.16 with r = 0, or by differentiating A.18 to give:

d N 0 ( t ) d t = λ N 0 ( 0 ) e λ t

But this suggests that at t = 0 the offending rate is λ*N0(0) whereas we know that at age 10 (t = 0) the offending rate should be very close to zero. However, if we assume that early conviction opportunities are ignored and do not result in conviction, the probability distribution of the first recorded offence/conviction as a function of time would be some combination of the probability distributions of the second, third, etc offences as functions of time. The probability density of the rth offence occurring at time t is simply derived from the probability of r − 1 offences in time t giving: (p.219)

P d f ( o f f e n c e ( r ) @ t i m e ( t ) ) = λ ( λ t ) r 1 e λ t ( r 1 ) ! = λ ( λ t ) r 1 r λ t Γ ( r )

For integer values of r: G(r) = (r-1)!

This is the gamma distribution, which we introduced in Chapter 4 as an approximation to account for the rise in crime in the early part of the criminal career.


The negative exponential distribution has the property of being memory-less, in the sense that what happens after time t is independent of what happened before. We can therefore choose to start our (re-) conviction process at any time. If in Equation A.16 we assume that, instead of moving offenders on into the next offence-count subset, we return recidivists into the active offender pool and move the desisters into a non-offender pool then the differential equation becomes:

d N ( t ) d t = λ N ( t ) + p λ N ( t ) = λ ( 1 p ) N ( t )

This has the solution:

N ( t ) = N ( 0 ) e λ ( 1 p ) t

Where N(t) is the number of (active) offenders still offending at time t and N(0) is the number who will offend at some time after we choose to start the process. From this we can see that the average residual career length of active offenders, from our arbitrary start time, is 1/(λ*(1−p)). In this last expression, the operative word is active and this is very important when considering incapacitation. Avi-Itzhik and Shinnar (1973) and Shinnar and Shinnar (1975) in their models of crime made many basic assumptions in common with us. However, in estimating incapacitation, like us they assumed that criminal career length is exponentially distributed, but they also implicitly assumed that for an individual the career is fixed in time, which implies that offenders could terminate their careers whilst incarcerated and those still active on release would have a reduced residual career length (ie active offending time = career length − time in prison). Their result that incapacitation reduces crime relies on this assumption; the dependency on the invariance of criminal career length with respect to CJS interventions is not made explicit in their analysis. (p.220) Using their formulation and parameter estimates from UK data, Tarling (1993, pp 143–146) estimated that the extant prison populations in England and Wales in 1975, 1980, and 1986 had reduced recorded crime by between 5.8 per cent and 9 per cent. But as we show below even these modest estimates would appear to be gross exaggerations.

As reported in Chapter 5, from the 1953 cohort, for high-risk offenders with at least one custodial sentence and more than four convictions, the proportion of reconvictions after custody was 84.8 per cent compared with 83.1 per cent after non-custodial disposals. Similarly, from Table 5.2 the four year reconviction proportions after custody and supervision were 64.4 per cent and 62.5 per cent respectively. In both of these situations, where recidivism risk and seriousness is controlled for, we would have expected some 11 per cent fewer reconvictions after custodial, compared with non-custodial, sentences for fixed in time careers. This is because, for high-rate offenders our estimated residual career length would be about five years and average prison time served about seven months during which time 11 per cent of offenders, who otherwise would have reoffended, should desist and not reoffend on release, which should result in 11 per cent fewer reconvictions than for non-custodial disposals. Tarling’s shorter residual career length estimates would result in even higher reductions in recidivism. What was actually observed is that recidivism tends to be higher following custody than for non-custodial sentences. There is therefore no evidence in these data that criminal careers terminate during incarceration rather than at the point of conviction, in fact to the contrary. We therefore conclude that there is no overall crime reduction brought about by incapacitation except where offenders are incarcerated for a large proportion of their active lives.

In our theory we assume that the career termination decision is made at the time of, and as a result of, conviction. This assumption implies that on release from prison the proportion p, destined to reoffend, is the same as for any other disposal, as is the residual career length. Released prisoners simply rejoin the active offender pool in which the residual career length is distributed exponentially with the same constant parameter value. We now consider incapacitation from the viewpoint of our theory. In a simplified situation where the birth rate is constant over time and there is a single offender group in which criminality = c, incarceration probability = pc, and conviction rate = λ are also all constant, we can create a simple model of the impact of prison on crime.

In our theory crime is proportional to the active criminal population and we therefore need to calculate the impact of prison on the number of active offenders. With a constant birth rate, the age–crime curve for all cohorts is the same. If for the moment we ignore crimes committed prior to the first conviction opportunity and start our process at age 10 or at the conviction opportunity before the first conviction, whichever is later, we can calculate, using Equation A.25, the rate of conviction for the cohort at (p.221) time t. As all of the cohorts have identical rate of conviction profiles, we can use this to calculate the overall number of convictions by integrating A.25 over t = 0 to ∞:

N ( ) = N ( 0 ) 0 e λ ( 1 p ) t d t = N ( 0 ) λ ( 1 p )

Here N(0) is the number of offenders in one homogeneous category of a cohort sample and N(∞) is the lifetime total number of convictions sustained by the category, but, because all cohorts are equivalent, N(0) and N(∞) are also equal to the number of new, NN(t), and active, NA(t), offenders respectively in the equivalent category of active offenders at time t. If the system is in equilibrium, the number of offenders giving up crime, NA(t)*(1−p), will be balanced by the number of new offenders, NN(t), being convicted for the first time and hence entering the system (NN(t) is a constant because birth rate is assumed constant). This is trivially true for empty prisons. If we now incarcerate a proportion pc of those convicted and sentence them to an average time served 1s then the prison population Np(t) would build up until the prison element of the system was also in equilibrium, thus the rate of change in the active offender population is given by:

d N A ( t ) d t = [ N N ( t ) λ ( 1 p ) N A ( t ) ] + p [ λ s N p ( t ) p c λ N A ( t ) ]

During the build-up of the prison population the right hand bracketed (prison) term in A.27 will become negative as more individuals enter prison than leave; this will cause the active population NA(t) to reduce, causing the left hand bracketed (active) term to become more positive; this will cause the rate of decrease of NA(t) over time to slow down and change sign to become an increase; eventually the prison population will stabilize and the prison and active terms will both return to zero. In the steady state the active population is given by:

N A ( t ) = N N ( t ) λ ( 1 p )

But, remembering that in a stable population NA(t) ≡ N() and NN(t) ≡ N(0), this is the same situation as existed when the prisons were empty; see Equation A.26. Thus, in the steady state, the active population, NA(t), and by inference crime, is independent of the actual prison population. However, following a step change in custodial sentencing policy, both the active population and crime will reduce if the prison population is increasing and increase if it is reducing. The changes in active population (crime (p.222) rate) are transient but the changes in prison population persist, unless of course the prison population increases/decreases indefinitely. In the steady state the prison population, Np(t) is given by:

N p ( t ) = p c λ λ s N A ( t )

The steady state prison population is therefore proportional to the product of the probability of a custodial sentence and sentence length (time served).

If we now assume that the prison population is increased at a constant linear rate r (extra inmates per year) then Equation A.27 would become:

d N A ( t ) d t = [ N N ( t ) p r λ ( 1 p ) N A ( t ) ]

Which has the solution:

N A ( t ) = N N ( t ) p r λ ( 1 p ) ( 1 e λ ( 1 p ) t )

Over time the exponential term tends to zero, thus in the steady state:

N A = N A ( 0 ) p r λ ( 1 p )

The right hand side of Equation A.32 is independent of t and therefore constant, thus there is an ongoing reduction in the active population of:

Δ N A = p r λ ( 1 p )

The constant rate of increase in prison population can be expressed as:

r = λ N A ( 0 ) Δ p c

Giving the proportionate change in the active population of:

Δ N A N A ( 0 ) = p Δ p c 1 p

If the probability of custody, pc, is increased in such a way as to result in a constant annual increase in the prison population then the active population will be reduced by a steady state constant proportion.

(p.223) Although the derivations above assume only one risk category, we can accommodate different risk/rate categories by simply summing the results over all homogeneous categories. Also the real situation is more complicated, the active population is determined by demographics (the birth rate at each age weighted by the normalized age–crime curve) and policies are subject to change over time potentially influencing any or all of the parameters. But the principles still hold.

Over the six-year period from 1993 to 1999 the prison population in England and Wales increased by about 50 per cent, a linear change of 8.3 per cent, of the initial value per year. With an overall custody rate of 14 per cent this results in a steady state Δpc = 0.012, and a change in active high and low-risk populations of −6.6 per cent and −0.6 per cent respectively. Our analysis suggests that overall about half of crime is committed prior to the first conviction and that the risk group proportions in the offender population are 43 per cent high-risk and 57 per cent low-risk. From these estimates and Equation A.34 the percentage change in recorded crime, due to increasing the prison population, during period 1993–1999 would have been in the region of −1.5 per cent.

Steady State Solutions

The derivations above, concerning constant probability systems and the alternative approach to generating the models, demonstrate that, for homogeneous categories of offenders, the models derived in Chapters 3 and 4 follow directly from the basic assumptions of our theory. The distribution fitting and the goodness of fit achieved in Chapter 2 strongly support our assumptions of combinations of homogeneous categories in both recidivism probabilities and offending rates. The Poisson processes derived above (from both constant probability processes and proportional offending approaches) implicitly assume 100 per cent recidivism but we can incorporate recidivism probabilities less than one simply by multiplying the distributions for the rth offences by p ( r-1 ) leading to the age–crime models of Chapters 2 and 3. Recidivism was also incorporated in our simplified model of active and incarcerated populations derived above in our discussion of incapacitation.

Of interest to planners and policy makers are estimates of overall crime/conviction rates and which aspects of the process are amenable to policy interventions. Equation A.28 showed that the size of the active population is proportional to the number of first convictions and inversely proportional to the rate of desistance λ*(1−p). First convictions at time t are proportional to the weighted sum of birth rates for each age at time t. In A.28 λ is the conviction rate and, as discussed earlier, convictions are a sample of offences committed. Thus doubling the probability of conviction given an offence should halve crime if (cumulatively) conviction truly is the cause of desistance. The factor 1/(1−p) which occurs in A.28, and other (p.224) formulae, is simply the sum of the series n = 0 p n and is the average numberof convictions for members of the risk category with reconviction probability p.

Thus reducing p for the high-risk category by 10 per cent from 0.84 to 0.76 would reduce the future crime of those offenders by about 40 per cent and crime overall by about 13 per cent. Reducing recidivism for low-risk offenders would have a much smaller impact as relatively few are convicted more than once or twice. The number of first convictions, NN(t) is proportional to the birth rate, B(t, age), and for each group NNg(t) = B(t, age)*c*qg. The proportionality parameter is made up of population criminality c and the proportion of offenders in each category qg. Overall crime can potentially be reduced by reducing criminality and/or moving offenders from high recidivism to low-recidivism risk categories. Early intervention programmes and more effective informal and pre-conviction disposals could possibly make these changes.

In the above we have assumed that criminal careers start at the first conviction. Although our estimates of crime will implicitly include some offences prior to the first conviction, the majority of early offences will be excluded. From our two modelling approaches we can estimate the extent of crime by unconvicted offenders as follows: by numerically integrating Equation 3.4, with C set equal to 1, over the age range 10 to 70 we obtain an estimate of the average number of offender years between age 10 and the first conviction for each of the rate categories. Multiplying this by λ for the category results in an estimate of the average number of conviction opportunities which have been ignored, otherwise dealt with, or missed due to the reduced probability of detection prior to being known to the police. Table A.1 gives estimates from the three category model of Chapters 2 and 3.

Our approximate model explicitly assumes that early conviction opportunities are ignored and the numbers of these ignored opportunities are estimated in the fitting process. Average values over all cohorts are quoted in Table 4.5 for males and females separately. These approximate model estimates suggest that about 42 per cent of crime is committed by offenders prior to their first convictions for males and about 38 per cent for females.

The three category model of Chapter 3 is likely to overestimate crimes because, for example, acts like playground fights, although strictly assaults, would not generally be regarded as crime but may be the forerunner of more serious violence. The approximate model estimates, on the other hand, are likely to be underestimates as crimes committed by 10-year-olds are omitted completely and the gamma approximation for the high category requires a lower λ, to achieve the fit, which is corrected for by the temporal adjustment δ for actual convictions but not for the ignored conviction opportunities. Both estimates are, however, speculative, as we have very limited information on unsolved crimes and who has committed them. We do (p.225)

Table A.1 Estimates of the number of conviction opportunities and the proportion of crime committed prior to the first conviction, three group model

Integral of Equation 3.4 10 70 ( 1 + e α ( t c ) ) P f λ α d t


Average No. of convictions

P f λ α

Average offending years prior to 1st conviction

Conviction opportunities prior to 1st conviction

Proportion of crime prior to 1st conviction

High-rate high-risk







Low-rate low-risk







Low-rate high-risk









Note: Estimates are for the 1953 cohort from the Offenders Index.

(p.226) know that clear-up rates over the period of the cohort samples have been between 35 per cent and 20 per cent and that a significant proportion of crime remains unreported. Also our analysis assumes that most crime is committed by offenders who are eventually convicted. For the purpose of making conservative estimates of the impact of offender based crime reduction policy initiatives, we assume that about 50 per cent of crime is committed by unknown/unconvicted offenders at the time of commission.

Estimating the Active Offender Population Size

The definition of an active offender used in our theory differs from that used in most criminal career research, in that our offenders are active from the age of 10 until they desist. We have no intermittency because being active is defined by the constant probabilities of continuing to offend at the constant Poisson rate. Our analysis suggests that offenders do actually desist. This is because, for many offenders, the time between their last recorded conviction and the end of the observation period is long compared with the average inter-conviction time. If these offenders had continued to offend as before, the proportion caught and convicted would have approached 1. Also, offenders who are reconvicted appear in precisely the numbers and at the age predicted by the recidivism and rate parameters. Occam’s razor favours this simple explanation over the rather convoluted changes in λ that would otherwise be required. Such convoluted changes are also not supported by the data.

In a cross-section sample, like the 1997 sentencing sample of Chapter 3, we can estimate the high and low-rate parameters, λh and λl from an analysis of time since the previous conviction. And from the conviction number frequencies we can estimate ph, pl and the proportions of offenders in the risk/rate categories. For each homogeneous category of offenders with rate parameter λ and reconviction probability p: we can calculate the cohort equivalent category size in the sample, NS, and therefore the total convictions, NS/(1−p), for the category. We now apply this calculation to the offender categories in the 1997 sentencing sample. This sample was in fact six one-week samples from across the year, so the average total convictions for one group in one week is given by NS/(6*(1−p)). Now the expected number of convictions, NE, in a week for a single category is given by:

N E = N A ( 1 e λ 52 )


N A = N E ( 1 e λ 52 )

where NA is the size of the active offender population.

(p.227) Substituting parameter estimates from the 1997 sentencing sample into A.36 gives estimates of:

  • 156,800 active (in the sense that they will be convicted of one or more offences at some time in the future) high-risk/high-rate offenders of whom approximately 133,700 would have been convicted in 1997;

  • 604,700 active low-risk/low-rate offenders but only approximately 127,400 would have been convicted in 1997;

  • 438,900 active high-risk/low-rate offenders of whom only approximately 92,400 would have been convicted in 1997.

In 1997 we estimate that there were just under 1,200,000 individuals in England and Wales who would commit and be convicted of relatively serious (standard list) crime if appropriate opportunities presented themselves and who would do so at least once in the remainder of their lives. Almost 30 per cent of these individuals would have been convicted within 12 months. Not surprisingly the most active offenders are disproportionately responsible for convictions; the high-risk/high-rate offenders represent 17 per cent of a cohort and 13 per cent of active offenders. They are responsible for 38 per cent of annual convictions and, if clear-up rates are the same for all categories, the same proportion of crime. The low-risk low-rate offenders make up 76 per cent of a cohort, 50 per cent of the active offender population and accrue only 36 per cent of annual convictions. The high-risk/low-rate group make up only 7 per cent of a cohort but 37 per cent of active offenders and 26 per cent of annual convictions. In these calculations we have taken no account of early career offending prior to (and including) the last conviction opportunity before the first actual conviction. Including early offending would almost certainly increase the disproportionality of crime committed by high-risk/high-rate offenders.

If offender population estimates are required for specific crime types, drug dealing or burglary for example, then these can be obtained by substituting parameter values for specific crime types into Equation A.36.

Maximum Likelihood Estimation of the Recidivism Parameters

In Chapter 2 we derived an equation for the dual-risk recidivism model. Initially we used a graphical technique which fitted a straight line to the log of the conviction number frequency data (n > 6) from the 1953 cohort, subtracted the fitted line from the data to obtain the residuals and fitted a second straight line to these residuals. This procedure provided us with a good structural model of the data but it was unclear how well the model fitted. Visually the fit was almost unbelievably good but there was no direct measure of the sensitivity of the fit to the parameter values. It is also clear that the parameters are not independent of each other. A small change in ph would give rise to changes in both α and pl. The parameters quoted in (p.228) Chapter 2 were in fact jointly estimated using a maximum likelihood objective function in an iterative curve fitting procedure. The objective function was derived as follows:

In the cohort datasets there is one record for every conviction (court appearance) of each offender in the cohort sample. For each offender the convictions are numbered from 1, the first conviction, to the last conviction in the observation period. The likelihood of a record having conviction number n is simply the probability of n under our dual-risk recidivism model:

P ( n ) = 1 C [ a p h n 1 + ( 1 a ) p l n 1 ]


C = 1 [ a p h n 1 + ( 1 a ) p l n 1 ] = 1 p h + a ( p h p l ) ( 1 p h ) ( 1 p l )

In a cohort dataset, xn records have conviction number n and the likelihood of this is:

l i k e l i h o o d ( x n ) = P ( n ) x n

The likelihood of the whole dataset is given by the product of the likelihoods of the xns for each conviction number. Therefore:

l i k e l i h o o d ( d a t a ) = n = 1 N ( P ( n ) x n )
where: N is the highest recorded conviction number in the data set.


log l i k ( d a t a ) = x n 1 N [ L n ( ( 1 p h ) ( 1 p l ) 1 p h + a ( p h p l ) ) + L n ( a p h n + ( 1 a ) p l n ) ]

The parameters ph, pl and α were estimated by minimizing –loglik(data) in the fitting procedure. The proportion of variance accounted for by the model was over 99.9 per cent, an extremely high correlation between the model and the data. For the 1953 cohort data the maximum likelihood estimates were ph = 0.840, pl = 0.313 and α = 0.237. Because the parameters are jointly estimated, conventional confidence intervals for individual parameters are misleading as such intervals would represent a rectangular box around the maximum likelihood estimate (see Figure A.1). The true (p.229)

Appendix Mathematical Notes

Figure A.1 Likelihood surface for dual-risk recidivism model

Source: Parameter estimates for the 1953 cohort, Offenders Index.

Note: The surface is equivalent to the more conventional 95% confidence intervals for the parameters.

confidence interval is represented by the surface contained within the box which is defined by parameter triplets (points) resulting in the likelihood ratio (likelihood of (triplet) point on surface/maximum likelihood) = 0.05 (ie 20 times less likely than the estimate),2 points outside the surface are even less likely.


(1) In Chapter 3 we use this result in the form of the inter-vent time T which is equal to 1/ λ. ie selecting events at random with probability p results in a stream of random events with inter-event time 1/p*λ = T/p.

(2) This might be considered similar to a 95 per cent ‘confidence interval’ and indeed would be exactly equivalent if we were dealing with a two parameter multivariate normal distribution.