(p.667) Appendix A4 The Mathematics of Darwinian Systems
(p.667) Appendix A4 The Mathematics of Darwinian Systems
(p.667) Appendix A4
The Mathematics of Darwinian Systems
Abstract: Optimization is studied as the interplay of selection, recombination and mutation. The underlying model is based on ordinary differential equations (ODEs), which are derived from chemical kinetics of reproduction, and hence it applies to sufficiently large — in principle infinite — populations. A flowreactor (continuously stirred tank reactor, CSTR) is used as an example of a realistic open system that is suitable for kinetic studies. The mathematical analysis is illustrated for the simple case of selection in the CSTR. In the following sections the kinetic equations are solved exactly for idealized conditions. A brief account on the influences of finite population size is given in the last section.
1. Replication in the flowreactor
Replication and degradation of molecular species I _{i}, (i=1,2,&,n) in the flowreactor (Figure A4.1) follows the mechanism
The variables a(t), b(t), and c_{i}(t) are concentrations, [A]=a, [B]=b and [I_{i}] = c_{i}, which are defined by a=N_{A}/(V N_{L}), b=N_{B}/(V N_{L}), and c_i= N_{i}/(V N_{L}) where V is the volume and N_{L} is Avogadro's number, the number of particles in one mole of substance. The particle numbers N are discrete and non‐negative quantities, whereas concentrations are assumed to be continuous because N_{L}=6.023 × 10^{23} is very large.^{1}
The equations (A4.2) sustain (n+1) stationary states fulfilling the conditions $\u0227=0,\text{}\u1e03=0,\text{}{\u010b}_{i}=0$ for i=1,2,&,n. Every stationarity conditions for one particular class of replicating molecules I _{i}
has two solutions (i) ${\mathrm{c\u0304}}_{i}=0$ and $\u0101=({d}_{i}+\text{}r)/{k}_{i}$. Since any pair of type (ii) conditions is incompatible,^{2} only two types of solutions remain: (i) ${\mathrm{c\u0304}}_{i}=0\forall i=1,\text{}2,\mathrm{\dots},\text{}n$, the state of extinction, because no replicating molecule ‘survives’ and (ii)n states with ${\mathrm{c\u0304}}_{j}=\text{}({a}_{0}/({d}_{j}\text{}+\text{}r)\text{}\text{}1/{k}_{j})\text{}r$ and ${\mathrm{c\u0304}}_{k}=\text{}0\forall k\ne j$. Steady‐state analysis through linearisation and diagonalisation of the Jacobian matrix at the stationary points yields the result that only one of the n states is asymptotically stable, in particular the one for the species I _{m} that fulfils
Accordingly, species I _{m} is selected and we call this state the state of selection. The proof is straightforward and yields simple expressions for the eigenvalues λ_{k} (k=0,1,&,n) of the Jacobian matrix when degradation is neglected, d_{j}=0 (j=1, 2, &, n). For the state of extinction we find
(p.670) It is asymptotically stable as long as r 〉 k_{m} a_{0} is fulfilled. If r 〈 k_{m} a_{0} then r 〈 k_{j}a_{0} ∀ j≠ m is valid by definition because of the selection criterion (A4.3) for d_{j}=0. For all other n pure states $\{{\mathrm{c\u0304}}_{i}={a}_{0}r/{k}_{j},\text{}{\mathrm{c\u0304}}_{j}=0,j\ne i\}$ the eigenvalues of the Jacobian are:
All pure states except the state at which I _{m} is selected (state of selection: c_{m}=a_{0}r/k_{m}, c_{j}=0, j=1, &, n, j≠ m) have at least one positive eigenvalue and are unstable. Therefore we observe indeed selection of the molecular species with the largest value of k_{j} (or k_{j} a_{0} −d_{j}, respectively), because only at ${\mathrm{c\u0304}}_{m}\ne 0$ are all eigenvalues of the Jacobian matrix negative.
It is worth indicating that the dynamical system (A4.2) has a stable manifold $\u0233=\u0101+\mathrm{b\u0304}+{\mathrm{\sum}}_{i=\text{1}}^{n}{\mathrm{c\u0304}}_{i}={a}_{0}$ since $\u1e8f=\u0227+\u1e03+{\mathrm{\sum}}_{i=1}^{n}{\u010b}_{i}=({a}_{0}y)r$. The sum of all concentrations, y(t), follows a simple exponential relaxation towards the steady state $\u0233={a}_{0}$:
with the flow rate r being the relaxation constant.
2. Selection
As shown in the previous section the basis of selection is reproduction in the form of a simple autocatalytic elementary step, (A)+I _{i}→2I _{i}. We idealise the system by assuming that the material consumed in the reproduction process, A, is present in excess. Therefore, its concentration is constant and can be subsumed as a factor in the rate constant: f_{i}=k_{i} [A]. In addition we neglect the degradation terms by putting d_{i}=0 ∀ i. In terms of chemical reaction kinetics selection based on reproduction without recombination and mutation is described by the dynamical system
As before the variables c_{i}(t) are the concentrations of the genotypes I_{i}, the quantities f_{i} are reproduction rate parameters corresponding to overall replication rate constants in molecular systems or, in general, the fitness values of the genotypes. A global flux Փ(t) has been introduced in order to regulate the growth of the system. (p.671) Transformation to relative concentrations, x_{i}=c_{i}/c, and adjusting the global flux Փ(t) to zero net‐growth simplifies further analysis:^{3}
The relative concentrations x_{i}(t) fulfil ${\mathrm{\sum}}_{j=1}^{n}{x}_{j}(t)=1$ and the flux Փ(t) is the mean growth rate of the population. Because of this conservation relation only n−1 variables x_{i} are independent. In the space of n Cartesian variables, ℝ^{n}, the x variables represent a projection of the positive orthant onto the unit simplex (Figure A4.2)
The simplex ${\text{S}}_{n}^{\left(1\right)}$ is an invariant manifold of the differential equation (A4.7). This means that every solution curve x(t)=(x_{1}(t),x_{2}(t),&,x_{n}(t) that starts in one point of the simplex will stay on the simplex forever.
are fulfilled. According to this assumption all rate parameters are strictly positive — a condition that will be replaced by the weaker one f_{i}≥0∀i≠ k∧ f_{k}〈0 — and the concentration variables are non‐negative quantities. Stability of the simplex requires that all solution curves converge to the unit simplex from every initial condition, $t\to \infty ({\mathrm{\sum}}_{i=1}^{n}{x}_{i}(t))=1$. This conjecture is proved readily: from ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}(t)=c(t)$ follows
For $\u010b=0$ we find the two stationary states: a saddle point at $\mathrm{c\u0304}=0$ and an asymptotically stable state at $\mathrm{c\u0304}=1$. There are several possibilities to verify its asymptotic stability, we choose to solve the differential equation and find:
Starting with any positive initial value c(0) the population approaches the unit simplex. When it starts on ${S}_{n}$ it stays there even in the presence of fluctuations.^{4} Therefore, we restrict population dynamics to the simplex without loosing generality and characterise the state of a population at time t by the vector x(t) which fulfils the ${\text{L}}^{\left(1\right)}$ norm ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}(t)=1$.
The necessary and sufficient condition for the stability of the simplex, Փ(t)〉0, enables us to relax the condition for the rate parameters f_{i}. In order to have a positive flux it is sufficient that one rate parameter is strictly positive provided the corresponding variable is non‐zero:
For the variable x_{k} it is sufficient that x_{k}(0)〉0 holds because x_{k}(t)≥ x_{k}(0) when all other products f_{j}xj were zero at t=0. This relaxed condition for the flux is important for the handling of lethal mutants with f_{j}=0.
(p.673) The time dependence of the mean fitness or flux Փ is given by
Since a variance is always non‐negative, equation (A4.9) implies that Փ(t) is a non‐decreasing function of time. The value {f}=0 refers to a homogeneous population of the fittest variant, and then Փ(t) cannot increase any further. Hence it has been optimised during selection.
It is also possible to derive analytical solutions for equation (A4.7) by a transform called integrating factors ([5], p.322ff.):
Insertion into (A4.7) yields
where we have used z_{i}(0)=x_{i}(0) and the condition ${\sum}_{i\text{=1}}^{n}\text{}{x}_{i}\text{}\text{=}\text{}\text{1}$. The solution finally is of the form
Under the assumption that the largest fitness parameter is non‐degenerate, {f_{i}; i=1,2,&,n} = f_{m}〉f_{i}∀i ≠ m, every solution curve fulfilling the initial condition x_{i}(0)〉0 approaches a homogenous population: ${\mathrm{lim}}_{t\to \infty}{x}_{m}\left(t\right)={\mathrm{x\u0304}}_{m}=1$ and ${\mathrm{lim}}_{t\to \infty}{x}_{i}\left(t\right)={\mathrm{x\u0304}}_{i}=0\forall i\ne m$, and the flux approaches the largest fitness parameter monotonously, Փ(t)→ f_{m} (examples are shown in Figure A4.3).
Qualitative analysis of stationary points and their stability yields the following results: (p.674)

(i) The only stationary points of equation ((A4.7)) are the corners of the simplex, represented by the unit vectors e_{k} = {x_{k} = 1, x_{i} = 0∀i ≠ k},

(ii) only one of these stationary points is asymptotically stable, the corner where the mean fitness Փ adopts its maximal value on the simplex ${e}_{m}:{\mathrm{x\u0304}}_{m}=1$ defined by {f _{i}; i=1,2,&,n} = f_{m}〉f_{i} ∀i≠ m), one corner is unstable in all directions, a source where the value of Փ is minimal ${e}_{s}:{\mathrm{x\u0304}}_{s}=1$ defined by {f _{i}; i=1,2,&,n}= f_{s}〈_{fi} ∀ i≠ s), and all other n−2 equilibria are saddle points, and

(iii) since x_{i}(0)=0 implies x_{i}(t)=0 ∀ t〉0, every subsimplex of ${\text{S}}_{n}^{\left(1\right)}$ is an invariant set, and thus the whole boundary of the simplex consists of invariant sets and subsets down the corners (which represent members of class ${\text{S}}_{1}^{\left(1\right)})$.
3. Generalised gradient systems
Although Փ(t) represents a Liapunov function for the dynamical system (A4.7) and its existence is sufficient for the proof of global stability for selection of the fittest being the species with the largest f_{k} value, it is of interest that the differential equation ((A4.7)) can be interpreted as a generalised gradient system [6, 7, 8] through the introduction of a Riemann‐type metric based on a generalised scalar product defined at position x
In a gradient system,
the potential V(x) increases steadily along the orbits,
and it does so at a maximal rate, since the velocity vector, being equal to the gradient, points at the position of maximum increase of V. In other words, the velocity vector points in the direction of steepest ascent, which is always orthogonal to the constant level sets of V. In gradient systems we observe optimisation of the potential function V(x) along all orbits.
For the purpose of illustration we choose an example, equation (A4.12) with
(p.675) The time derivative of the potential function is obtained by
The potential is increasing until it reaches asymptotically the maximal value V=0. Solutions of the differential equation are computed by integration: x_{i}(t)=x_{i}(0) exp (−f_{i}t) ∀ i=1,&,n. The result derived from dV}/dt is readily verified, since lim_{t→∞} x_{i}(t)=0 ∀ i=1,&,n and hence lim_{t→∞} V(t)=0.
Equation ((A4.7)), on the other hand, is not an ordinary gradient system: it fulfills the optimisation criterion but the orbits are not orthogonal to the constant level sets of V(x)=Փ(x) (see Figure A4.3). In such a situation, it is often possible to achieve the full gradient properties through a generalisation of the scalar product that is tantamount to a redefinition of the angle on the underlying space, ℝ^{n} or ${\mathbb{R}}^{n}\text{}\text{or}{\text{S}}_{n}^{\left(1\right)}$, respectively. We shall describe here the formalism by means of the selection equation ((A4.7)) as an example.
The potential function is understood as a map, $V\left(\text{x}\right):{\mathbb{R}}^{n}\Rightarrow {\mathbb{R}}^{1}$. The derivative of the potential DV_{(x)} is the unique linear map $L:{\mathbb{R}}^{n}\Rightarrow {\mathbb{R}}^{1}$ that fulfils for all $\text{y}\in {\mathbb{R}}^{n}$:
where for o(y) holds o(y)/║ y ║ → 0 as ║ y ║→ 0. To L corresponds a uniquely defined vector l ∈ℝ^{n} such that 〈l, y〉=L(y) where 〈*,*〉 in the conventional Euclidean inner product is defined by $\u3008\text{u,v}\u3009={\mathrm{\sum}}_{i=1}^{n}{u}_{i}{v}_{i}$ for u, v∈ℝ^{n}. This special vector ℓ is the gradient of the potential V, which can be defined therefore by the following mapping of y into ℝ^{1}:
The conventional Euclidean inner product is associated with the Euclidean metric, ║ x ║=〈x, x〉^{1/2}.
It is verified straightforwardly that equation ((A4.7)) does not fulfill the condition of a conventional gradient (A4.13). The idea is now to replace the Euclidean metric by another more general metric that allows the properties of the gradient system. We introduce a generalised inner product corresponding to a Riemann‐type metric where the conventional product terms are weighted by the co‐ordinates of the position vector z:
Expression (A4.14), [*,*]_{z}, defines an inner product in the interior of ${\mathbb{R}}_{+}^{n}$, because it is linear in u and v, and satisfies [u,u]_{z}≥ 0 with the equality fulfilled if and only
The differential DV _{(x)} is defined completely by V_{(x)} and hence is independent of the choice of an inner product; the gradient, however, is not because it depends on the definition (A4.15). Shahshahani [6] has shown that the relation dx/dt=Grad[Փ(x)] is fulfilled for Fisher's selection equation (A4.19; see section 5) with $\mathrm{\Phi}={\mathrm{\sum}}_{i=1}^{n}{\mathrm{\sum}}_{j=1}^{n}{a}_{ij}{x}_{i}{x}_{j}$. As an example for the procedure we consider here the simple selection equation ((A4.7)) with $\mathrm{\Phi}={\mathrm{\sum}}_{i=1}^{n}{f}_{i}{x}_{i}$.
The differential equation ((A4.7)) is conceived as a generalised gradient system and we find:
By application of equation (A4.15) we obtain
which can be derived by conventional differentiation from
By straightforward computation we find the desired result:
With the new definition of the scalar product, encapsulated in the definition of ‘Grad’, the orbits of equation ((A4.7)) are perpendicular to the constant level sets of Փ(x).
4. Complementary replication
Often the molecular mechanism of replication proceeds through an intermediate represented by a polynucleotide molecule with a complementary sequence: (A)+I _{+}→I _{+}+I _{−} and (A)+I _{−}→I _{−}+I _{+}. In analogy to equation ((A4.7)) and with ${f}_{1}={\mathrm{f\u0303}}_{+}\left[\text{A}\right],{f}_{2}={\mathrm{f\u0303}}_{}\left[\text{A}\right],{x}_{1}=\left[{\text{I}}_{+}\right]$, and x_{2}=[I _{−}] we obtain the following differential equation ^{9–11}:
(p.678) Applying the integrating factor transformation (A4.10) yields the linear equation
The eigenvalues and (right‐hand) eigenvectors of the matrix W are
Straightforward calculation yields analytical expressions for the two variables (see paragraph mutation) with the initial concentrations x _{1}(0) and x _{2}(0), and ${\gamma}_{1}(0)=\sqrt{{f}_{1}}{x}_{1}(0)+\sqrt{{f}_{2}}{x}_{2}(0)$ and ${\gamma}_{2}(0)=\sqrt{{f}_{1}}{x}_{1}(0)+\sqrt{{f}_{2}}{x}_{2}(0)$ as abbreviations:
After a sufficiently long time the negative exponential has vanished and we obtain the simple result
After an initial period, the plus‐minus pair, I _{±}, grows like a single autocatalyst with a fitness value $f=\sqrt{{f}_{1}{f}_{2}}$ and a stationary ratio of the concentrations of complementary stands ${x}_{1}/{x}_{2}\approx \sqrt{{f}_{2}}/\sqrt{{f}_{1}}$.
5. Recombination
Recombination of n alleles on a single locus is described by Ronald Fisher's [12] selection equation,
(p.679) As in the simple selection case the two conditions a _{ij}}〉 0∀i,j = 1,2,&,n and x _{i}≥ 0∀i=1,2,&,n will guarantee Փ(t)≥0. Summation of allele frequencies, ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}\left(t\right)=c\left(t\right)$, yields again equation (A4.8) for ċ and hence for ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}\left(0\right)=1$ the population is confined again to the unit simplex.
The rate parameters a _{ij} form a quadratic matrix
The dynamics of equation (A4.19) for general matrices A may be very complicated [13]. In case of Fisher's selection equation, however, we are dealingwith a symmetric matrix for biological reasons,^{5} and then the differential equation can be subjected to straightforward qualitative analysis.
The introduction of mean rate parameters ${\u0101}_{i}={\mathrm{\sum}}_{j=1}^{n}{a}_{ij}{x}_{j}$ facilitates the forthcoming analysis.The time dependence of Փ is now given by
Again we see that the flux Փ(t) is a non‐decreasing function of time, and it approaches an optimal value on the simplex. This result is often called Fisher's fundamental theorem of evolution (see, for example, [14]).
Qualitative analysis of equation (A4.19) yields 2^{n−1} stationary points, which depending on the elements of matrix A may lie in the interior, on the boundary or outside the unit simplex ${\text{s}}_{n}^{\left(1\right)}$. In particular, we find at most one equilibrium point on the simplex and one on each subsimplex of the boundary. For example, each corner, (p.680) represented by the unit vector ${\text{e}}_{k}=\{{\mathrm{x\u0304}}_{k}=1,{x}_{i}=0\forall i\ne k\}$, is a stable or unstable stationary point. Where there is an equilibrium in the interior of ${S}_{n}^{\left(1\right)}$ it may be stable or unstable depending on the elements of A. In summary, this leads to a rich collection of different dynamical scenarios which share the absence of oscillations or chaotic dynamics. As said above, multiple stationary states do occur and more than one may be stable. This implies that the optimum, which Փ(t) is approaching, need not be uniquely defined. Instead Փ(t) may approach one of the local optima and then the outcome of the selection process will depend on initial conditions [14, 15, 16, 17].
Three final remarks are important for a proper understanding of Fisher's fundamental theorem: (i) selection in the one‐locus system when it follows equation (A4.19) optimises mean fitness of the population, (ii) the outcome of the process need not be unique since the mean fitness Փ may have several local optima on the unit simplex, and (iii) optimisation behaviour that is susceptible to rigorous proof is restricted to the one locus model since systems with two or more gene loci may show different behaviours of Փ(t).
6. Mutation
The introduction of mutation into the selection equation ((A4.7)) based on knowledge from molecular biology is due to Manfred Eigen [9]. It leads to
Mutations and error‐free replication are understood as parallel reaction channels, and the corresponding reaction probabilities are contained in the mutation matrix
where Q_{ij} expresses the frequency of a mutation I _{j}→I _{i}. Since the elements of Q are defined as reaction probabilities and a replication event yields either a correct copy or a mutant, the columns of Q sum up to one, ${\mathrm{\sum}}_{i=1}^{n}{Q}_{ij}=1$, and hence, Q is stochastic matrix. In case one makes the assumption of equal probabilities for I _{j}→I _{i} and I _{i}→I _{j}, as it is made for example in the uniform error rate model (see equation (A4.29) and [10,18], Q is symmetric and hence a bistochastic matrix.^{6} The mean fitness or flux Փ is described by the same expression as in the selection‐only case (p.681) (A4.7). This implies that the system converges to the unit simplex, as it did without mutations. For initial values of the variables chosen on the simplex, ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}\left(0\right)=1$, it remains there.
In the replication‐mutation system the boundary of the unit simplex, ${\text{S}}_{n}^{\left(1\right)}$, is not invariant. Although no orbit starting on the simplex will leave it, which is a conditio sine qua non for chemical reactions requiring non‐negative concentrations, trajectories flow from outside the positive orthant into ${\text{S}}_{n}^{\left(1\right)}$. In other words, the condition x _{i}(0)=0 does not lead to x _{i}(t)=0 ∀ t〉0. The chemical interpretation is straightforward: If a variant I _{i} is not present initially, it can, and depending on Q commonly will, be formed through a mutation event.
6.1 Exact solution
Before discussing the role of the flux Փ in the selection‐mutation system, we shall derive exact solutions of equation (A4.21) following a procedure suggested in [19,20]. At first the variables x_{i}(t) are transformed as in the selection‐only case (A4.10):
From ${\mathrm{\sum}}_{x=1}^{n}{x}_{i}\left(t\right)=1$ follows straightforwardly — again as in the selection‐only case — the equation,
What remains to be solved is a linear first‐order differential
which is readily done by means of standard linear algebra. We define a matrix W ={W_{ij} = Q_{ij} f_{j}} = Q∙F where F ={F_{ij}=f_{i}δ_{ij} is a diagonal matrix, and obtain the differential equation in matrix form, ż=W∙z. Provided matrix W is diagonalisable, which will always be the case when the mutation matrix Q is based on real chemical reaction mechanisms, the variables z can be transformed linearly by means of an invertible n × n matrix L= {ℓ_{ij} i,j=1,&,n} with L^{−1}=h_{ij}; i,j=1,&,n} being its inverse,
(p.682) such that L^{−1}∙ W ∙ L=λ is diagonal. The elements of λ, λ_{k}, are the eigenvalues of the matrix W. The right‐hand eigenvectors of W are given by the columns of L, ℓ_{j}=ℓ_{i,j}; i = 1,&,n, and the left‐hand eigenvectors by the rows of L^{−1}, h_{k} = (h_{k,i}; i = 1,&,n, respectively. These eigenvectors can be addressed as the normal modes of selection‐mutation kinetics. For strictly positive off‐diagonal elements of W, implying the same for Q which says nothing more than every mutation I _{i}→I _{j} is possible, although the probability might be extremely small, the Perron—Frobenius theorem holds (see, for example, [21] and next paragraph) and we are dealing with a non‐degenerate largest eigenvalue λ_{0},
and a corresponding dominant eigenvector ℓ_{0} with strictly positive components, ℓ_{10}〉0 ∀ i=1,&,n.^{7} In terms of components the differential equation in ζ has the solutions
Transformation back into the variables z yields
with the initial conditions encapsulated in the equation
From here we obtain eventually the solutions in the original variables x_{i} in analogy to equation (A4.11)
6.2 Perron‐Frobenius theorem
Perron—Frobenius theorem comes in two versions [21], which we shall now consider and apply to the selection‐mutation problem. The stronger version provides (p.683) a proof for six properties of the largest eigenvector of non‐negative primitive matrices^{8} T:

(i) The largest eigenvalue is real and positive, λ_0〉0,

(ii) a strictly positive right eigenvector ℓ_{0} and a strictly positive left eigenvector h _{0} are associated with λ_{0},

(iii) λ_{0}〉λ_{k} holds for all eigenvalues λ_{k}≠λ_{0},

(iv) the eigenvectors associated with λ_{0} are unique up to constant factors,

(v) if l ≤ B ≤ T is fulfilled and β is an eigenvalue of B, then β≤λ_{0}, and, moreover, β=λ_{0} implies B=T,

(vi) λ _{0} is a simple root of the characteristic equation of T.
The weaker version of the theorem holds for irreducible matrices^{9} T. All the above given assertions hold except (iii) has to be replaced by the weaker statement
(iii) λ_{0}〉λ_{k} holds for all eigenvalues λ_{k}≠λ_{0}
Irreducible cyclic matrices can be used straightforwardly as examples in order to demonstrate the existence of conjugate complex eigenvalues (an example is discussed below). Perron—Frobenius theorem, in its strict or weaker form, holds not only for strictly positive matrices T〉0 but also for large classes of mutation or value matrices W≡T being a primitive or irreducible non‐negative matrix) with off‐diagonal zero entries corresponding to zero mutation rates. The occurrence of a non‐zero element ${t}_{ij}^{\left(m\right)}$ in T^{m} implies the existence of a mutation path I _{j}→ I _{k}→&→ I _{ℓ}→ I _{i} with non‐zero mutation frequencies for every individual step. This condition is almost always fulfilled in real systems.
6.3 Complex eigenvalues
In order to address the existence of complex eigenvalues of the value matrix W we start by considering the straightforward case of a symmetric mutation matrix Q. Replication rate parameters f_{i} are subsumed in a diagonal matrix: F={f _{i} δ_{i,j};i,j=1,&,n}, the value matrix is obtained as product W=Q∙F, and, in general, W is (p.684) not symmetric. A similarity transformation,
yields a symmetric matrix [22], since ${\text{F}}^{\frac{1}{2}}\xb7\text{Q}\xb7{\text{F}}^{\frac{1}{2}}$ is symmetric if Q is. Symmetric matrices have real eigenvalues and as a similarity transformation does not change the eigenvalues W has only real eigenvalues if Q is symmetric.
The simplest way to yield complex eigenvalues is by the introduction of cyclic symmetry into the matrix Q in such a way that the symmetry with respect to the main diagonal is destroyed. An example is the matrix
with different entries Q_{ij}. For equal replication parameters the eigenvalues contain complex n th roots of one, ${\gamma}_{k}^{n}=1$ or γ_{k}= exp (2π ik/n), i=1,&, n, and for n≥3 most eigenvalues come in complex conjugate pairs. As mentioned earlier symmetry in mutation frequencies is commonly not fulfilled in nature. For point mutations the replacement of one particular base by another one does usually not occur with the same frequency as the inverse replacement, G→A versus A→G, for example. Needless to say, cyclic symmetry in mutation matrices is also highly improbable in real systems. The validity the of Perron—Frobenius theorem, however, is not affected by the occurrence of complex conjugate pairs of eigenvectors. In addition, it is unimportant for most purposes whether a replication‐mutation system approaches the stationary state monotonously or through damped oscillations (see next paragraph).
6.4 Mutation and optimisation
In order to consider the optimisation problem in the selection‐mutation case, we choose the eigenvectors of W as the basis of a new co‐ordinate system (Figure A4.4):
is identical to equation ((A4.7)) and hence the solutions are the same,
as well as the maximum principle on the simplex
The difference between selection and selection‐mutation comes from the fact that the simplex ${\tilde{\text{S}}}_{n}$ does not coincide with the physically defined space ${\text{S}}_{n}$ (see Figure A4.4 for a low‐dimensional example). Indeed only the dominant eigenvector ℓ_{0} lies in the interior of ${\text{S}}_{n}^{\left(1\right)}$: it represent the stable stationary distribution of genotypes or quasi‐species [10] towards which the solutions of the differential equation (A4.21) converge. All other n−1 eigenvectors, ℓ_{1},&,ℓ_{n−1} lie outside ${\text{S}}_{n}^{\left(1\right)}$ in the not physical range where one or more variables x_{i} are negative. The quasi‐species ℓ_{0} is commonly dominated by a single genotype, called the master sequence I _{m}, having the largest stationary relative concentration, ${\stackrel{\u0304}{x}}_{m}\u3009\u3009{\stackrel{\u0304}{x}}_{i}\forall i\ne m$, reflecting, for not too large mutation rates, the same ranking as the elements of the matrix W, ${W}_{mm}\u3009\u3009{W}_{ii}\forall i\ne m$; i≠m. As sketched in Figure A4.4 the quasi‐species is then situated close to the unit vector e_{m} in the interior of ${\text{S}}_{n}^{\left(1\right)}$.
For the discussion of the optimisation behaviour the simplex is partitioned into three zones: (i) the zone of maximization of Փ(t), the (large) lower white area in figure A4.4 where equation (A4.9a) holds and which we shall denote as optimisation cone, (ii) the zone that includes the unit vector of the master sequence, e_{m}, and the quasi‐species, ℓ_{0}, as corners, and that we shall characterize as master cone,^{10} and (iii) the remaining part of the simplex ${\text{S}}_{n}^{\left(1\right)}$ (zones (iii) are coloured grey in Figure A4.4). It is straightforward to prove that increase of Փ(t) and monotonous convergence towards the quasi‐species is restricted to the optimisation cone [23]. From the properties of the selection equation ((A4.7)) we recall and conclude that the boundaries of the simplex ${\text{S}}_{n}^{\left(1\right)}$ are invariant sets. This implies that no orbit of the differential equation (A4.21) can cross these boundaries. The boundaries of ${\text{S}}_{n}^{\left(1\right)}$, on the other hand, are not invariant but they can be crossed exclusively in one direction: (p.687) from outside to inside.^{11} Therefore, a solution curve starting in the optimisation cone or in the master cone will stay inside the cone where it started and eventually converge towards the quasi‐species, ℓ_{0}.
In zone (ii), the master cone, all variables ξ_{k} except ξ_{0} are negative and ξ_{0} is larger than one in order to fulfill the L^{(1)}norm condition $\sum _{k=0}^{n1}{\xi}_{k}=1$. In order to analyse the behaviour of Փ(t) we split the variables into two groups, ξ_{0} the frequency of the quasi‐species and the rest [23], {ξ_{k}; k = 1,&,n − 1} with $\sum _{k=1}^{n1}{\xi}_{k}=1{\xi}_{0}$:
Next we replace the distribution of λ_{k} values in the second group by a single λ‐value, $\tilde{\lambda}$ and find:
After a few simple algebraic operations we find eventually
For the master cone with ξ_{0}≥1, this implies dՓ(t)/dt≤0, as the flux is a non‐increasing function of time. Since we are only interested in the sign of dՓ/dt, the result is exact, because we could use the mean value $\tilde{\lambda}=\stackrel{\u0304}{\lambda}=\left(\sum _{k=1}^{n1}{\lambda}_{k}{\xi}_{k}\right)/(1{\xi}_{0})$, the largest possible value λ_{1} or the smallest possible value λ_{n−1} without changing the conclusion. Clearly, the distribution of λ_{k}‐values matters for quantitative results. It is worth mentioning that equation (A4.28) applies also to the quasi‐correct coneand gives the result that Փ(t) is non‐decreasing. Decrease of mean fitness or flux Փ(t) in the master cone is readily illustrated. Consider, for example, a homogeneous population of the master sequence as the initial condition: x_{m}(0)=1 and Փ(0)=f_{m}. The population becomes inhomogeneous because mutants are formed. Since all mutants have lower replication constants by definition, (f_{i}〈f_{m}∀i≠ m), Փ becomes smaller. Finally, the distribution approaches the quasi‐species ℓ_{0} and lim_{t→∞}Փ(t)=λ_{0} 〈 f_{m}.
An extension of the analysis from the master cone to zone (iii), where not all ξ_{k} values with k≠0 are negative, is not possible. It has been shown by means of numerical examples that dՓ(t)/dt may show non‐monotonous behavior and can go through a maximum or a minimum at finite time [23].
(p.688) 6.5 Mutation rates and error threshold
In order to illustrate the influence of mutation rates on the selection process we apply (i) binary sequences, (ii) the uniform error rate approximation,
with d_{ij} being the Hamming distance between the two sequences I _{i} and I _{j}, ν the chain length and p the mutation or error rate per site and replication, and (iii) a simple model for the distribution of fitness values known as single peak fitness landscape [24],
which represents a kind of mean field approximation. The mutants with the master sequence I _{1} are ordered in mutant classes: the zero‐error class contains only the reference sequence (I _{1}), the one‐error class comprises all single‐point mutations, the two‐error class all double‐point mutations, etc. Since the error rate p is independent of the particular sequence and all molecules belonging to the same mutant class have identical fitness values f_{k}, it is possible to introduce new variables for entire mutant classes Γ_{k}:
The mutation matrix Q has to be adjusted to transitions between classes [24,25]. for mutations from class Γ_{l} into Γ_{k} we calculate:
The mutation matrix Q for error classes is not symmetric, Q_{kl}≠ Q_{kl}lk as follows from equation (A4.31).
A typical plot of relative concentrations against error rate is shown in Figure A4.5. At vanishing error rates, lim p→0, the master sequence is selected, ${\text{lim}}_{t\to \infty}{y}_{0}\left(t\right)=\stackrel{\u0304}{y}=1$, and all other error classes vanish in the long time limit. Increasing error rates are reflected by a decrease in the stationary relative concentration of the master sequence and a corresponding increase in the concentration of all mutant classes. Apart from ${\stackrel{\u0304}{y}}_{0}\left(p\right)$ all concentrations ${\stackrel{\u0304}{y}}_{k}\left(p\right)$ with k〈ν/2 go through a maximum and approach pairwise the curves for ${\stackrel{\u0304}{y}}_{vk}$ at values of p that increase with p. At p = 0.5 the eigenvalue problem can be solved exactly: the largest eigenvalue is strictly positive λ_{0} 〉 0: it corresponds to an eigenvector ℓ_{0}, which is the uniform distribution in relative stationary concentrations ${\stackrel{\u0304}{x}}_{1}={\stackrel{\u0304}{x}}_{2}=\mathrm{\dots}={\stackrel{\u0304}{x}}_{n}=1/n$, and
(p.690) The mutant distribution $\stackrel{\u0304}{\text{y}}\left(p\right)$ comes close to the uniform distribution already around p≈0.035 in Figure A4.5, and stays constant for the rest of the p values (0.035〈 p 〈0.5). The narrow transition from the ordered replication (0〈p〈0.035) to random replication (p〉0.035) is called the error threshold. An approximation based on neglect of mutational back‐flow and using ln(1−p)≈−p yields a simple expression for the position of the threshold [9]:
The equation defines a maximal error rate p _{max} above which no ordered — non‐uniform — stationary distributions of sequences exist (see also section 7). In the current example (we calculate p _{max}=0.03466 in excellent agreement with the value observed in computer simulations. RNA viruses commonly have mutation rates close to the error threshold [26]. Error rates can be increased by pharmaceutical drugs interfering with virus replication and accordingly a new antiviral strategy has been developed, which drives virus replication into extinction either by passing the error threshold [28] or by extinction. Recently, the mechanism of lethal mutagenesis in virus infections has been extensively discussed [29,30].
Several model landscapes describing fitness by a monotonously decreasing function of the Hamming distance from the master sequence, f(d), are often applied in population genetics, examples are:
Interestingly, all three model landscapes do not sustain sharp error thresholds as observed with the single peak landscape. On the hyperbolic landscape the transition is less sharp than on the single peak landscape and may be called weak error threshold. The linear and quadratic landscapes show rather gradual and smooth transitions from the quasi‐species towards the uniform mutant distribution (Figure A4.6). Despite the popularity of smooth landscapes in populations genetics, they are not supported by knowledge derived from biopolymer structures and functions. In contrast, the available data provide strong evidence that the natural landscapes are rugged and properties do not change gradually with Hamming distance.
In order to generalise the results derived from model landscapes to more realistic situations, random variations of rate constants for individual sequences were superimposed upon the fitness values of a single peak landscape — whereby the mean value (p.691) ${\stackrel{\u0304}{f}}_{m}$ was kept constant [31, pp. 29–60]. Then, the curves for individual sequences within an error class differ from each other and form a band that increases in width with the amplitude of the random component. Interestingly, the error threshold phenomenon is thus retained and the critical value p _{max} is shifted to lower error rates. Another very general approach to introduce variation into the value matrix without accounting for the underlying chemical reaction mechanism was taken by Walter Thirring and coworkers [32]. They also found a sharp transition between ordered and disordered domains.
6.6 Population entropies
Population entropies are suitable measures for the width of mutant distributions. For steady states they are readily computed from the largest eigenvector of matrix W:
where the expression on the right‐hand side refers to mutant classes.The pure state at p = 0 has zero entropy, S(0) = 0. For the uniform distribution the entropy is maximal, and for binary sequences we have
Between these two extremes, 0≤ p≤0.5, the entropy is a monotonously increasing function of the error rate, p. Figure A4.7 shows the entropy S(p) on the four model landscapes applied in Figures A4.5 and A4.6. The curves reflect the threshold behaviour encountered in the previous paragraphs (Figures A4.5 and A4.6): the entropy on the single peak landscape makes a sharp kink at the position of the error threshold, the curve for the entropy on the hyperbolic landscape has a similar bend at the threshold but the transition is smoother, whereas the entropies for the two other landscapes are curved differently and approach smoothly the maximum value, S_{max}=ν ln 2.
6.7 Lethal mutants
It is important to note that a quasi‐species can exist also in cases where the Perron—Frobenius theorem is not fulfilled. As an example we consider an extreme case of lethal mutants: only genotype I _{1} has a positive fitness value, f_{1} 〉 0 and f_{2} = & = f_{n} = 0, and hence only the entries W _{k1}=Q _{k1}f_{1} of matrix W are non‐zero:
Accordingly, W is not primitive in this example, but under suitable conditions x̄ = (Q_{11}, Q_{21},…,Q_{n1}) is a stable stationary mutant distribution and for ${Q}_{11}\u3009{Q}_{j1}\forall j=2,\mathrm{\dots},\text{}n$ — correct replication occurs more frequently than a particular (p.693) mutation — genotype I _{1} is the master sequence. On the basis of a rather idiosyncratic mutation model consisting of a one‐dimensional chain of sequences the claim was raised that no quasi‐species can be stable in the presence of lethal mutants and accordingly, no error thresholds could exist [33]. Recent papers [30,34], however, used a realistic high‐dimensional mutation model and presented analytical results as well as numerically computed examples for error thresholds in the presence of lethal mutations.
In order to be able to handle the case of lethal mutants properly we have to go back to absolute concentrations in a realistic physical setup, the flowreactor applied in section 1 and shown in Figure A4.1. We neglect degradation and find for I _{1} being the only viable genotype:^{12}
Computation of stationary states is straightforward and yields two solutions, (i) the state of extinction with ā=a_{0} and $\stackrel{\u0304}{a}={a}_{0}$, and (ii) a state of quasi‐species selection consisting of I _{1} and its mutant cloud at the concentrations $\stackrel{\u0304}{a}=r/\left({Q}_{11}{k}_{1}\right),{\stackrel{\u0304}{c}}_{1}={Q}_{11}{a}_{0}r/{k}_{1}$ and ${\stackrel{\u0304}{c}}_{i}={\stackrel{\u0304}{c}}_{1}({Q}_{i1}/{Q}_{11})$ for i=2,&,n.
As an example we compute a maximum error rate for constant flow, r=r_{0}, again applying the uniform error rate model (A4.29):
where d_{i1} again is the Hamming distance between the two sequences I _{i} and I _{1}. Instead of the superiority σ of the master sequence — which diverges since f _{−m} = 0 because of f_{2}=&=f_{n}=0 — we use the dimensionless carrying capacity η, which can be defined to be
for the flowreactor. The value of p, at which the stationary concentration of the master sequence $\stackrel{\u0304}{c}\left(p\right)$ and those of all other mutants vanishes, represents the analogue of the error threshold (32), and for the sake of clearness it is called the extinction threshold. Using ln(1−p)≈−p again we obtain:
7. Limitations of the approach
An implicit assumption of the mathematical analysis of Darwinian selection presented here is the applicability of kinetic differential equations to describe selection and mutation in populations. In principle the approach by ordinary differential (p.695) equations (ODEs) neglects finite size effects and hence is exact in principle for an infinite population size only. Biological populations, however, may be small and low–frequency mutants may be present often in a single copy or very few copies only. The uniform distribution at error rates above the threshold can never be achieved in reality because the numbers of possible polynucleotide sequences — 4^{ν} yielding, for example, 6 × 10^{45} sequences of tRNA length — are huge compared to typical populations ranging from 10^{6} to 10^{15} individuals in replication experiments with bacteria, viruses or RNA molecules. Typical situations in biology may thus differ drastically from scenarios in chemistry where large populations are distributed upon a few chemical species. Are the results derived from the differential equations then representative for real systems? Two situations can be distinguished: (i) individual mutations are rare events and it is extremely unlikely that the same mutation will occur twice or is precisely reversed after it has occurred, and (ii) mutations are sufficiently frequent and occur in both directions within the time of observation. The second scenario is typical for virus evolution and ^{in vitro} evolution experiments with molecules. The first case seems to be fulfilled with higher organisms. Bacteria may be in an intermediate situation.
In scenario (i), i.e. at low mutation rates, the exact repetition of a given mutation is of very low probability. Back‐mutations, precise inversions of mutations, are also of probability zero for all practical purposes. So‐called compensatory mutations are known, but they are not back‐mutations, they are rather caused by second mutations that compensate the effect of the first mutation. A phenomenon called Muller's ratchet [35] in population genetics becomes effective in finite populations. Since lost mutants are not replaced, all variants starting with the fittest one will disappear sooner or later, and it is a only matter of time before a situation is reached where all genotypes have been replaced by others no matter what their fitness values were. For a comparison between the error threshold phenomenon and Muller's ratchet see [33].
The frequent mutation scenario (ii) allows for modelling and studying the kinetic equations of reproduction and selection as stochastic processes [25, 36, 37, 38] — examples are multitype branching or birth‐and‐death processes — as well as for computer simulations [39] (for an overview of stochastic modeling see, for example [40]). In essence, the solutions of stochastic models are time‐dependent probability distributions instead of solution curves. The mean or expectation value of the distribution coincides with the deterministic (ODE) solution, since all reactions in the kinetic model are (pseudo) first order. The (relative) width of the distribution increases with growing mutation rate and deceasing population size, and the error threshold phenomenon is reproduced as a superposition of error propagation and finite size effects. The expression for the error threshold can be readily extended to finite populations [25]. Formation of stable quasi‐species requires a replication fidelity that is higher the smaller the population size.
(p.696) At error rates above threshold the kinetic ODEs predict the uniform distribution of sequences as stationary solution of equation (A4.21).^{13} Differences in fitness values do not matter under these conditions and there is no preferred master sequence. Realistic populations are far too small to form uniform distributions of sequences and hence the deterministic model fails. Below threshold the quasi‐species can be visualised as a localisation of the population in some preferred region of sequence space with high fitness values (or at least one particularly high fitness value) [18, 41]. Above threshold the population is no more localised and drifts randomly in sequence space.^{14} At the same time, populations are also too small to occupy a coherent region in sequence space and break up into smaller clones, which migrate in different directions as described for the neutral evolution case [3,44].
How relevant is the error threshold in realistic situations? According to the results presented in section 6.5 the question boils down to an exploration of natural fitness landscapes: are biopolymer landscapes rugged or smooth? All evidence obtained so far points towards a rather bizarre structure of these landscapes. Single nucleotide exchanges may lead to large effects, small effects or no consequences at all, as in the case of neutral mutations. Since biomolecules are usually optimised with respect to their functions within an organism, most mutations have deleterious effects or no effect. Biopolymer landscapes have three characteristic features, which are hard to visualize: (i) high dimensionality (ii) ruggedness and (iii) neutrality. Where equally fit genotypes are nearest or next nearest neighbors in sequence space they form joint quasi‐species as described in [23]. When they are not closely related, however, neutral evolution in the sense of Motoo Kimura is observed [45]. In the case of neutrality in genotype space a selection model can still be formulated in phenotype space [46, 47]. The variables are concentrations of phenotypes that are obtained by lumping together all concentrations of genotypes, which form the same phenotype. Then, an analysis similar to the one presented here can be carried out. The genotypic error threshold is relaxed and the system gives rise to a phenotypic error threshold below which the fittest or master phenotype is conserved in the population. The ODE model is readily supplemented by a theory of phenotype evolution based on new concept of evolutionary nearness of phenotypes in sequence space [4, 48, 49], which is confirmed by computer simulations of RNA structure optimization in a flowreactor of the type shown in Figure A4.1 [4, 48, 50]. Random drift of populations on neutral subspaces of sequence space has also been dealt with [50]. A series of snapshots shows the spreading of a population that breaks up into individual clones in (p.697) full agreement with earlier models [44, 51]. Computer simulations were also successful in providing evidence for the occurrence of error thresholds in stochastic replication‐mutation systems [52].
Notes:
(^{1}) An overview of the notation used in this article is found on the last page of this Appendix.
(^{2}) We do not consider degenerate or neutral cases, d_{i}=d_{j} and k_{i}=k_{j}, here (see also section A4.7).
(^{3}) Care is needed for the application of relative co‐ordinates, because the total concentration c(t) might vanish and then relative co‐ordinates become spurious quantities (see subsection 6.7).
(^{4}) Generalisation to arbitrary but finite population sizes c≠1 is straightforward: For ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}\left(0\right)={c}_{0}$ the equation Generalisation to arbitrary ${\stackrel{\xb7}{x}}_{t}={f}_{i}{x}_{i}({x}_{i}/{c}_{0}){\mathrm{\sum}}_{j=1}^{n}{f}_{j}{x}_{j},i=1,2,\mathrm{\dots},n$ plays the same role as equation ((A4.7)) did for ${\mathrm{\sum}}_{i=1}^{n}{x}_{i}\left(0\right)=1$.
(^{5}) The assumption for Fisher's equation is based on insensitivity of phenotypes to the origin of the parental alleles on chromosomes. Phenotypes derived from genotype a _{i} a _{j} are assumed to develop the same properties, no matter which allele, a _{i} a _{j}, on the chromosomal locus comes from the mother and which comes from the father. New results on genetic diseases have shown, however, that this assumption can be questioned.
(^{6}) Symmetry in the direction of mutations is commonly not fulfilled in nature. It is introduced as a simplification, which facilitates the construction of computer models for equation (A4.21). Moreover, the assumption of a symmetric mutation matrix Q is not essential for the analytic derivation of solutions.
(^{7}) We introduce here an asymmetry in numbering rows and columns in order to point at the special properties of the largest eigenvalue λ_{0} and the dominant eigenvector l _{0}.
(^{8}) A square non‐negative matrix T=t_{ij};i,j=1,&,n; t_{ij}≥0 is called primitive if there exists a positive integer m such that T^{m} is strictly positive: T^{m}〉, which implies ${\text{T}}^{m}=\{{t}_{ij}^{\left(m\right)};i,j=1,\mathrm{\dots},n;{t}_{ij}^{\left(m\right)}\u30090\}$
(^{9}) A square non#x2010;negative matrix T=t_{ij};i,j=1,&,n; t_{ij}≥0 is called irreducible if for every pair (i,j) of its index set there exists a positive integer m_{ij}≡ m(i,j) such that ${t}_{ij}^{{m}_{ij}}\u30090$. An irreducible matrix is called cyclic with period d if the period of (all) its indices satisfies d〉1, and it is said to be acyclic if d=1.
(^{10}) The exact geometry of the optimisation cone or the master cone is a polyhedron that can be approximated by a pyramid rather than a cone. Nevertheless we prefer the inexact notion cone because it is easier to memorise and to imagine in high‐dimensional space.
(^{11}) This is shown easily by analysing the differential equation, but follows also from the physical background: no acceptable process can lead to negative particle numbers or concentrations. It can, however, start at zero concentrations and this means the orbit begins at the boundary and goes into the interior of the physical concentration space, here the simplex ${\text{S}}_{n}^{\left(1\right)}$.
(^{12}) We use k^{i} for the rate constants as in section 1, since a(t) is a variable here.
(^{13}) As mentioned before the uniform distribution is the exact stationary solution of equation ((A4.21)) for equal probabilities of correct and incorrect incorporation of a nucleotide, which is the case at an error rate p=1−p=0.5 for binary sequences.
(^{14}) The mutation rate can be seen as an analogue to temperature in spin systems and the error threshold corresponds to a phase transition. The relation between the selection‐mutation equation and spin systems was studied first by Ira Leuthäusser [42,43]