Jump to ContentJump to Main Navigation
The Nature of Computation$

Cristopher Moore and Stephan Mertens

Print publication date: 2011

Print ISBN-13: 9780199233212

Published to Oxford Scholarship Online: December 2013

DOI: 10.1093/acprof:oso/9780199233212.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see www.oxfordscholarship.com/page/privacy-policy).date: 12 December 2018

The Basics

The Basics

Chapter:
(p.15) Chapter 2 The Basics
Source:
The Nature of Computation
Author(s):

Cristopher Moore

Stephan Mertens

Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780199233212.003.0002

Abstract and Keywords

This chapter discusses the complexity of a problem by thinking about the best possible algorithm, or computer program, that solves it. It shows that computational complexity theory is not about how to write better programs, but about understanding the underlying structure of different problems as well as asking fundamental questions about them. The chapter first explains problems and solutions by considering a Eulerian path and a Hamiltonian path. It then examines Euclid's algorithm, time and space, and the notion of scaling in physics. It also analyzes the distinction between polynomial functions of n and exponential ones, why this distinction is very important, and why it is so robust with respect to changes in the definition of computation. Finally, the chapter looks at the tractability and mathematical insight into a problem's structure.

Keywords:   computational complexity theory, problems, solutions, Eulerian path, Hamiltonian path, Euclid's algorithm, time, space, scaling, polynomial functions

  • An algorithm is a finite answer to an infinite
  • number of questions.
  • Stephen Kleene

As we saw in the Prologue, there seems to be some mysterious difference between Eulerian and Hamiltonian paths that makes one of them much harder to find than the other. To put this differently, Hamiltonianness seems to be a qualitatively more subtle property than Eulerianness. Why is one of these problems easy, while the other is like searching for a needle in a haystack?

If we want to understand the nature of these problems, we need to go beyond particular puzzles like the bridges of Königsberg or the edges of the dodecahedron. We need to ask how hard these problems are in general—for cities with any number of bridges, or graphs of any size—and ask how their complexity grows with the size of the city or the graph. To a computer scientist, the “complexity” of a problem is characterized by the amount of computational resources required to solve it, such as how much time or memory we need, and how these requirements grow as the problem gets larger.

In order to measure the complexity of a problem, we will think about the best possible algorithm, or computer program, that solves it. However, as we will see, computational complexity theory is not about how to write better programs, any more than physics is about building better spaceships. It is about understanding the underlying structure of different problems, and asking fundamental questions about them—for instance, whether they can be broken into smaller pieces that can be solved independently of each other.

2.1 Problems and Solutions

Let's start this chapter by saying precisely what we mean by a “problem,” and what constitutes a “solution.” If you have never thought about computational complexity before, our definitions may seem slightly counterintuitive. But as we will see, they give us a framework in which we can clearly state, and begin to answer, the questions posed in the Prologue.

(p.16) 2.1.1 What's the Problem?

Any particular instance of a problem, such as the Königsberg bridges, is just a finite puzzle. Once we have solved it, there is no more computation to do. On the other hand, Euler's generalization of this puzzle,is a worthy object of study in computational complexity theory—and we honor it as such by writing its name in elegant small capitals. We can think of this as an infinite family of problems, one for each graph G. Alternately, we can think of it as a function that takes a graph as its input, and returns the output “yes” or “no.”

To drive this point home, let's consider a somewhat comical example. How computationally complex is Chess? Well, if you mean the standard game played on an 8 × 8 board, hardly at all. There are only a finite number of possible positions, so we can write a book describing the best possible move in every situation. This book will be somewhat ungainly—it has about 1050 pages, making it difficult to fit on the shelf—but once it is written, there is nothing left to do.

Now that we have disposed of Chess, let's consider a more interesting problem:

Now you're talking! By generalizing to boards of any size, and generalizing the rules appropriately, we have made it impossible for any finite book, no matter how large, to contain a complete solution. To solve this problem, we have to be able to solve Chess problems, not just look things up in books. Moreover, generalizing the problem in this way allows us to consider how quickly the game tree grows, and how much time it takes to explore it, as a function of the board size n.

Another important fact to note is that, when we define a problem, we need to be precise about what input we are given, and what question we are being asked. From this point of view, cities, graphs, games, and so on are neither complex nor simple—specific questions about them are.

These questions are often closely related. For instance, yes-or-no questions like whether or not a Hamiltonian cycle exists are called decision problems, while we call the problem of actually finding such a cycle a search problem or function problem. Problem 1.10 showed that if we can solve the decision version of HAMILTONIAN CYCLE, then we can also solve the search version. But there are also cases where it is easy to show that something exists, but hard to actually find it.

2.1.2 Solutions and Algorithms

Now that we've defined what we mean by a problem, what do we mean by a solution? Since there are an infinite number of possible graphs G, a solution to EULERIAN PATH can't consist of a finite list of answers we can just look up. We need a general method, or algorithm, which takes a graph as input and returns the correct answer as output.

(p.17) While the notion of an algorithm can be defined precisely, for the time being we will settle for an intuitive definition: namely, a series of elementary computation steps which, if carried out, will produce the desired output. For all intents and purposes, you can think of an algorithm as a computer program written in your favorite programming language: C++, JAVA, HASKELL, or even(ugh) FORTRAN. However, in order to talk about algorithms at a high level, we will express them in “pseudocode.” This is a sort of informal programming language, which makes the flow of steps clear without worrying too much about the syntax.

As our first example, let us consider one of the oldest algorithms known. Given two integers a and b, we would like to know their greatest common divisor gcd(a, b). In particular, we would like to know if a and b are mutually prime, meaning that gcd(a, b) = 1.

We can solve this problem using Euclid's algorithm, which appears in his Elements and dates at least to 300 B.C. It relies on the following fact: d is a common divisor of a and b if and only if it is a common divisor of b and a mod b. Therefore,

gcd ( a , b ) = gcd ( b , a mod b ) .
(2.1)

This gives us an algorithm in which we repeatedly replace the pair (a, b) with the pair (b, a mod b ). Since the numbers get smaller each time we do this, after a finite number of steps the second number of the pair will be zero. At that point, the gcd is equal to the current value of the first number, since 0 is divisible by anything.

We can express this as a recursive algorithm. When called upon to solve the problem gcd(a, b ), it calls itself to solve the simpler sub problem gcd(b, a mod b ), until it reaches the base case b = 0 which is trivial to solve. Note that if a < b in the original input, the first application of this function will switch their order and call gcd(b, a). Here is its pseudocode:

  • Euclid(a, b )
  • begin
  • if b = 0 then return a ;
  • return Euclid(b, a mod b );
  • end

Calling this algorithm on the pair (120,33), for instance, gives

  • Euclid(120,33)
  • = Euclid(33,21)
  • = Euclid(21,12)
  • = Euclid(12,9)
  • = Euclid(9,3)
  • = Euclid(3,0)
  • = 3.

Is this a good algorithm? Is it fast or slow? Before we discuss this question, we bring an important character to the stage.

(p.18) 2.1.3 Meet the Adversary

The Creator determines and conceals the aim of the game, and it is never clear whether the purpose of the Adversary is to defeat or assist him in his unfathomable project… But he is concerned, it would seem, in preventing the development of any reasoned scheme in the game.

H. G. Wells, The Undying Fire

Computer scientists live in a cruel world, in which a malicious adversary (see Figure 2.1) constructs instances that are as hard as possible, for whatever algorithm we are trying to use. You may have a lovely algorithm that works well in many cases, but beware! If there is any instance that causes it to fail, or to run for a very long time, you can rely on the adversary to find it.

The adversary is there to keep us honest, and force us to make ironclad promises about our algorithms’ performance. If we want to promise, for instance, that an algorithm always succeeds within a certain amount of time, this promise holds in every case if and only if it holds in the worst case—no matter what instance the adversary throws at us.

This is not the only way to think about a problem's hardness. As we will see in Chapter 14 for some problems we can ask how well algorithms work on average, when the instance is chosen randomly rather than by an adversary. But for the most part, computational complexity theorists think of problems as represented by their worst cases. A problem is hard if there exist hard instances—it is easy only if all its instances are easy.

So, is finding the gcd more like EULERIAN PATH OR HAMILTONIAN PATH? Euclid's algorithm stops after a finite number of steps, but how long does it take? We answer this question in the next section. But first, we describe how to ask it in the most meaningful possible way.

2.2 Time, Space, and Scaling

It is convenient to have a measure of the amount of work involved in a computing process, even though it be a very crude one… We might, for instance, count the number of additions, subtractions, multiplications, divisions, recordings of numbers, and extractions of figures from tables.

Alan M. Turing, 1947

Time and space—the running time of an algorithm and the amount of memory it uses—are two basic computational resources. Others include the number of times we evaluate a complicated function, the number of bits that two people need to send each other, or the number of coins we need to flip. For each of these resources, we can ask how the amount we need scales with the size of our problem.

2.2.1 From Physics to Computer Science

The notion of scaling is becoming increasingly important in many sciences. Let's look at a classic example from physics. In 1619, the German astronomer and mathematician Johannes Kepler formulated his “harmonic law” of planetary motion, known today as Kepler's third law. Each planet has a “year” or orbital (p.19)

The Basics

FIGURE 2.1: The adversary bringing us a really stinky instance.

(p.20)
The Basics

FIGURE 2.2: Scaling in physics: Kepler's harmonic law.

period T, and a distance R from the sun, defined as its semimajor axis since orbits are ellipses. Based on extensive observations, Kepler found that the ratio between T 2 and R 3 is the same for all planets in the solar system. We can write this as
T = C R 3 / 2,
(2.2)

for some constant C. If we plot T vs. R on a log-log plot as in Figure 2.2, this becomes

log T = C + 3 2 log R ,

where C’ = log C is another constant. Thus all the planets—including Pluto, for tradition's sake—fall nicely on a straight line with slope 3/2.

What happens if we plot T vs. R for other systems, such as the four Galilean moons of Jupiter? The constant C in (2.2) depends on the central body's mass, so it varies from system to system. But the exponent 3/2, and therefore the slope on the log-log plot, remains the same. The scaling relationship (2.2) holds for any planetary system in the universe. It represents a fundamental property of celestial dynamics, which turns out to be Isaac Newton's celebrated law of gravitation. The constants don't matter—what matters is the way that T scales as a function of R.

In the same way, when we ask how the running time of an algorithm scales with the problem size n, we are not interested in whether your computer is twice as fast as mine. This changes the constant in front of the running time, but what we are interested in is how your running time, or mine, changes when n changes. (p.21)

What do we mean by the “size” n of a problem? For problems like EULERIAN PATH, we can think of n as the number of vertices or edges in the input graph—for instance, the number of bridges or riverbanks in the city. For problems that involve integers, such as computing the greatest common divisor, n is the number of bits or digits it takes to express these integers. In general, the size of a problem instance is the amount of information I need to give you—say, the length of the email I would need to send—in order to describe it to you.

2.2.2 Euclidean Scaling

Let's look at a concrete example. How does the running time of Euclid's algorithm scale with n? Here n is the total number of digits of the inputs a and b. Ignoring the factor of 2, let's assume that both a and b are n -digit numbers.

The time it takes to calculate a mod b depends on what method we use to divide a by b. Let's avoid this detail for now, and simply ask how many of these divisions we need, i.e., how many times Euclid's algorithm will call itself recursively before reaching the base case b = 0 and returning the result. The following exercise shows that we need at most a linear number of divisions.

Exercise 2.1 Show that if a ≥ b, then a mod b < a/2. Conclude from this that the number of divisions that Euclid's algorithm performs is at most2 log2 a.

If a has n digits, then 2 log2 aCn for some constant C. Measuring n in bits instead of digits, or defining n as the total number of digits of a and b, just changes C to a different constant. But the number of divisions is always linear in n.

Following Kepler, we can check this linear scaling by making observations. In Figure 2.3 we plot the worst-case and average number of divisions as a function of n, and the relationship is clearly linear. If you are mathematically inclined, you may enjoy Problems 2.6, 2.7, and 2.8, where we calculate the slopes of these lines analytically. In particular, it turns out that the worst case occurs when a and b are successive Fibonacci numbers, which as Problem 2.4 discusses are well-known in the field of rabbit breeding.

Now that we know that the number of divisions in Euclid's algorithm is linear in n, what is its total running time? In a single step, a computer can perform arithmetic operations on integers of some fixed size, such as 64 bits. But if n is larger than this, the time it takes to divide one n-digit number by another grows with n. As Problem 2.11 shows, if we use the classic technique of long division, the total running time of Euclid's algorithm scales as n 2, growing polynomially as a function of n.

Let's dwell for a moment on the fact that the size n of a problem involving an integer a is the number of bits or digits of a, which is roughly loga, rather than a itself. For instance, as the following exercise shows, the problem Factoring would be easy if the size of the input were the number itself:

Exercise 2.2 Write down an algorithm that finds the prime factorization of an integer a, whose running time is polynomial as a function of a (as opposed to the number of digits in a).

However, we believe that FACTORING cannot be solved in an amount of time that is polynomial as a function of n = loga, unless you have a quantum computer.

(p.22)

The Basics

FIGURE 2.3: Scaling in computer science: the number of divisions done by Euclid's algorithm grows linearly with n when given n-digit inputs. In the inset, the typical distribution of a set of random inputs.

2.2.3 Asymptotics

In order to speak clearly about scaling, we will use asymptotic notation. This notation lets us focus on the qualitative growth of a function of n, and ignores the constant in front of it.

For instance, if the running time f(n) of an algorithm grows quadratically, we say that f(n) = Θ‎(n 2). If it grows at most quadratically, but it might grow more slowly, we say that f(n) = O(n 2). If it definitely grows less than quadratically, such as n c for some c < 2, we say that f(n) = o(n 2). Buying a faster computer reduces the constant hidden in Θ‎ or O, but it doesn't change how f(n) grows when n increases.

We define each of these symbols in terms of the limit as n → ∞ of the ratio f(n)/g(n) where g(n) is whatever function we are comparing f(n) with. Thus f(n) = o(n 2) means that limn→∞ f(n)/n 2 = 0, while f(n) = Θ‎(n 2) means that f (n)/n 2 tends neither to zero nor to infinity—typically, that it converges to some constant C. We summarize all this in Table 2.1. In Appendix A.1 you can find formal definitions and exercises involving these symbols.

Exercise 2.3 When we say that f (n) = O(log n ), why don't we have to specify the base of the logarithm?

These definitions focus on the behavior of f(n) in the limit n → ∞. This is for good reason. We don't really care how hard HAMILTONIAN PATH, say, is for small graphs. What matters to us is whether there is an (p.23)

TABLE 2.1: A summary of asymptotic notation.

symbol

C = n lim f ( n ) g ( n )

roughly speaking…

f(n) = O(g(n))

C <

fg

f(n) = Ω(g(n))

C > 0

fg

f(n) = Θ‎(g(n))

0 < C <

f = g

f(n) = o(g(n))

C = 0

f < g

f(n) = ω‎(g(n))

C = ∞

f > g

efficient algorithm that works for all n, and the difference between such an algorithm and an exhaustive search shows up when n is large. Even Kepler's law, which we can write as T = Θ‎(R 3/2), only holds when R is large enough, since at small distances there are corrections due to the curvature of space time.

Armed with this notation, we can say that Euclid's algorithm performs Θ‎(n) divisions, and that its total running time is O(n 2). Thus we can calculate the greatest common divisor of two n-digit numbers in polynomial time, i.e., in time O(nc) for a constant c. But what if Euclid's algorithm isn't the fastest method? What can we say about a problem's intrinsic complexity, as opposed to the running time of a particular algorithm? The next section will address this question with a problem, and an algorithm, that we all learned in grade school.

2.3 Intrinsic Complexity

For whichever resource we are interested in bounding—time, memory, and so on—we define the intrinsic complexity of a problem as the complexity of the most efficient algorithm that solves it. If we have an algorithm in hand, its existence provides an upper bound on the problem's complexity—but it is usually very hard to know whether we have the most efficient algorithm. One of the reasons computer science is such an exciting field is that, every once in a while, someone achieves an algorithmic breakthrough, and the intrinsic complexity of a problem turns out to be much less than we thought it was.

To illustrate this point, let's discuss the complexity of multiplying integers. Let T(n) denote the time required to multiply two integers x and y which have n digits each. As Figure 2.4 shows, the algorithm we learned in grade school takes time T(n) = Θ‎(n 2), growing quadratically as a function of n.

This algorithm is so natural that it is hard to believe that one can do better, but in fact one can. One of the most classic ideas in algorithmsis to divide and conquer—to break a problem into pieces, solve each piece recursively, and combine their answers to get the answer to the entire problem. In this case, we can break the n-digit integers x and y into pairs of n/2-digit integers as follows:

x = 10 n / 2 a + b and y = 10 n / 2 c + d .

Here a and b are the high-order and low-order parts of x, i.e., the first and second halves of x's digit sequence, and c and d are similarly the parts of y. Then

x y = 10 n a c + 10 n / 2 ( a d + b c ) + b d .
(2.3)

(p.24)

The Basics

FIGURE 2.4: The grade-school algorithm for multiplication, which takes Θ‎(n 2) time to multiply n-digit integers.

Of course, on a digital computer we would operate in binary instead of decimal, writing x = 2n /2 a + b and so on, but the principle remains the same.

This approach lets us reduce the problem of multiplying n - digit integers to that of multiplying several pairs of n/2-digit integers. We then reduce this problem to that of multiplying n/4-digit integers, and so on, until we get to integers so small that we can look up their products in a table. We assume for simplicity that n is even at each stage, but rounding n/2 up or down makes no difference when n is large.

What running time does this approach give us? If we use (2.3) as our strategy, we calculate four products, namely ac, ad, bc, and bd. Adding these products together is much easier, since the grade-school method of adding two n-digit integers takes just O(n) time. Multiplying an integer by 10nor 10n /2 is also easy, since all we have to do is shift its digits to the left and add n or n/2 zeros. The running time T(n) then obeys the equation

T ( n ) = 4 T ( n / 2 ) + O ( n ) .
(2.4)

If T(n) scales faster than linearly, then for large n we can ignore the O(n) term. Then the running time is dominated by the four multiplications, and it essentially quadruples whenever n doubles. But as Problem 2.13 shows, this means that it grows quadratically, T(n) = Θ‎(n 2), just like the grade-school method. So we need another idea.

The key observation is that we don't actually need to do four multiplications. Specifically, we don't need ad and bc separately—we only need their sum. Now note that

( a + b ) ( c + d ) a c b d = a d + b c .
(2.5)

Therefore, if we calculate (a + b)(c + d) along with ac and bd, which we need anyway, we can obtain ad + bc by subtraction, which like addition takes just Θ‎(n) time. Using this trick changes (2.4) to

T ( n ) = 3 T ( n / 2 ) + O ( n ) .
(2.6)

Now the running time only triples when n doubles, and using Problem 2.13 gives

T ( n ) = Θ ( n α ) where α = log 2 3 1.585 .

So, we have tightened our upper bound on the complexity of multiplication from O(n 2) to O(n 1.585).

(p.25) Is this the best we can do? To be more precise, what is the smallest α‎ for which we can multiply n-digit integers in O(n α‎) time? It turns out that α‎ can be arbitrarily close to 1. In other words, there are algorithms whose running time is less than O(n1+ε‎ ) for any constant ε‎ > 0. On the other hand, we have a lower bound of T(n) = Ω‎(n) for the trivial reason that it takes that long just to read the inputs x and y. These upper and lower bounds almost match, showing that the intrinsic complexity of multiplication is essentially linear in n. Thus multiplication turns out to be much less complex than the grade-school algorithm would suggest.

2.4 The Importance of Being Polynomial

For practical purposes the difference between polynomial and exponential order is often more crucial than the difference between finite and non-finite.

Jack Edmonds, 1965

Finding an algorithm that multiplies n-digit integers in O(n 1.585) time, instead of O(n 2), reveals something about the complexity of multiplication. It is also of practical interest. If n = 106, for instance, this improves the running time by a factor of about 300 if the constants in the Os are the same.

However, the most basic distinction we will draw in computational complexity is between polynomial functions of n—that is, n raised to some constant—and exponential ones. In this section, we will discuss why this distinction is so important, and why it is so robust with respect to changes in our definition of computation.

2.4.1 Until the End of the World

One way to illustrate the difference between polynomials and exponentials is to think about how the size of the problems we can handle increases as our computing technology improves. Moore's Law (no relation) is the empirical observation that basic measures of computing technology, such as the density of transistors on a chip, are improving exponentially with the passage of time.

A common form of this law—although not Moore's original claim—is that processing speed doubles every two years. If the running time of my algorithm is Θ‎(n), doubling my speed also doubles the size n of problems I can solve in, say, one week. But if the running time grows as Θ‎(2n), the doubling the speed just increases n by 1.

Thus whether the running time is polynomial or exponential makes an enormous difference in the size of problems we can solve, now and for the foreseeable future. We illustrate this in Figure 2.5, where we compare various polynomials with 2n and n!. In the latter two cases, solving instances of size n = 100 or even n = 20 would take longer than the age of the universe. A running time that scales exponentially implies a harsh bound on the problems we can ever solve—even if our project deadline is as far away in the future as the Big Bang is in the past.

Exercise 2.4 Suppose we can currently solve a problem of size n in a week. If the speed of our computer doubles every two years, what size problem will we be able to solve in a week four years from now, if the running time of our algorithm scales as log2 n, n , n, n 2, 2n, or 4n ? (p.26)

The Basics

FIGURE 2.5: Running times of algorithms as a function of the size n. We assume that each one can solve an instance of size n = 1 in one microsecond. Note that the time axis is logarithmic.

The Basics

FIGURE 2.6: Euler's algorithm for EULERIAN PATH. The variable y counts the number of odd-degree vertices.

2.4.2 Details, and Why they Don't Matter

In the Prologue we saw that Euler's approach to EULERIAN PATH is much more efficient than exhaustive search. But how does the running time of the resulting algorithm scale with the size of the graph? It turns out that a precise answer to this question depends on many details. We will discuss just enough of these details to convince you that we can and should ignore them in our quest for a fundamental understanding of computational complexity.

(p.27) In Figure 2.6 we translate Euler's Theorem into an algorithm, and express it in pseudocode. Quantifying the running time of this algorithm is not quite as trivial as it seems. To get us started, how many times will the for loop run? In the worst case, all vertices—or all but the last two—have even degree. Thus the for loop will, in general, run O(|V|) times.

Next we need to specify how we measure the running time. The physical time, measured in seconds, will vary wildly from computer to computer. So instead, we measure time as the number of elementary steps that our algorithm performs. There is some ambiguity in what we consider “elementary,” but let us assume for now that assignments like y := 0, arithmetical operations like y + 1, and evaluations of inequalities like y > 2 are all elementary and take constant time.

The next question is how long it takes to determine the degree of a vertex, which we denote deg(v ). Clearly this is not very hard—we just have to count the number of neighbors v has. But analyzing it precisely depends on the format in which we are given the input graph G.

One common format is the adjacency matrix. This the |V|×| V| matrix A such that

A i j = { 1 if ( i , j ) E , i.e. , there is an edge from vertex i to vertex j 0 otherwise.

We could give this matrix directly as a |V|×|V| array of 0s and 1s. However, if G is a sparse graph that only has a few edges, then there are just a few pairs i, j such that Aij = 1. As Problem 2.18 shows, we can then describe G more efficiently by giving a list of these pairs, or equivalently a list of edges.

Determining deg(v) takes different numbers of steps depending on which of these formats we use. Given the entire adjacency matrix, we would use a for loop with i ranging from 1 to |V|, and increment a counter each time Avi = 1. Given a list of edges (i, j), we could scan the entire list and increment a counter each time either i = v or j = v, and so on.

However, it is not even obvious that we can carry out instructions like “check if Aij = 1” in a single elementary step. If the input is stored on a magnetic tape—an ancient memory technology which our reader is surely too young to remember—it might take a long time to roll the tape to the location of the data we wish to read. Among theoretical models of computation, a Turing machine, which we will discuss in Chapter 7 takes t steps to move to the tth location on its tape, while a machine with random access memory (RAM) can access any location in its memory in a single step. Thus moving from one of these models to another could change our running time considerably.

Finally, we need to agree how we specify the size of an instance. In general, this is the number n of bits it takes to describe it—if you like, the length of the email I would have to send you to tell you about it. This depends on our choice of input format, and n can be smaller or larger depending on whether this format is efficient or inefficient.

All these considerations make it difficult to quantify the running time precisely, and how it scales with the input size, without going into a great deal of detail about our input format, the particular implementation of our algorithm, and the type of machine on which we run our program. These are worthy engineering questions, but the goal of computational complexity theory is to take a larger view, and draw deep qualitative distinctions between problems. So, rather than studying the art of grinding lenses and mirrors, let us turn our attention to the stars.

(p.28) 2.4.3 Complexity Classes

As we saw in the previous section, one of the most basic questions we can ask about a problem is whether it can be solved in polynomial time as a function of its size. Let's consider the set of all problems with this property:

Conversely, a problem is outside P if no algorithm exists that solves it in polynomial time—for instance, if the most efficient algorithm takes exponential time 2ε‎n for some ε‎ > 0.

P is our first example of a complexity class—a class of problems for which a certain kind of algorithm exists. We have defined it here so that it includes both decision problems, such as “does there exist an Eulerian path,” and function problems, such as “construct an Eulerian path.” Later on, many of our complexity classes will consist just of decision problems, which demand a yes-or-no answer.

More generally, for any function f (n) we can define TIME(f (n)) as follows:

In particular, P contains time(n), TIME(n 2), and so on, as well as noninteger exponents like TIME(n log23) which we met in Section 2.3. Formally,

P = c > 0 TIME ( n c ) .

The essential point is that we allow any exponent that is constant with respect to n. Exponents that grow as n grows, like n logn, are excluded from P. Throughout the book, we will use poly(n) as a shorthand for “O(n c) for some constant c,” or equivalently for n O(1). In that case, we can write

P = TIME ( poly ( n ) ) .

If we wish to entertain running times that are exponentially large or even greater, we can define EXP = TIME(2poly(n)), EXPEXP = TIME(22poly(n)), and so on. This gives us a hierarchy of complexity classes, in which the amount of computation we can do becomes increasingly astronomical:

P EXP EXP EXP

But back down to earth. Why is the question of whether a problem can be solved in polynomial time or not so fundamental? The beauty of the definition of P is that it is extremely robust to changes in how we measure running time, and what model of computation we use. For instance, suppose we change our definition of “elementary step” so that we think of multiplying two integers as elementary As long as they have only a polynomial number of digits, each multiplication takes polynomial time anyway, so this at most changes our running time from a polynomial to a smaller polynomial.

Similarly, going from a Turing machine to a RAM, or even a massively parallel computer—as long as it has only a polynomial number of processors—saves at most polynomial time. The one model of computation that seems to break this rule is a quantum computer, which we discuss in Chapter 15. So, to (p.29) be clear, we define P as the class of problemsthat classical computers, like the ones we have on our desks and in our laps today, can solve in polynomial time.

The class P is also robust with respect to most input formats. Any reasonable format for a graph, for example, has size n which is polynomial in the number of vertices, as long as it isn't a multigraph where some pairs of vertices have many edges between them. Therefore, we can say that a graph problem is in P if the running time is polynomial in |V|, and we will often simply identify n with |V|.

However, if we change our input format so drastically that n becomes exponentially larger or smaller, the computational complexity of a problem can change quite a bit. For instance, we will occasionally represent an integer a in unary instead of binary—that is, as a string of a ones. In that case, the size of the input is n=a instead of n=log2 a. Exercise 2.2 shows that if we encode the input in this way, FACTORING can be solved in polynomial time.

Of course, this is just a consequence of the fact that we measure complexity as a function of the input size. If we make the input larger by encoding it inefficiently, the problem becomes “easier” in an artificial way. We will occasionally define problems with unary notation when we want some input parameter to be polynomial in n. But for the most part, if we want to understand the true complexity of a problem, it makes sense to provide the input as efficiently as possible.

Finally, P is robust with respect to most details of how we implement an algorithm. Using clever data structures, such as storing an ordered list in a binary tree instead of in an array, typically reduces the running time by a factor of n or n 2. This is an enormous practical improvement, but it still just changes one polynomial to another with a smaller power of n.

The fact that P remains unchanged even if we alter the details of our computer, our input format, or how we implement our algorithm, suggests that being in P is a fundamental property of a problem, rather than a property of how we humans go about solving it. In other words, the question of whether HAMILTONIAN PATH is in P or not is a mathematical question about the nature of Hamiltonian paths, not a subjective question about our own abilities to compute. There is no reason why computational complexity theory couldn't have been invented and studied thousands of years ago, and indeed there are glimmers of it here and there throughout history.

2.5 Tractability and Mathematical Insight

It is often said that P is the set of tractable problems, namely those which can be solved in a reasonable amount of time. While a running time of, say, O(n 10) is impractical for any interesting value of n, we encounter such large powers of n very rarely. The first theoretical results proving that a problem is in P sometimes give an algorithm of this sort, but within a few years these algorithms are usually improved to O(n 3) or O(n 4) at most.

Of course, even a running time of O(n 3) is impractical if n = 106—for instance, if we are trying to analyze an online social network with 106 nodes. Now that fields like genomics and astrophysics collect vast amounts of data, stored on stacks of optical disks, containing far more information than your computer can hold at one time, some argue that even linear-time algorithms are too slow. This has given rise to a new field of sublinear algorithms, which examine only a small fraction of their input.

But for us, and for computational complexity theorists, P is not so much about tractability as it is about mathematical insight into a problem's structure. Both EULERIAN PATH and HAMILTONIAN PATH can be solved in exponential time by exhaustive search, but there is something different about EULERIAN PATH (p.30) that yields a polynomial-time algorithm. Similarly, when we learned in 2004 that the problem of telling whether an n-digit number is prime is in P, we gained a fundamental insight into the nature of PRIMALITY, even though the resulting algorithm (which we describe in Chapter 10) is not very practical.

The difference between polynomial and exponential time is one of kind, not of degree. When we ask whether a problem is in P or not, we are no longer just computer users who want to know whether we can finish a calculation in time to meet a deadline. We are theorists who seek a deep understanding of why some problems are qualitatively easier, or harder, than others.

Problems

If there is a problem you can't solve, then there is an easier problem you can solve: find it.

George Pólya How to Solve It

2.1 Upgrades.

The research lab of Prof. Flush is well-funded, and they regularly upgrade their equipment. Brilliant Pebble, a graduate student, has to run a rather large simulation. Given that the speed of her computer doubles every two years, if the running time of this simulation exceeds a certain T, she will actually graduate earlier if she waits for the next upgrade to start her program. What is T?

2.2 Euclid extended.

Euclid's algorithm finds the greatest common divisor gcd(a, b) of integers a and b. Show that with a little extra bookkeeping it can also find (possibly negative) integers x and y such that

a x + b y = gcd ( a , b ) .
(2.7)

Now assume that b < a and that they are mutually prime. Show how to calculate the multiplicative inverse of b modulo a, i.e., the y such that 1 ≤ y < b and by ≡ 1 mod a.

Hint: the standard algorithm computes the remainder r = a mod b. The extended version also computes the quotient q in a = qb + r. Keep track of the quotients at the various levels of recursion.

2.3 Geometrical subtraction.

Euclid's original algorithm calculated a mod b by repeatedly subtracting b from a (by marking a line of length b off a line of length a) until the remainder is less than b. If a and b have n or fewer digits, show that this method can take exponential time as a function of n.

2.4 Fibonacci's rabbits.

Suppose that it takes a year for a baby rabbit to mature, and that mature rabbits produce one baby per year. (This also assumes that rabbits are immortal and reproduce asexually, but perhaps you have heard of physicists assuming that cows are spherical.) Suppose that I start my first year as a rabbit farmer with one baby rabbit. Then if F is the rabbit population on the ℓth year, show that the first few values of F are 1, 1, 2, 3, 5, 8,… and that in general, these obey the equation

F = F 1 + F 2 ,
(2.8)

with the initial values F 1 = F 2 = 1. These are called the Fibonacci numbers. Show that they grow exponentially, as rabbits are known to do. Specifically, show that

F = Θ ( φ )

where φ‎ is the “golden ratio,”

φ = 1 + 5 2 = 1.618

(p.31) In other words, find constants A and B for which you can prove by induction on ℓ that for all ℓ ≥ 1,

A φ F B φ .

Hint: φ‎ is the largest root of the quadratic equation φ‎ 2φ‎ – 1 = 0. Equivalently, it is the unique positive number such that

φ 1 = 1 + φ φ .

2.5 Exponential growth, polynomial time.

Using Problem 2.4, show that the problem of telling whether an n-digit number x is a Fibonacci number is in P. Hint: how many Fibonacci numbers are there between 1 and 10n?

2.6 Euclid at his worst.

Let's derive the worst-case running time of Euclid's algorithm. First, prove that the number of divisions is maximized when a and b are two adjacent Fibonacci numbers. Hint: using the fact that the smallest a such that a mod b = c is b + c, work backwards from the base case and show that, if b ≤ a and finding gcd(a, b) takes ℓ divisions, then aF ℓ+1 and b ≥ F.

Now suppose that a and b each have n digits. Use Problem 2.4 to show that the number of divisions that Euclid's algorithm performs is C worst n + O(1) where

C worst = 1 log 10 φ 4.785 .

This is the slope of the upper line in Figure 2.3.

2.7 Euclid and Gauss.

Now let's derive the average-case running time of Euclid's algorithm. First, let x denote the ratio b/a. Show that each step of the algorithm updates x as follows,

x = g ( x ) where g ( x ) = 1 x mod 1 .
(2.9)

Here by a number mod 1, we mean its fractional part. For instance, π‎ mod 1 = 0.14159… This function g(x) is called the Gauss map, and its graph is shown in Figure 2.7. It also plays an important role in the theory of continued fractions: as we iterate (2.9), the integers ⌊1/x⌋ = (1/x) – (1/x mod 1) label the different “branches” of g(x) we land on, and give the continued fraction series of x.

If we start with a random value of x and apply the Gauss map many times, we would expect x to be distributed according to some probability distribution P(x). This distribution must be stationary: that is, it must remain the same when x is updated according to (2.9). This means that P(x) must equal a sum of probabilities from all the values that x could have had on the previous step, adjusted to take the derivative of g into account:

P ( x ) = y : g ( y ) = x P ( y ) | g ( y ) | .

Show that one such distribution—which turns out to be essentially unique—is

P ( x ) = 1 In 2 ( 1 x + 1 ) .

Of course, we haven't shown that the probability distribution of x converges to this stationary distribution. Proving this requires much more sophisticated arguments. (p.32)

The Basics

FIGURE 2.7: The Gaussmap g (x) = (1/x) mod 1.

2.8 Euclid on average.

Continuing the from the previous problem, argue that the average number of divisions Euclid's algorithm does when given n-digit numbers grows as C avg n where

C avg = 1 E [ log 10 x ] .

Here E [ . ] denotes the expectation given the distribution P(x),

E [ log 10 x ] = 0 1 P ( x ) log 10 x d x .

Evaluate this integral to obtain

C avg = 12 In 2 In 10 π 2 1.941 .

This is the slope of the lower line in Figure 2.3. Clearly it fits the data very well.

2.9 The golden ratio again.

To connect Problems 2.6 and 2.7, show that 1/φ‎ = 0.618… is the largest fixed point of the Gauss map. In other words, it is the largest x such that g(x) = x. This corresponds to the fact that if a and b are successive Fibonacci numbers, the ratio x = b/a stays roughly constant as Euclid's algorithm proceeds. Then show that, since [φ‎] = 1, the golden ratio's continued fraction expansion is

φ = 1 + 1 1 + 1 1 + 1

(p.33)

The Basics

FIGURE 2.8: Using long division to calculate a mod b. In this case a = 35 500, b = 113, ⌊a/b⌋ = 314, and a mod b = 18.

Finally, show that cutting off this expansion after ℓ steps gives an approximation for φ‎ as the ratio between two successive Fibonacci numbers,
φ F + 1 F .

2.10 Euclid and Fibonacci.

Use Euclid's algorithm to show that any two successive Fibonacci numbers are mutually prime. Then, generalize this to the following beautiful formula:

gcd ( F a , F b ) = F gcd ( a , b ) .

Note that we use the convention that F 1 = F 2 = 1, so F 0 = 0. Hint: you might want to look ahead at Problem 3.19.

2.11 Long division.

Show that if a has n digits, b ≤ a, and the integer part ⌊a/b⌋ of their ratio has m digits, then we can obtain a mod b in O(nm) time. Hint: consider the example of long division shown in Figure 2.8, where n = 5 and m = 3. If you are too young to have been taught long division in school, humble yourself and ask your elders to teach you this ancient and beautiful art.

Now consider the series of divisions that Euclid's algorithm does, and show that the total time taken by these divisions is O(n 2).

2.12 Divide and conquer.

Show that for any constants a, b > 0, the recursive equation

T ( n ) = a T ( n / b )
(2.10)

has the exact solution

T ( n ) = C n log b a ,

where C = T(1)is given by the base case.

2.13 Divide and a little more conquering.

Now show that if a > b > 1 and we add a linear term to the right-hand side, giving

T ( n ) = a T ( n / b ) + C n ,

then T(n) is still T(n) =Θ‎(n log ba ). In other words, prove by induction on n that there are constants A and B, depending on C and the initial conditions T(1), such that

A n log b a T ( n ) B n log b a

for all n ≥ 1. Feel free to assume that n is a power of b.

(p.34) 2.14 Toom's algorithm, part 1.

After reading the divide-and-conquer algorithm for multiplying n-digit integers in Section 2.3, the reader might well ask whether dividing these integers into more than two pieces might yield an even better algorithm. Indeed it does!

To design a more general divide-and-conquer algorithm, let's begin by thinking of integers as polynomials. If x is an n-digit integer written in base b, and we wish to divide it into r pieces, we will think of it as a polynomial P(z) of degree r – 1, where z = bn/r. For instance, if x = 314159265 and r = 3, then x = P(103) where

P ( z ) = 314 z 2 + 159 z + 265 .

Now, if x = P(z) and y = Q(z), their product xy is R(z), where R(z) = P(z)Q(z) is a polynomial of degree 2r – 2. In order to find R's coefficients, it suffices to sample it at 2r – 1 values of z, say at integers ranging from – r + 1 to r – 1. The coefficients of R are then linear combinations of these samples.

As a concrete example, suppose that r = 2, and that P(z) = az + b and Q(z) = cz + d. Then R(z) = Az 2 + Bz + C is quadratic, and we can find A, B, and C from three samples, R(–1), R(0) and R(1). Write the 3 × 3 matrix that turns (R(–1), R(0), R(+1)) into (A, B, C), and show that the resulting algorithm is essentially identical to the one described in the text.

2.15 Toom's algorithm, part 2.

Now let's generalize the approach of the previous problem to larger values of r. Each sample of R(z) requires us to multiply two numbers, P(z) and Q(z), which have essentially n/r digits each. If we ignore the time it takes to multiply by M, the running time of this algorithm is

T ( n ) = ( 2 r + 1 ) T ( n / r )
(2.11)

which by Problem 2.12 has the solution

T ( n ) = Θ ( n α ) where α = log 2 r 1 log r .

Show that α‎ tends to 1 as r tends to infinity, so by taking r large enough we can achieve a running time of O(n 1+ε‎ ) for arbitrarily small ε‎.

2.16 Toom's algorithm, part 3.

Let's continue from the previous problem. As r grows, the constant hidden in the Θ‎ of (2.11) grows too, since we have to multiply the vector of samples by a matrix M of size O(r 2). This suggests that the optimal value of r is some function of n. Show that, in the absence of any information about M's structure, a nearly optimal choice of r is

r = 2 log n .

Then show that the running time of the resulting algorithm is

T ( n ) = n 2 o ( log n )

which is less than n 1+ε‎ for any ε‎ > 0.

2.17 Fast matrix multiplication.

Suppose we need to compute the product C of two n × n matrices A, B. Show that the naive algorithm for this takes θ‎(n 3) individual multiplications. However, we can do better, by again using a divide-and-conquer approach. Write

A = ( A 1 , 1 A 1 , 2 A 2 , 1 A 2 , 2 ) , B = ( B 1 , 1 B 1 , 2 B 2 , 1 B 2 , 2 ) , and C = ( C 1 , 1 C 1 , 2 C 2 , 1 C 2 , 2 ) ,
,

(p.35) where A 1,1 and so on are n/2 × n/2 matrices. Now define the following seven n/2 × n/2 matrices,

M 1 = ( A 1 , 1 + A 2 , 2 ) ( B 1 , 1 + B 2 , 2 ) M 2 = ( A 2 , 1 + A 2 , 2 ) B 1 , 1 M 3 = A 1 , 1 ( B 1 , 2 B 2 , 2 ) M 4 = A 2 , 2 ( B 2 , 1 B 1 , 1 ) M 5 = ( A 1 , 1 + A 1 , 2 ) B 2 , 2 M 6 = ( A 2 , 1 A 1 , 1 ) ( B 1 , 1 + B 1 , 2 ) M 7 = ( A 1 , 2 A 2 , 2 ) ( B 2 , 1 + B 2 , 2 ) .

Then show that C is given by

C 1 , 1 = M 1 + M 4 M 5 + M 7 C 1 , 2 = M 3 + M 5 C 2 , 1 = M 2 + M 4 C 2 , 2 = M 1 M 2 + M 3 + M 6 .

Again using the fact that the cost of addition is negligible, show that this gives an algorithm whose running time is Θ‎(n α‎) where α‎ = log27 ≈ 2.807. The optimal value of the exponent α‎ is still not known.

2.18 How to mail a matrix.

Given a graph G = (V,E) with |V| = n vertices and |E| = m edges, how many bits do we need to specify the adjacency matrix, and how many do we need to specify a list of edges? Keep in mind that it takes log n bits to specify an integer between 1 and n. When are each of these two formats preferable?

In particular, compare sparse graphs where m = O (n) with dense graphs where m = Θ‎(n 2). How do things change if we consider a multigraph, like the graph of the Königsberg bridges, where there can be more than one edge between a pair of points?

2.19 Polynomial subroutines.

Another sense in which P is robust is that it allows one program to use another as a subroutine. If an algorithm B runs in polynomial time, and an algorithm A performs a polynomial number of steps, some of which call B as a subroutine, show that A also runs in polynomial time. Note that the inputs A sends to B are not the original input, but rather some subproblem that A is interested in, which may be polynomially larger.

Hint: show that the set of polynomial functions is closed under composition. In other words, if f(n) and g(n) are both poly(n), so is their composition f(g(n)).

2.20 A little bit more than polynomial time.

A quasipolynomial is a function of the form f(n) = 2Θ‎(logk n) for some constant k > 0, where logk n denotes (log n)k. Let us define QuasiP as the class of problems that can be solved in quasipolynomial time. First show that any quasipolynomial f(n) with k > 1is ω‎(g(n)) for any polynomial function g(n), and that f(n) = o(h (n)) for any exponential function h(n) = 2Θ‎(n c) for any c > 0. Thus P ⊆ QuasiPEXPTIME.

Then show that the set of quasipolynomial functions is closed under composition. Therefore, QuasiP programs can use each other as subroutines just as P programs can (see Problem 2.19) even if the main program gives an instance to the subroutine whose size is a quasipolynomial function of the original input size.

Notes

2.1 The Book of Chess.

Schaeffer et al. [715] estimated that the number of legal positions in Checkers is 1018. For Chess, the number of possible positions in a game40 moves long was estimated at 1043 by Claude Shannon [731] in (p.36) 1950. In 1994, Victor Allis [37] proved an upper bound of 5 × 1052 for the number of Chess positions and estimated the true number to be 1050.

2.2 Dixit Algorizmi.

The word algorithm goes back to the Persian Astronomer Muhammad ibn Musa al-Khwarizmi, born about 780 A.D. in Khwarezm (now Khiva in Uzbekistan). He worked in Baghdad, serving the caliph Abd-Allah al Mamun, son of the caliph Harun al-Rashid of 1001 Arabian Nights fame. Al-Khwarizmibrought the Hindu number system to the Arab world, from where it spread to Europe, allowing us to write 31 × 27 = 837 instead of XXXI × XXVII = DCCCXXXVII.

In medieval times, arithmetic was identified with al-Khwarizmi's name, and the formula dixit Algorizmi (thus spake al-Khwarizmi) was a hallmark of clarity and authority. Al-Khwarizmi's legacy is also found in the Spanish word guarismo (digit) and in the word algebra, which can be traced back to al-Khwarizmi's book on the solution of equations, the Kitab al-muhtasarfi hisab al-gabr w'al-muqabalah. A good reference on al-Khwarizmi, and the role of algorithms in Mathematics and Computer Science, is Knuth [484].

2.3 Euclid's algorithm.

Euclid's algorithm appeared in his Elements in the 3rd century B.C. in Proposition 2 of Book VII. However, there is some evidence that it was known to Aristarchus and Archimedes [149]. The first proof that it finds the gcd of two n-digit numbers in O(n) steps was given by Pierre-Joseph-Étienne Finck in 1841, who used the argument of Exercise 2.1 to get an upper bound of 2log2 a = (2log210)n on the number of divisions. This is probably the first nontrivial mathematical analysis of the running time of an algorithm. Three years later, Gabriel Lamé gave a slightly better bound of 5log10 a = 5n using Fibonacci numbers. For a history of these results, see Shallit [726]. For a history of the average running time, discussed in Problems 2.7 and 2.8, see Knuth [486].

2.4 The adversary.

Why should we focus on worst cases? Certainly other sciences, like physics, assume that problems are posed, not by a malicious adversary, but by Nature. Norbert Wiener draws a distinction between two kinds of devil, one that works cleverly against us, and another that simply represents our own ignorance [807,] pp. 34–36]. Nature, we hope, is in the second category:

The scientist is always working to discover the order and organization of the universe, and is thus playing a game against the arch-enemy, disorganization. Is this devil Manichaean or Augustinian? Is it a contrary force opposed to order or is it the very absence of order itself?

…This distinction between the passive resistance of nature and the active resistance of an opponent suggests a distinction between the research scientist and the warrior or the gameplayer. The research physicist has all the time in the world to carry out his experiments, and he need not fear that nature will in time discover his tricks and method and change her policy. Therefore, his work is governed by his best moments, whereas a chess player cannot make one mistake without finding an alert adversary ready to take advantage of it and to defeat him. Thus the chess player is governed more by his worst moments than by his best moments.

There are actually three kinds of instance we might be interested in: worst-case ones, random ones, and real ones. In Problem 2.8 we derived the performance of Euclid's algorithm on random numbers, and in Chapter 14 we will consider problems based on random graphs and random Boolean formulas. But for many problems, there doesn't seem to be any natural way to define a random instance. Real-world problems are the most interesting to an engineer, but they typically have complicated structural properties that are difficult to capture mathematically. Perhaps the best way to study them is empirically, by going out and measuring them instead of trying to prove theorems.

Finally, one important reason why computer science focuses on the adversary is historical. Modern computer science got its start in the codebreaking efforts of World War II, when Alan Turing and his collaborators at Bletchley Park broke the Nazi's Enigma code. In cryptography, there really is an adversary, doing his or her best to break your codes and evade your algorithms.

(p.37) 2.5 Fast multiplication: Babbage, Gauss, and Fourier.

The idea of multiplying two numbers recursively by dividing them into high- and low-order parts, and the fact that its running time is quadratic, was known to Charles Babbage—the 19th-century inventor of the Differential and Analytical Engines, whom we will meet in Chapter 7. He wanted to make sure that his Analytical Engine could handle numbers with any number of digits. He wrote [70,] p. 125]:

Thus if a · 1050+ b and a'·1050 + b’ are two numbers each of less than a hundred places of figures, then each can be expressed upon two columns of fifty figures, and a, b, a’, b’ are each less than fifty places of figures… The product of two such numbers is

a a 10 100 + ( a b + a b ) 10 50 + b b .

This expression contains four pairs of factors, aa', ab', a'b, bb', each factor of which has less than fifty places of figures. Each multiplication can therefore be executed in the Engine. The time, however, of multiplying two numbers, each consisting of any number of digits between fifty and one hundred, will be nearly four times as long as that of two such numbers of less than fifty places of figures…

Thus it appears that whatever may be the number of digits the Analytical Engine is capable of holding, if it is required to make all the computations with k times that number of digits, then it can be executed by the same Engine, but in an amount of time equal to k 2 times the former.

The trick of reducing the number of multiplications from four to three, and the resulting improvement in how the running time scales with the number of digits, is the sort of thing that Babbage would have loved. We will spend more time with Mr. Babbage in Chapter 7.

The first O(n log23) algorithm for multiplying n-digit integers was found in 1962 by Karatsuba and Ofman [447]. However, the fact that we can reduce the number of multiplications from four to three goes back to Gauss! He noticed that in order to calculate the product of two complex numbers,

( a + b i ) ( c + d i ) = ( a c b d ) + ( a d + b c ) i

we only need three real multiplications, such as ac, bd, and (a + c)(b + d), since we can get the real and imaginary parts by adding and subtracting these products. The idea of [447] is then to replace i with 10n /2, and to apply this trick recursively.

Toom [778] recognized that we can think of multiplication as interpolating a product of polynomials as described in Problems 2.14–2.16, and thus achieved a running time of O(n 1+ε‎) for arbitrarily small ε‎. This is generally called the Toom-Cook algorithm, since Stephen Cook also studied it in his Ph.D. thesis.

In 1971, Schönhage and Strassen [718] gave an O(n · log n · log log n) algorithm. The idea is to think of an integer x as a function, where x(i) is its ith digit. Then, except for carrying, the product xy is the convolution of the corresponding functions, and the Fourier transform of their convolution is the product of their Fourier transforms. They then use the Fast Fourier Transform algorithm, which as we will discuss in Section 3.2.3 takes O(n log n) time. We outline this algorithm in Problems 3.15 and 3.16; we can also think of it as a special case of the Toom-Cook algorithm, where we sample the product of the two polynomials at the 2rth roots of unity in the complex plane. An excellent description of this algorithm can be found in Dasgupta, Papadimitriou and Vazirani [215].

In 2007, Fürer [302] improved this algorithm still further, obtaining a running time of n · log n · 2O (log*n). Here log* n is the number of times we need to iterate the logarithm to bring n below 2; for instance, log* 65536 = 4 and log* 1010000 < 5. Since log* n is “nearly constant,” it seems likely that the true complexity of multiplication is Θ‎(n log n).

2.6 Matrix multiplication.

The problem of calculating matrix products also has a long and interesting history. Multiplying two n × n matrices requires n 3 multiplications if we use the textbook method, but algorithms that work in time O(nα‎) have been achieved for various α‎ < 3. In 1969, Strassen obtained the algorithm of Problem 2.17, for which α‎ = log27 ≈ 2.807. At the time we write this, the best known algorithm is that of Coppersmith and Winograd [202] with α‎ ≈ 2.376.

(p.38) While clearly α‎ ≥ 2 since we need Ω(n 2) time just to read the input, it is not known what the optimal value of α‎ is. However, there is some very promising recent work on algebraic approaches by Cohn and Umans [190] and Cohn, Kleinberg, Szegedy and Umans [188]. These include reasonable conjectures which would imply that α‎ = 2, or more precisely, that we can multiply matrices in time O(n 2+ε‎) for any ε‎ > 0.

2.7 Moore's Law.

Gordon Moore, a co-founder of Intel, originally claimed in 1965 that the number of transistors in an integrated circuit roughly doubled each year. He later changed the doubling time to two years, and “Moore's Law” came to mean a similar claim about speed, memory per dollar, and so on. While clock speeds have recently leveled off, the real speed of computation measured in instructions per second continues to rise due to improvements in our computers’ architecture, such has having multiple processors on a single chip, speeding up memory access by cleverly predicting what data the program will need next, and so on.

It could be argued that, at this point, Moore's Law has become a self-fulfilling prophecy driven by consumer expectations—now that we are used to seeing multiplicative improvements in our computers every few years, this is what we demand when we buy new ones.

Some technologists also believe that improvements in computing technology are even better described by Wright's Law. This states that as manufacturers and engineers gain more experience, technology improves polynomially as a function of the number of units produced. In this case, the exponential improvements we see are due to the exponential growth in the number of computers produced so far.

However, these improvements cannot continue forever without running up against fundamental physical constraints. If the current exponential growth in chip density continues, by around 2015 or 2020 our computers will use one elementary particle for each bit of memory. At these scales, we cannot avoid dealing with quantum effects, such as electron “tunneling” through potential barriers and jumping from one location to another. These effects will either be a source of noise and inconvenience—or, as we discuss in Chapter 15 of new computational power.

2.8 Exponentials.

Some readers may find it jarring that functions of the form 2nc for any constant c are called simply “exponential.” However, allowing the exponent to be polynomial in n, rather than simply linear, gives the class EXP the same robustness to the input format that P possesses.

2.9 Elementary steps.

We need to be somewhat careful about how liberally we define the notion of an “elementary step.” For instance, Schönhage [717] showed that machines that can multiply and divide integers of arbitrary size in a single step can solve PSPACE-complete problems in polynomial time (we will meet the class PSPACE in Chapter 8). Hartmanis and Simon [364] found that the same is true of machines that can perform bitwise operations on strings of arbitrary length in a single step. Finally, Bertoni, Mauri, and Sabadini [108] showed that such machines can solve #P-complete problems, which we will meet in Chapter 13. A review can be found in van Emde Boas [790].

Both PSPACE and #P are far above P in the complexity hierarchy. Thus if we assume that arithmetic operations can be performed in constant time regardless of the size of the numbers involved, we lose our ability to draw distinctions between hard problems and easy ones. We get a more meaningful picture of complexity if we assume that the cost of arithmetic operations is logarithmic in the size and accuracy of the numbers, i.e., linear in the number of digits or bits that are being manipulated. And this assumption is certainly more realistic, at least where digital computers are concerned.

2.10 The history of polynomial time.

Computational complexity theory as we know it began with the 1965 paper of Juris Hartmanis and Richard Stearns [365], for which they received the Turing Award in 1993. Their paper defines classes of functions by how much time it takes to compute them, proves by diagonalization (as we will discuss in Chapter 6) that increasing the computation time yields an infinite hierarchy of more and more powerful classes, and notes that changing the type of machine can alter the running time from, say, Θ‎(n) to Θ‎(n 2). They also make what has turned out to be a rather impressive understatement: (p.39)

The Basics

FIGURE 2.9: A graph is planar if and only if it does not contain either of these graphs as a minor.

It is our conviction that numbers and functions have an intrinsic computational nature according to which they can be classified… and that there is a good opportunity here for further research.

At around the same time, the idea that polynomial time represents a good definition of tractable computation appeared in the work of Cobham [181] and Edmonds [258]. Cobham says:

The subject of my talk is perhaps most directly indicated by simply asking two questions: first, is it harder to multiply than to add? and second, why? … There seems to be no substantial problem in showing that using the standard algorithm it is in general harder—in the sense that it takes more time or more scratch paper—to multiply two decimal numbers than to add them. But this does not answer the question, which is concerned with … properties intrinsic to the functions themselves and not with properties of particular related algorithms.

He goes on to define a class of functions that can be computed in polynomial time as a function of the number of digits of their input, and recognizes that changing from one type of machine to another typically changes the power of the polynomial but preserves overall.

Edmonds studied a polynomial-time algorithm for MAX MATCHING, an optimization problem that asks how to form as many partnerships as possible between neighboring vertices in a graph. He presents his result by calling it a good algorithm, and says:

There is an obvious finite algorithm, but that algorithm increases in difficulty exponentially with the size of the graph. It is by no means obvious whether or not there exists an algorithm whose difficulty increases only algebraically [i.e., polynomially] with the size of the graph. The mathematical significance of this paper rests largely on the assumption that the two preceding sentences have mathematical meaning…

He then proposes that any algorithm can be broken down into a series of elementary steps, and that once we agree on what types of steps are allowed, the question of whether an algorithm exists with a given running time becomes mathematically well-defined.

For an even earlier discussion of whether mathematical proofs can be found in polynomial time, see the 1956 letter of Gödel to von Neumann discussed in Section 6.1.

2.11 The Robertson-Seymour Theorem.

There are strange circumstances in which we can know that a problem is in P, while knowing essentially nothing about how to solve it. To see how this could be the case, let's start with a simple graph property.

A graph is planar if it can be drawn in the plane without any edges crossing each other. Kuratowski [501] and Wagner [800] showed that G is planar if and only if it does not contain either of the graphs K5 or K 3,3 shown in (p.40) Figure 2.9 as a minor, where a minor is a graph we can obtain from G by removing vertices or edges, or by shrinking edges and merging their endpoints. With some work, we can check for both these minors in polynomial time. While this is far from the most efficient algorithm, it shows that PLANARITY is in P.

Planarity is an example of a minor-closed property. That is, if G is planar then so are all its minors. Other examples of minor-closed properties include whether G can be drawn on a torus with no edge crossings, or whether it can be embedded in three-dimensional space in such a way that none of its cycles are knotted, or that no two cycles are linked. For any fixed k, the property that G has a vertex cover of size k or less (see Section 4.2.4) is also minor-closed.

Wagner conjectured that for every minor-closed property, there is a finite list {K 1, K 2,…} of excluded minors such that G has that property if and only if it does not contain any of them. After a series of 20 papers, Neil Robertson and Paul Seymour proved this conjecture in 2004 [689]. Along the way, they proved that for any fixed K, we can check in O(n3) time whether a graph with n vertices contains K as a minor.

As a result, we know that for any minor-closed property, the problem of telling whether a graph has it or not is in P. But Robertson and Seymour's proof is nonconstructive: it tells us nothing about these excluded minors, or how big they are. Moreover, while their algorithm runs in O(n 3) time, the constant hidden in O depends in a truly horrendous way on the number of vertices in the excluded minors Ki (see [433] for a review). We are thus in the odd position of knowing that an entire family of problems is in P, without knowing polynomial-time algorithms for them, or how long they will take.