(p.173) Appendix: On Complexity Issues
(p.173) Appendix: On Complexity Issues
One of the central theses of this book is that homologies may be established at the level of computational phenotypes, understood as abstract characterizations of the activity of the natural computational systems implemented in the central nervous systems of most animals. Computational homologies are therefore homology relations among the models of computation that may be realized in the said nervous systems “under every variety of form and function” (Owen 1843). Since in our characterization of computational phenotypes we have been appealing to data about computational complexity coming from two different sources—the Chomsky Hierarchy and the theory of computational complexity, we would like to devote this appendix to present a more detailed exposition of these theories, focusing, in particular, on what specific aspects of computation they concentrate on, how the two relate to each other, and how they are applied in our model.
A very important point to be made from the outset is that both theories are concerned with the study of “languages,” which is a technical term in mathematics referring to sets (also in the technical sense) represented in a particular format. A “language” in this sense is a (possibly infinite) set of strings, where a “string” is a concatenation of symbols taken from a finite alphabet. The motivation for this focus on languages is that any set we can think of may be represented in language form, thereby facilitating the study of its properties for reasons that should become clear as the following presentation unfolds. Thus, for example the set of integers ℤ = {…, -1, 0, 1,…} may be represented in binary by the language made up of strings over the alphabet Σ = {0, 1} as ℤ = {…, 10000001, 00000000, 00000001,…}, for an 8-bit system.
Also, both theories are interested in classifying languages into equivalence classes and to find out how these classes relate to each other. Complexity classes are therefore sets of languages sharing some specific property. Classes are organized hierarchically such that both the Chomsky Hierarchy and the hierarchy of computational complexity are in fact sets of classes, and one important subject of study in both theories is to find out answers to such questions as: Is class A a subset of class B? Given that we know that class C is a subset of class D, is this subset relation proper or is it the case that both classes are equal? By offering definite answers to these questions, it is possible to have a better understanding of how each hierarchy is structured. Of course, each theory focuses on different properties of languages in order to determine their membership in a specific class and, thence, their specific position in the hierarchy. We shall discuss these in turn, starting with the Chomsky Hierarchy.
(p.174) The Chomsky Hierarchy is a hierarchy established on the basis of the structural complexity of languages. Thus, the traditional four classes, hierarchically organized as Type3 ⊂ Type2 ⊂ Type1 ⊂ Type0, define four degrees of increasing structural complexity, with the Type3 class being the lowest and Type0 being the highest. Recall that complexity classes are just sets of languages and that languages are themselves sets of strings defined over finite alphabets—there is no direct indication of the complexity of some language in its representation as a set of strings. Chomsky’s insight was that one can nonetheless determine the complexity of languages by studying the properties of those finite devices capable of generating them or, in other words, by setting the minimal requirements for a model of computation to be capable of generating some specific type of language. Chomsky defined two equivalent models of computation appropriate for this: Grammars and automata. Without getting into their formal definitions, one can think of grammars as finite sets of rules that, applied in a sequential manner, eventually yield a string of the language. Automata, on the other hand, are abstract machines that can be run such that after a number of well-defined steps they will output a string of the language. Alternatively, an automaton may be seen not as a generator, but as an acceptor, such that if it is given an arbitrary string it will output a “yes” or a “no,” signaling whether the string belongs to the language in question or not.^{1}
Chomsky also showed that in order to determine the basic structural differences that set apart the languages belonging to different classes in the hierarchy it was just necessary to introduce minimal adjustments to their respective models of computation. Thus, in the case of grammars, Chomsky demonstrated that by gradually relaxing the constraints on the format of rules, it became possible to generate languages of increasing complexity. Now we need to introduce some formal details to make this more explicit. The general format of a grammar rule is x → y, where x and y are strings of symbols, such that the interpretation of the rule is something like the following: “At this step of the derivation of the string, replace substring x with substring y.” Now, grammars have two types of symbols, terminals and nonterminals, where the set of terminals is the alphabet from which the strings of the language are defined, whereas nonterminals never show up in the final string, since they are just auxiliary symbols that help to drive the derivation. Thus, for example, the rule S → bQ contains a single nonterminal on its left-hand side and a terminal followed by a nonterminal on its right-hand side, where we follow the established convention of representing nonterminal symbols with capital letters. Now, suppose we are trying to (p.175) derive a string in some language and that we have reached a point in the derivation in which we have the string aaaS. Given the rule above, we can apply it to get the string aaabQ. The process will only end when all nonterminals have been successfully replaced with terminals. In principle, both the left-hand side and the right-hand side of a rule can be a string made up of any combination of terminals and nonterminals, but, by defining constraints on the composition of the left- and the right-hand strings, we get different types of grammars capable of generating different types of languages; this is summarized in table A.1.
There are a number of important things to note here. Recall, first, that from the mere inspection of a string we cannot decide whether it belongs to a specific set, since the string bears no record of its structural properties. The only way to decide this question in the affirmative or in the positive is by reference to some grammar or automaton. Thus, the question “Does string s belong to set C?” must in fact be posed as “Given grammar G or automaton A, does string s belong to set C?” This is a decision problem and, as we have just seen, decision problems can only be framed
Table A.1. Constraints on rules of grammars
Type of language |
Constraints on rules |
---|---|
Recursively enumerable |
x → y, where x and y are strings of both terminals and nonterminals. |
Context-sensitive (recursive) |
x → y, |x| ≤ |y|, where x and y are strings of both terminals and nonterminals.^{a} |
Context-free |
A → y, where A is a single nonterminal and y is a string of both terminals and nonterminals.^{b} |
Regular |
A → wB, A → w, where A and B are nonterminals and w is a string of terminals. |
Type 0 grammars have no constraints on their rules, hence the term “unrestricted” often used to refer to them; context-sensitive grammars require that the left-hand side string be no longer than the right-hand side one; context-free grammars require that the left-hand side be a single nonterminal; regular (also left- or right-linear) grammars additionally require that the right-hand side be either a string of terminals or a string of terminals followed by a single nonterminal (or in the reverse order if left-linear).
^{a} Strictly speaking, this constraint defines the rules for grammars capable of generating any recursive set. Context-sensitive grammars would add the following, slightly stronger, constraint: xAy → xwy, i.e., that the substring made up of the single nonterminal A rewrites as the substring w; note that the (possibly null) substrings x and y remain constant on both sides of the arrow and thus define the “context” in which A is to be rewritten as w, hence the term context-sensitive grammar.
^{b} In other words, a context-free rule is the result of requiring that the substrings defining the context in a context-sensitive rule be null. Any context-free grammar can moreover be normalized to the much more familiar format A → BC, A → a, where A, B, and C are nonterminals and a is a single terminal, such that there are rules introducing only nonterminal symbols and rules introducing only terminal symbols. Grammars in this format are said to be in Chomsky Normal Form.
Thus, if the structural complexity of a set can only be assessed with respect to one or another model of computation, it should not come as a surprise that Chomsky has always insisted on the fact that what is important in the study of a language (any language, natural languages included) is its grammar, not just the strings that make it up. This observation is at the core of the whole generative linguistics enterprise, since, as also pointed out by Chomsky, there are many possible different grammars one can think of capable of generating exactly the same stringset; we shall say in this case that these grammars are all “weakly equivalent” or that they have the same “weak generative capacity.” The challenge, when our focus of interest is natural language, is, therefore, which of all the imaginable weakly equivalent grammars is the one that really captures the actual structure of natural language expressions or, using Chomsky’s own words, which grammar is the descriptively adequate one. Note that descriptive adequacy is defined not just with respect to weak generative capacity but, rather, with respect to a stronger condition incorporating the notion of structural description. Thus, two weakly equivalent grammars are not necessarily also “strongly equivalent,” i.e., they do not necessarily assign the same structural descriptions to the strings of a set or have the same “strong generative capacity.” Note that the notion of strong generation somehow transcends the notion of model of computation, since grammars and automata, as defined in formal language theory, are only weak generators, not strong ones; they do not assign structural descriptions to strings in a set.^{2} Importantly, though, this does not make all the complexity results presented in the preceding discussion irrelevant, because another crucial outcome of Chomsky’s original work is that, whatever the descriptively adequate grammar for natural language eventually turns out to be, its power will be beyond that of context-free grammars, and, as shown in later work already reported in other chapters of this book, it most probably lies within the power of a mildly context-sensitive one. Therefore, the Chomsky Hierarchy sets a minimal lower bound of complexity for any language—also for natural language, which is our model of reference for other complexity results used in this book.
(p.177) That said, it should be clear by now why our computational phenotypes must be interpreted as abstract characterizations of models of computation implemented by natural computational systems. If we wanted to build a performance model of, say, a human’s or a Campbell’s monkey’s verbal behavior, or of a weaverbird’s knotting abilities, the model would, among other things, incorporate a descriptively adequate grammar for the system to be able to assign the correct structural descriptions to the corresponding representations or, in other words, to strongly generate the appropriate set of structural descriptions. Strong generation, however, presupposes weak generation and, therefore, a computational phenotype defines the minimal computational architecture for a performance model to be able to account for the complexity of the behavior it subserves.
Thus, along with the Chomsky Hierarchy of languages, we can assume that there exist two additional hierarchies, a hierarchy of grammars and a hierarchy of automata, corresponding to the models of computation capable of generating the languages in the said hierarchy of languages. Another important claim in this book is that models of computation—and, hence, also computational phenotypes—can be differentiated only by reference to the kind of memory regime they incorporate. This is perhaps not entirely obvious in our previous discussion on the different types of grammars, but a closer inspection of the constraints on rules presented in table A.1 reveals that this is in fact the case. Consider, for example, right-linear grammars, a type of regular grammars in which nonterminal symbols can only be introduced as the rightmost symbol of a string made up of just terminals. This constraint has the consequence that each step in the derivation is strictly dependent on the immediately preceding step, such that rules do not carry any information whatsoever of what was done in the preceding steps nor can they condition what will happen two or more steps ahead. If we now turn to context-free grammars, we see that the role of nonterminals is precisely that of carrying information of what has been done in order to determine what will have to be done a number of steps ahead.^{3} Similarly with the other grammars, although by letting the rules be more complex, also more complex operations can be carried out. The role of memory is probably also not obvious from the traditional characterization of automata. Thus, even though this is clear in the case of finite-state automata vs pushdown automata, the definition of a linear-bounded automaton for context-sensitive languages is not based on that of the pushdown automaton and this blurs the connection between the different types of automata.
Informally, a finite-state automaton (acceptor version) is defined as a mechanical device with an input tape and a finite control unit. In the former the input string is stored and the latter incorporates a read-only head that scans the input sequentially (p.178) from left to right. We assume that the grammar is somehow hardwired in the control unit and that it drives the unit towards the end of the input string as the head reads the symbols that make it up. If the unit reaches the end of the string and halts in a final state, we say that the automaton has accepted the string. A pushdown automaton is built over this basic architecture with the addition of the pushdown stack and an extra write-erase head capable of manipulating its contents. Thus, as the read-only head scans the input, the write-erase head can push and pop symbols onto and off the stack, respectively, but following a first-in-last-out regime. The conditions for acceptance are similar to those for the finite-state automaton.^{4} Now, a linear-bounded automaton is defined as a device with a finite control unit and a single read-write head pointing to the input tape. The unit can both move to the left and to the right and the head can perform any of the two operations on symbols of the input string, either just reading them or overwriting them with another (possibly the same) symbol, with the only constraint being that the movements of the control unit are limited by the length of the input; i.e., the head cannot go to the left of the first symbol nor to the right of the last one. This constraint has the net effect of imposing limitations on the working space of the device and it is in fact a limitation on memory, but it is hard to see the connection with the memory stack of the pushdown automaton, since a linear-bounded automaton is after all a Turing machine with memory limitations.
Our claims concerning memory find however stronger support in the work by David Weir on embedded pushdown automata and their generalization, which defines a progression of increasingly more complex automata where at each stage the degree of nesting of the store is increased to give the next member of the progression. Along with this hierarchy of automata, a hierarchy of languages is defined that is properly contained in the context-sensitive languages and which includes the context-free languages as its structurally simplest family. It is within this context that our claims concerning working memory need to be understood, since here it can more clearly be seen that the structural complexity of a language is a direct function of the sophistication of the storage type associated to the corresponding acceptor, which is always of the pushdown type. Our space of computational phenotypes could then be increased accordingly by including those phenotypes lying between CompuType3 and CompuType0 and equivalent to the corresponding automata within the pushdown hierarchy. It is eventually an empirical question which of these phenotypes would characterize some possible natural computational system and which would, along with CompuType0, fall within the class of theoretically possible but naturally impossible phenotypes. It may be the case, for example, (p.179) that constraints on the nature and structure of neural phenotypes impose limitations on the types of storage systems that a brain can implement, but we don’t know for the time being whether this is the case or not; this is an open question and a new avenue of research suggested by our proposals in this book.
What we know for sure is that CompuType0 is impossible because it is equivalent to a Turing machine and Turing machines have no memory limitations; indeed, they have an infinite amount of time and space to do their work. Note, then, that the Chomsky Hierarchy is only concerned with those languages that can be mechanically generated by a Turing machine (or some grammar), but recall that an important point in the study of languages concerns finding out whether a given string is a member in some specific set. Remember also that this can only be established by reference to some model of computation. Now, it happens that this kind of question is not always answerable, and that there is a class of languages within the hierarchy for which the decision problem has no solution. This point deserves some discussion, since here is where the Chomsky Hierarchy and the theory of computational complexity meet each other.
Let us take the Turing machine as our basic model of computation. Now, if we were to start playing the “decision game” by asking some appropriate Turing machine whether some arbitrary string s is a member of some language L in the Chomsky Hierarchy, we would find out that, for some languages, there will be at least a Turing machine M capable of answering “yes” if s ∈ L and “no” if s ∉ L. We will say in this case that the machine “decides” the language or that the language is “decidable;” if a language is decidable, we call it a “recursive language.” Note that the property of being recursive has an interesting consequence: We can build a Turing machine M′ that works in exactly the opposite way to M above, such that it will answer “yes” when s ∉ L and “no” when s ∈ L. In other words, machine M′ is able to decide the complement set of L, L′, which implies that the complement of a recursive set is also recursive, or that recursive languages are closed under complementation—a language is recursive if and only if its complement is also recursive.
We keep on playing the decision game and eventually we come across the following situation: There are certain languages for which a Turing machine is able to answer “yes” if s ∈ L, but that is unable to halt and answer “no” if s ∉ L. In this case we simply say that the Turing machine “accepts” the language and call the language “recursively enumerable.” Note that, by the definition of acceptance just given, any recursive language is also recursively enumerable, but not the other way around—some recursively enumerable languages are not recursive and are in fact undecidable. Now, if a language is not recursive, then, given the set-theoretic definition of recursive, its complement is not recursive either, indeed, it is not even recursively enumerable, because recursively enumerable languages are not closed under complementation: For a language and its complement to be both recursively enumerable, at least one of them has to be recursive.
(p.180) Thus, there are recursively enumerable languages that are not recursive and, even worse, there are languages that are not even recursively enumerable. This shows that the universe of languages very much transcends that of the Chomsky Hierarchy—there are languages that the Hierarchy does not even contemplate, but it most crucially shows that we can look at languages as if they were “problems.” Patently, one thing we want to know about a problem is if it is solvable at all, and we have just seen that some problems are not: The undecidable ones. There are many other things we may want to know about a problem apart from its solvability—we may want to know, for example, how hard it is to solve it. And here enters the theory of computational complexity.
At the beginning of this appendix we saw that this theory is also concerned with sets, and we have just seen that it looks at sets as if they were problems in order to determine how hard it is to solve them. To be sure, one important assumption of the theory of computational complexity is that any problem we can think of may be reduced to a language recognition problem in order to assess its complexity. Thus, the very same languages that fall under the focus of the Chomsky Hierarchy fall also under the focus of the theory of computational complexity, with the additional fact that the latter is also interested in languages within the realm of the undecidable and beyond, since some problems may indeed turn out to be not just hard to solve but actually unsolvable.
There are many ways to assess the hardness of a problem, its complexity, but the most common one in computational complexity is to determine the maximal amount of time and space resources consumed by a Turing machine for its solution. Thus, the theory of computational complexity assumes a fixed model of computation with reference to which complexity measures are defined, but not necessarily a fixed mode of computation: The same model (the Turing machine) operating in different modes (e.g., deterministic vs nondeterministic) may yield different complexity results, because nondeterministic devices are more powerful than deterministic ones and certain problems that do not find an efficient solution in deterministic mode may have one in nondeterministic mode.
Switching from one to another mode of computation is nevertheless too crude a measure of complexity, especially when the focus is on efficiency. In order to measure the complexity of algorithms, the theory of computational complexity also pays attention to the resources consumed by the device to solve the problem. The most common resources of interest (there are others) are time and space, which, for a Turing machine, are defined respectively as the number of steps to halting and the number of cells in the tape visited during the whole computation. Note, however, that Turing machines are devices with an unlimited amount of resources, meaning that, if a problem is decidable then a Turing machine will eventually halt, but possibly after having spent a much too large amount of time and space. Inasmuch as one is concerned with the efficiency of algorithms one needs hence to find out whether a (p.181) solution is reachable within some feasible time and/or space bounds. The usual practice in this case (and also the easiest result one can prove for an algorithm) is to determine an upper bound of complexity, i.e., the worst case beyond which efficiency severely decays. It is nonetheless also possible (but also much more difficult) to determine a lower bound of complexity, meaning that if we are able to locate the lower complexity bound for a problem in some class C, then any possible algorithm we might devise for it will always be at least as complex, i.e., that the problem in question is as hard as any of the hardest problems in C (it could be harder, but not easier); in those cases, we say that the problem is C-hard.
Back to complexity measures in terms of time and space, these are defined as functions on the length of the input string telling us the rate at which time or space grows as the length of the input grows, expressed, in the so-called “big oh notation,” as f(n) = O(g(n)), where n is the length of the input. These functions may be seen as limits where n approaches ∞ (there is no theoretical upper bound for the length of the input) and, consequently, telling us the rate at which time or space resources grow as the length of the input grows. While the rate of growth of the latter is always linear, that of the former need not be linear, but it may rather be polynomial (i.e., O(n^{c}), c a constant), exponential (i.e., O(c^{n}), c a constant), or even factorial (i.e., O(n!)). Thus, if for example we have an algorithm running in time O(n^{2}) and another running in time O(2^{n}), little difference will be observed for small values of n, but it is clear that the exponential one will soon become inefficient as exponential functions grow much faster than polynomial ones. Of course, some functions may identify algorithms showing better values than polynomial ones, such as linear (i.e., O(n)), logarithmic (i.e., O(log n)), or constant (i.e., O(c)) values. Now remember that, unless we are aiming at solving the more difficult task of setting a lower bound of complexity for some problem, these functions are to be interpreted as worst-case solutions fixing an upper bound of complexity. Thus, and assuming we were studying time bounds, the results for the two algorithms above should be interpreted in the sense that, in the worst case, we will reach a solution in a time equal to the square of the input in the polynomial case, but equal to 2 to the power of the length of the input in the exponential case. We will then classify the corresponding languages as members of the classes TIME(n^{2}) and TIME(${2}^{{n}^{1}}$), respectively.^{5} Given that the exponent in a polynomial may be any integer, there are infinitely many polynomial and exponential classes identifying more or less complex languages and defining a hierarchy of complexity within the polynomial and the exponential realm. These two hierarchies, defined as the union of all the polynomial classes on the one hand, and the union of all exponential classes on the other, specify two important complexity classes denoted respectively as P and EXP. Since P and EXP are themselves hierarchies, it follows that (p.182) some problems in P, for example, are harder than others. The hardest problems in P are the P-complete problems, where a C-complete problem, C a complexity class, is a problem that is at least as hard as any of the hardest problems in C (i.e., it is C-hard) and it is known to be in C. Complete problems are very important because they are “model” problems for which both a lower and an upper bound of complexity is known and these are defined by the class in which the problem belongs; in other words, a P-complete problem is at least as hard as any other problem in P but not harder. Accordingly, if we manage to show that some problem P_{n} we are studying is equivalent to some known complete problem P_{c},^{6} we will have an exact and precise characterization in complexity terms of the problem in question, since we will know that P_{n} is at least as hard as P_{c} and it may eventually be the case that it is also a complete problem (i.e., in the same class as P_{c}).
Now, P and EXP are both deterministic classes, meaning that any problem within these classes can be solved efficiently in, respectively, polynomial or exponential time by a Turing machine working in deterministic mode. Nondeterministic time classes are a bit different. This is mostly due to the fact that nondeterminism is still a poorly understood notion and that the definition of recognition is weaker for nondeterministic Turing machines than it is for deterministic ones. Given the fact that a nondeterministic Turing machine, at any point of the computation, has at least two choices to follow, time measures do not refer to all the possible steps in a single computation (these would be too many) and are calculated differently assuming that at least one path yields to acceptance. The most important nondeterministic time class is NP (for Nondeterministic Polynomial) and, like P, is defined as the union of all nondeterministic classes with polynomial characteristic functions. The discovery that some problem is in NP means that the Turing machine will reach a solution following some path and that the correctness of the solution can be verified through a succinct certificate (or polynomial witness) in polynomial time. The succinct certificate is an external device that can be consulted every time a “yes” state has been reached and that provides an efficient procedure to check the result.
Turning briefly to space complexity classes, these are constructed in exactly the same way as time complexity classes, with the proviso that space is taken to be a more costly resource than time because it can be reused. In the case of space, then, polynomial functions identify very hard problems, close to intractability. For this reason, when dealing with space, logarithmic or linear bounds are preferred over polynomial bounds, although the class that will be of interest for us here is, precisely, PSPACE (for Polynomial Space), which is a deterministic class.
(p.183) Finally, the four classes considered here are related by inclusion, composing the following hierarchy of increasing complexity: P ⊆ NP ⊆ PSPACE ⊆ EXP. Note that the inclusions are not known to be proper, meaning that the classes might turn out to be equal. Indeed, the question whether P = NP is one of the most important unsolved problems in complexity theory.
Now that we have a detailed definition of the two complexity hierarchies, the structural and the computational, let us see how the two are connected. Take first the case of natural language, which as we have seen is structurally more complex than context-free but less so than context-sensitive, with a structural complexity equivalent to that of a mildly context-sensitive language. Recall that these complexity results refer to sets and that these very same sets will appear as members of some computational complexity class represented as recognition problems. The problem CONTEXT-FREE RECOGNITION is in P; the problem CONTEXT-SENSITIVE RECOGNITION is in PSPACE;^{7} then, the problem MILD-CONTEXT-SENSITIVE RECOGNITION should fall somewhere in between, or within any of the two bounding classes—a conjecture that finds its confirmation in the results reported in Chapter 7 that language computations are NP-complete. Note that these inferences are possible because both theories deal with the very same entities (i.e., languages), but classify them according to different criteria. Therefore, two languages known to be in the same structural class (e.g., context-free) will fall within the same complexity class (i.e., P), and vice versa; two languages known to be in the same computational complexity class will necessarily fall within the same structural class. Similarly, if natural language is NP-complete and knot recognition is NP-complete too, then both problems are equivalent on computational complexity grounds, but, qua languages, they will be represented by sets of an equivalent structural complexity and, hence, each specifiable by an automaton/grammar also of equivalent complexity.
It is thus in this sense that for example our claims concerning the equivalence of language and knot recognition must be interpreted and that this implies the presence of at least a computational phenotype of a specific kind.
Further Reading
In the foregoing presentation we tried to provide an informal and accessible introduction both to formal language and automata theory and to computational complexity theory, hopefully sufficient to understand our proposals concerning computational phenotypes and their homologies. There may be readers, however, willing to go deeper into the details and it is for them that we compiled this brief bibliographical note.
(p.184) Chomsky’s original work on formal languages and automata is scattered across a number of papers published between 1953 and 1963, but Chomsky (1956a), Chomsky (1959b), and especially Chomsky (1963) remain perhaps as the most complete and comprehensive references. Chomsky (1957) is also a good summary of these results, in particular for his argument against the adequacy of context-free grammars. The issue of weak vs strong generative capacity is discussed at length in Chomsky (1965). For additional information on languages and automata, and the equivalence between grammars and automata, the reader may want to consult Hopcroft and Ullman (1979) and Lewis and Papadimitriou (1981), although the latter focuses mostly on regular and context-free languages. As for the theory of computational complexity, the paper by Fortnow and Homer (2003) offers a short historical survey of its main past and present concerns. For a more complete presentation, Papadimitriou (1994) is an excellent source of information, in particular for all concerning the issues of determinism vs nondeterminism, the relations among different complexity classes and their characterization, and for descriptions of some model problems in computational complexity theory. The differences between the structural or qualitative approach of formal language theory and the quantitative one of complexity theory are discussed in Hopcroft and Ullman (1979: Ch. 12) and Lewis and Papadimitriou (1981: Ch. 7); Chapter 12 of Hopcroft and Ullman’s book also contains a good introduction to time and space bounds that may be a good complement to that of Papadimitriou (1994). Finally, an excellent—and updated—text with a clear and comprehensive introduction to both formal language and complexity theories and the connections between each other is Webber (2008).
Notes:
(^{1}) There are other ways to conceptualize acceptance by an automaton without assuming that it actually outputs anything. For example, we can define acceptance as the situation occurring when the automaton enters a final state and halts, while rejection would correspond to either not halting or halting in a non-final state, for example. Later on we will see that we need to refine the notion of rejection and that the alternatives presented here are in fact not equivalent.
(^{2}) Thus, our discussion in Chapter 5, Section 5.2, of the differences between the strings a_{i}a_{j}a_{k}b_{k}b_{j}b_{i} and a_{i}a_{j}a_{k}b_{i}b_{j}b_{k} dealt with issues of strong generative capacity. Indeed, from the point of view of weak generative capacity both strings are the same, but the alternative structural assignment suggested by the subindices in the second string could only be captured by a grammar more powerful than a context-free grammar. Similarly, our discussion at the end of Chapter 7 (Section 7.3) of aural pattern-recognition experiments was motivated by the fact, presented in detail here, that many different grammars, even with different computational power, can weakly generate the same set of strings, so additional information is needed in order to assess exactly what computational model underlies the observed behavior.
(^{3}) This is most clearly seen in the demonstration that each context-free grammar is accepted by some pushdown automaton; see the Further reading section at the end of this Appendix for references.
(^{4}) But not exactly the same, since pushdown automata are nondeterministic machines and, unlike finite-state automata, their deterministic version is not equivalent to the nondeterministic one. Deterministic pushdown automata only accept a subfamily of context-free languages, lying properly between regular sets and the rest of the context-free languages.
(^{5}) In exponential classes, it is usually assumed that the base is fixed to 2, while the exponent is in fact a polynomial n^{k}, such that exponential growth is determined by variation on the value of k.
(^{6}) Where equivalence is demonstrated by the possibility of reducing the complete problem to the problem under study. Reductions are conversions of instances of some problem of known complexity to instances of the problem whose complexity we wish to determine.
(^{7}) It is in fact PSPACE-complete.