Issue #18 -- February 2007. Pp. 109-128
Truth, and Realism: Toward a Multi-level Theory of Knowledge
Copyright © by Paul Bowyer and Sorites
Facts, Truth, and Realism: Toward a Multi-level Theory of Knowledge
It is a striking fact of modern life that the predictions of our best scientific theories in the physical and biological sciences are routinely confirmed to a high level of accuracy. This is what makes our modern technology possible. The question of why scientific theories are able to make such accurate predictions has received considerable attention in recent years (e. g. Psillos, 1999; Sankey, 2001; Stanford, 2003). The purpose of this paper is to suggest a possible answer to that question. It will defend a version of what is called «scientific realism,» which says that theories make accurate predictions because they are approximately true.
Scientific theories are expressions of human thought. It is natural to think of thought as being largely imagination, and of imagination as consisting of images that resemble things in the world. On that hypothesis, theories can predict events by a sort of simulation, the way we predict the behavior of a jet plane in flight from the behavior of a model jet in a wind tunnel. But there are convincing arguments that this picture, sometimes called the «copy theory of representation,» is wrong (Cummins, 1989, pp. 27-34; Fodor, 1975, pp. 174-195).Foot note 1 There is also good evidence, although in this case it is more controversial, that thoughts are more like sentences than images, and that the principle that causes one thought to succeed another is essentially logical inference from premises to conclusions.Foot note 2 At the same time, for a theory to make accurate predictions, it must act as a sort of map to reality. We can liken thoughts to the dots on a map, and facts to the cities represented by the dots. Then the principle by which one thought leads to another must parallel that by which one fact leads to another, much as the lines between dots on the map must parallel the roads between cities. This kind of parallel is an isomorphism, so that in order to make correct predictions thoughts must be at least partially isomorphic with facts. Thus when we abandon the copy theory of representation, we are forced to search for another explanation of the relationship between thoughts and facts, which can tell us how thoughts or ideas in the mind, being sentences connected by logical inference, can be isomorphic with events in the world. The purpose of this paper is to supply that explanation.
I will argue in the remainder of this introduction that thoughts have a special capacity to mirror and anticipate facts because both thoughts and facts are subject to the rules of logic, that is, to the rules by which one idea «follows from» another.Foot note 3 But in order to state this relationship in our theory we must be able to talk about facts, and to do that we need to incorporate our present well-established scientific theories into an overarching logical structure, containing a metatheory consisting of descriptions of the sentences of those scientific theories. Using this metatheory, we can view facts as sentences related by logical inference. At the same time, the language of thought hypothesis allows us to view the physical states of the human brain as encodings of sentences, where these encodings are related by a physical process that is equivalent to logical inference. Thus from the point of view of the metatheory there is an isomorphism between thoughts and facts, which means that there is an isomorphism between theories and facts, since theories are expressions of thought. We will see that this isomorphism provides a possible solution to certain problems in epistemology, such as the problem of representation (Cummins, 1989), and the semantics of the propositional attitudes (Quine, 1986). Our metatheory also suggests how statements about such concepts as «reference» and «truth» can be incorporated into a parsimonious theory that makes verifiable predictions about observable facts -- in this case the fact of the success of science. In the following sections, I will introduce a thought experiment by means of which we can understand in detail the nature of the isomorphism between mental sentences and facts. The last two sections include a brief defense of the theory and the conclusion.Foot note 4
I have given a central role to the relationship between sentences that we call «logical inference.» When we try to get a clear idea of the nature of this relationship, we find that it consists of the process of formal inference. Let us remind ourselves how formal inference works by means of a simple example. Consider how we would derive the conclusion «Socrates is mortal» from the premise «all men are mortal and Socrates is a man.» This premise is a combination of «all men are mortal,» which is called the «major premise,» and «Socrates is a man,» which is called the «minor premise.» In formal logic, we would paraphrase «all men are mortal» as «for all values of the variable x, if x is a man, then x is mortal.» We then match the phrase «x is a man» with «Socrates is a man,» assigning the word «Socrates» as the value of the variable «x», and then substituting this value in «x is mortal» to get the conclusion, «Socrates is mortal.» We see that formal logic acts something like an assembly line for sequences of symbols, processing old sentences into new ones according to mechanical rules. We can call such a series of sentences a «chain of inference.» Notice that our explanation of formal logic required us to describe the sentence, «Socrates is a man» as consisting of the words «Socrates,» «is,» «a,» and «man,» in that order, since otherwise the phrase «corresponding positions» would have had no meaning. We also needed to know that the symbol «x» functions as a variable. Let us define the «description» of a sentence as the collection of all these statements taken together, that is, the collection of all the various pieces of information that are important to know about a sentence when we apply formal inference to it. Let us also define the phrase «higher logical level» as follows: one sentence is at a higher logical level than another if the first sentence is part of a logical system that contains a description of the second. Thus the sentence, «'Socrates is mortal' consists of the words `Socrates,' `is,' and `mortal' in that order,» is at a higher logical level than the sentence, «Socrates is mortal.»
If we want to develop a theoretical approach that can deal with the isomorphism between theories and facts, we must be able to refer to such things as «theories» and «facts,» for the same reason that we must be able to refer to atoms and electrons if we wish to discuss chemistry. We can think of facts as whatever it is that causes our experience. In science, they are whatever makes our experiments and other observations turn out the way they do. It is a difficult question how to refer to facts in a theory (we will return to this question later), but let us provisionally identify facts with the sentences of an idealized theory of nature -- the ultimate theory that accounts for all observations (see Peirce, 1878/2001). When we need, for expository purposes, to identify particular facts, we will equate them with sentences of our well-established physical and biological theories, since these are the sort of thing that, in everyday life, we provisionally regard as facts. Because our metatheory consists of descriptions of those sentences that we have chosen to call «facts,» we are thus able to refer to facts in our metatheory. Notice that scientists, when they criticize and test their theories, must form descriptions of the sentences of those theories in order to determine whether those sentences are consistent with each other and what their logical implications are. They must do this whether or not they are clearly aware that that is what they are doing. Thus our metatheory resembles a formal statement of what scientists say about their theories when they criticize them. For example, one of our physical facts will be «electrons have a negative charge,» which we can translate into a somewhat more formal language as «charge electron negative,» putting «charge» as the predicate and «electron» and «negative» as the arguments. Now we can say that the description of this sentence in our metatheory will include the sentences, «The sentence `charge electron negative' consists of the symbols `charge,' `electron,' and `negative,' in that order,» «the predicate of the sentence is `charge,'» «the first and second arguments of the predicate are `electron' and `negative,'» and so on.
We have seen how we can talk about facts in our metatheory, but we also need to be able to talk about theories. In this connection the word «theory» refers, not to those idealized statements that we regard as facts, but to hypotheses that we might want to test. Let us refer to speculative hypotheses of this kind as «thoughts,» since they are the thoughts of scientists trying to understand the causes of some set of phenomena. What role should thoughts play in our overall theoretical structure? I will adopt here the «language of thought» (or LOT) hypothesis (Fodor, 1975), because I think it is by far the simplest explanation we have of human behavior. According to that hypothesis, as I will use it here, the cognitive states of the human brain can be thought of as sentences in a mental language, or «mental sentences.» But if this view is correct, then there are physiological states of the brain that can be described at a certain level of abstraction as sequences of symbols that form sentences. (In much the same sense we refer to certain states of the memory of a computer as encodings of sentences.) Since descriptions of the state of the brain belong to neurophysiology, which is part of biology, a statement like, «Einstein believed that electrons have a negative charge» is really a statement about the structure of Einstein's brain, and so is part of the logical system of statements that constitute the science of biology. Here the sentence, «Electrons have a negative charge,» is being used to identify a biological state of Einstein's brain.
Because we know so little about how information is encoded in the human brain, we need to turn to computers to see how it is possible for a sentence to identify a state of a physical system. We can encode the symbols in a computer's memory as sequences of «on» and «off» settings of electronic switches. For example the letter «A» might be encoded as the pattern «11000001,» where «1» means «on» and «0» means «off.» By arranging these encodings of letters into longer sequences we can encode words and other symbols, and whole sentences. If an engineer wants to describe a certain state of the computer's memory, she can express it as a certain pattern of ones and zeroes, but she can also express it, for example, by saying that it contains encodings of the words «charge,» «electron,» and «negative,» in that order, and that «charge» is the first symbol in the sequence, and so on. In other words, the engineer describing the state of the computer's memory can use very much the same types of expressions as our metatheory uses to describe the facts. Furthermore, if our computer is programmed to carry out the procedures of formal inference as described in the «Socrates is mortal» example, then one of the principles by which its internal states succeed one another will be formal inference. And according to the view presented in this paper, the internal states of the brain that we refer to as thoughts are analogous to the internal states of our computer, and so they also sometimes succeed one another according to the rules of formal inference. But since we view facts as sentences of idealized theories, they are also connected by formal inference. Thus we have reestablished the isomorphism between thoughts and facts that was lost when the copy theory of representation was abandoned.
Our complete theory thus has a structure made up of three logical levels, the metatheory, the fact level, and the thought level, each of which can be viewed as a set of sentences. The sentences of the metatheory constitute descriptions of those of the fact level, and some of the sentences of the fact level are descriptions of the sentences of the thought level. However, although for clarity in introducing the theory I have spoken of the fact level as composed of sentences, it should properly be thought of as sets of possible worlds, as those are described in intensional semantics (Carnap, 1956, Montague, 1974). That is, everything we say in the metatheory should properly be translated into the language of intensional semantics, with sentences being translated as sets of possible worlds, predicates as functions from possible worlds to sets of n-tuples, and so on. Of course the word «fact» should only be applied to those sets of possible worlds that correspond to true sentences, that is, those sets that contain the actual world (the word «proposition» can be used to refer to any set of possible worlds, whether the sentence it corresponds to is true or false). These are what we should regard as the causes of our experience. We will find that none of the predictions of our three-level system depend on identifying any specific fact. They require only that we posit that sets of possible worlds exist and have certain properties. Thus we do not need to assume an idealized ultimate theory, but only that the sentences of our theories can become approximately isomorphic to sets of possible worlds that contain the actual world. This means that we should regard the objects that are spoken of in intensional semantics as real things. The sentences of our physical and biological theories are true to the extent that they correspond to sets of possible worlds that contain the actual world, and in this way our system inherits all their predictions, while making the additional prediction that those theories will be successful under the proper conditions. But because intensional objects are isomorphic with sentences and their parts, I will continue to speak here as though facts were sentences.Foot note 5
I want to make a very important point here -- that one sentence can imply another only if they are at the same logical level. For example, in a physiological theory of how the brain perceives objects, there will be a logical chain of inference leading from a description of the perceived object to a description of the resulting brain state, which, I contend, can be translated into a description of a set of mental sentences. The point to note is that the description of the object is logically connected, not to the mental sentences themselves, but to a description of them. Thus any given axiomatic system is confined to a single logical level. There is a sense in which a higher level can contain a sentence at a lower level, but only in the sense that the higher level contains a description of the lower-level sentence. This is, I believe, the nature of the embedding of one sentence in another that we see in the propositional attitudes. For example, if we state it as a fact that Einstein believed that electrons have a negative charge, we mean that the fact level contains a description of the sentence «electrons have a negative charge» as part of its description of the state of Einstein's brain.
Our three-level system can make the metaprediction, or prediction about other predictions, that theories that are strongly supported by evidence will tend to make accurate predictions, because of the isomorphism between theory and fact (compare Cummins' [1989, Chapter 8] concept of an isomorphism between mental states and their «interpretations»). With regard to the truth of scientific theories, which is viewed as a relation of isomorphism between the thought and fact levels, our system is a form of correspondence theory (see Kirkham, 1992). But with regard to the system as a whole I take an «instrumentalist» approach (see, e. g., van Fraassen, 1989), since our reason for accepting it is that it provides the simplest account of a set of observable facts -- the success of scientific theories. It is therefore not subject to Putnam's (1981) criticism that correspondence theories require us to take a «God's eye view.» Our isomorphism between thoughts and facts is analogous to the isomorphism between a formal system and its Goedel numbers, used by Kurt Goedel in his proof of the undecidability of arithmetic (Nagel & Newman, 1986), in that our metatheory corresponds to the metalanguage in which Goedel's isomorphism is described, our fact level to the formal system that is the subject of Goedel's proof, and our thought level to the Goedel numbers themselves. It is also related to Tarski's (1983) theory of truth, in that our thought level corresponds to his «object language,» our fact level to his «metalanguage,» and our metatheory to his «metametalanguage.» I think that a number of difficulties with Tarski's theory, that are pointed out by, for example, Soames (1999) and Field (2001), are due to the attempt to define truth in the metalanguage, and are resolved when we define it in the metametalanguage, which corresponds to our metatheory.
Before continuing, let us get a clearer understanding of the contents of the thought level, that is, what it means to have a mental sentence «encoded» in the brain. How can we justify talking about the human brain as if it were a sheet of paper with sentences written on it? To answer this question we must first recognize that a sentence is not an object but a state of an object, in the same way that the «on» or «off» position of an ordinary light switch is a state of the switch. The state of a computer's memory, for example, consists of bit patterns that are made up of the «on» or «off» settings of electronic switches that serve as memory elements. Each «on» or «off» setting of a memory element, represented by the digits «1» and «0» respectively, is an arrangement of its electrons that is measured by a quantity called the voltage. These settings, like those of a light switch, are discrete states in that the exact voltage does not matter to the functioning of the computer, but only whether the voltage falls in the «off» or the «on» range.Foot note 6 We saw earlier that these bit patterns can be used to encode sentences. But the concept of a sentence is more abstract than that of a bit pattern, in that a sentence can be encoded in a number of ways -- as a stream of sounds in a spoken sentence, a stream of printed letters on paper, magnetized areas on a magnetic tape, and so on. I suggest that the brain is a discrete system, and thoughts are its mental sentences, making the brain what can be called a «sentential processor,» that is, a system to which the language of thought (LOT) hypothesis applies (see Aydede, 1997, for a discussion of the issues involved in defining the range of application of the LOT hypothesis). That it is possible to characterize the brain in this way is suggested, for example, by the work of Shastri and his associates (Shastri, 1999), and of Marcus (1998, 2001).
The thought experiment
We will now examine closely the internal structure of our three-level theory, to see how our two formal chains of inference -- one at the thought level and one at the fact level -- can be constructed, and under what circumstances they will be isomorphic. We will look at the «internal clockwork» of our system to see how it can make metapredictions about the success of scientific theories. This can be done via a «thought experiment,» in which we apply the principles of our theory to a hypothetical situation which is designed to reveal certain consequences of the theory. We will choose our hypothetical situation so that it contains a sentential processor whose internal structure is known to us. For this purpose we will posit a robot named «Rob,» controlled by a computer, which is programmed to carry out formal logical inference on sentences encoded in its memory. In our posit, which belongs to the fact level of our three-level system, we describe both Rob's internal structure and his environment, which is a «blocks world» consisting of solid blocks in various arrangements. We can think of the posit as a sort of «user manual» for Rob the robot. In it we describe Rob as having a visual system consisting of a mechanical «eye» that can focus an image on an electronic «retina,» which sends to his computer a matrix of numbers representing the light intensity at each point on the image. This matrix is processed in Rob's computer by a program similar to Winograd's SHRDLU (see Dennett, 1991, pp. 92-93), containing a component developed by Winston that translates a matrix of intensities into a simple description of the scene that Rob is looking at, thus giving him a simple perceptual system. The output of this perceptual system is a set of bit patterns, each of which is a series of «on» or «off» settings of electronic switches that fits our definition of a «mental sentence.» These bit patterns are composed of shorter bit patterns called «symbols,» which are the basic units that are strung together to form Rob's mental sentences. Like SHRDLU, we will assume that his computer contains a «translation table» that pairs Rob's symbols (i.e., the symbols he uses as predicates and individual constants) with English words and so allows his mental sentences to be translated into English sentences. So, for example, if Rob's environment contains two blocks, with one shaped like a pyramid and the other like a cube, and the pyramid is on top of the cube, and if Rob is in a position to see this arrangement of blocks, then his visual system will produce a bit pattern that can be translated into English as «the pyramid is on the cube.» Since these bit patterns represent formal sentences subject to formal inference, a better translation (and more convenient for our purposes) would be the formal English sentence «(pyramid on cube).»
Let us use our thought experiment to give a more definite form to the ideas discussed in the previous sections. I said there that the metatheory contains descriptions of the sentences of the fact level, and that some of the sentences of the fact level were descriptions of sentences of the thought level. We want to examine how these descriptions are formulated and how they relate to each other. These relationships are quite complex, involving as they do not only descriptions of sentences, but also descriptions of those descriptions. But I believe that such an examination is the key to understanding the relationship between mind and reality, and the semantics of the propositional attitudes (Quine, 1986). We can formulate these descriptions of sentences as what I will call «positional descriptions,» which are a form of what Tarski (1983, pp. 156-157) calls a «structural descriptive name.» For example, the positional description of the formal sentence «(pyramid on cube)» says essentially «the sentence consists of the words `pyramid,' `on,' and `cube,' in that order.» We can express this as follows:
The posit contains a sentence we will refer to as «S1.»
The symbol in position 1 of S1 is «pyramid.»
The symbol in position 2 of S1 is «on.»
The symbol in position 3 of S1 is «cube.»
We can abbreviate this by adopting the convention of using «(pos S1 <pyramid> 1)» to mean «the symbol in position 1 of S1 is `pyramid.'» Now we have:
1) The posit contains S1
and (pos S1 <pyramid> 1) and (pos S1 <on> 2) and (pos S1 <cube> 3)
Here the angle brackets surrounding the words «pyramid,» «on,» and «cube» serve a function similar to quotation marks. The advantage of formulating the description in this way is that it conforms to the syntax of formal logic and lends itself easily to manipulation by the rules of logic. We should view sentence (1) as being contained in the metatheory, part of which consists of sentence-by-sentence descriptions of our posit, which is our user manual. Let us call this description of the sentences of the posit, contained in the metatheory, the «metaposit.»
But it is also the case that the posit contains descriptions of Rob's mental sentences, which are bit patterns in Rob's computer. Let us see how these descriptions of Rob's mental sentences can be formulated as positional descriptions. An example of one of Rob's bit patterns might be «10000000/01100100/11001000.» Suppose that each series of eight bits, as I have indicated by the slashes, forms a piece of information that tends to be processed as a unit. Let us then develop a vocabulary for referring to these units, which we can call «symbols.» We can use the fact that each sequence of ones and zeroes can be interpreted as a number, so that «10000000» is equivalent to the number 128, «01100100» to 100, and «11001000» to 200. Let us form terms for Rob's symbols by placing a «B» in front of the corresponding number, so that we have «B128,» «B100,» and «B200» as terms for our three symbols. Then we can write the longer bit pattern composed of these symbols as «B128, B100, and B200, in that order,» or in our positional description format, as follows:
2) Rob contains BP1
and (pos BP1 Βþþþ þ) and (pos BP1 B100 2) and (pos BP1 B200 3)
Here we use «BP1» to refer to the entire bit pattern. (We have used the same predicate «pos» as in the metaposit, but we could have used a different term without affecting any of our conclusions.) Let us now take a further step to simplify our description of Rob's bit patterns, using our translation table. Suppose the translation table associates the symbol «B128» with the word «pyramid,» «B100» with «on,» and B200 with «cube.» Then we can develop an alternative term for each of our symbols by enclosing the corresponding English word in angle brackets. For example, «<pyramid>» will refer to the same bit pattern as «B128.» Our positional description can then be expressed as follows:
3) Rob contains BP1
and (pos BP1 <pyramid> 1) and (pos BP1 <on> 2) and (pos BP1 <cube> 3)
Comparing sentence (3) with sentence (1), we see that our positional descriptions of Rob's bit patterns in the posit resemble our descriptions of sentences of the posit in the metaposit. This depended critically on our use of the translation table between Rob's mental symbols and English words. By using the translation table in this way we have temporarily bypassed an important problem, called by Quine (1960) the «indeterminacy of translation.» I will return to this problem later.
But Rob's mental sentences are bit patterns, and the sequence of the symbols they contain must depend on the set of electrical connections between the switches in Rob's computer that tells the system which symbol comes next in the sequence. These connections define the meaning of the word «position» and of the predicate «pos,» as these terms apply to the order of the symbols in Rob's mental sentences. On the other hand, the same predicate «pos,» when applied in the metaposit to sentences of the posit, refers to the left-to-right order of words on the printed pages of our user manual. But in a sense the predicate «pos» in our positional descriptions has the same meaning in both cases, because both the sentences of the posit and Rob's mental sentences are related to each other by formal inference. We can take advantage of this fact by a process which I will call «schematization,» which translates a positional description into a sort of «picture» of the sentence described. Let us see how our positional description in sentence (3) would be schematized. The rule we will follow has the following form:
If x, s, y, z, and w are variables, and the positional description has the form «(x contains s) and (pos s <y> 1) and (pos s <z> 2) and (pos s <w> 3),» then schematize it as «(x contains <y z w>)»
We simply take our terms for the three words in the described sentence, removed the angle brackets surrounding them, arranged them in the sequence prescribed by the positional description, and enclosed the entire sequence in angle brackets. Thus the description in (3) would be schematized in the posit as follows:
(Rob contains <pyramid on cube>)
Notice that in the schema inside the angle brackets, we are using the order of words in the user manual as a sort of picture of the order of symbols in Rob's computer.
Now let us examine in some detail how the isomorphism comes about between the sentences of the posit and Rob's mental sentences. We saw that the isomorphism is due to the fact that the sentences of both fact and thought levels are related by logical inference. In our thought experiment, this means that the rules of inference apply both to the sentences of the posit and to Rob's mental sentences. We can give explicit form to these rules of inference as follows:
For-all σ ρ
IF for-all n, α, β
(IF (n <= max)
THEN (IF (pos σ α n) and (pos ρ β n)
THEN (α = β) or ((variable α) and (name β))))
THEN (match σ ρ)
For-all σ ρ α β
IF there-exists n
((match σ ρ) and (pos σ α n) and (variable α) and (pos ρ β n) and (name β))
THEN (assign σ ρ α β)
For-all φ ι α β
(subst φ ι α βþ
IF AND ONLY IF
IF (pos φ α n) and (pos ι β n)
THEN (α = β) or (assign φ ι α β)
Complete rule of inference:
For-all ζ ι π ρ σ α β
IF (major-premise ζ π) and (minor-premise ζ ρ) and (antecedent π σ)
and (consequent π φ) and (assign σ ρ α β) and (subst φ ι α β)
THEN (logically-implies ζ ι)
If we apply the support rule: For-all x if (x on cube) then (cube supports x)
To the minor premise: (pyramid on cube).
To get the conclusion: (cube supports pyramid)
Then we will have the following variable assignments:
ζ = «(For-all x if (x on cube) then (cube supports x)) and (pyramid on cube)»
π = major premise = «(For-all x if (x on cube) then (cube supports x))»
ρ = minor premise = «(pyramid on cube)»
σ = antecedent of major premise = «(x on cube)»
φ = consequent of major premise = «(cube supports x)»
ι = conclusion = «(cube supports pyramid)»
These rules state in a formal way the procedure we applied in the «Socrates is mortal» example in the introduction. This particular statement of the rules of inference may not be free of errors, and is certainly not complete, for example, it does not deal with compound antecedents. Its purpose is only to provide an explicit statement of these rules, as a basis for discussion. Notice that these rules of inference have the special property of being rules for applying other rules. They operate on positional descriptions of the major and minor premises of a logical inference, and produce a positional description of the conclusion. To see how this works, consider a simplified example, using a principle which I will call the «support rule,» that might apply to Rob's world. The support rule states the trivial fact that if anything is on top of the cube then the cube supports it. It will serve as our major premise, and can be formulated as follows:
4) (For-all x if (x on cube) then (cube supports x))
We will apply our rules of inference to the support rule (remember that the rules of inference are rules for applying rules), in combination with the minor premise «(pyramid on cube)» to obtain the conclusion «(cube supports pyramid).»
The first of our rules of inference, called the «match rule,» concludes that the antecedent of the support rule «(x on cube)» matches the minor premise «(pyramid on cube)» because at every position, either they contain the same symbol or the first has a variable where the second has a name. The other rules of inference carry out the remainder of the inference process, assigning a value to each variable and substituting those values in the consequent of the major premise to get the conclusion. Notice particularly that the following part of the match rule:
(pos σ α n) and (pos ρ β n)
is the antecedent of a higher-order major premise that applies, in our example, to the higher-order minor premise «(pos S1 <x> 1) and (pos S2 <pyramid> 1),» where «(pos S1 <x> 1)» is part of the positional description of the antecedent of the support rule «(x on cube),» and »(pos S2 <pyramid> 1)» is part of the positional description of the minor premise «(pyramid on cube).» In the course of applying this higher-order major premise to the higher-order minor premise, we make a set of variable assignments, which include the following:
α = <x>, β = <pyramid>
Then we substitute these values into the consequent «(α = β) or ((variable α) and (name β)),» which becomes, «(<x> = <pyramid>) or ((variable <x>) and (name <pyramid>)).» This statement is true, since <x> is a variable and <pyramid> is a name. The important fact to notice is that in applying our rules of inference we are carrying out at a higher logical level the same procedure as in the «Socrates is mortal» example, and that is the very procedure that is described by these rules. It is clear that if we want to describe in formal language the procedure we use in applying our rules of inference, that description must again take the form of these same rules of inference.
Now we can see more clearly the reason for the isomorphism between the fact and thought levels, represented in our thought experiment by the isomorphism between sentences of the posit and Rob's mental sentences. This isomorphism depends on the fact that we find a copy of the rules of inference in the metaposit, which we can call the «higher-order rules of inference,» and another copy in the posit, which we can call the «lower-order rules of inference.» This gives concrete form to the idea that both facts and thoughts are related by logical inference. The higher-order rules of inference that we find in the metaposit describe the fact that the sentences of the posit must be logically consistent, and that any sentence that follows logically from sentences of the posit can be treated just as though it were itself written in our posit/user manual. The lower-order rules of inference that we find in the posit describe Rob's capacity for applying formal inference to his mental sentences. When bit patterns encoding the major and minor premise of a syllogism are presented to Rob's inference mechanism, his internal electronic connections are such that the bit pattern that encodes the conclusion of the syllogism is produced. Thus the lower-order rules of inference summarize certain facts about Rob's electronic connections and the laws of physics that apply to them.
To see the isomorphism more clearly, imagine that the support rule is a fact stated in the user manual and is also known to Rob, so that it is encoded as a mental sentence in his brain. We can list the sentences of the metaposit that would constitute a description of this situation as follows:
(The posit contains R1) and (pos R1 <for-all> 1) and (pos R1 <x> 2) and (pos R1 <if> 3)
and (pos R1 <C1> 4) and (pos R1 <then> 5) and (pos R1 <C2> 6)
and (pos C1 <x> 1) and (pos C1 <on> 2) and (pos C1 <cube> 3)
and (pos C2 <cube> 1) and (pos C2 <supports> 2) and (pos C2 <x> 3)
(The posit contains S1) and (pos S1 <pyramid> 1) and (pos S1 <on> 2) and (pos S1 <cube> 3)
(The posit contains S2) and (pos S2 <cube> 1) and (pos S2 <supports> 2) and (pos S2 <pyramid> 3)
Two-level inference (abbreviating «(the posit contains S3)» as «(PC S3)»):
(PC S23) and (pos S23 <Rob> 1) and (pos S23 <contains> 2) and (pos S23 <RP1> 3)
and (PC S24) and (pos S24 <pos> 1) and (pos S24 <RP1> 2) and (pos S24 <<for-all>> 3) and (pos S24 <1> 4)
and (PC S4) and (pos S4 <pos> 1) and (pos S4 <RP1> 2) and (pos S4 <<x>> 3) and (pos S4 <2> 4)
and (PC S5) and (pos S5 <pos> 1) and (pos S5 <RP1> 2) and (pos S5 <<if>> 3) and (pos S5 <3> 4)
and (PC S6) and (pos S6 <pos> 1) and (pos S6 <RP1> 2) and (pos S6 <<CP1>> 3) and (pos S6 <4> 4)
and (PC S7) and (pos S7 <pos> 1) and (pos S7 <RP1> 2) and (pos S7 <<then>> 3) and (pos S7 <5> 4)
and (PC S8) and (pos S8 <pos> 1) and (pos S8 <RP1> 2) and (pos S8 <<CP2 >> 3) and (pos S8 <6> 4)
and (PC S9) and (pos S9 <pos> 1) and (pos S9 <CP1> 2) and (pos S9 <<x>> 3) and (pos S9 <1> 4)
and (PC S10) and (pos S10 <pos> 1) and (pos S10 <CP1> 2) and (pos S10 <<on>> 3) and (pos S10 <2> 4)
and (PC S11) and (pos S11 <pos> 1) and (pos S11 <CP1> 2) and (pos S11 <<cube>> 3) and (pos S11 <3> 4)
and (PC S12) and (pos S12 <pos> 1) and (pos S12 <CP2> 2) and (pos S12 <<cube>> 3) and (pos S12 <1> 4)
and (PC S13) and (pos S13 <pos> 1) and (pos S13 <CP2> 2) and (pos S13 <<supports>> 3) and (pos S13 <2>4)
and (PC S14) and (pos S14 <pos> 1) and (pos S14 <CP2> 2) and (pos S14 <<x>> 3) and (pos S14 <3> 4)
(PC S15) and (pos S15 <Rob> 1) and (pos S15 <contains> 2) and (pos S15 <BP1> 3)
and (PC S16) and (pos S16 <pos> 1) and (pos S16 <BP1> 2) and (pos S16 <<pyramid>> 3) and (pos S16 <1>4)
and (PC S17) and (pos S17 <pos> 1) and (pos S17 <BP1> 2) and (pos S17 <<on>> 3) and (pos S17 <2> 4)
and (PC S18) and (pos S18 <pos> 1) and (pos S18 <BP1> 2) and (pos S18 <<cube>> 3) and (pos S18 <3> 4)
(PC S19) and (pos S19 <Rob> 1) and (pos S19 <contains> 2) and (pos S19 <BP2> 3)
and (PC S20) and (pos S20 <pos> 1) and (pos S20 <BP2> 2) and (pos S20 <<cube>> 3) and (pos S20 <1> 4)
and (PC S21) and (pos S21 <pos> 1) and (pos S21 <BP2> 2) and (pos S21 <<supports>> 3) and (pos S21 <2>4)
and (PC S22) and (pos S22 <pos> 1) and (pos S22 <BP2> 2) and (pos S22 <<pyramid>> 3) and (pos S22 <3>4)
Under the heading «one-level inference» we see the major and minor premise and conclusion of the support rule in the posit, as they would appear in the form of positional descriptions in the metaposit. Under the heading «two-level inference,» we see the description in the metaposit of the description in the posit of the mental sentences in Rob's computer that encode the two premises and the conclusion. (Here we have an example of a description of a description.) For simplicity, let us schematize these sentences. (The words in bold letters are those that will appear in the schematizations.) In the case of the sentences labeled «two-level inference,» we will use «double schematization.» We can illustrate how double schematization works as follows:
(pos S15 <Rob> 1) and (pos S15 <contains> 2) and (pos S15 <BP1> 3)
schematized as: S15 = <Rob contains BP1>
(pos S16 <pos> 1) and (pos S16 <BP1> 2) and (pos S16 <<pyramid>> 3) and (pos S16 <1> 4)
schematized as: S16 = <pos BP1 <pyramid> 1>
(pos S17 <pos> 1) and (pos S17 <BP1> 2) and (pos S17 <<on>> 3) and (pos S17 <2> 4)
schematized as: S17 = <pos BP1 <on> 2>
(pos S18 <pos> 1) and (pos S18 <BP1> 2) and (pos S18 <<cube>> 3) and (pos S18 <3> 4)
schematized as: S18 = <pos BP1 <cube> 3>
Those three schemas can in turn be schematized as: BP1 = <<pyramid on cube>>
As a result we obtain two parallel chains of inference, the first of which is derived from the sentences labeled «one-level inference» as follows:
5) The posit contains <For-all x if (x on cube) then (cube supports x)>
6) The posit contains <pyramid on cube>
7) The posit contains <cube supports pyramid>
and its parallel, from the sentences labeled «two-level inference» as follows:
8) The posit contains <Rob contains <For-all x if (x on cube) then (cube supports x)>>
9) The posit contains <Rob contains <pyramid on cube>>
10) The posit contains <Rob contains <cube supports pyramid>>
We see that there is an exact parallel between our chain of inference in the posit (as described in the metaposit) and our chain of inference linking Rob's mental sentences as they are described in the posit, those descriptions being in turn described in the metaposit.
Notice that schemas can provide us with an enormous simplification in the process of deriving logical inferences from positional descriptions. This is particularly striking in the case of sentences (8), (9), and (10), as compared to the sentences above labeled «two-level inference.» We see that one way to get from the premises to the conclusion is by means of an extremely complicated process of applying the higher-order rules of inference to a positional description of the lower-order ones, and by that means applying the lower-order rules to positional descriptions of mental sentences. But we can get the same result by applying our own capacity for logical inference directly to the schematized sentences inside the angle brackets. Thus schemas can serve as «contexts» as they are used in artificial intelligence (see Sowa, 2003; compare the concept of «situation» as used, for example, by Reconati, 2000). I contend that schemas are the normal way in which our brains represent positional descriptions, and that the propositional attitudes are expressions of these schemas, so that, for example, the sentence «(Rob contains <pyramid on cube>)» has essentially the same meaning as the sentence «Rob thinks that the pyramid is on the cube.» Notice that sentences (8), (9), and (10) are schematizations of positional descriptions, which contain symbols in double angle brackets, such as «<<pyramid>>.» The symbol «<<pyramid>>» is a name in the metaposit of a name in the posit for one of the symbols in Rob's computer. This symbol is in turn Rob's name for an object, the pyramid. Here we see the relationship between terms that occur inside the content of propositional attitudes and those occurring outside it (Quine, 1986, pp. 32-34).
Our higher-order inference rules provide the means of giving a definition in the metaposit of the relation of logical implication between sentences of the posit, which we can call «logically-implies1,» and the combination of the higher- and lower-order inference rules provides the means of defining the corresponding relation between bit patterns in Rob's computer, which we can call «logically-implies2,» using the subscript to distinguish these two uses of the word. We can then state the isomorphism in the metaposit by saying that whenever one of Rob's bit patterns logically-implies2 another, the sentence of the posit corresponding to the first logically-implies1 that corresponding to the second.
We have not yet provided in our model for the fact that we are able to think in terms of these three levels. To represent that ability, we need to break the thought level into three sublevels, corresponding to the three levels of the model. Suppose, for example, that the metaposit contains the sentence, «It is a fact that Rob thinks that it is a fact that Sally thinks that the book is on the table,» which in our more formal terminology is, «The posit contains <Rob contains <the facts contain <Sally contains <book on table>>>>.» Then the sentence, «<<<<book on table>>>>,» will be placed in the lowest of these three sublevels of the thought level. Of course, the metaposit can also contain the sentence, «It is a fact that Rob thinks that it is a fact that Rob thinks that the book is on the table,» allowing him to attribute beliefs to himself, and thus to reason about the possibility of his being wrong, and about how he might go about correcting errors in his thinking. This makes the system capable of «metareasoning» (Russell and Wefald, 1991; Costantini, 2002). These multi-level mental sentences never need to take the form of positional descriptions, because they will always be in schematized form. The information provided by positional descriptions can always be obtained from schemas as needed. In the terms used by Cosmides and Tooby (2000), the sublevels of the thought level represent different levels of «decoupling.» Each of these five levels, the metatheory, fact level, and three sublevels of the thought level, can be divided into one or more sections, one section containing sentences about things, another descriptions of such sentences, another descriptions of descriptions, and so on. The relation of reference can then be represented as a relation between symbols at the top sublevel of the thought level and symbols in the corresponding section of the fact level.
Some objections to the theory
I have concentrated in this paper on explaining the three-level theory, rather than on defending it. I want here to respond briefly to some criticisms of the sort of theory presented in this paper. The first criticism is based on the concept of the indeterminacy of radical translation (Quine, 1960; Dennett, 1987, pp. 40-42), which says essentially that it is meaningless to talk of a correspondence between mental sentences and facts, because any brain state could always be translated into sentences in more than one way. But suppose that all meaningful logical systems share a subset of their axioms, and this subset contains all the undefined terms of the system. Suppose further that this subset of axioms has a high density, in Hayes' (1985a) sense that the ratio of statements to concepts is high. Then this subset could act as a sort of «Rosetta stone» for establishing the translation table between the languages in which different systems were expressed.Foot note 7 Hayes (1985a) calls such a subset of axioms a «kernel theory.» Let us call the concepts contained in it the «conceptual base.» It seems to me that recent research in artificial intelligence on formal theories of common-sense reasoning shows that such a conceptual base may exist, consisting of basic geometrical and physical axioms (e. g., Hayes, 1985b; Varzi, 1997; Davis, 1998; see also Smith and Casati, 1994, for a different perspective on this research).Foot note 8 The indeterminacy argument assumes without good reason that such a conceptual base does not exist. In response, we could reverse the argument and say that the fact that widely different cultures can communicate, and the demonstrated success of modern science, indicate respectively that mental sentences in different brains, and mental sentences and facts, do in fact share a conceptual base. As an example of how such principles might operate in everyday problem-solving, I was once faced with the problem of having to cut a bagel for my son's lunch using only a plastic fork. I solved the problem by repeatedly pushing the tines of the fork into the bagel at different spots and moving it back and forth to cut away the substance, although as far as I remember I had never before used a fork to cut something in that way. Using as a model the sort of commonsense reasoning discussed in Varzi (1997) and Davis (1998), we can explain my behavior by saying that I used my intuitive knowledge of geometry and physics to understand that I needed to cut away the soft material between the two halves of the bagel in order to separate them, and I knew that moving the fork in that way would necessarily accomplish that.
The language of thought hypothesis has also been criticized on the grounds that it would be too slow to account for the rapidity with which we are able to respond intelligently to circumstances (e. g., Dennett, 1998, pp. 86-87; Churchland, 1986, pp. 459-460). This criticism is contradicted by the work of Shastri and his associates (Shastri, 1999) who have developed a model that demonstrates how a sentential processor can be implemented in a brain-like system that can account for the speed of human problem solving. Another objection (e. g., Dennett, 1998, pp. 279-281), that sentential processing cannot explain human emotion, is contradicted, for example, by the work of Ortony, Clore, and Collins (1988), and Anderson and Lebiere (1998), who suggest how such an explanation could be developed. In connection with the treatment of emotion in our system, I am of course not saying that logical inference is the only kind of causal relationship that can exist between thoughts, but only that it is the logical connection between thoughts that makes it possible for them to be isomorphic with facts. Since thoughts are states of the brain, they can have properties, such as activity levels, degrees of belief, and so on, that do not affect their logical implications. Because we often do not know what premises are correct, but instead have to make the best of a mass of contradictory information, the control of the thought process that is exercised by emotions is necessary for our survival.
The apparent problem of externalism raised by Putnam's (1981) «Twin Earth» thought experiment can be answered if we recognize that the entry in our translation table for the word «I» is conditional, in that a token of this word refers to whatever system contains the sentence that contains that token, so that identical mental sentences in different brains can have different referents, just as identical thoughts in different minds can have different meanings. Fodor's (2001) objection to his own CTM theory that it cannot be a general theory of cognition fails to take into account the fact that, while each individual computation is local, each successive computation applies to a different overlapping domain, so that any global interaction can be computed in a finite number of steps.
I have argued in this paper that, in order to provide an adequate explanation of the success of scientific theories, we need to add a level of theory, which I call a metatheory, above that of our ordinary physical and biological theories. The metatheory makes it possible for us to describe the isomorphism between thoughts and facts, as well as the isomorphism between theories and facts, since theories are expressions of thought. This explains how theories can serve as maps to the facts, allowing us to make correct predictions. Consider what it would mean in our thought experiment for one of Rob's predictions to come true. The truth in Rob's world is whatever the posit says it is, much as the truth in a novel is whatever the author says it is. The purpose of the posit is to tell us what would happen if there really were a robot like Rob who lived in a blocks world environment like the one described. Under what circumstances would such a robot be able to make correct predictions about its environment? Suppose that one sentence of the posit says, «At 9:50 A. M. Rob contained the sentence <the cube will fall off the block at 10:00>,» and another says, «the cube fell off the block at 10:00.» Then the posit is telling us that Rob's prediction was confirmed, in other words, what he thought would happen did happen. So in general, we predict that a robot like Rob will make correct predictions if its mental sentences correspond with the relevant sentences of the fact level (which includes our posit). This amounts to a metaprediction (a prediction about other predictions) about the conditions under which predictions will be correct.Foot note 9 But this metaprediction could not have been stated or proved in the posit, because it requires us to be able to refer to the sentences of the posit, and this we can only do in the metaposit. Since our posit consists of the physical theory of Rob's environment and his internal electronics, our metaprediction of the success of Rob's predictions is a generalization that is true of our physical theory of Rob and his environment, but cannot be stated or proved within that physical theory. This suggests that our ordinary theories of nature have the property that mathematicians call «omega incompleteness,» which means that there are generalizations, each instance of which is a theorem, although the generalization itself is not a theorem. We see that including the additional level of the metaposit allows us to account for the success of predictions in a way that would otherwise not be possible. Notice that this isomorphism is implicit in our accepted scientific theories, since those theories predict that computers and other similar mechanisms will be capable of formal logical inference. (Consider, for example, a physical theory that includes a description of an exploding star as well as a description of a computer programmed to simulate the exploding star. We would see a parallel between the theory's predictions about the internal states of the computer and its predictions about the star, but the theory itself would not contain a general statement of this parallel.)
So far, I have shown only how it is possible to make accurate predictions if we know the correct premises, but not how we come to know what those premises are. How do we know, for example, that all men probably are mortal? I want to finish by suggesting very briefly what direction such an explanation might take. In science as well as in everyday life, we assign probabilities to hypotheses based on evidence. For these hypotheses to make reliable predictions, a combination of randomness and regularity is required: randomness to prevent the evidence from routinely containing spurious patterns, and regularity so that there are real patterns to discover. Under what circumstances can we expect our world to show the necessary combination of regularity and randomness? To see how this is possible, let us add a quantitative aspect to our metatheory, by assigning probabilities to possible worlds, forming what is called an «initial distribution.» This makes our metatheory into a version of the Bayesian approach to the theory of probability (see, for example, Earman, 1992; Howson, 2000; Franklin, 2001).
To see how this extension of our metatheory can help us, consider a simplified example, in which each of our possible worlds consists of a series of one thousand tosses of a fair coin, and we choose our initial distribution so that half of these worlds have heads on the first toss and half tails; half of the resulting two groups have heads and half tails on the second toss; half of the resulting four groups have heads and half tails on the third toss; and so on. Then the vast majority of these worlds will show the familiar statistical pattern of a series of coin tosses, with an apparently random sequence of heads and tails, about one half being heads. Notice that there is also regularity here, in that the coin always falls either heads or tails. The results depend on the symmetries of the coin and of the law of gravity -- that the coin is physically the same on both sides; that it is shaped so that it always falls on one side or the other; and that the same coin is used on every toss. So an initial distribution constructed according to simple rules has conferred on the vast majority of our possible worlds the properties of regularity and randomness that we were looking for.
I want to suggest very briefly, as a program for future investigation, that we look for a set of rules that produce in a similar manner an initial distribution that can account for the pattern of events in the actual world. These rules would, perhaps, resemble what is called «symmetry breaking» in modern physics (Pagels, 1985), where the randomness is introduced by broken symmetries and the regularity by those that remain unbroken. Such symmetries are expressed in terms of geometric distributions of matter or fields, so that their use in generating the initial distribution requires that all the possible worlds in our distribution share the principles of geometry and physics that make up our conceptual base. An initial distribution of this kind would confer on the vast majority of worlds the property that evidence obtained by the inhabitants of any given world would tend to be representative of the actual patterns that hold in that world. If we make the assumption that what holds in the vast majority of possible worlds holds also in the actual world, this gives us an explanation for the past success of science in our own world.
But can we predict that science will be successful in the future? Let us take our initial distribution as the basis for a definition of the concept of probability. To see the significance of having a definition of probability, consider that we call a factual statement «knowledge» when its probability is high enough that it can provisionally be treated as though it were certainly true (see Carnap, 1962). Thus if we know how to define probability we know how to justify our factual statements. In this way we introduce into our three-level theory a medicinal dose of metaphysics, the need for which is, I think, the main lesson to be drawn from Goodman's (1983) «new riddle of induction.» Because of their special role in the rules for constructing our initial distribution, we have singled out a priori the predicates contained in the conceptual base as, in Goodman's terms, «projectible.» (Projectibility is, after all, a symmetry between examined and unexamined cases.) We have also phrased our definition of probability so that it expresses a priori conditions under which the inhabitants of a world can come to an approximate knowledge of that world through the systematic collection and careful analysis of experience. We may say that our theory of truth is non-epistemic, whereas our theory of probability is epistemic (Kirkham, 1992). The isomorphism discussed in the previous sections of this paper accounted for our ability to apply knowledge to the making of accurate predictions. The addition of an initial distribution to our metatheory accounts for our ability to obtain that knowledge, in the form of statements of probability, by means of the scientific method.
[Foot Note 1]
Fodor (1975, p. 174) gives an excellent summary of the arguments against the doctrine that «thoughts are mental images and they refer to their objects just insofar as (and just by virtue of the fact that) they resemble them.» He does not deny that images may play an essential role in thought.
[Foot Note 2]
This is the «language of thought» hypothesis (Fodor, 1975), which is defended, for example, by Marcus (1998, 2001) and Pinker (1994). For arguments against the language of thought hypothesis, see, for example, Churchland, P. S., 1986, pp. 386-399; Dennett, 1987, pp. 112-116; Millikan, 1995).
[Foot Note 3]
For a related argument that sentential processing evolved because it has an adaptive advantage resulting from its correspondence with the structure of nature, see McGinn (1995).
[Foot Note 4]
There is not sufficient space in this paper to both explain the theory and give a proper defense of it. I will therefore concentrate on the explanation and provide arguments wherever possible.
[Foot Note 5]
What I call the «fact level» might better be called the «proposition level,» since it contains all sets of possible worlds whether or not they include the actual world. But I will continue to use the more evocative term «fact level.»
[Foot Note 6]
I have defined a «discrete state» by the all-or-nothing character of the response of the system to which the state belongs. But it should properly be defined in terms of the system's state space, which in the case of discrete states can be divided into compartments, so that the system's processes move its state from one compartment to the next in such a way that we can predict the next compartment it will enter without knowing the exact location of the state within the compartment it currently occupies.
[Foot Note 7]
In our three-level theory, Cummins' (1989) «specification problem» reduces to the problem of finding a translation table between the mental sentences of any given sentential processor on the one hand, and the sentences of the fact level on the other.
[Foot Note 8]
I do not mean to say that these authors draw the same conclusions I do. In fact Hayes (1985a, p. 17) expresses doubt that there is such a kernel theory for common-sense. But I think that Hayes' doubts are based on the fact that most common-sense concepts cannot be defined in terms of those of the conceptual base. My suggestion is, rather, that our qualitative concepts are efficient approximations to the correct quantitative theory that includes the kernel theory; that we use metareasoning to create and modify these qualitative approximations as needed; and that we can fall back on the underlying quantitative theory when necessary, for example when a husband and wife are confronted in a store with an unlabeled piece of furniture intermediate between a chair and a loveseat, and one says to the other, «Let's sit down on it and if we both fit, it's a loveseat!» Thus qualitative and quantitative concepts are not related as a building to its foundation, but as a ship at anchor to the sea-floor where the anchor rests.
[Foot Note 9]
In order for an organism to make use of such predictions, it must have the capacity for perception and action. Perception, when successful, creates mental sentences that correspond to observable facts, and action, when successful, does the opposite, starting from mental sentences of the kind we call «intentions,» and causing motions of the body that in turn bring about states of affairs that correspond to those intentions.