Logic

On The Logical Principles Of Scientific Explanation

Maarten Maartensz



Abstract: This paper clarifies some of the fundamental logical principles of scientific explanation. It does so by giving ten conditions for scientific explanation that are then explained and commented, after which some more details are given about probability, abduction and induction. The paper presupposes some knowledge of set theory and of elementary probability theory. The requisite knowledge can be found in Halmos and Adams, or in Stegmüller as listed in the literature.

Sections:

(0) Introduction

(1) On explanations
(1.1) There is a domain D, described by some set theory
(1.2) There is a language L that represents D
(1.3) Language L represents set D symbolically and numerically
(1.4) There is a set K of factual statements properly contained in L
(1.5) There is a set F of counter-factual statements properly contained in L
(1.6) T is properly contained in L
(1.7) T has no consequences in F
(1.8) T deductively entails something about D in L that is not in K and not in F

(1.9) T satisfies the abductive condition
(1.10) T satisfies the inductive condition

(2) On probability
(2.1) Measure theory and probability theory
(2.2) Six kinds of axiomatizations
(2.3) Five kinds of Interpretations
(2.4) Derivation of Kolmogorov's axioms from proportions

(3)
On abduction
(3.1) Peirce on abduction
(3.2) The logical status of the abductive condition

(4) On induction
(4.1) Statistical induction and Bayesian reasoning
(4.2) Learning from experience
(4.3) The logical status of the inductive condition
 

(5) Summary and discussion

See also: Fundamental principles of valid reasoning; The measurement of reality by truth and probability; and Classical Probability Theory and Learning from Experience.

Internet note: The fonts used are Verdana and StarMath.


(0) Introduction:

What is a scientific explanation?

This paper provides an answer to this question, and does so in logical set-theoretical terms, by stating, proposing and explaining a number of conditions that a theory should satisfy to be considered a minimal scientific explanation.

Here I take "theory" as "a set of statements that describes some thing in some presupposed domain of things, about which there may be much or little knowledge"; I take "knowledge" as "true statements, verified by some methods that are taken for granted"; and I suppose the reader has some familiarity with standard set-theory.

The intuitive ideas about explanations I start from are these:

Explanations are sets of statements, about some thing that is somehow characterized and that is situated in some set of things somehow characterized. Valid scientific  explanations are consistent with the evidence, not false, entail deductively at least part of the evidence, and can be confirmed probabilistically and be refuted.

And the intuitive ideas about reasoning I start from are these:

There are three basic kinds of reasoning, where reasoning involves argumentation of any kind using assumptions and inferences of conclusions:

1. Deductions: To find conclusions that follow from given assumptions
2. Abductions: To find assumptions from which given conclusions follow
3. Inductions: To confirm or infirm assumptions by showing their conclusions do (not) conform to the observable facts.

Normally in reasoning all three kinds are involved: We explain supposed facts by abductions; we check the abduced assumptions by deductions of the facts they were to explain; and we test the assumptions arrived by deducing consequences and then revising by inductions the probabilities of the assumptions by
probabilistic reasoning when these consequences are verified or falsified.


And indeed we shall see that all three kinds of reasoning are involved in explaining.

The conditions I want to propose are as follows, with my explanations following in as many sections.

(1) There is a domain D, described by some set theory S(D)
(2) There is a language L
(3) Language L represents set D symbolically and numerically
(4) There is a set K of true statements properly contained in L
(5) There is a set F of false statements properly contained in L
(6) There is a set T of theoretical statements properly contained in L
(7) T has no deductive consequences in F
(8) T entails something about D in L that is not in T and not in F

(9) T satisfies the abductive condition
(10) T satisfies the inductive condition   (Sections)


(1) On explanations:

Informally and intuitively every human being who can speak can explain, and knows in general terms how and when to do so:

One sets up explanations for presumed facts one does not know how to deduce from such knowledge one has, and one does so by inventing a number of assumptions from which one can deduce what one seeks to explain.

The basic difference between scientific and other explanations is that non-scientific explanations, even if they have been formulated carefully, tend to remain merely plausible in subjective terms and tend not to rely on or refer to explicit principles of inference, for which reason it is difficult to criticize them rationally because many of the assumptions that are used to generate or defend the explanations are either not explicitly formulated or are not rational to start with (but often instead kinds of wishful thinking), and because often it is not quite clear what the precise logical relation between assumptions and conclusions is.  (Sections)


(1.1) There is a domain D, described by some set theory:

D is the collection of things that comprises what the theory T is about, and is supposed to be describable by and also to some extent truly described by some standard set theory.

The main logical import of being described by some standard set theory is that then one has by way of the power set axiom a way of talking about all logical possibilities contained in D. (The power set axiom asserts the existence of the set of subsets of any set X. One standard formulation is: (X)(jY)(Y={Z: ZaX}).)

And the main logical import of having D is that it consists of statements of presumed factually true statements about what the theory T is about, and about what may be related or relevant to this.

It is not necessary that one has much true knowledge about D, but one needs at least some, for without it one has no presumed facts to reason about or to subject to investigation and experimentation, and hence no basis for a scientific explanation. (Sections)


(1.2) There is a language L that represents D:

One needs some way of formulating a theory about some things in D, and so one needs some language L to do this in, and one also needs to specify in what way L does describe things in D.

Of course, the more clearly and precisely this language has been formulated, the better is one able to reason with it and write logical arguments in it. In what follows L is supposed to be some standard set theory. The main reason to choose set theory as the language of L is that it makes the present treatment the simplest, and because (almost) any other formal language one could choose in its stead may be translated into set theory.

Also, choosing set theory as the language L makes it easy to state what the basic sense of representing is.

In general terms, if something A represents something B that means that one can infer some properties, relations or things in B from some properties, relations or things in A. Thus, a menu represents the courses of a diner, and a map represents the lay of the land.

What we need in general terms can be written as "r(X,Y)" for "X represents Y" and can be defined thus if X and Y are sets:

           r(X,Y) IFF (Ef)(f : X |-> Y & (xЄX)(XiaX)(xЄXi IFF f(x)Єf(Xi)) (Note 1)

In words: Set X represents set Y if and only if there is a function that maps the elements and sets from X to those of Y that preserves the element-relation. (Sections)


(1.3) Language L represents set D symbolically and numerically:

Here we have arrived at the main formal assumption of this paper, which may strike the reader as a rather technical set-theoretical definition of what is involved in representing something symbolically and numerically.

This is mostly appearance, for what follows adds those assumptions to the presumed property of representing that allows one to represent with symbols and represent numerical information.

We write "rsn(L,D)" for "L represents D symbolically and numerically" and use D* for the powerset of D, i.e. D*={Z: ZaD}, and Di and Dj for subsets of D:

rsn(L,D) IFF (
je)(j#)(jp)
               (r(L,D)                                                     &

                 e : L |-> D*                                           &
                 # : D* |-> N                                          &
                 p : D* |-> R                                           &

                 D    = e(Ti) U (e~Ti)                              & 
                 e(Ti) = e(Ti&Tj) U e(Ti&~Tj)               &

                 #(Di)= #(Dj) IFF (Ef)(f : Di 1-1 Dj)   &
 
                 #(D)  = #(Di) + #(-Di)                        &
                 #(Di) = #(Di
ODj) + #(DiO-Dj)            &

                 p(Di)      = #(Di) : #(D)                        &
                 p(Di|Dj) = #(Di
ODj) : #(Dj))

In words, the above reads as follows, where the general motivation for the whole conjunction of conditions is that if we want to formulate claims in L about parts of D then we must somehow correlate the terms and statements of L with the things and sets in D, and the above does just that by assuming three functions with certain properties it also states that state the conditions for the cardinal number of a set and the proportions of cardinal numbers.

(je)(j#)(jp): There are a function e, say extension, a function #, say number, and a function p, say proportion.

These will concern, by the assumptions that follow, respectively the set of things a statement represents; the cardinal number of that set; and relative proportions that can be formed using cardinal numbers.

r(L,I):  Language L represents domain D.

This was defined above.

e : L |-> D*: extension maps the terms and statements of L to the subsets of D, and
# : D* |-> N: number maps the subsets of D to the natural numbers, and
p : D* |-> R: proportion maps the subsets of D to the real numbers.

Note that for extensions we just two basic assignments for a statement, namely e(Ti)=Ø and e(Ti)Ø, which will be used below to define truth. Number refines this by enabling us to use the cardinal number of a set, and proportion uses this to lay the foundation of probability in terms of cardinal numbers. 

D = e(Ti) U (e~Ti): D is the union of the extensions of a term its negation, and
e(Ti) = e(Ti&Tj) U e(Ti&~Tj): the extension of a term is the union of the extensions of its conjunctions with any term and its negation.

The properties given for extensions guarantee that the denials, conjunctions and disjunctions in L are preserved respectively set-theoretically as complements, intersections and unions in D. Also, extension will enable us to define truth, which is done below when considering representing.

#(Di) = #(Dj) IFF (Ef)(f : Di 1-1 Dj): Two subsets have the same number iff there is a 1-1 function between them.

This is fairly called "Hume's Principle", in words say: two subsets have the same number iff there is a 1-1 function between them, because the philosopher David Hume was one of the first to see that the right-hand side can be used to define the left hand side, and thus gives a start to explain what number is. Also, the number that thus gets defined is known as the cardinal number of the set, and will be used below to define proportion which can be used to define probability.

#(D) = #(Di) + #(-Di): The number of D is the sum of the numbers of any subset of D and its complement, and
#(Di) = #(DiODj) + #(DiO-Dj): The number of a subset of D is the sum of the numbers of the intersections of any subset and its complement.

The properties given for numbers guarantee that the number of the domain equals the sum of the numbers of any subset and its complement. Thus complements, intersections and exclusive unions are preserved numerically in terms of subtractions and additions.

p(Di) = #(Di) : #(D): The proportion of a subset is its number divided by the number of D, and
p(Di|Dj) = #(DiODj) : #(Dj): The conditional proportion of a subset Di in a subset Dj is the number of their intersection divided by the number of Dj.

Note that in fact proportion is defined using the cardinal numbers of the subsets of the domain plus the ordinary rules for arithmetic that one has with cardinal numbers. Also p(.) is so much like standard probability that it entails the standard axiomatization of probability, as will be shown below. The point to notice here is only that proportion is definable in terms of number. (Sections)


(1.4) There is a set K of true statements properly contained in L:
       (
jK)(KaL & K≠L & (KiЄK)(p(e(Ki))=1))

There is a set K of true statements properly contained in L. Why these statements are supposed to be true is not a directly relevant concern. One must assume something, and indeed K is named K because it is taken to contain all background knowledge. And what is assumed is supposed to be true unless and until one knows that what is assumed entails something false, to which we turn to in the next condition.

Note (1.4) covers statements of probability as well, namely in the form p(X=x)=1, and that K also includes whatever is known about D. In brief, K consists of all background knowledge one assumes and may use. And normally K contains much more than any specific theory one adds hypothetically to it. (Sections)


(1.5) There is a set F of false statements properly contained in L:
          (
jF)(FaL & F≠L & (FiЄF)(p(e(Ki))=0))

There is a set F of false statements properly contained in L. These are supposed to be false. Why these statements are supposed to be false is not a directly relevant concern. But one must be able to contradict some theories, and indeed one has a start for F by taking a statements from K and putting "~" before them.

Together (1.4) and (1.5) assert that whenever we propose a theory this is done in a context of presumed knowledge. This makes both intuitive sense, for it is hard to see how or why one could propose theories without any background knowledhe, and logical sense, because we need to appeal to background knowledge to check out how good our proposed theories are in explaining the facts as we know them.

It is also worth remarking that what belongs to K and F is presumptive knowledge, which itself may be qualified or withdrawn in the face of new evidence. (Sections)


(1.6) T is properly contained in L:
         (T
aL & T≠L)

T is what one abduces i.e. proposes as explanation. There are conditions on it below, but the general intuitive reason is obvious: One sets up a theory made up from assumptions if such knowledge as one has does not allow one to deduce certain facts one does want to explain, where as the assumptions in the theory together with one's background knowledge does allow one to deduce the facts one wants to explain.

Note also that the present condition implies that T is not inconsistent, for if it is it implies and contains all of L, and that it is not assumed T is true. (Sections)


(1.7) T has no false consequences:
          ~(
jC)(T |= C & CЄF)

Though when a theory is proposed one does not know it to be true, one must know that it is not refuted by such facts as one knows, and this (1.7) expresses.

This is a minimal condition on theories, and it is used to refute any theory which does have false consequences. (Sections)


(1.8) T explains something about D in L:

This is another minimal condition on theories, which is formulated here in probabilistic terms: There is a statement S about D in L such that K&T is positively relevant to S and such that S at least as probable as not on K&T and S follows from K while the probability of T on K is the the probability of T on K&S.

Here we must mention a logical point and introduce some notation.

The logical point is that the proportion we introduce is so much like probability that we may use it as probability. This will be shown below. The notation to be introduced relates to this and is as follows:

Convention: p(S')=x =def (jSЄL)(jTЄL)(jDiaD)
                                               (S'=Di=e(S) & #(e(S)):#(D)=x )

Note that given two statements S and T the assumptions we made entail that the right-hand side of the definition is true , and that we introduce accents to maintain a notational reference to the statements the sets on the left are the extensions of. This convention will be followed where appropriate in the rest of this text.

Assuming this, (1.8) may be formalized as follows:

        (ESЄL)
            (p(S'|K'&T')>p(S'|K'&~T') &
             p(S'|K')=1)

Note the simplest version of (1.8) is (EX')(K'&T' |= X' & X'ЄK) i.e. T does entail some consequence that is known to be true. Also note this needs only to apply to all of T, and not necessarily to parts of T.

However, all that is needed is merely that K&T makes S more probable than K&~T i.e. K without T. (Sections)


(1.9) T satisfies the abductive condition:

What we have so far by our conditions is a theory T that is consistent, not false, and that explains at least something. But we need more to confirm T, and specifically we need a probability for T.

Such a probability does not follow from what we have assumed so far, for while we know from the assumptions we made that T represents some set in D we do not know what its (relative or absolute) size is.

This requirement may be put in words as follows: The probability of a theory T on background knowledge K is the probability of its least probable known proper consequence on K.

This is an assumption, which needs a justification. This comes in several parts.

First, there is the point that it amounts to a strengthening of the following theorem of any standard formal probability theory:

         (T)(Q) (T |= Q --> p(T')Rp(Q') )

That is: For any statements T and Q, if T does entail (explain) Q, then the probability of T is not larger than the probability of Q.

The abductive condition strengthens this inequality to an equality in the case that Q' is the least probable of T's known consequences, and it does so to obtain a probability for T' - that then can be changed by any incoming evidence by using Bayes' Theorem.

Bayes' Theorem is a rather elementary theorem of formal probability theory, which may be written like

       p(T'|Q') = p(Q'|T').p(T')

and read as "The probability of a theory given evidence Q' equals the probability of the evidence given the theory times the probability of the theory"· Clearly, this requires a probability for T, and the abductive condition gives that, and gives it based on the available evidence.

Second, what is used in the abductive condition is not the above theorem as applied to any consequence of T, since this would also cover everything in K, but only the proper consequences of T on K. This is defined as follows: 

        QЄpc(K&T) =def (K&T|=Q) & ~(K&~T|=Q)

which is to say that Q is a proper cxonsequence of background knowledge K and theory T iff Q follows from K&T but not from K&~T.

Third, a formal version of the abductive condition looks thus:

        p(T|K)=x IFF (jQeL) (p(Q|K&T)=x &
                           QЄpc(K&T) &
                           (S)(SЄpc(K&T) --> p(S|K&T)≥p(Q|K&T) )

The probability of a theory T on background knowledge K equals x precisely if there is a statement Q about D in L such that the probability of Q on K and T is x and Q is a proper consequence of K&T and every other known proper consequence of K&T is at least as probable as Q on K&T.

Note that this is "so far as known" c.q. "so far as such proper consequences have been derived". Thus, it may happen someone refutes your theory or makes it very improbable by showing that on the background knowledge you both share it can be proved that your theory entails a far less probable proposition than you originally believed. But this is as it should be: Reasoning logically consistent with such evidence as one has. (Sections)


(1.10) T satisfies the inductive condition:

It would seem we now have all we need to apply Bayesian confirmation to a theory T, but this is a - very common - mistake. At this point we have probabilities for consequences of theories and for theories, but to suppose one can now as it were automatically apply Bayesian reasoning is to forget that any actual application of Bayes' Theorem happens in a context in which, besides the evidence Q' for theory T', many other things happen to be true, all of which may be probabilistically relevant.

Thus, if T is our theory and P something it predicts that may be used as a test of T, then if P is true there will always be other truths, like Q, that may seem wholly unrelated to whatever T explains, but also might in reality be relevant. So what is needed is some way to move from p(T|K&Q&P) to p(T|K&P).

Therefore we need some condition that allows us to abstract from whatever facts that happen to be the case when testing a theory that are not relevant. In words, it amounts to the following: Theory T on background knowledge K is adequate in entailing all that is relevant to whatever it explains. 

To state this in a convenient formal format, it is helpful to introduce three definitions that belong to standard probability theory

         PrT|K     =def p(P|K&T)>p(P|K&~T)
         QiP|K     =def p(Q|K&P)=p(Q|K)
         QiT|K&P =def p(Q|K&T&P)=p(Q|K&P)

The first defines what it is for T to be relevant to P given K, namely if P on K&T differs from P on K&~T. This is what one needs for a prediction to be of any probabilistic use. The second defines what it is for Q to be irrelevant to P given K, namely if Q on K&T is the same as Q on K&~P. (The definition states a logical equivalent of this, because that is more convenient in the argument that follows.) And the third defines what it is for Q to be irrelevant to T given K&P, and is just along the same lines as the previous definition, except for conditionalizing on a conjunction.

Using these definitions, the inductive condition we shall need is this:

         ( QЄL)(PЄL)(PrT|K ---> QiP|K IFF QiT|K&P)

This reads as: For all statements Q and P in L about D if S is relevant to T on K then Q is irrelevant to S on K IFF Q is irrelevant to T on K&S. This can be seen to amount to the above verbal claim that theory T on background knowledge K is adequate in entailing all that is relevant to whatever it explains, when we reflect that relevance is the denial of irrelevance, for then we see that the condition says that whatever is relevant in fact to a prediction of T also must be relevant given both T&P and conversely that if the theory entails something is relevant, then indeed it is relevant in fact.

If we expand the definitions what we obtain is

 (Q)(S)(p(S|K&T)>p(S|K&~T) --> p(Q|K&S)=p(Q|K) IFF p(Q|K&T&S)=p(Q|K&S)

This allows us to disregard as irrelevant all statements that are not entailed by T as relevant, and thus enables us to use Bayes Theorem of confirmation. Namely as follows, and supposing PrT|K and QiP|K:

p(T|K&Q&S) = p(Q|K&T&S)*p(K&T&S) : p(Q|K&S)*p(K&S)
                       = p(Q|K)*p(K&T&S) : p(Q|K)*p(K&S)
                       = p(K&T&S) : p(K&S)
                       = p(T|K&S)  

In this argument the inductive condition is applied in the second line, and the rest of the argument only involves standard probability theory. Obviously, this delivers what we wanted: We can wholly abstract from Q if T does not entail it is relevant. (Sections)


(2) On probability:

There are many interpretations of probability and also quite a few axiomatizations of probability. The main reason for the differences in interpretation and axiomatization is that "probability" has many meanings and many uses.

It is not possible to chart these meanings and uses adequately in less than a book's length of text, and I shall not even attempt to do so. All I want to do in this section is to note what is the currently standard mathematical approach to probability; to briefly discuss three kinds of axiomatizations; introduce four interpretations; and to show how the standard Kolmogorov axioms may be derived from the assumption that L represents D symbolically and numerically, as this has been defined above. (Sections)


(2.1) Measure theory and probability theory:

The current mathematical approach to probability theory is to see probability theory as a part of measure theory, which is a set-theoretical tool to measure sets, to represent sums, and to explain integration (which concerns infinite sums). It is mathematically the clearest approach to probability, and much besides it like the theory of integration. An excellent reference is P. Halmos, Measure Theory. (Sections)


(2.2) Six kinds of axiomatizations:

Broadly speaking - and see Fine and Stegmüller for much more information - there are three kinds of axiomatizations, some of which fit easily into a measure-theoretical frame-work and some of which don't

Qualitative or quantitative: Standard mathematical probability theory is numerical, and concerns equalities and inequalities of probabilistic formulas which are (usually) mapped to the real numbers between 0 and 1 inclusive.

The problems with this numerical approach are that quite a lot of intuitive reasoning with probabilities is not quantitative but qualitative - such and such is more probable than so and so, but one cannot say by how much, or one merely knows that one considers an event more probable than not, without knowing how probable  - and that often there are no good measurements of probabilities or no measurements possible. Therefore, there have been several proposals of axiomatizations of qualitative probability. 

Absolute or conditional: Standard mathematical probability theory concerns absolute probabilities, and then adds conditional probabilities as an apparent afterthought and by way of definition, like proportion was introduced above. 

The problem with absolute probabilities is that the vast majority of the probabilities one considers in everyday life or in science are somehow conditional. Therefore, there are several proposals axiomatizations in which conditional probability is basic.

Finitistic or infinitistic: The probabilities one meets in everyday life, such as are related to coin-tossing, dice-throwing, card-playing and gambling, are finite in a plausible sense, in that there are finitely many possible outcomes.

One can deal with the finite case by algebra and by combinatorial theory, but most interesting cases in physics, and most statistical applications of probability theory require sums of infinitely many possible outcomes. The problems that arise here are currently normally stated and solved in a measure-theoretical context. (Sections)


(2.3) Five kinds of Interpretations:

There have been many interpretations of what probability is. I will sketch five, namely two more or less old-fashioned ones; two currently fashionable ones; and my own theory of cardinal probability, that is compatible with the last two.

Again, to treat the subject of the interpretations of probability well, one needs at least a book, and so I will limit myself to the four interpretations that make some sense, and refer the reader to Weatherford for a book-length survey of the field, followed by my own theory, which will be considered in some more detail later on in this paper.

Logical interpretation: The logical interpretation seems to be the oldest interpretation if what probability is, and is often rendered as "probability is the ratio of selected cases to possible cases". Thus, the probability of throwing a 4 with an ordinary die is 1/6, and the probability of throwing an even number with an ordinary die is 3/6.

As will be seen, there is an underlying assumption that all possibilities that are distinguished count for the same and as one, which much simplifies the treatment of problems involving probability, but cannot easily or at all deal with weighted dice or different and unpredictable length of life.

Empirical interpretation: The empirical interpretation soon followed the logical interpretation, and tends to look at the actual frequencies with which distinguished possibilities in fact happen as information for what the probability of an event is.

This works well in practice with subjects where one can easily establish frequencies or samples, but this also makes it difficult to say what probability really is, both for such things as have frequencies, since these may change and anyway are partial information, and for such things as have no frequencies, like unique events and future events. 

Both the logical and the empirical interpretation are somewhat old-fashioned, though one still meets the empirical interpretation in social statistics.

Objective interpretation: This can be best rendered in the form of two claims, namely (1) there is real chance in the world, in the form of chance processes and chance events in physics, and real contingency in life and free choice and (2) probability theory provides the tools to represent its basic properties.

This can be seen as motivated by physics: According to quantum-mechanics there are real chance processes in nature. Until the rise of quantum-mechanics, all physical theories were deterministic, and probabilities only entered because one nearly always has incomplete knowledge and samples of populations. With the rise of quantum-mechanics this supposed determinism of nature had to be given up.

Subjective interpretation: There are various subjective or personal theories of probability. One way of rendering their intuitive basis is in terms of two claims:
(1) Persons have their own personal estimates of probabilities, which, if they are consistent, indeed behave according to probability-theory, and (2) these probabilities can be used for Bayesian confirmation.

This does justice to the fact that different persons may have different estimates of what is the probability of something, and enables each person to recalculate his original probabilities when given new evidence.

The first claim can be spelled out in quite a few different ways, based on different considerations, but these will not occupy us here since we assume its conclusion anyway.

It is especially the second claim which makes subjective interpretations useful. The reason is that while Bayes' Theorem is a rather elementary theorem of formal probability theory, applying Bayes' Theorem requires that one has p(T), and this one does not have on the standard non-subjective interpretations, for whatever theories represent, these things cannot be counted like cherries, and anyway will at least start to be largely unknown for new theories.

The reason one does not have this on the standard non-subjective interpretations is a fundamental lack of knowledge about the hypothesis T. And the reason one does have this on subjective interpretations is that then one may make any guess about the probability of any statement, provided only it is consistent with one's further assumptions. The set-back of this is that if this is wholly subjective, one can in principle fix it so that almost any evidence will have hardly any effect on it. Thus, not only need subjective probabilities not be based on the evidence, but they also can be chosen so extreme as to make almost any evidence have almost no effect.

Cardinal interpretation: The interpretation of probability I propose I call the cardinal interpretation, because it rests on the existence of cardinal numbers, which are guaranteed by non-probabilistic assumptions, namely, those given for extension and number and which exist anyway.

Hence there always will be some probability for any statement, and this probability will exist objectively because it derives from the cardinal numbers of the sets that are involved.

One set-back is that normally one does not know the cardinal probability, though one can normally establish evidence for such statements as represent things that can be counted empirically, rather as in the empirical interpretation of probability.

Another set-back is that one cannot count the things that are represented by a theory. The way to solve that problem is to make an assumption about the probability of a theory that is consistent with the rest of probability theory, and does not depend on personal whim but on logic. It is what I called the abductive condition: The probability of a theory T on background knowledge K is the probability of its least probable known proper consequence on K.

This was treated above, and all that needs to be remarked here is that it amounts to a strengthening of the following theorem of any standard formal probability theory:

      (T)(Q) (T |= Q --> p(T')Rp(Q') )

That is: For any statements T and Q, if T does entail (explain) Q, then the probability of T is not larger than the probability of Q.

The abductive condition strengthens this inequality to an equality in the case that Q is the least probable of T's known consequences, and it does so to obtain a probability for T - that then can be changed by any incoming evidence by using Bayes' Theorem.

The cardinal interpretation of probability is compatible with the objective interpretation, and is like the subjective interpretation in enabling the use of Bayesian confirmation, but it does not make this subjective, though it does make this dependent on such evidence as one has, including such consequences of the theory one has established.

Note it also has the interesting consequence that wherever we have a domain of sets we have implied probabilities for the sets, which exist as much as do the cardinal numbers of these sets - but that very often we don't have enough information to determine these cardinal numbers, and accordingly the best we can do is to make a guess about it, and try to confirm or infirm that guess by evidence. (Sections)


(2.4) Derivation of Kolmogorov's axioms from proportions

It is an interesting fact that the standard axioms for finite probability theory, that were first stated by Kolmogorov, can be derived from the assumptions in section (1), and especially those in (1.3).

The derivation is given in this section, which you may skip if you believe the result anyway.

These standard Kolmogorov axioms for probability are normally stated in such terms as:

 

Kolmogorov axioms for probability theory:

 

Suppose that $ is a set of propositions P, Q, R etc. and that this set is closed for negation, conjunction and disjunction, which is to say that whenever (P e $) and (Q e $), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a function that maps the propositions in $ into the real numbers in the following way, that is, satisfying the following three axioms:

 

 

A1.

For all P e $ the probability of P, written as pr(P), is some non-negative real number.

A2.

If P is logically valid, pr(P)=1.

A3.

If ~(P&Q) is logically valid, pr(PVQ)=pr(P)+pr(Q).


To derive Kolmogorov's standard axioms for finite probabilities from the proporties for proportions assumed in (1.3), it is convenient to first state three simple theorems about proportions:

Theorem 1: Di=Dj --> #(Di)=#(Dj)
      Proof: By Di
a Dj --> #Di<=#Dj

Theorem 2: p(Di|DJ) = p(Di
ODj) : p(Dj)
      Proof: p(Di|Dj) = #(Di
ODj):#(Dj) = (#(DiODj):#(D)):((Dj):#(D)) = p(DiODj):p(Dj)

Theorem 3: p(Di) = p(Di|D)
      Proof: p(Di) = #Di : #D = #Di
OD : #D = p(Di|D)

Second, since we have something much like probability, we can use what we have to define truth-values, namely as follows

v(T'i)=1 IFF p(T'i)>0 & p(~T'i)=0
v(T'i)=0 IFF v(T'i)
1

What we have done here amounts in fact to identifying the truth-values with the two extremes of a proportional or probabilistic distribution, which we have from the assumptions made in (1.3).

Third, we use the definition of truth-values to derive three results, which we will need below.

Theorem 4:  P1: v(Ti)=1 --> p(T'i)=1 &
                      P2: v(Ti --> Tj)=1 --> p(T'i) <= (T'j) &
                      P3: p(T'i)=p(T'&T'j)+p(T'i&~T'j)

Proof: The first is a direct consequence of the definitions for v(.). The third follows from Theorems 2 and 3, which entail that if Di=e(T'i) and Dj=e(T'j) then p(~Dj|Di)=1-p(Dj|Di), whence p(Di)=p(Dj|Di)p(Di)+p(~Dj|Di)p(Di), whence p(T'i)=p(T'&T'j)+p(T'i&~T'j). And now the first and third entail the second, for by the first v(Ti --> Tj)=1 --> p(Ti --> Tj)=1. Now the consequent of this is equivalent with p(~(Ti & ~Tj))=1 whence is equivalent with p(Ti & ~Tj)=0. And this entails by the third equality that p(Ti) <= (Tj).

First, there is the fundamental theorem that permits inferences from logical equivalences to probabilities:


T5:

|- (A iff B) --> pr(A)=pr(B)

Equivalent propositions have the same probability

(1)  

|- (A iff B) --> |- (A --> B) --> pr(A) <= pr(B) 

 P2

(2)

|- (A iff B) --> |- (B --> A) --> pr(B) <= pr(A)

 P2

(3)

|- (A iff B) --> pr(A) <= pr(B) & pr(B) <= pr(A)

 (1), (2)

(4)

|- (A iff B) --> pr(A) = pr(B)       

 (3), Algebra

Next, it is proved contradictions have probability 0:

T6

pr(A&~A)=0

Contradictory propositions have zero probability

(1)

pr(A)=pr(A&A)+pr(A&~A) 

P3

(2)

pr(A)=pr(A&A)

T5 with (|- A iff (A&A))

(3)

       =pr(A)+pr(A&~A)

(1), (2)

(4)

pr(A&~A)=0

(3), Algebra


It is often helpful to have in propositional logic two special constants, such as Taut (from "tauology") and Contrad (from "contradiction"). These are defined as: Taut iff AV~A and Contrad iff A&~A. Taking this for granted:

T7

0 <= pr(A) <= 1

Probabilities are between 0 and 1 inclusive

(1)

|- A --> pr(A)=1 

P1

(2)

|- pr(Taut)=1 

(1) and |- Taut

(3)

|- A --> Taut

Logic

(4)

pr(A) <= pr(Taut)

(3), P2

(5)

pr(A) <= 1

(2), (4)

(6)

 pr(Contrad)=0

T6

(7)

|- (Contrad --> A)  

Logic

(8)

pr(Contrad) <= pr(A) 

(7), P2

(9)

0 <= pr(A)   

(6), (8)

(10)

0 <= pr(A) <= 1

(5), (9)

Next, we need to prove the probabilistic theorem for denial. We do it in two steps:

T8

 pr(AV~A)=pr(A)+pr(~A)

Probability of disjunction of exclusives is sum of probability of factors

(1)

pr(AV~A)=pr((AV~A)&A)+pr((AV~A)&~A)

P3

(2)

pr(A)=pr((AV~A)&A)

T5, as |- ((AV~A)&A) iff A byT1

(3)

pr(~A) = pr((AV~A)&~A) 

As under (2)

(4)

pr(AV~A) = pr(A)+pr(~A

(1),(2),(3)

And now:

T9

pr(~A)=1-pr(A)

Probability of denial is complementary probability

(1)

pr(AV~A)=pr(A)+pr(~A)

T8

(2)

1=pr(A)+pr(~A)

P1 since |- AV~A

(3)

pr(~A) = 1-pr(A) 

(2), Algebra

Next, we have this parallel to P1:

T10

|-~A --> pr(A)=0

Provable non-truths have zero probability

(1)

|-~A 

Assumption

(2)

pr(~A)=1

(1), P1

(3)

1-pr(A)=1

(2), T9

(4)

pr(A)=0

(3), Algebra

The main point of T*6 and P1 is that if one can prove that A (or ~A), then thereby it follows that pr(A)=1 (or pr(A)=0 if |-~A). This is normally important in comparing the supposed truths and non-truths one can logically infer from a theory, with what the facts are (so that if one can prove that |-T A, while in fact one finds ~A one thereby has learned the assumptions of theory T can't be all true, if the proof of |-T A was without mistakes in reasoning.)

Next, we need a theorem that serves as a lemma to the next theorem, but that needs a remark itself. The theorem is:

T11

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

Full disjunctive probabilistic sum of two factors

(1)

pr(A)+pr(~A)=1  

T9

(2)

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

(1), P3

The promised remark is that T*7 differs essentially from the similar theorem in CPL minus the probabilities: In CPL ([A&B]+[A&~B]+[~A&B]+[~A&~B]) is true and implies that precisely one of the four factors is true. In PT pr(A&B) + pr(A&~B) + pr(~A&B) + pr(~A&~B) is certain, but normally none of the four alternatives itself is provably true by itself; normally none of the four alternatives are known to be true; and normally several or all of the alternatives will have a probability between 0 and 1 (conforming T*3).

Indeed, a very interesting aspect of PT is that it assigns numerical measures to all alternatives the underlying logic can distinguish, regardless of whether these alternatives are true or have ever been true. And part of the interest is that there normally are far more logically possible alternatives than logically provable alternatives.

To finish the proof CPT indeed implies all of Kolmogorov's axioms for PT we need to derive his P3:

T12

|-~(A&B) --> pr(AVB)=pr(A)+pr(B)

Conditional sums

(1)

|-~(A&B)  

AI

(2)

pr(A&B)=0

T10

(3)

pr(A)=pr(A&~B)

(2), P3

(4)

pr(B)=pr(~A&B)

(2), P3, T5

(5)

pr(AVB)=1-pr(~A&~B)

T5, T5 with |-(~(~A&~B)) iff (AVB)

(6)

          =pr(A&B)+pr(A&~B)+pr(~A&B)

T11

(7)

          =pr(A&~B)+pr(~A&B)

(2),(6)

 (8)

          =pr(A)+pr(B)

(3),(4),(7)

I have now proved all of Kolmogorov's axioms for the finite case: A1 follows from T7; A2 is P1; and A3 is T12. (Sections)


(3) On abduction

There is and has been a lot of confusion during many ages about the ideas for and definitions of deduction, abduction and induction, of which the beginnings can be found Aristotle.

The least problematical of these is deduction, and the 19th and 20th Centuries have produced many quite sophisticated mathematical and logical analyses of what is involved in deduction, and how one may set up proofs. It is easy to state the criterion a deduction C from premisses A1 .. An must satisfy to be valid: C is a valid deduction C from premisses A1 .. An iff it is impossible that C is not true in any case all of A1 .. An are true.

The matters of abduction and induction are far less clear, and indeed Aristotle was the first to confuse them, or not clearly distinguish them. In this section I will outline my views of abduction, and in the next my views of induction. (Sections)


(3.1) Peirce on abduction

To my knowledge, the first person to clearly single out abduction as an inference, namely of assumptions A1 .. An for a given conclusion C, was C.S. Peirce. Here is a relevant quotation:

"Abduction. (..) "Hypothesis [or abduction] may be defined as an argument which proceeds upon the assumption that a character which is known necessarily to involve a certain number of others, may be probably predicated of any object which has all the characteristics which this character is known to involve." (5.276) "An abduction is [thus] a method of forming a general prediction." (2.269) But this prediction is always in reference to an observed fact; indeed, an abductive conclusion "is only justified by its explaining an observed fact." (1.89) If we enter a room containing a number of bags of beans and a table upon which there is a handful of white beans, and if, after some searching, we open a bag which contains white beans only, we may infer as a probability, or fair guess, that the handful was taken from this bag. This sort of inference is called making an hypothesis or abduction. (J. Feibleman, "An Introduction to the Philosophy of Charles S. Peirce", p. 121-2. The numbers referred to are to paragraphs in Peirce's "Collected Papers".)

It is well to make some additional points here

  • Abductions indeed are inferences.
  • Abductions go beyond the evidence.
  • Apart from a number of general conditions on the theory inferred, such as that it should be consistent with the known evidence and background knowledge, should not false, should deductively entail what it is meant to explain, and should have some probability so that it can be confirmed, there is not much that can be said about abductions. The reason follows:
  • Abductions may involve highly creative hypotheses, that open a completely new perspective on something.
  • The best texts I know relating to abduction are by Peirce and by the mathematician Polya(Sections)

(3.2) The logical status of the abductive condition

The abductive condition amounts to the assumptions that, first there is such a thing as the probability of a theory, and second that it may be initially and conveniently settled by supposing this probability equals the maximum of what it may be given in probability theory on such knowledge as one presumes, while one also fully expects that this initial probability will be adjusted by further reasoning and still to be discovered new evidence.

So the abductive condition seems not so much a truth about nature as a truth about the ways and procedures human beings may use to discover the truth about nature. And indeed the abductive postulate seems safe and warranted in the sense that any probability it introduces can be - and usually will be - rationally corrected and adjusted; that indeed it may be increased or decreased by inductions; and that it is based on such evidence one has. 

And an abductive condition is needed because we need to have some factually based probability for theories that we decide are good explanations, if only to have a start to test them inductively: Without a probability for theories we can only try to refute them but not confirm them.  (Sections)


(4) On induction

Induction has been widely confused with abduction, and besides the term has been used in quite a few ways, though especially in statistics related texts mostly as some sort of generalizing hypothesis that - especially in statistical contexts, might be tested by some sample from some population, some hypothesis about the distribution of some attribute in the population, and a statistical test and calculation that finds to what extent the hypothesis concerning the distribution is supported or within which confidence-interval a supposed probability derived from such a distribution and the data falls.

Much of this statistical theorizing is useful and has many applications, but I rather call it by some such name as "statistical hypothesis-testing", because this indicates much better what is really involved. (Sections)


(4.1) Statistical induction and Bayesian reasoning

What I prefer to call induction and explained above involves the use of Bayes' Theorem, and aims at revising the probability of an assumption on the basis of evidence. The reason to call it induction is that indeed here one usually tests generalizations and other hypotheses that go beyond the evidence, which is what "induction" since Hume's critique of it is concerned with.

Besides, there is the following rather direct relation to deduction. The most fundamental principle of deductive inference is Modus Ponens, which may be written thus

If T deductively entails P and T is true, then P is true.

Now the intuitions of many persons say that if, conversely, if T deductively entails P and P is true, then T is usually more probable than it was before. Indeed, that is just the sort of reasoning that was explained above, but as will be seen, the ins and outs of it are not precisely obvious or self-evident. Also, one needs special assumptions of a probabilistic nature, for the inference that "if T deductively entails P and P is true, then T is true" is the well-known deductive fallacy of confirmation, which indeed is a fallacy in deductive logic.  (Sections)


(4.2) Learning from experience

Nearly everything in the above was motivated by a desire to explain how we can rationally explain our experiences, and thereby to explain, at least in general principle, how we can learn from experience.

Much of the results illustrate Bishop Butler's saying "Probability is the guide to life". In the foregoing sections I have shown what are the general logical principles and assumptions that go into scientific explanations, and have shown how much these depend on and involve probability.

In fact, there is much more to elementary probability theory, for we can also use it to prove principles like the following ones:

Confirmation:

The probability of a theory increases as its consequences are verified.

Support:

The probability of a theory increases as relevant circumstances are verified.

Competition:

The probability of a theory increases as its competing theories are falsified.

Undermining:

The probability of a theory decreases as its assumptions are falsified

In the present paper I shall not provide these proofs, but the reader can try for himself, or consult Polya's books in the Bibliography, or my "Classical Probability Theory and Learning from Experience".  (Sections)


(4.3) The logical status of the inductive condition

The basic reason to assume the inductive postulate is that one needs some assumption to deal with the very many facts that are true besides the theory one is interested in testing, since each of these very many facts may be relevant to the truth of the fact one is interested in, and so that it seems a good demand to make of a theory to be true that it should truly and fully entail all it is relevant to.

Also, in fact this postulate seems to be necessarily true if human beings can come to know nature by testing such theories as they have, for all such tests must include knowledge of what is relevant to what is tested and in what degree it is relevant and also of what is irrelevant to it, for relevancies and irrelevancies are facts that are as real as the facts they concern. In brief: one just cannot rely on any experimental evidence if one cannot rely on one's abstraction from much of the surrounding factual details as irrelevant, which is necessary in any experiment.

On the other hand, one cannot normally prove in complete or even considerable detail that any given theory that is to be tested in fact does correctly entail all that is relevant to it and does not entail as relevant anything that is in fact irrelevant. (Indeed, normally only a few known relevant factors are listed in any report of a scientific experiment together with an indication how these have been dealt with in the experimental set-up. Yet any design of experiments must involve assumptions about factors that are relevant and that are irrelevant to what is to be tested.)

But since true theories must properly entail the true degrees of relevancies of their predictions, all one can do is to assume that one's theories do so, and to take care of all relevancies one does know.

So the inductive postulate seems not so much a truth about nature as a truth about the ways and procedures human beings use to discover the truth about nature, and one which is true to the extent human beings have true theories about nature, for true theories must satisfy the inductive postulate, even if no human being is able to survey all of the universe and establish all its presumed relevancies and irrelevancies are factually correct. And indeed the inductive postulate seems safe and warranted in the sense that any probabilities it introduces can be - and usually will be - rationally corrected and adjusted by later evidence. Also it suggests a reason for experiments that fail or turn out unexpected results: One may have disregarded as irrelevant some factor that is relevant i.e. one may have falsely assumed that one's theory T satisfied the inductive postulate. Finally, the inductive postulate is needed because in any experimental test of a theory we need to abstract from very many accompanying circumstances.  (Sections)


(5) Summary and discussion

I have in this paper in section (1) proposed, stated both informally and formally, and discussed in some detail ten conditions which explanations must satisfy to be called rational or scientific. To my knowledge, the key postulates here, namely what I call the abductive and inductive conditions are new, as are the notions of representing and, especially, representing symbolically and numerically.

In section (2) I have discussed various interpretations and axiiomatizations of probabibility theory, and shown how the condition of representing symbolically and numerically entails the standard axioms for probability.

In section (3) I have discussed abduction, including Peirce who first saw its fundamental importance; and the question what the logical status of my abductive condition is: Much like an assumption we must make in order to explain rationally, but which also is rationally corrigible by the evidence whenever it has been made.

In section (4) I have discussed induction; compared my usage of the term with what used to be common in statistics; noted that once we have probability theory we can explain human learning from experience in a much better way than is possible in standard logic without probability; and considered the question what the logical status of my inductive condition is: Again, much like an assumption we must make in order to learn experimentally from experience, but which also is rationally corrigible by the evidence whenever it has been made, since an excellent reason for the failure of a theory is that it was falsely assumed it satisfies the inductive condition - in which case the theory fails to imply the relevancy of certain facts which are in reality relevant to whatever it attempts to explain.  (Sections)

Maarten Maartensz
Amsterdam, June 2004


Bibliography:

  • Ernest Adams:
  • Mario Bunge: Treatise on Basic Philosophy
  • Arthur Burks: Chance, Cause and Reason
  • Terence Fine: Theories of Probability
  • Klausner - Kuntz: Philosophy - The study of alternative beliefs
  • David Hawkins: The language of nature
  • Paul Halmos: Naive Set Theory
  • Paul Halmos:  Measure Theory
  • C.S. Peirce: Collected Papers
  • G. Polya: Principles of Plausible Reasoning
  • G. Polya: How to solve it
  • Bertrand Russell: History of Western Philosophy
  • Bertrand Russell: Problems of Philosophy
  • Bertrand Russell: Human Knowledge - Its scope and limits
  • Wolfgang Stegmüller: Probleme und Resultate der Wissenschaftstheorie und Analytische Philosophie
  • R. Weatherford: Interpretations of Probability
  • W.G. Wood & D.H. Martin: Experimental Method

 


Notes:

Note 1: Here I introduce a convention that can be formulated in general terms as: A condition in a universally quantified implication may be written inside the universal quantifier. Thus (xЄX)(XiaX)(xЄXi IFF f(x)Єf(Xi)) IFF
                           (x)(Xi
)(
xЄX &XiaX ->xЄXi IFF f(x)Єf(Xi))
This convention will be used in the rest of this paper where appropriate, since it results in clearer and shorter formulas. Back.

 


Colofon:
First draft version: 24 jun 2004.
Last draft version: 26 jun 2004.
This is a pre-publication.
Copyright Maarten Maartensz.