Abstract: This paper clarifies some of the fundamental logical principles of
scientific explanation. It does so by giving ten conditions for scientific
explanation that are then explained and commented, after which some more
details are given about probability, abduction and induction.
The paper presupposes some knowledge of set theory and of elementary
probability theory. The requisite knowledge can be found in Halmos and Adams,
or in Stegmüller as listed in the literature.
Sections:
(0) Introduction
(1) On explanations
(1.1) There is a domain D, described by some set theory
(1.2) There is a language L that represents D
(1.3) Language L represents set D symbolically and numerically
(1.4) There is a set K of factual statements properly contained in L
(1.5) There is a set F of counter-factual statements properly contained
in L
(1.6) T is properly contained in L
(1.7) T has no consequences in F
(1.8) T deductively entails something about D in L that is not in K and
not in F
(1.9) T satisfies the abductive
condition
(1.10) T satisfies the inductive condition
(2) On probability
(2.1) Measure theory and probability theory
(2.2) Six kinds of
axiomatizations
(2.3) Five kinds of Interpretations
(2.4) Derivation of Kolmogorov's axioms from proportions
(3) On abduction
(3.1) Peirce on abduction
(3.2) The logical status of the abductive condition
(4)
On induction
(4.1) Statistical induction and Bayesian reasoning
(4.2) Learning from experience
(4.3) The logical status of the inductive condition
(5) Summary and discussion
See also:
Fundamental principles of valid reasoning;
The measurement of reality by truth and
probability; and Classical Probability Theory and
Learning from Experience.
Internet note: The fonts used are Verdana and
StarMath.
(0) Introduction:
What is a scientific explanation?
This paper provides an answer to
this question, and does so in logical set-theoretical terms, by stating,
proposing and explaining a number of conditions that a theory should satisfy to
be considered a minimal scientific explanation.
Here I take "theory" as "a set of
statements that describes some thing in some presupposed domain of things, about
which there may be much or little knowledge"; I take "knowledge" as "true
statements, verified by some methods that are taken for granted"; and I suppose
the reader has some familiarity with standard set-theory.
The intuitive ideas about explanations I start from are these:
Explanations are sets of
statements, about some thing that is somehow characterized and that is situated
in some set of things somehow characterized. Valid scientific explanations
are consistent with the evidence, not false, entail deductively at least part of
the evidence, and can be confirmed probabilistically and be refuted.
And the intuitive ideas about
reasoning I start from are these:
There are three basic kinds of reasoning, where
reasoning involves argumentation of any kind using assumptions and inferences
of conclusions:
1. Deductions: To find conclusions that
follow from given assumptions
2. Abductions: To find assumptions from which given given conclusions follows
3. Inductions: To confirm or infirm assumptions by showing their conclusions
do (not) conform to the observable facts.
Normally in reasoning all three kinds are involved: We
explain supposed facts by abductions; we check the
abduced assumptions by deductions of the facts they were to explain; and we test the assumptions arrived
by deducing consequences and then revising by inductions
the probabilities of the assumptions by probabilistic reasoning when these consequences are verified or falsified.
And indeed we shall see
that all three kinds of reasoning are involved in explaining.
The conditions I want to propose
are as follows, with my explanations following in as many sections.
(1) There is a domain D,
described by some set theory S(D)
(2) There is a language L
(3) Language L represents set D symbolically and numerically
(4) There is a set K of true statements properly contained in L
(5) There is a set F of false statements properly contained in L
(6) There is a set T of theoretical statements properly contained in L
(7) T has no deductive consequences in F
(8) T entails something about D in L that is not in T and not in F
(9) T satisfies the abductive
condition
(10) T satisfies the inductive condition (Sections)
(1) On explanations:
Informally and intuitively every
human being who can speak can explain, and knows in general terms how and when
to do so:
One sets up explanations for
presumed facts one does not know how to deduce from such knowledge one has, and
one does so by inventing a number of assumptions from which one can deduce what
one seeks to explain.
The basic difference between
scientific and other explanations is that non-scientific explanations, even if
they have been formulated carefully, tend to remain merely plausible in
subjective terms and tend not to rely on or refer to explicit principles of
inference, for which reason it is difficult to criticize them rationally because
many of the assumptions that are used to generate or defend the explanations are
either not explicitly formulated or are not rational to start with (but often
instead kinds of wishful thinking), and because often it is not quite clear what
the precise logical relation between assumptions and conclusions is. (Sections)
(1.1) There
is a domain D, described by some set theory:
D is the collection of things that
comprises what the theory T is about, and is supposed to be describable by and
also to some extent truly described by some standard set theory.
The main logical import of being
described by some standard set theory is that then one has by way of the power
set axiom a way of talking about all logical possibilities contained in D. (The
power set axiom asserts the existence of the set of subsets of any set X. One
standard formulation is: (X)(jY)(Y={Z:
ZaX}).)
And the main logical import of
having D is that it consists of statements of presumed factually true statements
about what the theory T is about, and about what may be related or relevant to
this.
It is not necessary that one has
much true knowledge about D, but one needs at least some, for without it one has
no presumed facts to reason about or to subject to investigation and
experimentation, and hence no basis for a scientific explanation. (Sections)
(1.2) There is a language L
that represents D:
One needs some way of formulating
a theory about some things in D, and so one needs some language L to do this in,
and one also needs to specify in what way L does describe things in D.
Of course, the more clearly and
precisely this language has been formulated, the better is one able to reason
with it and write logical arguments in it. In what follows L is supposed to be
some standard set theory. The main reason to choose set theory as the
language of L is that it makes the present treatment the simplest, and because
(almost) any other formal language one could choose in its stead may be
translated into set theory.
Also, choosing set theory as the
language L makes it easy to state what the basic sense of representing is.
In general terms, if something A
represents something B that means that one can infer some properties, relations
or things in B from some properties, relations or things in A. Thus, a menu
represents the courses of a diner, and a map represents the lay of the land.
What we need in general terms can
be written as "r(X,Y)" for "X represents Y" and can be defined thus if X and Y
are sets:
r(X,Y) IFF (Ef)(f : X |-> Y & (xЄX)(XiaX)(xЄXi
IFF f(x)Єf(Xi)) (Note 1)
In words: Set X represents set Y
if and only if there is a function that maps the elements and sets from X to
those of Y that preserves the element-relation. (Sections)
(1.3) Language L represents set
D symbolically and numerically:
Here we have arrived at the main
formal assumption of this paper, which may strike the reader as a rather
technical set-theoretical definition of what is involved in representing
something symbolically and numerically.
This is mostly appearance, for
what follows adds those assumptions to the presumed property of representing
that allows one to represent with symbols and represent numerical information.
We write "rsn(L,D)" for "L
represents D symbolically and numerically" and use D* for the powerset of
D, i.e. D*={Z:
ZaD},
and Di
and Dj
for subsets of D:
rsn(L,D)
IFF (je)(j#)(jp)
(r(L,D)
&
e :
L |-> D* &
# : D* |-> N &
p : D* |-> R
&
D = e(Ti) U (e~Ti) &
e(Ti) = e(Ti&Tj) U e(Ti&~Tj) &
#(Di)= #(Dj) IFF (Ef)(f : Di 1-1 Dj)
&
#(D) = #(Di) + #(-Di) &
#(Di) = #(DiODj)
+ #(DiO-Dj) &
p(Di) = #(Di) : #(D)
&
p(Di|Dj) = #(DiODj) : #(Dj))
In words, the above reads as
follows, where the general motivation for the whole conjunction of conditions is
that if we want to formulate claims in L about parts of D then we must somehow
correlate the terms and statements of L with the things and sets in D, and the
above does just that by assuming three functions with certain properties it also
states that state the conditions for the cardinal number of a set and the
proportions of cardinal numbers.
(je)(j#)(jp):
There are a function e, say extension, a function #, say number,
and a function p, say proportion.
These will concern, by the
assumptions that follow, respectively the set of things a statement represents;
the cardinal number of that set; and relative proportions that can be formed
using cardinal numbers.
r(L,I): Language L
represents domain D.
This was defined above.
e : L |-> D*: extension
maps the terms and statements of L to the subsets of D, and
# : D* |-> N: number maps the subsets of D to the natural numbers,
and
p : D* |-> R: proportion maps the subsets of D to the real
numbers.
Note that for extensions we just
two basic assignments for a statement, namely
e(Ti)=Ø and e(Ti)≠Ø,
which will be used below to define truth. Number refines this by enabling us to
use the cardinal number of a set, and proportion uses this to lay the foundation
of probability in terms of cardinal numbers.
D = e(Ti) U
(e~Ti): D is the union of
the extensions of a term its negation, and
e(Ti) = e(Ti&Tj) U e(Ti&~Tj):
the extension of a term is the union of the extensions of its conjunctions with
any term and its negation.
The properties given for
extensions guarantee that the denials, conjunctions and disjunctions in L are
preserved respectively set-theoretically as complements, intersections and
unions in D. Also, extension will enable us to define truth, which is done below
when considering representing.
#(Di) = #(Dj) IFF (Ef)(f : Di 1-1 Dj):
Two subsets have the same number iff there is a 1-1 function between them.
This is fairly called "Hume's
Principle", in words say: two subsets have the same number iff there is a 1-1
function between them, because the philosopher David Hume was one of the first
to see that the right-hand side can be used to define the left hand side, and
thus gives a start to explain what number is. Also, the number that thus gets
defined is known as the cardinal number of the set, and will be used
below to define proportion which can be used to define probability.
#(D) = #(Di) + #(-Di):
The number of D is the sum of the numbers of any subset of D and its complement,
and
#(Di) = #(DiODj)
+ #(DiO-Dj):
The number of a subset of D is the sum of the numbers of the intersections of
any subset and its complement.
The properties given for numbers
guarantee that the number of the domain equals the sum of the numbers of any
subset and its complement. Thus complements, intersections and exclusive unions
are preserved numerically in terms of subtractions and additions.
p(Di) = #(Di) : #(D):
The proportion of a subset is its number divided by the number of D, and
p(Di|Dj) = #(DiODj) : #(Dj):
The conditional proportion of a subset Di
in a subset Dj is the number of
their intersection divided by the number of Dj.
Note that in fact proportion is
defined using the cardinal numbers of the subsets of the domain plus the
ordinary rules for arithmetic that one has with cardinal numbers. Also p(.) is
so much like standard probability that it entails the standard axiomatization of
probability, as will be shown below. The point to notice here is only that
proportion is definable in terms of number. (Sections)
(1.4) There is a set K of true
statements properly contained in L:
(jK)(KaL
& K≠L & (KiЄK)(p(e(Ki))=1))
There is a set K of true
statements properly contained in L. Why these statements are supposed to be true
is not a directly relevant concern. One must assume something, and indeed K is
named K because it is taken to contain all background knowledge. And what is
assumed is supposed to be true unless and until one knows that what is assumed
entails something false, to which we turn to in the next condition.
Note (1.4) covers statements of
probability as well, namely in the form p(X=x)=1, and that K also includes
whatever is known about D. In brief, K consists of all background knowledge
one assumes and may use. And normally K contains much more than any specific
theory one adds hypothetically to it. (Sections)
(1.5) There is a set F of false
statements properly contained in L:
(jF)(FaL
& F≠L & (FiЄF)(p(e(Ki))=0))
There is a set F of false statements properly contained in L. These are supposed
to be false. Why these statements are supposed to be false is not a directly
relevant concern. But one must be able to contradict some theories, and indeed
one has a start for F by taking a statements from K and putting "~" before them.
Together (1.4) and (1.5) assert
that whenever we propose a theory this is done in a context of presumed
knowledge. This makes both intuitive sense, for it is hard to see how or why one
could propose theories without any background knowledhe, and logical sense,
because we need to appeal to background knowledge to check out how good our
proposed theories are in explaining the facts as we know them.
It is also worth remarking that
what belongs to K and F is presumptive knowledge, which itself may be qualified
or withdrawn in the face of new evidence. (Sections)
(1.6) T is properly contained
in L:
(TaL
& T≠L)
T is what one abduces i.e.
proposes as explanation. There are conditions on it below, but the general
intuitive reason is obvious: One sets up a theory made up from assumptions if
such knowledge as one has does not allow one to deduce certain facts one does
want to explain, where as the assumptions in the theory together with one's
background knowledge does allow one to deduce the facts one wants to explain.
Note also that the present
condition implies that T is not inconsistent, for if it is it implies and
contains all of L, and that it is not assumed T is true. (Sections)
(1.7) T has no false
consequences:
~(jC)(T
|= C & CЄF)
Though when a theory is proposed one does not know it to be true, one must know
that it is not refuted by such facts as one knows, and this (1.7) expresses.
This is a minimal condition on
theories, and it is used to refute any theory which does have false
consequences. (Sections)
(1.8) T explains something
about D in L:
This is another minimal condition on theories, which is formulated here in
probabilistic terms: There is a statement S about D in L such that K&T is
positively relevant to S and such that S at least as probable as not on K&T and
S follows from K while the probability of T on K is the the probability of T on
K&S.
Here we must mention a logical
point and introduce some notation.
The logical point is that the
proportion we introduce is so much like probability that we may use it as
probability. This will be shown below. The notation to be introduced relates to
this and is as follows:
Convention: p(S')=x =def (jSЄL)(jTЄL)(jDiaD)
(S'=Di=e(S)
& #(e(S)):#(D)=x )
Note that given two statements S
and T the assumptions we made entail that the right-hand side of the definition
is true , and that we introduce accents to maintain a notational reference to
the statements the sets on the left are the extensions of. This convention will
be followed where appropriate in the rest of this text.
Assuming this, (1.8) may be
formalized as follows:
(ESЄL)
(p(S'|K'&T')>p(S'|K'&~T') &
p(S'|K')=1)
Note the simplest version of (1.8)
is (EX')(K'&T' |= X' & X'ЄK) i.e. T does entail some consequence that is known
to be true. Also note this needs only to apply to all of T, and not
necessarily to parts of T.
However, all that is needed is
merely that K&T makes S more probable than K&~T i.e. K without T. (Sections)
(1.9) T satisfies the abductive
condition:
What we have so far by our
conditions is a theory T that is consistent, not false, and that explains at
least something. But we need more to confirm T, and specifically we need a
probability for T.
Such a probability does not follow
from what we have assumed so far, for while we know from the assumptions we made
that T represents some set in D we do not know what its (relative or absolute)
size is.
This requirement may be put in
words as follows: The probability of a theory T on background knowledge K is
the probability of its least probable known proper consequence on K.
This is an assumption, which needs
a justification. This comes in several parts.
First, there is the point that it
amounts to a strengthening of the following theorem of any standard formal
probability theory:
(T)(Q) (T |= Q --> p(T')Rp(Q')
)
That is: For any statements T and
Q, if T does entail (explain) Q, then the probability of T is not larger
than the probability of Q.
The abductive condition
strengthens this inequality to an equality in the case that Q' is the least
probable of T's known consequences, and it does so to obtain a probability for
T' - that then can be changed by any incoming evidence by using Bayes' Theorem.
Bayes' Theorem is a rather
elementary theorem of formal probability theory, which may be written like
p(T'|Q') = p(Q'|T').p(T')
and read as "The probability of a
theory given evidence Q' equals the probability of the evidence given the theory
times the probability of the theory"· Clearly, this requires a probability for
T, and the abductive condition gives that, and gives it based on the available
evidence.
Second, what is used in the
abductive condition is not the above theorem as applied to any consequence of T,
since this would also cover everything in K, but only the proper consequences of
T on K. This is defined as follows:
QЄpc(K&T) =def (K&T|=Q) & ~(K&~T|=Q)
which is to say that Q is a proper
cxonsequence of background knowledge K and theory T iff Q follows from K&T but
not from K&~T.
Third, a formal version of the
abductive condition looks thus:
p(T|K)=x IFF (jQeL)
(p(Q|K&T)=x &
QЄpc(K&T) &
(S)(SЄpc(K&T) --> p(S|K&T)≥p(Q|K&T) )
The probability of a theory T on
background knowledge K equals x precisely if there is a statement Q about D in
L such that the probability of Q on K and T is x and Q is a proper consequence
of K&T and every other known proper consequence of K&T is at least as probable
as Q on K&T.
Note that this is "so far as
known" c.q. "so far as such proper consequences have been derived". Thus, it may
happen someone refutes your theory or makes it very improbable by showing that
on the background knowledge you both share it can be proved that your theory
entails a far less probable proposition than you originally believed. But this
is as it should be: Reasoning logically consistent with such evidence as one
has. (Sections)
(1.10) T satisfies the
inductive
condition:
It would seem we now have all we
need to apply Bayesian confirmation to a theory T, but this is a - very common -
mistake. At this point we have probabilities for consequences of theories and
for theories, but to suppose one can now as it were automatically apply Bayesian
reasoning is to forget that any actual application of Bayes' Theorem happens in
a context in which, besides the evidence Q' for theory T', many other things
happen to be true, all of which may be probabilistically relevant.
Thus, if T is our theory and P
something it predicts that may be used as a test of T, then if P is true there
will always be other truths, like Q, that may seem wholly unrelated to whatever
T explains, but also might in reality be relevant. So what is needed is some way
to move from p(T|K&Q&P) to p(T|K&P).
Therefore we need some condition
that allows us to abstract from whatever facts that happen to be the case when
testing a theory that are not relevant. In words, it amounts to the following:
Theory T on background knowledge K is adequate in entailing all that is
relevant to whatever it explains.
To state this in a convenient
formal format, it is helpful to introduce three definitions that belong to
standard probability theory
PrT|K
=def p(P|K&T)>p(P|K&~T)
QiP|K
=def p(Q|K&P)=p(Q|K)
QiT|K&P =def p(Q|K&T&P)=p(Q|K&P)
The first defines what it is for T to be relevant to P given K, namely if
P on K&T differs from P on K&~T. This is what one needs for a prediction to be
of any probabilistic use. The second defines what it is for Q to be
irrelevant to P given K, namely if Q on K&T is the same as Q on K&~P. (The
definition states a logical equivalent of this, because that is more convenient
in the argument that follows.) And the third defines what it is for Q to be
irrelevant to T given K&P, and is just along the same lines as the previous
definition, except for conditionalizing on a conjunction.
Using these definitions, the
inductive condition we shall need is this:
(
QЄL)(PЄL)(PrT|K ---> QiP|K IFF QiT|K&P)
This reads as: For all statements Q and
P in L about D if S is relevant to T on K then Q is irrelevant to S on K IFF Q
is irrelevant to T on K&S. This can be seen to amount to the above verbal claim
that theory T on background knowledge K is adequate in entailing all that is
relevant to whatever it explains, when we reflect that relevance is the denial
of irrelevance, for then we see that the condition says that whatever is
relevant in fact to a prediction of T also must be relevant given both T&P and
conversely that if the theory entails something is relevant, then indeed it is
relevant in fact.
If we expand the definitions what
we obtain is
(Q)(S)(p(S|K&T)>p(S|K&~T) -->
p(Q|K&S)=p(Q|K) IFF p(Q|K&T&S)=p(Q|K&S)
This allows us to disregard as
irrelevant all statements that are not entailed by T as relevant, and thus
enables us to use Bayes Theorem of confirmation. Namely as follows, and
supposing PrT|K and QiP|K:
p(T|K&Q&S) = p(Q|K&T&S)*p(K&T&S) :
p(Q|K&S)*p(K&S)
= p(Q|K)*p(K&T&S) : p(Q|K)*p(K&S)
= p(K&T&S) : p(K&S)
= p(T|K&S)
In this argument the inductive
condition is applied in the second line, and the rest of the argument only
involves standard probability theory. Obviously, this delivers what we wanted:
We can wholly abstract from Q if T does not entail it is relevant. (Sections)
(2) On probability:
There are many interpretations of
probability and also quite a few axiomatizations of probability. The main reason
for the differences in interpretation and axiomatization is that "probability"
has many meanings and many uses.
It is not possible to chart these
meanings and uses adequately in less than a book's length of text, and I shall
not even attempt to do so. All I want to do in this section is to note what is
the currently standard mathematical approach to probability; to briefly discuss
three kinds of axiomatizations; introduce four interpretations; and to show how
the standard Kolmogorov axioms may be derived from the assumption that L
represents D symbolically and numerically, as this has been defined above. (Sections)
(2.1) Measure theory and
probability theory:
The current mathematical approach
to probability theory is to see probability theory as a part of measure theory,
which is a set-theoretical tool to measure sets, to represent sums, and to
explain integration (which concerns infinite sums). It is mathematically the
clearest approach to probability, and much besides it like the theory of
integration. An excellent reference is P. Halmos, Measure Theory. (Sections)
(2.2) Six kinds of
axiomatizations:
Broadly speaking - and see Fine
and Stegmüller for much more information - there are three kinds of
axiomatizations, some of which fit easily into a measure-theoretical frame-work
and some of which don't
Qualitative or quantitative:
Standard mathematical probability theory is numerical, and concerns equalities
and inequalities of probabilistic formulas which are (usually) mapped to the
real numbers between 0 and 1 inclusive.
The problems with this numerical
approach are that quite a lot of intuitive reasoning with probabilities is not
quantitative but qualitative - such and such is more probable than so and so,
but one cannot say by how much, or one merely knows that one considers an event
more probable than not, without knowing how probable - and that often
there are no good measurements of probabilities or no measurements possible.
Therefore, there have been several proposals of axiomatizations of qualitative
probability.
Absolute or conditional: Standard mathematical probability theory
concerns absolute probabilities, and then adds conditional probabilities as an
apparent afterthought and by way of definition, like proportion was introduced
above.
The problem with absolute
probabilities is that the vast majority of the probabilities one considers in
everyday life or in science are somehow conditional. Therefore, there are
several proposals axiomatizations in which conditional probability is basic.
Finitistic or infinitistic: The probabilities one meets in everyday life,
such as are related to coin-tossing, dice-throwing, card-playing and gambling, are finite in a
plausible sense, in that there are finitely many possible outcomes.
One can deal with the finite case
by algebra and by combinatorial theory, but most interesting cases in physics,
and most statistical applications of probability theory require sums of
infinitely many possible outcomes. The problems that arise here are currently
normally stated and solved in a measure-theoretical context. (Sections)
(2.3) Five
kinds of Interpretations:
There have been many
interpretations of what probability is. I will sketch five, namely two more or
less old-fashioned ones; two currently fashionable ones; and my own theory of
cardinal probability, that is compatible with the last two.
Again, to treat the subject of the
interpretations of probability well, one needs at least a book, and so I will
limit myself to the four interpretations that make some sense, and refer the
reader to Weatherford for a book-length survey of the field, followed by my own
theory, which will be considered in some more detail later on in this paper.
Logical interpretation: The logical interpretation seems to be the oldest
interpretation if what probability is, and is often rendered as "probability is
the ratio of selected cases to possible cases". Thus, the probability of
throwing a 4 with an ordinary die is 1/6, and the probability of throwing an
even number with an ordinary die is 3/6.
As will be seen, there is an
underlying assumption that all possibilities that are distinguished count for
the same and as one, which much simplifies the treatment of problems involving
probability, but cannot easily or at all deal with weighted dice or different
and unpredictable length of life.
Empirical interpretation: The empirical interpretation soon followed the
logical interpretation, and tends to look at the actual frequencies with which
distinguished possibilities in fact happen as information for what the
probability of an event is.
This works well in practice with
subjects where one can easily establish frequencies or samples, but this also
makes it difficult to say what probability really is, both for such things as
have frequencies, since these may change and anyway are partial information, and
for such things as have no frequencies, like unique events and future events.
Both the logical and the empirical
interpretation are somewhat old-fashioned, though one still meets the empirical
interpretation in social statistics.
Objective interpretation: This can be best rendered in the form of two
claims, namely (1) there is real chance in the world, in the form of chance
processes and chance events in physics, and real contingency in life and free
choice and (2) probability theory provides the tools to represent its basic
properties.
This can be seen as motivated by
physics: According to quantum-mechanics there are real chance processes in
nature. Until the rise of quantum-mechanics, all physical theories were
deterministic, and probabilities only entered because one nearly always has
incomplete knowledge and samples of populations. With the rise of
quantum-mechanics this supposed determinism of nature had to be given up.
Subjective interpretation: There are various subjective or personal
theories of probability. One way of rendering their intuitive basis is in terms
of two claims:
(1) Persons have their own personal estimates of probabilities, which, if they
are consistent, indeed behave according to probability-theory, and (2) these
probabilities can be used for Bayesian confirmation.
This does justice to the fact that
different persons may have different estimates of what is the probability of
something, and enables each person to recalculate his original probabilities
when given new evidence.
The first claim can be spelled out
in quite a few different ways, based on different considerations, but these will
not occupy us here since we assume its conclusion anyway.
It is especially the second claim
which makes subjective interpretations useful. The reason is that while Bayes'
Theorem is a rather elementary theorem of formal probability theory, applying
Bayes' Theorem requires that one has p(T), and this one does not have on the
standard non-subjective interpretations, for whatever theories represent, these
things cannot be counted like cherries, and anyway will at least start to be
largely unknown for new theories.
The reason one does not have this
on the standard non-subjective interpretations is a fundamental lack of
knowledge about the hypothesis T. And the reason one does have this on
subjective interpretations is that then one may make any guess about the
probability of any statement, provided only it is consistent with one's further
assumptions. The set-back of this is that if this is wholly subjective, one can
in principle fix it so that almost any evidence will have hardly any effect on
it. Thus, not only need subjective probabilities not be based on the evidence,
but they also can be chosen so extreme as to make almost any evidence have
almost no effect.
Cardinal interpretation:
The interpretation of probability I propose I call the cardinal interpretation,
because it rests on the existence of cardinal numbers, which are guaranteed by
non-probabilistic assumptions, namely, those given for extension and number and
which exist anyway.
Hence there always will be some
probability for any statement, and this probability will exist objectively
because it derives from the cardinal numbers of the sets that are involved.
One set-back is that normally one
does not know the cardinal probability, though one can normally establish
evidence for such statements as represent things that can be counted
empirically, rather as in the empirical interpretation of probability.
Another set-back is that one
cannot count the things that are represented by a theory. The way to solve that
problem is to make an assumption about the probability of a theory that is
consistent with the rest of probability theory, and does not depend on personal
whim but on logic. It is what I called the abductive condition: The
probability of a theory T on background knowledge K is the probability of its
least probable known proper consequence on K.
This was treated above, and all that needs to be remarked here is that it
amounts to a strengthening of the following theorem of any standard formal
probability theory:
(T)(Q) (T |= Q --> p(T')Rp(Q')
)
That is: For any statements T and
Q, if T does entail (explain) Q, then the probability of T is not larger than
the probability of Q.
The abductive condition
strengthens this inequality to an equality in the case that Q is the least
probable of T's known consequences, and it does so to obtain a probability for T
- that then can be changed by any incoming evidence by using Bayes' Theorem.
The cardinal interpretation of
probability is compatible with the objective interpretation, and is like the
subjective interpretation in enabling the use of Bayesian confirmation, but it
does not make this subjective, though it does make this dependent on such
evidence as one has, including such consequences of the theory one has
established.
Note it also has the interesting
consequence that wherever we have a domain of sets we have implied probabilities
for the sets, which exist as much as do the cardinal numbers of these sets - but
that very often we don't have enough information to determine these cardinal
numbers, and accordingly the best we can do is to make a guess about it, and try
to confirm or infirm that guess by evidence. (Sections)
(2.4) Derivation of
Kolmogorov's axioms from proportions
It is an interesting fact that the standard
axioms for finite probability theory, that were first stated by Kolmogorov, can
be derived from the assumptions in section (1), and especially those in (1.3).
The derivation is given in this section, which
you may skip if you believe the result anyway.
These standard Kolmogorov axioms for probability are
normally stated in such terms as:
|
|
Kolmogorov axioms
for probability theory:
|
|
|
Suppose that $ is a
set of propositions P, Q, R etc. and that this set is closed for negation,
conjunction and disjunction, which is to say that whenever (P e $) and (Q e
$), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a function
that maps the propositions in $ into the real numbers in the following way,
that is, satisfying the following three axioms:
|
|
|
|
|
A1.
|
For all P e $ the
probability of P, written as pr(P), is some non-negative real number.
|
|
A2.
|
If P is logically
valid, pr(P)=1.
|
|
A3.
|
If ~(P&Q) is
logically valid, pr(PVQ)=pr(P)+pr(Q).
|
To derive Kolmogorov's standard axioms for finite probabilities from
the proporties for proportions assumed in (1.3), it is convenient to first
state three simple theorems about proportions:
Theorem 1: Di=Dj --> #(Di)=#(Dj)
Proof: By Di a
Dj --> #Di<=#Dj
Theorem 2: p(Di|DJ) = p(DiODj) :
p(Dj)
Proof: p(Di|Dj) = #(DiODj):#(Dj)
= (#(DiODj):#(D)):((Dj):#(D))
= p(DiODj):p(Dj)
Theorem 3: p(Di) = p(Di|D)
Proof: p(Di) = #Di : #D = #DiOD
: #D = p(Di|D)
Second, since we have something much like
probability, we can use what we have to define truth-values, namely as follows
v(T'i)=1 IFF p(T'i)>0 &
p(~T'i)=0
v(T'i)=0 IFF v(T'i)≠1
What we have done here amounts in fact to
identifying the truth-values with the two extremes of a proportional or
probabilistic distribution, which we have from the assumptions made in (1.3).
Third, we use the definition of truth-values to
derive three results, which we will need below.
Theorem 4: P1: v(Ti)=1 -->
p(T'i)=1 &
P2: v(Ti --> Tj)=1 --> p(T'i) <= (T'j)
&
P3: p(T'i)=p(T'&T'j)+p(T'i&~T'j)
Proof: The first is a direct
consequence of the definitions for v(.). The third follows from Theorems 2 and
3, which entail that if Di=e(T'i)
and Dj=e(T'j) then p(~Dj|Di)=1-p(Dj|Di),
whence p(Di)=p(Dj|Di)p(Di)+p(~Dj|Di)p(Di),
whence p(T'i)=p(T'&T'j)+p(T'i&~T'j). And
now the first and third entail the second, for by the first v(Ti -->
Tj)=1 --> p(Ti --> Tj)=1. Now the consequent of
this is equivalent with p(~(Ti & ~Tj))=1 whence is
equivalent with p(Ti & ~Tj)=0. And this entails by the
third equality that p(Ti) <= (Tj).
First, there
is the fundamental theorem that permits inferences from logical equivalences
to probabilities:
|
T5:
|
|- (A iff B) -->
pr(A)=pr(B)
|
Equivalent
propositions have the same probability
|
|
(1)
|
|- (A iff B) --> |-
(A --> B) --> pr(A) <= pr(B)
|
P2
|
|
(2)
|
|- (A iff B) --> |- (B
--> A) --> pr(B) <= pr(A)
|
P2
|
|
(3)
|
|- (A iff B) -->
pr(A) <= pr(B) & pr(B) <= pr(A)
|
(1), (2)
|
|
(4)
|
|- (A iff B) -->
pr(A) = pr(B)
|
(3), Algebra
|
Next, it is proved contradictions have probability 0:
|
T6
|
pr(A&~A)=0
|
Contradictory
propositions have zero probability
|
|
(1)
|
pr(A)=pr(A&A)+pr(A&~A)
|
P3
|
|
(2)
|
pr(A)=pr(A&A)
|
T5 with (|- A iff
(A&A))
|
|
(3)
|
=pr(A)+pr(A&~A)
|
(1), (2)
|
|
(4)
|
pr(A&~A)=0
|
(3), Algebra
|
It is often helpful to have in propositional logic two special constants,
such as Taut (from "tauology") and Contrad (from
"contradiction"). These are defined as: Taut iff AV~A and Contrad
iff A&~A. Taking this for granted:
|
T7
|
0 <= pr(A) <= 1
|
Probabilities are
between 0 and 1 inclusive
|
|
(1)
|
|- A -->
pr(A)=1
|
P1
|
|
(2)
|
|- pr(Taut)=1
|
(1) and |- Taut
|
|
(3)
|
|- A --> Taut
|
Logic
|
|
(4)
|
pr(A) <= pr(Taut)
|
(3), P2
|
|
(5)
|
pr(A) <= 1
|
(2), (4)
|
|
(6)
|
pr(Contrad)=0
|
T6
|
|
(7)
|
|- (Contrad -->
A)
|
Logic
|
|
(8)
|
pr(Contrad) <=
pr(A)
|
(7), P2
|
|
(9)
|
0 <=
pr(A)
|
(6), (8)
|
|
(10)
|
0 <= pr(A) <=
1
|
(5), (9)
|
Next, we need to prove the probabilistic theorem for
denial. We do it in two steps:
|
T8
|
pr(AV~A)=pr(A)+pr(~A)
|
Probability of
disjunction of exclusives is sum of probability of factors
|
|
(1)
|
pr(AV~A)=pr((AV~A)&A)+pr((AV~A)&~A)
|
P3
|
|
(2)
|
pr(A)=pr((AV~A)&A)
|
T5,
as |- ((AV~A)&A) iff A byT1
|
|
(3)
|
pr(~A) =
pr((AV~A)&~A)
|
As under
(2)
|
|
(4)
|
pr(AV~A) = pr(A)+pr(~A
|
(1),(2),(3)
|
And now:
|
T9
|
pr(~A)=1-pr(A)
|
Probability of denial
is complementary probability
|
|
(1)
|
pr(AV~A)=pr(A)+pr(~A)
|
T8
|
|
(2)
|
1=pr(A)+pr(~A)
|
P1 since |- AV~A
|
|
(3)
|
pr(~A) =
1-pr(A)
|
(2), Algebra
|
Next, we have this parallel
to P1:
|
T10
|
|-~A --> pr(A)=0
|
Provable non-truths
have zero probability
|
|
(1)
|
|-~A
|
Assumption
|
|
(2)
|
pr(~A)=1
|
(1), P1
|
|
(3)
|
1-pr(A)=1
|
(2), T9
|
|
(4)
|
pr(A)=0
|
(3), Algebra
|
The main
point of T*6 and P1 is that if
one can prove that A (or ~A), then thereby it follows that pr(A)=1 (or
pr(A)=0 if |-~A). This is normally important in comparing the supposed truths
and non-truths one can logically infer from a theory, with what the facts are
(so that if one can prove that |-T A, while in fact one finds ~A
one thereby has learned the assumptions of theory T can't be all true, if the
proof of |-T A was without mistakes in reasoning.)
Next, we
need a theorem that serves as a lemma to the next theorem, but that needs a
remark itself. The theorem is:
|
T11
|
pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1
|
Full disjunctive
probabilistic sum of two factors
|
|
(1)
|
pr(A)+pr(~A)=1
|
T9
|
|
(2)
|
pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1
|
(1), P3
|
The promised remark is that T*7 differs essentially from
the similar theorem in CPL minus the probabilities: In CPL
([A&B]+[A&~B]+[~A&B]+[~A&~B]) is true and implies that
precisely one of the four factors is true. In PT pr(A&B) + pr(A&~B) + pr(~A&B) + pr(~A&~B) is
certain, but normally
none of the four alternatives itself is provably true by itself; normally
none of the four alternatives are known to be true; and normally several or all of the
alternatives will have a probability between 0 and 1 (conforming T*3).
Indeed, a very interesting aspect of PT
is that it assigns numerical measures to all alternatives the underlying
logic can distinguish, regardless of whether these alternatives are true or
have ever been true. And part of the interest is that there normally are far
more logically possible alternatives than logically provable alternatives.
To finish
the proof CPT indeed implies all of Kolmogorov's axioms for PT we need to
derive his P3:
|
T12
|
|-~(A&B) -->
pr(AVB)=pr(A)+pr(B)
|
Conditional sums
|
|
(1)
|
|-~(A&B)
|
AI
|
|
(2)
|
pr(A&B)=0
|
T10
|
|
(3)
|
pr(A)=pr(A&~B)
|
(2), P3
|
|
(4)
|
pr(B)=pr(~A&B)
|
(2),
P3, T5
|
|
(5)
|
pr(AVB)=1-pr(~A&~B)
|
T5, T5 with
|-(~(~A&~B)) iff (AVB)
|
|
(6)
|
=pr(A&B)+pr(A&~B)+pr(~A&B)
|
T11
|
|
(7)
|
=pr(A&~B)+pr(~A&B)
|
(2),(6)
|
|
(8)
|
=pr(A)+pr(B)
|
(3),(4),(7)
|
I have now proved all of Kolmogorov's axioms for the finite case: A1 follows
from T7; A2 is P1; and A3 is T12.
(Sections)
(3) On
abduction
There is and has been a lot of confusion during
many ages about the ideas for and definitions of deduction, abduction and
induction, of which the beginnings can be found Aristotle.
The least problematical of these is deduction,
and the 19th and 20th Centuries have produced many quite sophisticated
mathematical and logical analyses of what is involved in deduction, and how
one may set up proofs. It is easy to state the criterion a deduction C from
premisses A1 .. An must satisfy to be valid: C is a valid deduction C from
premisses A1 .. An iff it is impossible that C is not true in any case all of
A1 .. An are true.
The matters of abduction and induction are far
less clear, and indeed Aristotle was the first to confuse them, or not clearly
distinguish them. In this section I will outline my views of abduction, and in
the next my views of induction. (Sections)
(3.1) Peirce on abduction
To my knowledge, the first person to clearly
single out abduction as an inference, namely of assumptions A1 .. An for a
given conclusion C, was C.S. Peirce. Here is a relevant quotation:
"Abduction. (..) "Hypothesis [or abduction] may be defined as an argument
which proceeds upon the assumption that a character which is known necessarily
to involve a certain number of others, may be probably predicated of any
object which has all the characteristics which this character is known to
involve." (5.276) "An abduction is [thus] a method of forming a general
prediction." (2.269) But this prediction is always in reference to an observed
fact; indeed, an abductive conclusion "is only justified by its explaining an
observed fact." (1.89) If we enter a room containing a number of bags of beans
and a table upon which there is a handful of white beans, and if, after some
searching, we open a bag which contains white beans only, we may infer as a
probability, or fair guess, that the handful was taken from this bag. This
sort of inference is called making an hypothesis or abduction. (J.
Feibleman, "An Introduction to the Philosophy of Charles S. Peirce",
p. 121-2. The numbers referred to are to paragraphs in Peirce's "Collected
Papers".)
It is well to make some additional points
here
- Abductions indeed are inferences.
- Abductions go beyond the evidence.
- Apart from a number of general conditions
on the theory inferred, such as that it should be consistent with the known
evidence and background knowledge, should not false, should deductively
entail what it is meant to explain, and should have some probability so that
it can be confirmed, there is not much that can be said about abductions.
The reason follows:
- Abductions may involve highly creative
hypotheses, that open a completely new perspective on something.
- The best texts I know relating to
abduction are by Peirce and by the mathematician Polya. (Sections)
(3.2) The logical status
of the abductive condition
The abductive condition amounts to the assumptions
that, first there is such a thing as the probability of a theory, and second that it may be initially and conveniently
settled by supposing this probability equals the maximum of what it may be
given in probability theory on such knowledge as one presumes, while one also
fully expects that this initial probability will be adjusted by further
reasoning and still to be discovered new evidence.
So the abductive condition seems not so much a
truth about nature as a truth about the ways and procedures human beings may use
to discover the truth about nature. And indeed the abductive postulate seems
safe and warranted in the sense that any probability it introduces can be - and
usually will be - rationally corrected and adjusted; that indeed it may be
increased or decreased by inductions; and that it is based on such evidence one
has.
And an abductive condition is needed because we
need to have some factually based probability for theories that we decide are
good explanations, if only to have a start to test them inductively: Without a
probability for theories we can only try to refute them but not confirm them.
(Sections)
(4) On
induction
Induction has been widely confused with
abduction, and besides the term has been used in quite a few ways, though
especially in statistics related texts mostly as some sort of generalizing
hypothesis that - especially in statistical contexts, might be tested by some
sample from some population, some hypothesis about the distribution of some
attribute in the population, and a statistical test and calculation that finds
to what extent the hypothesis concerning the distribution is supported or
within which confidence-interval a supposed probability derived from such a
distribution and the data falls.
Much of this statistical theorizing is useful
and has many applications, but I rather call it by some such name as
"statistical hypothesis-testing", because this indicates much better what is
really involved. (Sections)
(4.1) Statistical
induction and Bayesian reasoning
What I prefer to call induction and explained
above involves the use of Bayes' Theorem, and aims at revising the probability
of an assumption on the basis of evidence. The reason to call it induction
is
that indeed here one usually tests generalizations and other hypotheses that
go beyond the evidence, which is what "induction" since Hume's critique of it
is concerned with.
Besides, there is the following rather direct
relation to deduction. The most fundamental principle of deductive inference
is Modus Ponens, which may be written thus
If T deductively entails P and T is true, then P
is true.
Now the intuitions of many persons say that if,
conversely, if T deductively entails P and P is true, then T is usually more
probable than it was before. Indeed, that is just the sort of reasoning that
was explained above, but as will be seen, the ins and outs of it are not
precisely obvious or self-evident. Also, one needs special assumptions of a
probabilistic nature, for the inference that "if T deductively entails P and P
is true, then T is true" is the well-known deductive fallacy of confirmation,
which indeed is a fallacy in deductive logic. (Sections)
(4.2) Learning from experience
Nearly everything in the above was
motivated by a desire to explain how we can rationally explain our experiences,
and thereby to explain, at least in general principle, how we can learn from
experience.
Much of the results illustrate
Bishop Butler's saying "Probability is the guide to life". In the foregoing
sections I have shown what are the general logical principles and assumptions
that go into scientific explanations, and have shown how much these depend on
and involve probability.
In fact, there is much more to
elementary probability theory, for we can also use it to prove principles like
the following ones:
|
Confirmation:
|
The probability of a theory increases as its
consequences are verified.
|
|
Support:
|
The probability of a theory increases as relevant
circumstances are verified.
|
|
Competition:
|
The probability of a theory increases as its
competing theories are falsified.
|
|
Undermining:
|
The probability of a theory decreases as its
assumptions are falsified
|
In the present paper I shall
not provide these proofs, but the reader can try for himself, or consult
Polya's books in the Bibliography, or my "Classical Probability Theory and
Learning from Experience". (Sections)
(4.3)
The logical status of the inductive condition
The basic
reason to assume the inductive postulate is that one needs some assumption to
deal with the very many facts that are true besides the theory one is interested
in testing, since each of these very many facts may be relevant to the truth of
the fact one is interested in, and so that it seems a good demand to make of a
theory to be true that it should truly and fully entail all it is relevant to.
Also, in fact this postulate seems to be
necessarily true if human beings can come to know nature by testing such
theories as they have, for all such tests must include knowledge of what is relevant to
what is tested and in what degree it is relevant and also of what is irrelevant to
it, for relevancies and irrelevancies are facts that are as real as the facts
they concern. In brief: one just cannot rely on any experimental evidence
if one cannot rely on one's abstraction from much of the surrounding factual
details as irrelevant, which is necessary in any experiment.
On the other hand, one cannot normally prove in
complete or even considerable detail that any given theory that is to be tested
in fact does correctly entail all that is relevant to it and does not
entail as relevant anything that is in fact irrelevant. (Indeed, normally only a
few known relevant factors are listed in any report of a scientific experiment
together with an indication how these have been dealt with in the experimental
set-up. Yet any design of experiments must involve assumptions about factors
that are relevant and that are irrelevant to what is to be tested.)
But since true theories must properly entail
the true degrees of relevancies of their predictions, all one can do is to
assume that one's theories do so, and to take care of all relevancies one does
know.
So the inductive postulate seems not so much a
truth about nature as a truth about the ways and procedures human beings use to
discover the truth about nature, and one which is true to the extent human
beings have true theories about nature, for true theories must satisfy the
inductive postulate, even if no human being is able to survey all of the
universe and establish all its presumed relevancies and irrelevancies are
factually correct. And indeed the inductive postulate seems safe and warranted
in the sense that any probabilities it introduces can be - and usually will be -
rationally corrected and adjusted by later evidence. Also it suggests a reason for experiments
that fail or turn out unexpected results: One may have disregarded as irrelevant
some factor that is relevant i.e. one may have falsely assumed that one's theory
T satisfied the inductive postulate. Finally, the inductive postulate is needed
because in any experimental test of a theory we need to abstract from very many
accompanying circumstances. (Sections)
(5) Summary
and discussion
I have in this paper in
section
(1) proposed, stated both informally and formally, and discussed in some detail
ten conditions which explanations must satisfy to be called rational or
scientific. To my knowledge, the key postulates here, namely what I call the
abductive and inductive conditions are new, as are the notions of representing
and, especially, representing symbolically and numerically.
In section (2) I have discussed
various interpretations and axiiomatizations of probabibility theory, and shown
how the condition of representing symbolically and numerically entails the
standard axioms for probability.
In section (3) I have discussed
abduction, including Peirce who first saw its fundamental importance; and the
question what the logical status of my abductive condition is: Much like an
assumption we must make in order to explain rationally, but which also is
rationally corrigible by the evidence whenever it has been made.
In section (4) I have discussed
induction; compared my usage of the term with what used to be common in
statistics; noted that once we have probability theory we can explain human
learning from experience in a much better way than is possible in standard logic
without probability; and considered the question what the logical status of my
inductive condition is: Again, much like an assumption we must make in order to
learn experimentally from experience, but which also is rationally corrigible by
the evidence whenever it has been made, since an excellent reason for the
failure of a theory is that it was falsely assumed it satisfies the inductive
condition - in which case the theory fails to imply the relevancy of certain
facts which are in reality relevant to whatever it attempts to explain.
(Sections)
Maarten Maartensz
Amsterdam, June 2004
Bibliography:
- Ernest Adams:
- Mario
Bunge: Treatise on Basic
Philosophy
- Arthur Burks: Chance, Cause and Reason
- Terence Fine: Theories of
Probability
-
Klausner - Kuntz:
Philosophy - The study of alternative beliefs
-
David Hawkins:
The
language of nature
-
Paul Halmos: Naive Set Theory
-
Paul Halmos: Measure Theory
-
C.S. Peirce: Collected Papers
-
G. Polya: Principles of Plausible Reasoning
-
G. Polya: How to solve it
-
Bertrand
Russell: History of Western Philosophy
-
Bertrand
Russell:
Problems of Philosophy
-
Bertrand
Russell: Human Knowledge - Its scope and limits
- Wolfgang
Stegmüller: Probleme und
Resultate der Wissenschaftstheorie und Analytische Philosophie
- R. Weatherford: Interpretations of
Probability
- W.G. Wood & D.H. Martin: Experimental Method
Notes:
Note 1: Here I
introduce a convention that can be formulated in general terms as: A condition
in a universally quantified implication may be written inside the universal
quantifier. Thus (xЄX)(XiaX)(xЄXi
IFF f(x)Єf(Xi)) IFF
(x)(Xi)(xЄX
&XiaX
->xЄXi IFF f(x)Єf(Xi))
This convention will be used in the rest of this paper where appropriate,
since it results in clearer and shorter formulas. Back.
|