Abstract: This paper clarifies some of the
fundamental logical principles of scientific explanation. It does so by
giving ten conditions for scientific explanation that are then
explained and commented, after which some more details are given about
probability, abduction and induction. The paper presupposes some
knowledge of set theory and of elementary probability theory. The
requisite knowledge can be found in Halmos and Adams, or in Stegmüller
as listed in the literature.
Sections:
(0) Introduction
(1) On explanations
(1.1) There is a domain D, described by some
set theory
(1.2) There is a language L
that represents D
(1.3) Language L represents
set D symbolically and numerically
(1.4) There is a set K of
factual statements properly contained in L
(1.5) There is a set F of
counterfactual statements properly contained in L
(1.6) T is properly
contained in L
(1.7) T has no consequences
in F
(1.8) T deductively entails
something about D in L that is not in K and not in F
(1.9) T satisfies the abductive condition
(1.10) T satisfies the
inductive condition
(2) On probability
(2.1) Measure theory and
probability theory
(2.2) Six kinds of
axiomatizations
(2.3) Five kinds of
Interpretations
(2.4) Derivation of
Kolmogorov's axioms from proportions
(3) On abduction
(3.1) Peirce on abduction
(3.2) The logical status of
the abductive condition
(4) On induction
(4.1) Statistical induction
and Bayesian reasoning
(4.2) Learning from experience
(4.3) The logical status of
the inductive condition
(5) Summary and discussion
See also: Fundamental principles of valid reasoning;
The measurement of
reality by truth and probability; and Classical
Probability Theory and Learning from Experience.
Internet
note: The fonts used are Verdana and StarMath.
(0) Introduction:
What is a
scientific explanation?
This paper
provides an answer to this question, and does so in logical
settheoretical terms, by stating, proposing and explaining a number of
conditions that a theory should satisfy to be considered a minimal
scientific explanation.
Here I take
"theory" as "a set of statements that describes some thing in some
presupposed domain of things, about which there may be much or little
knowledge"; I take "knowledge" as "true statements, verified by some
methods that are taken for granted"; and I suppose the reader has some
familiarity with standard settheory.
The intuitive ideas about explanations I start from are these:
Explanations
are sets of statements, about some thing that is somehow characterized
and that is situated in some set of things somehow characterized. Valid
scientific explanations are consistent with the evidence, not
false, entail deductively at least part of the evidence, and can be
confirmed probabilistically and be refuted.
And the
intuitive ideas about reasoning I start from are these:
There are three basic
kinds of reasoning, where reasoning involves argumentation of any kind
using assumptions and inferences of conclusions:
1. Deductions:
To find conclusions that follow from given assumptions
2. Abductions: To find assumptions from which given conclusions follow
3. Inductions: To confirm or infirm assumptions by showing their conclusions do (not) conform to the observable facts.
Normally in reasoning
all three kinds are involved: We explain supposed facts by abductions;
we check the abduced assumptions by deductions of the
facts they were to explain; and we test the assumptions
arrived by deducing consequences and then revising by inductions the probabilities of the
assumptions by
probabilistic reasoning when these consequences are verified or falsified.
And
indeed we shall see that all three kinds of reasoning are involved in
explaining.
The
conditions I want to propose are as follows, with my explanations
following in as many sections.
(1) There
is a domain D, described by some set theory S(D)
(2) There is a language L
(3) Language L represents set D symbolically
and numerically
(4) There is a set K of true statements
properly contained in L
(5) There is a set F of false statements
properly contained in L
(6) There is a set T of theoretical statements
properly contained in L
(7) T has no deductive consequences in F
(8) T entails something about D in L that is
not in T and not in F
(9) T
satisfies the abductive condition
(10) T satisfies the inductive
condition (Sections)
(1) On explanations:
Informally
and intuitively every human being who can speak can explain, and knows
in general terms how and when to do so:
One sets up
explanations for presumed facts one does not know how to deduce from
such knowledge one has, and one does so by inventing a number of
assumptions from which one can deduce what one seeks to explain.
The basic
difference between scientific and other explanations is that
nonscientific explanations, even if they have been formulated
carefully, tend to remain merely plausible in subjective terms and tend
not to rely on or refer to explicit principles of inference, for which
reason it is difficult to criticize them rationally because many of the
assumptions that are used to generate or defend the explanations are
either not explicitly formulated or are not rational to start with (but
often instead kinds of wishful thinking), and because often it is not
quite clear what the precise logical relation between assumptions and
conclusions is. (Sections)
(1.1) There is a domain D, described by some set
theory:
D is the
collection of things that comprises what the theory T is about, and is
supposed to be describable by and also to some extent truly described
by some standard set theory.
The main
logical import of being described by some standard set theory is that
then one has by way of the power set axiom a way of talking about all
logical possibilities contained in D. (The power set axiom asserts the
existence of the set of subsets of any set X. One standard formulation
is: (X)(jY)(Y={Z: ZaX}).)
And the
main logical import of having D is that it consists of statements of
presumed factually true statements about what the theory T is about,
and about what may be related or relevant to this.
It is not
necessary that one has much true knowledge about D, but one needs at
least some, for without it one has no presumed facts to reason about or
to subject to investigation and experimentation, and hence no basis for
a scientific explanation. (Sections)
(1.2) There is a language L that represents D:
One needs
some way of formulating a theory about some things in D, and so one
needs some language L to do this in, and one also needs to specify in
what way L does describe things in D.
Of course,
the more clearly and precisely this language has been formulated, the
better is one able to reason with it and write logical arguments in it.
In what follows L is supposed to be some standard set theory.
The main reason to choose set theory as the language of L is that it
makes the present treatment the simplest, and because (almost) any
other formal language one could choose in its stead may be translated
into set theory.
Also,
choosing set theory as the language L makes it easy to state what the
basic sense of representing is.
In general
terms, if something A represents something B that means that one can
infer some properties, relations or things in B from some properties,
relations or things in A. Thus, a menu represents the courses of a
diner, and a map represents the lay of the land.
What we
need in general terms can be written as "r(X,Y)" for "X represents Y"
and can be defined thus if X and Y are sets:
r(X,Y) IFF (Ef)(f : X > Y & (xЄX)(XiaX)(xЄXi
IFF f(x)Єf(Xi)) (Note 1)
In words:
Set X represents set Y if and only if there is a function that maps the
elements and sets from X to those of Y that preserves the
elementrelation. (Sections)
(1.3) Language L represents set D symbolically and
numerically:
Here we
have arrived at the main formal assumption of this paper, which may
strike the reader as a rather technical settheoretical definition of
what is involved in representing something symbolically and numerically.
This is
mostly appearance, for what follows adds those assumptions to the
presumed property of representing that allows one to represent with
symbols and represent numerical information.
We write "rsn(L,D)"
for "L represents D symbolically and numerically" and use D*
for the powerset of
D, i.e. D*={Z: ZaD}, and D_{i }
and D_{j }
for subsets of D:
rsn(L,D) IFF (je)(j#)(jp)
(r(L,D)
&
e : L >
D* &
# : D* >
N
&
p : D* >
R
&
D = e(T_{i}) U (e~T_{i})
&
e(T_{i}) = e(T_{i}&T_{j}) U e(T_{i}&~T_{j})
&
#(D_{i})= #(D_{j}) IFF (Ef)(f : D_{i} 11 D_{j})
&
#(D) = #(D_{i}) + #(D_{i})
&
#(D_{i}) = #(D_{i}OD_{j}) + #(D_{i}OD_{j})
&
p(D_{i}) = #(D_{i})
:
#(D)
&
p(D_{i}D_{j}) = #(D_{i}OD_{j})
: #(D_{j}))
In words,
the above reads as follows, where the general motivation for the whole
conjunction of conditions is that if we want to formulate claims in L
about parts of D then we must somehow correlate the terms and
statements of L with the things and sets in D, and the above does just
that by assuming three functions with certain properties it also states
that state the conditions for the cardinal number of a set and the
proportions of cardinal numbers.
(je)(j#)(jp): There are a function e, say extension,
a function #, say number, and a function p, say proportion.
These will
concern, by the assumptions that follow, respectively the set of things
a statement represents; the cardinal number of that set; and relative
proportions that can be formed using cardinal numbers.
r(L,I):
Language L represents domain D.
This was
defined above.
e : L
> D*: extension maps the terms and statements of L to
the subsets of D, and
# : D* > N: number maps the subsets
of D to the natural numbers, and
p : D* > R: proportion maps the
subsets of D to the real numbers.
Note that
for extensions we just two basic assignments for a statement, namely
e(T_{i})=Ø and e(T_{i})≠Ø, which will be used below to define truth.
Number refines this by enabling us to use the cardinal number of a set,
and proportion uses this to lay the foundation of probability in terms
of cardinal numbers.
D =
e(T_{i}) U (e~T_{i}): D is the union of the extensions of a term its negation, and
e(T_{i})
= e(T_{i}&T_{j}) U e(T_{i}&~T_{j}): the extension of a term is the union of the
extensions of its conjunctions with any term and its negation.
The
properties given for extensions guarantee that the denials,
conjunctions and disjunctions in L are preserved respectively
settheoretically as complements, intersections and unions in D. Also,
extension will enable us to define truth, which is done below when
considering representing.
#(D_{i})
= #(D_{j}) IFF (Ef)(f : D_{i} 11 D_{j}): Two subsets have the same number iff there
is a 11 function between them.
This is
fairly called "Hume's Principle", in words say: two subsets have the
same number iff there is a 11 function between them, because the
philosopher David Hume was one of the first to see that the righthand
side can be used to define the left hand side, and thus gives a start
to explain what number is. Also, the number that thus gets defined is
known as the cardinal number of the set, and will be used
below to define proportion which can be used to define probability.
#(D) =
#(D_{i}) + #(D_{i}): The number of D is the sum of the numbers of any subset of
D and its complement, and
#(D_{i})
= #(D_{i}OD_{j}) + #(D_{i}OD_{j}): The number of a subset of D is the sum of
the numbers of the intersections of any subset and its complement.
The
properties given for numbers guarantee that the number of the domain
equals the sum of the numbers of any subset and its complement. Thus
complements, intersections and exclusive unions are preserved
numerically in terms of subtractions and additions.
p(D_{i}) =
#(D_{i}) : #(D): The
proportion of a subset is its number divided by the number of D, and
p(D_{i}D_{j})
= #(D_{i}OD_{j}) : #(D_{j}): The conditional proportion of a subset D_{i}
in a subset D_{j}
is the number of their intersection divided by the number of D_{j}.
Note that
in fact proportion is defined using the cardinal numbers of the subsets
of the domain plus the ordinary rules for arithmetic that one has with
cardinal numbers. Also p(.) is so much like standard probability that
it entails the standard axiomatization of probability, as will be shown
below. The point to notice here is only that proportion is definable in
terms of number. (Sections)
(1.4) There is a set K of true statements properly
contained in L:
(jK)(KaL &
K≠L & (KiЄK)(p(e(Ki))=1))
There is a
set K of true statements properly contained in L. Why these statements
are supposed to be true is not a directly relevant concern. One must
assume something, and indeed K is named K because it is taken to
contain all background knowledge. And what is assumed is supposed to be
true unless and until one knows that what is assumed entails something
false, to which we turn to in the next condition.
Note (1.4)
covers statements of probability as well, namely in the form p(X=x)=1,
and that K also includes whatever is known about D. In brief, K
consists of all background knowledge
one assumes and may use. And normally K contains much
more than any specific theory one adds hypothetically to it. (Sections)
(1.5) There is a set F of false statements properly
contained in L:
(jF)(FaL & F≠L
& (FiЄF)(p(e(Ki))=0))
There is a set F of false statements properly contained in L. These are
supposed to be false. Why these statements are supposed to be false is
not a directly relevant concern. But one must be able to contradict
some theories, and indeed one has a start for F by taking a statements
from K and putting "~" before them.
Together
(1.4) and (1.5) assert that whenever we propose a theory this is done
in a context of presumed knowledge. This makes both intuitive sense,
for it is hard to see how or why one could propose theories without any
background knowledhe, and logical sense, because we need to appeal to
background knowledge to check out how good our proposed theories are in
explaining the facts as we know them.
It is also
worth remarking that what belongs to K and F is presumptive knowledge,
which itself may be qualified or withdrawn in the face of new evidence.
(Sections)
(1.6) T is properly contained in L:
(TaL
& T≠L)
T is what
one abduces i.e. proposes as explanation. There are conditions on it
below, but the general intuitive reason is obvious: One sets up a
theory made up from assumptions if such knowledge as one has does not
allow one to deduce certain facts one does want to explain, where as
the assumptions in the theory together with one's background knowledge
does allow one to deduce the facts one wants to explain.
Note also
that the present condition implies that T is not inconsistent, for if
it is it implies and contains all of L, and that it is not assumed T is
true. (Sections)
(1.7) T has no false consequences:
~(jC)(T
= C & CЄF)
Though when a theory is proposed one does not know it to be true, one
must know that it is not refuted by such facts as one knows, and this
(1.7) expresses.
This is a
minimal condition on theories, and it is used to refute any theory
which does have false consequences. (Sections)
(1.8) T explains something about D in L:
This is another minimal condition on theories, which is formulated here
in probabilistic terms: There is a statement S about D in L such that
K&T is positively relevant to S and such that S at least as
probable as not on K&T and S follows from K while the probability
of T on K is the the probability of T on K&S.
Here we
must mention a logical point and introduce some notation.
The logical
point is that the proportion we introduce is so much like probability
that we may use it as probability. This will be shown below. The
notation to be introduced relates to this and is as follows:
Convention:
p(S')=x =def (jSЄL)(jTЄL)(jDiaD)
(S'=Di=e(S) & #(e(S)):#(D)=x )
Note that
given two statements S and T the assumptions we made entail that the
righthand side of the definition is true , and that we introduce
accents to maintain a notational reference to the statements the sets
on the left are the extensions of. This convention will be followed
where appropriate in the rest of this text.
Assuming
this, (1.8) may be formalized as follows:
(ESЄL)
(p(S'K'&T')>p(S'K'&~T') &
p(S'K')=1)
Note the
simplest version of (1.8) is (EX')(K'&T' = X' & X'ЄK) i.e. T
does entail some consequence that is known to be true. Also note this
needs only to apply to all of T, and not necessarily to parts
of T.
However,
all that is needed is merely that K&T makes S more probable than
K&~T i.e. K without T. (Sections)
(1.9) T satisfies the abductive condition:
What we
have so far by our conditions is a theory T that is consistent, not
false, and that explains at least something. But we need more to
confirm T, and specifically we need a probability for T.
Such a
probability does not follow from what we have assumed so far, for while
we know from the assumptions we made that T represents some set in D we
do not know what its (relative or absolute) size is.
This
requirement may be put in words as follows: The probability of a
theory T on background knowledge K is the probability of its least probable
known proper consequence on K.
This is an
assumption, which needs a justification. This comes in several parts.
First,
there is the point that it amounts to a strengthening of the following
theorem of any standard formal probability theory:
(T)(Q) (T = Q > p(T')Rp(Q') )
That is:
For any statements T and Q, if T does entail (explain) Q, then the
probability of T is not larger than the probability of Q.
The
abductive condition strengthens this inequality to an equality in the
case that Q' is the least probable of T's known consequences, and it
does so to obtain a probability for T'  that then can be changed by
any incoming evidence by using Bayes' Theorem.
Bayes'
Theorem is a rather elementary theorem of formal probability theory,
which may be written like
p(T'Q') = p(Q'T').p(T')
and read as
"The probability of a theory given evidence Q' equals the probability
of the evidence given the theory times the probability of the theory"·
Clearly, this requires a probability for T, and the abductive condition
gives that, and gives it based on the available evidence.
Second,
what is used in the abductive condition is not the above theorem as
applied to any consequence of T, since this would also cover everything
in K, but only the proper consequences of T on K. This is defined as
follows:
QЄpc(K&T) =def (K&T=Q) & ~(K&~T=Q)
which is to
say that Q is a proper cxonsequence of background knowledge K and
theory T iff Q follows from K&T but not from K&~T.
Third, a
formal version of the abductive condition looks thus:
p(TK)=x IFF (jQeL) (p(QK&T)=x &
QЄpc(K&T) &
(S)(SЄpc(K&T) > p(SK&T)≥p(QK&T) )
The
probability of a theory T on background knowledge K equals x precisely
if there is a statement Q about D in L such that the probability of Q
on K and T is x and Q is a proper consequence of K&T and every
other known proper consequence of K&T is at least as probable as Q
on K&T.
Note that
this is "so far as known" c.q. "so far as such proper consequences have
been derived". Thus, it may happen someone refutes your theory or makes
it very improbable by showing that on the background knowledge you both
share it can be proved that your theory entails a far less probable
proposition than you originally believed. But this is as it should be:
Reasoning logically consistent with such evidence as one has. (Sections)
(1.10) T satisfies the inductive condition:
It would
seem we now have all we need to apply Bayesian confirmation to a theory
T, but this is a  very common  mistake. At this point we have
probabilities for consequences of theories and for theories, but to
suppose one can now as it were automatically apply Bayesian reasoning
is to forget that any actual application of Bayes' Theorem happens in a
context in which, besides the evidence Q' for theory T', many other
things happen to be true, all of which may be probabilistically
relevant.
Thus, if T
is our theory and P something it predicts that may be used as a test of
T, then if P is true there will always be other truths, like Q, that
may seem wholly unrelated to whatever T explains, but also might in
reality be relevant. So what is needed is some way to move from
p(TK&Q&P) to p(TK&P).
Therefore
we need some condition that allows us to abstract from whatever facts
that happen to be the case when testing a theory that are not relevant.
In words, it amounts to the following:
Theory T on background knowledge K is adequate in
entailing all that is relevant to whatever it explains.
To state
this in a convenient formal format, it is helpful to introduce three
definitions that belong to standard probability theory
PrTK =def p(PK&T)>p(PK&~T)
QiPK =def p(QK&P)=p(QK)
QiTK&P =def
p(QK&T&P)=p(QK&P)
The first defines what it is for T to be relevant to P given K,
namely if P on K&T differs from P on K&~T. This is what one
needs for a prediction to be of any probabilistic use. The second
defines what it is for Q to be
irrelevant to P given K, namely if Q on K&T is the same as Q
on K&~P. (The definition states a logical equivalent of this,
because that is more convenient in the argument that follows.) And the
third defines what it is for Q to be
irrelevant to T given K&P, and is just along the same lines as
the previous definition, except for conditionalizing on a conjunction.
Using these
definitions, the inductive condition we shall need is this:
( QЄL)(PЄL)(PrTK > QiPK
IFF QiTK&P)
This reads
as: For all statements Q and P in L about D if S is relevant to T on K
then Q is irrelevant to S on K IFF Q is irrelevant to T on K&S.
This can be seen to amount to the above verbal claim that theory T on
background knowledge K is adequate in entailing all that is relevant to
whatever it explains, when we reflect that relevance is the denial of
irrelevance, for then we see that the condition says that whatever is
relevant in fact to a prediction of T also must be relevant given both
T&P and conversely that if the theory entails something is
relevant, then indeed it is relevant in fact.
If we
expand the definitions what we obtain is
(Q)(S)(p(SK&T)>p(SK&~T)
> p(QK&S)=p(QK) IFF p(QK&T&S)=p(QK&S)
This allows
us to disregard as irrelevant all statements that are not entailed by T
as relevant, and thus enables us to use Bayes Theorem of confirmation.
Namely as follows, and supposing PrTK and QiPK:
p(TK&Q&S)
= p(QK&T&S)*p(K&T&S) : p(QK&S)*p(K&S)
= p(QK)*p(K&T&S) : p(QK)*p(K&S)
= p(K&T&S) : p(K&S)
= p(TK&S)
In this
argument the inductive condition is applied in the second line, and the
rest of the argument only involves standard probability theory.
Obviously, this delivers what we wanted: We can wholly abstract from Q
if T does not entail it is relevant. (Sections)
(2) On probability:
There are
many interpretations of probability and also quite a few
axiomatizations of probability. The main reason for the differences in
interpretation and axiomatization is that "probability" has many
meanings and many uses.
It is not
possible to chart these meanings and uses adequately in less than a
book's length of text, and I shall not even attempt to do so. All I
want to do in this section is to note what is the currently standard
mathematical approach to probability; to briefly discuss three kinds of
axiomatizations; introduce four interpretations; and to show how the
standard Kolmogorov axioms may be derived from the assumption that L
represents D symbolically and numerically, as this has been defined
above. (Sections)
(2.1) Measure theory and probability theory:
The current
mathematical approach to probability theory is to see probability
theory as a part of measure theory, which is a settheoretical tool to
measure sets, to represent sums, and to explain integration (which
concerns infinite sums). It is mathematically the clearest approach to
probability, and much besides it like the theory of integration. An
excellent reference is P. Halmos, Measure Theory.
(Sections)
(2.2) Six kinds of axiomatizations:
Broadly
speaking  and see Fine and Stegmüller for much more
information  there are three kinds of axiomatizations, some of which
fit easily into a measuretheoretical framework and some of which don't
Qualitative
or quantitative: Standard mathematical probability theory is
numerical, and concerns equalities and inequalities of probabilistic
formulas which are (usually) mapped to the real numbers between 0 and 1
inclusive.
The
problems with this numerical approach are that quite a lot of intuitive
reasoning with probabilities is not quantitative but qualitative  such
and such is more probable than so and so, but one cannot say by how
much, or one merely knows that one considers an event more probable
than not, without knowing how probable  and that often there are
no good measurements of probabilities or no measurements possible.
Therefore, there have been several proposals of axiomatizations of
qualitative probability.
Absolute or conditional: Standard mathematical
probability theory concerns absolute probabilities, and then adds
conditional probabilities as an apparent afterthought and by way of
definition, like proportion was introduced above.
The problem
with absolute probabilities is that the vast majority of the
probabilities one considers in everyday life or in science are somehow
conditional. Therefore, there are several proposals axiomatizations in
which conditional probability is basic.
Finitistic or infinitistic: The probabilities one
meets in everyday life, such as are related to cointossing,
dicethrowing, cardplaying and gambling, are finite in a plausible
sense, in that there are finitely many possible outcomes.
One can
deal with the finite case by algebra and by combinatorial theory, but
most interesting cases in physics, and most statistical applications of
probability theory require sums of infinitely many possible outcomes.
The problems that arise here are currently normally stated and solved
in a measuretheoretical context. (Sections)
(2.3) Five kinds of Interpretations:
There have
been many interpretations of what probability is. I will sketch five,
namely two more or less oldfashioned ones; two currently fashionable
ones; and my own theory of cardinal probability, that is compatible
with the last two.
Again, to
treat the subject of the interpretations of probability well, one needs
at least a book, and so I will limit myself to the four interpretations
that make some sense, and refer the reader to Weatherford for a
booklength survey of the field, followed by my own theory, which will
be considered in some more detail later on in this paper.
Logical interpretation: The logical interpretation
seems to be the oldest interpretation if what probability is, and is
often rendered as "probability is the ratio of selected cases to
possible cases". Thus, the probability of throwing a 4 with an ordinary
die is 1/6, and the probability of throwing an even number with an
ordinary die is 3/6.
As will be
seen, there is an underlying assumption that all possibilities that are
distinguished count for the same and as one, which much simplifies the
treatment of problems involving probability, but cannot easily or at
all deal with weighted dice or different and unpredictable length of
life.
Empirical interpretation: The empirical
interpretation soon followed the logical interpretation, and tends to
look at the actual frequencies with which distinguished possibilities
in fact happen as information for what the probability of an event is.
This works
well in practice with subjects where one can easily establish
frequencies or samples, but this also makes it difficult to say what
probability really is, both for such things as have frequencies, since
these may change and anyway are partial information, and for such
things as have no frequencies, like unique events and future
events.
Both the
logical and the empirical interpretation are somewhat oldfashioned,
though one still meets the empirical interpretation in social
statistics.
Objective interpretation: This can be best
rendered in the form of two claims, namely (1) there is real chance in
the world, in the form of chance processes and chance events in
physics, and real contingency in life and free choice and (2)
probability theory provides the tools to represent its basic properties.
This can be
seen as motivated by physics: According to quantummechanics there are
real chance processes in nature. Until the rise of quantummechanics,
all physical theories were deterministic, and probabilities only
entered because one nearly always has incomplete knowledge and samples
of populations. With the rise of quantummechanics this supposed
determinism of nature had to be given up.
Subjective interpretation: There are various
subjective or personal theories of probability. One way of rendering
their intuitive basis is in terms of two claims:
(1) Persons have their own personal estimates of probabilities, which,
if they are consistent, indeed behave according to probabilitytheory,
and (2) these probabilities can be used for Bayesian confirmation.
This does
justice to the fact that different persons may have different estimates
of what is the probability of something, and enables each person to
recalculate his original probabilities when given new evidence.
The first
claim can be spelled out in quite a few different ways, based on
different considerations, but these will not occupy us here since we
assume its conclusion anyway.
It is
especially the second claim which makes subjective interpretations
useful. The reason is that while Bayes' Theorem is a rather elementary
theorem of formal probability theory, applying Bayes' Theorem requires
that one has p(T), and this one does not have on the standard
nonsubjective interpretations, for whatever theories represent, these
things cannot be counted like cherries, and anyway will at least start
to be largely unknown for new theories.
The reason
one does not have this on the standard nonsubjective interpretations
is a fundamental lack of knowledge about the hypothesis T. And the
reason one does have this on subjective interpretations is that then
one may make any guess about the probability of any statement, provided
only it is consistent with one's further assumptions. The setback of
this is that if this is wholly subjective, one can in principle fix it
so that almost any evidence will have hardly any effect on it. Thus,
not only need subjective probabilities not be based on the evidence,
but they also can be chosen so extreme as to make almost any evidence
have almost no effect.
Cardinal
interpretation: The interpretation of probability I propose I call the
cardinal interpretation, because it rests on the existence of
cardinal numbers, which are guaranteed by nonprobabilistic
assumptions, namely, those given for extension and number and which
exist anyway.
Hence there
always will be some probability for any statement, and this probability
will exist objectively because it derives from the cardinal numbers of
the sets that are involved.
One
setback is that normally one does not know the cardinal probability,
though one can normally establish evidence for such statements as
represent things that can be counted empirically, rather as in the
empirical interpretation of probability.
Another
setback is that one cannot count the things that are represented by a
theory. The way to solve that problem is to make an assumption about
the probability of a theory that is consistent with the rest of
probability theory, and does not depend on personal whim but on logic.
It is what I called the abductive condition: The probability of a
theory T on background knowledge K is the probability of its least
probable known proper consequence on K.
This was treated above, and all that needs to be remarked here is that
it amounts to a strengthening of the following theorem of any standard
formal probability theory:
(T)(Q) (T = Q > p(T')Rp(Q') )
That is:
For any statements T and Q, if T does entail (explain) Q, then the
probability of T is not larger than the probability of Q.
The
abductive condition strengthens this inequality to an equality in the
case that Q is the least probable of T's known consequences, and it
does so to obtain a probability for T  that then can be changed by any
incoming evidence by using Bayes' Theorem.
The
cardinal interpretation of probability is compatible with the objective
interpretation, and is like the subjective interpretation in enabling
the use of Bayesian confirmation, but it does not make this subjective,
though it does make this dependent on such evidence as one has,
including such consequences of the theory one has established.
Note it
also has the interesting consequence that wherever we have a domain of
sets we have implied probabilities for the sets, which exist as much as
do the cardinal numbers of these sets  but that very often we don't
have enough information to determine these cardinal numbers, and
accordingly the best we can do is to make a guess about it, and try to
confirm or infirm that guess by evidence. (Sections)
(2.4) Derivation of Kolmogorov's axioms from
proportions
It is an interesting
fact that the standard axioms for finite probability theory, that were
first stated by Kolmogorov, can be derived from the assumptions in
section (1), and especially those in (1.3).
The derivation is given
in this section, which you may skip if you believe the result anyway.
These
standard Kolmogorov axioms for probability are normally stated in such
terms as:

Kolmogorov axioms for probability
theory:


Suppose that $ is a set of
propositions P, Q, R etc. and that this set is closed for negation,
conjunction and disjunction, which is to say that whenever (P e $) and
(Q e $), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a
function that maps the propositions in $ into the real numbers in the
following way, that is, satisfying the following three axioms:



A1.

For all P e $ the probability of P,
written as pr(P), is some nonnegative real number.

A2.

If P is logically valid, pr(P)=1.

A3.

If ~(P&Q) is logically valid,
pr(PVQ)=pr(P)+pr(Q).

To derive Kolmogorov's standard axioms for finite
probabilities from the proporties for proportions assumed in (1.3), it
is convenient to first state three simple theorems about proportions:
Theorem 1: D_{i}=D_{j}
> #(D_{i})=#(D_{j})
Proof: By D_{i} a
Dj > #D_{i}<=#D_{j}
Theorem 2: p(D_{i}D_{J}) = p(D_{i}OD_{j})
: p(D_{j})
Proof: p(D_{i}D_{j}) = #(D_{i}OD_{j}):#(D_{j})
= (#(D_{i}OD_{j}):#(D)):((D_{j}):#(D)) =
p(D_{i}OD_{j}):p(D_{j})
Theorem 3: p(D_{i})
= p(D_{i}D)
Proof: p(D_{i}) = #D_{i} : #D = #D_{i}OD : #D
= p(D_{i}D)
Second, since we have
something much like probability, we can use what we have to define
truthvalues, namely as follows
v(T'_{i})=1
IFF p(T'_{i})>0 & p(~T'_{i})=0
v(T'_{i})=0 IFF v(T'_{i})≠1
What we have done here
amounts in fact to identifying the truthvalues with the two extremes
of a proportional or probabilistic distribution, which we have from the
assumptions made in (1.3).
Third, we use the
definition of truthvalues to derive three results, which we will need
below.
Theorem 4: P1:
v(T_{i})=1 > p(T'_{i})=1 &
P2: v(T_{i} > T_{j})=1 > p(T'_{i})
<= (T'_{j}) &
P3: p(T'_{i})=p(T'&T'_{j})+p(T'_{i}&~T'_{j})
Proof: The first
is a direct consequence of the definitions for v(.). The third follows
from Theorems 2 and 3, which entail that if D_{i}=e(T'_{i})
and D_{j}=e(T'_{j}) then p(~D_{j}D_{i})=1p(D_{j}D_{i}),
whence p(D_{i})=p(D_{j}D_{i})p(D_{i})+p(~D_{j}D_{i})p(D_{i}),
whence p(T'_{i})=p(T'&T'_{j})+p(T'_{i}&~T'_{j}). And
now the first and third entail the second, for by the first v(T_{i}
> T_{j})=1 > p(T_{i} > T_{j})=1.
Now the consequent of this is equivalent with p(~(T_{i} & ~T_{j}))=1
whence is equivalent with p(T_{i} & ~T_{j})=0. And
this entails by the third equality that p(T_{i}) <= (T_{j}).
First, there is the fundamental theorem that permits
inferences from logical equivalences to probabilities:
T5:

 (A iff B) > pr(A)=pr(B)

Equivalent propositions have the same
probability

(1)

 (A iff B) >  (A > B)
> pr(A) <= pr(B)

P2

(2)

 (A iff B) >  (B > A) > pr(B)
<= pr(A)

P2

(3)

 (A iff B) > pr(A) <= pr(B)
& pr(B) <= pr(A)

(1),
(2)

(4)

 (A iff B) > pr(A) = pr(B)

(3), Algebra

Next,
it is proved contradictions have probability 0:
T6

pr(A&~A)=0

Contradictory propositions have zero
probability

(1)

pr(A)=pr(A&A)+pr(A&~A)

P3

(2)

pr(A)=pr(A&A)

T5 with ( A iff (A&A))

(3)

=pr(A)+pr(A&~A)

(1),
(2)

(4)

pr(A&~A)=0

(3), Algebra

It is often helpful to have in propositional logic two special
constants, such as Taut (from "tauology") and Contrad (from
"contradiction"). These are defined as: Taut iff AV~A and Contrad iff
A&~A. Taking this for granted:
T7

0 <= pr(A) <= 1

Probabilities are between 0 and 1
inclusive

(1)

 A > pr(A)=1

P1

(2)

 pr(Taut)=1

(1) and  Taut

(3)

 A > Taut

Logic

(4)

pr(A) <= pr(Taut)

(3), P2

(5)

pr(A) <= 1

(2), (4)

(6)

pr(Contrad)=0

T6

(7)

 (Contrad > A)

Logic

(8)

pr(Contrad) <= pr(A)

(7), P2

(9)

0 <= pr(A)

(6), (8)

(10)

0 <= pr(A) <= 1

(5), (9)

Next,
we need to prove the probabilistic theorem for denial. We do it in two
steps:
T8

pr(AV~A)=pr(A)+pr(~A)

Probability of disjunction of
exclusives is sum of probability of factors

(1)

pr(AV~A)=pr((AV~A)&A)+pr((AV~A)&~A)

P3

(2)

pr(A)=pr((AV~A)&A)

T5, as  ((AV~A)&A) iff A byT1

(3)

pr(~A) = pr((AV~A)&~A)

As under (2)

(4)

pr(AV~A) = pr(A)+pr(~A

(1),(2),(3)

And
now:
T9

pr(~A)=1pr(A)

Probability of denial is
complementary probability

(1)

pr(AV~A)=pr(A)+pr(~A)

T8

(2)

1=pr(A)+pr(~A)

P1 since  AV~A

(3)

pr(~A) = 1pr(A)

(2), Algebra

Next,
we have this parallel to P1:
T10

~A > pr(A)=0

Provable nontruths have zero
probability

(1)

~A

Assumption

(2)

pr(~A)=1

(1), P1

(3)

1pr(A)=1

(2), T9

(4)

pr(A)=0

(3), Algebra

The main point of T*6 and P1
is that if one can prove that A (or ~A), then thereby it follows that
pr(A)=1 (or pr(A)=0 if ~A). This is normally important in comparing
the supposed truths and nontruths one can logically infer from a
theory, with what the facts are (so that if one can prove that _{T}
A, while in fact one finds ~A one thereby has learned the assumptions
of theory T can't be all true, if the proof of _{T} A was
without mistakes in reasoning.)
Next, we need a theorem
that serves as a lemma to the next theorem, but that needs a remark
itself. The theorem is:
T11

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

Full disjunctive probabilistic sum of
two factors

(1)

pr(A)+pr(~A)=1

T9

(2)

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

(1), P3

The
promised remark is that T*7 differs essentially from the similar
theorem in CPL minus the probabilities: In CPL
([A&B]+[A&~B]+[~A&B]+[~A&~B]) is true and implies that precisely
one of the four factors is true. In PT pr(A&B)
+ pr(A&~B)
+ pr(~A&B)
+ pr(~A&~B)
is certain, but normally none of the four
alternatives itself is provably true by itself; normally none of the
four alternatives are known to be true; and
normally several or all of the alternatives will have a probability
between 0 and 1 (conforming T*3).
Indeed, a very interesting aspect of PT is that it
assigns numerical measures to all alternatives the underlying logic can
distinguish, regardless of whether these alternatives are true or have
ever been true. And part of the interest is that there normally are far
more logically possible alternatives than logically provable
alternatives.
To finish the proof CPT
indeed implies all of Kolmogorov's axioms for PT we need to derive his P3:
T12

~(A&B) > pr(AVB)=pr(A)+pr(B)

Conditional sums

(1)

~(A&B)

AI

(2)

pr(A&B)=0

T10

(3)

pr(A)=pr(A&~B)

(2), P3

(4)

pr(B)=pr(~A&B)

(2), P3, T5

(5)

pr(AVB)=1pr(~A&~B)

T5, T5
with (~(~A&~B)) iff (AVB)

(6)

=pr(A&B)+pr(A&~B)+pr(~A&B)

T11

(7)

=pr(A&~B)+pr(~A&B)

(2),(6)

(8)

=pr(A)+pr(B)

(3),(4),(7)

I
have now proved all of Kolmogorov's axioms for the finite case: A1
follows from T7; A2 is P1; and A3 is T12. (Sections)
(3) On abduction
There is
and has been a lot of confusion during many ages about the ideas for
and definitions of deduction, abduction and induction, of which the
beginnings can be found Aristotle.
The least
problematical of these is deduction, and the 19th and 20th Centuries
have produced many quite sophisticated mathematical and logical
analyses of what is involved in deduction, and how one may set up
proofs. It is easy to state the criterion a deduction C from premisses
A1 .. An must satisfy to be valid: C is a valid deduction C from
premisses A1 .. An iff it is impossible that C is not true in any case
all of A1 .. An are true.
The
matters of abduction and induction are far less clear, and indeed
Aristotle was the first to confuse them, or not clearly distinguish
them. In this section I will outline my views of abduction, and in the
next my views of induction. (Sections)
(3.1) Peirce on abduction
To my
knowledge, the first person to clearly single out abduction as an
inference, namely of assumptions A1 .. An for a given conclusion C, was
C.S. Peirce. Here is a relevant quotation:
"Abduction. (..) "Hypothesis [or abduction] may be defined as an
argument which proceeds upon the assumption that a character which is
known necessarily to involve a certain number of others, may be
probably predicated of any object which has all the characteristics
which this character is known to involve." (5.276) "An abduction is
[thus] a method of forming a general prediction." (2.269) But this
prediction is always in reference to an observed fact; indeed, an
abductive conclusion "is only justified by its explaining an observed
fact." (1.89) If we enter a room containing a number of bags of beans
and a table upon which there is a handful of white beans, and if, after
some searching, we open a bag which contains white beans only, we may
infer as a probability, or fair guess, that the handful was taken from
this bag. This sort of inference is called making an hypothesis or
abduction. (J.
Feibleman, "An Introduction to the Philosophy of Charles S. Peirce", p. 1212. The numbers
referred to are to paragraphs in Peirce's "Collected Papers".)
It is well to make some
additional points here
 Abductions indeed are
inferences.
 Abductions go beyond
the evidence.
 Apart from a number
of general conditions on the theory inferred, such as that it should be
consistent with the known evidence and background knowledge, should not
false, should deductively entail what it is meant to explain, and
should have some probability so that it can be confirmed, there is not
much that can be said about abductions. The reason follows:
 Abductions may
involve highly creative hypotheses, that open a completely new
perspective on something.
 The best texts I know
relating to abduction are by Peirce and by the mathematician Polya.
(Sections)
(3.2)
The logical status of the abductive condition
The abductive condition
amounts to the assumptions that, first there is such a thing as
the probability of a theory, and second that it may be initially and
conveniently settled by supposing this probability equals the
maximum of what it may be given in probability theory on such knowledge
as one presumes, while one also fully expects that this initial
probability will be adjusted by further reasoning and still to be
discovered new evidence.
So the abductive
condition seems not so much a truth about nature as a truth
about the ways and procedures human beings may use to discover the
truth about nature. And indeed the abductive postulate seems safe
and warranted in the sense that any probability it introduces can be 
and usually will be  rationally corrected and adjusted; that indeed it
may be increased or decreased by inductions; and that it is based on
such evidence one has.
And an abductive
condition is needed because we need to have some factually based
probability for theories that we decide are good explanations, if only
to have a start to test them inductively: Without a probability for
theories we can only try to refute them but not confirm them.
(Sections)
(4) On induction
Induction
has been widely confused with abduction, and besides the term has been
used in quite a few ways, though especially in statistics related texts
mostly as some sort of generalizing hypothesis that  especially in
statistical contexts, might be tested by some sample from some
population, some hypothesis about the distribution of some attribute in
the population, and a statistical test and calculation that finds to
what extent the hypothesis concerning the distribution is supported or
within which confidenceinterval a supposed probability derived from
such a distribution and the data falls.
Much of
this statistical theorizing is useful and has many applications, but I
rather call it by some such name as "statistical hypothesistesting",
because this indicates much better what is really involved. (Sections)
(4.1) Statistical induction and Bayesian reasoning
What I
prefer to call induction and explained above involves the use of Bayes'
Theorem, and aims at revising the probability of an assumption on the
basis of evidence. The reason to call it induction is that
indeed here one usually tests generalizations and other hypotheses that
go beyond the evidence, which is what "induction" since Hume's critique
of it is concerned with.
Besides,
there is the following rather direct relation to deduction. The most
fundamental principle of deductive inference is Modus Ponens, which may
be written thus
If T
deductively entails P and T is true, then P is true.
Now the
intuitions of many persons say that if, conversely, if T deductively
entails P and P is true, then T is usually more probable than it was
before. Indeed, that is just the sort of reasoning that was explained
above, but as will be seen, the ins and outs of it are not precisely
obvious or selfevident. Also, one needs special assumptions of a probabilistic
nature, for the inference that "if T deductively
entails P and P is true, then T is true" is the wellknown deductive
fallacy of confirmation, which indeed is a fallacy in deductive
logic. (Sections)
(4.2) Learning from experience
Nearly
everything in the above was motivated by a desire to explain how we can
rationally explain our experiences, and thereby to explain, at least in
general principle, how we can learn from experience.
Much of the
results illustrate Bishop Butler's saying "Probability is the guide to
life". In the foregoing sections I have shown what are the general
logical principles and assumptions that go into scientific
explanations, and have shown how much these depend on and involve
probability.
In fact,
there is much more to elementary probability theory, for we can also
use it to prove principles like the following ones:
Confirmation:

The
probability of a theory increases as its consequences are verified.

Support:

The
probability of a theory increases as relevant circumstances are
verified.

Competition:

The
probability of a theory increases as its competing theories are
falsified.

Undermining:

The
probability of a theory decreases as its assumptions are falsified

In the
present paper I shall not provide these proofs, but the reader can try
for himself, or consult Polya's books in the Bibliography, or my "Classical Probability Theory and Learning from
Experience". (Sections)
(4.3) The logical status of the inductive condition
The basic reason to
assume the inductive postulate is that one needs some assumption to
deal with the very many facts that are true besides the theory one is
interested in testing, since each of these very many facts may be
relevant to the truth of the fact one is interested in, and so that it
seems a good demand to make of a theory to be true that it should truly
and fully entail all it is relevant to.
Also, in fact this
postulate seems to be necessarily true if human beings can come
to know nature by testing such theories as they have, for all such
tests must include knowledge of what is relevant to what is tested and
in what degree it is relevant and also of what is irrelevant to it, for
relevancies and irrelevancies are facts that are as real as the facts
they concern. In brief: one just cannot rely on any
experimental evidence if one cannot rely on one's abstraction from much
of the surrounding factual details as irrelevant, which is necessary in
any experiment.
On the other hand, one
cannot normally prove in complete or even considerable detail that any
given theory that is to be tested in fact does correctly entail all
that is relevant to it and does not entail as relevant anything that is
in fact irrelevant. (Indeed, normally only a few known relevant factors
are listed in any report of a scientific experiment together with an
indication how these have been dealt with in the experimental setup.
Yet any design of experiments must involve assumptions about factors
that are relevant and that are irrelevant to what is to be tested.)
But since true theories
must properly entail the true degrees of relevancies of their
predictions, all one can do is to assume that one's theories
do so, and to take care of all relevancies one does know.
So the inductive
postulate seems not so much a truth about nature as a truth
about the ways and procedures human beings use to discover the truth
about nature, and one which is true to the extent human beings have
true theories about nature, for true theories must satisfy the
inductive postulate, even if no human being is able to survey all of
the universe and establish all its presumed relevancies and
irrelevancies are factually correct. And indeed the inductive postulate
seems safe and warranted in the sense that any probabilities it
introduces can be  and usually will be  rationally corrected and
adjusted by later evidence. Also it suggests a reason for experiments
that fail or turn out unexpected results: One may have disregarded as
irrelevant some factor that is relevant i.e. one may have falsely
assumed that one's theory T satisfied the inductive postulate. Finally,
the inductive postulate is needed because in any experimental
test of a theory we need to abstract from very many accompanying
circumstances. (Sections)
(5) Summary and discussion
I have in
this paper in section (1) proposed, stated both
informally and formally, and discussed in some detail ten conditions
which explanations must satisfy to be called rational or scientific. To
my knowledge, the key postulates here, namely what I call the abductive
and inductive conditions are new, as are the notions of representing
and, especially, representing symbolically and numerically.
In section (2) I have discussed various
interpretations and axiiomatizations of probabibility theory, and shown
how the condition of representing symbolically and numerically entails
the standard axioms for probability.
In section (3) I have discussed abduction, including
Peirce who first saw its fundamental importance; and the question what
the logical status of my abductive condition is: Much like an
assumption we must make in order to explain rationally, but which also
is rationally corrigible by the evidence whenever it has been made.
In section (4) I have discussed induction; compared
my usage of the term with what used to be common in statistics; noted
that once we have probability theory we can explain human learning from
experience in a much better way than is possible in standard logic
without probability; and considered the question what the logical
status of my inductive condition is: Again, much like an assumption we
must make in order to learn experimentally from experience, but which
also is rationally corrigible by the evidence whenever it has been
made, since an excellent reason for the failure of a theory is that it
was falsely assumed it satisfies the inductive condition  in which
case the theory fails to imply the relevancy of certain facts which are
in reality relevant to whatever it attempts to explain.
(Sections)
Maarten Maartensz
Amsterdam, June 2004
Bibliography:
 Ernest Adams:
 Mario
Bunge: Treatise on Basic Philosophy
 Arthur Burks: Chance,
Cause and Reason
 Terence Fine: Theories
of Probability
 Klausner  Kuntz: Philosophy  The study
of alternative beliefs
 David Hawkins: The language of nature
 Paul Halmos: Naive
Set Theory
 Paul Halmos: Measure Theory
 C.S. Peirce: Collected
Papers
 G. Polya: Principles
of Plausible Reasoning
 G. Polya: How
to solve it
 Bertrand Russell: History of Western Philosophy
 Bertrand Russell: Problems
of Philosophy
 Bertrand Russell: Human Knowledge  Its scope and limits
 Wolfgang
Stegmüller: Probleme und Resultate der
Wissenschaftstheorie und Analytische Philosophie
 R. Weatherford: Interpretations
of Probability
 W.G. Wood & D.H.
Martin: Experimental Method
Notes:
Note
1: Here I introduce a convention that can be formulated in
general terms as: A condition in a universally quantified implication
may be written inside the universal quantifier. Thus (xЄX)(XiaX)(xЄXi IFF f(x)Єf(Xi)) IFF
(x)(Xi)(xЄX &XiaX
>xЄXi IFF f(x)Єf(Xi))
This convention will be used in the rest of this
paper where appropriate, since it results in clearer and shorter
formulas. Back.
