is the very guide to life."
"Every year, if not every day,
we have to wager our salvation upon some prophecy based upon
"Life is the art of drawing sufficient
conclusions from insufficient premises."
"The actual science of logic is conversant at
present only with things either certain, impossible, or entirely
doubtful, none of which (fortunately) we have to reason on.
Therefore the true logic for this world is the calculus of
Probabilities, which takes account of the magnitude of the
probability which is, or ought to be, in a reasonable man's mind."
Clerk Maxwell, quoted by E.T. Jaynes)
theory is nothing but common sense reduced to calculation."
I continue being not well, and otherwise also as before, so I cannot do
Today I reproduce part of something I
induction, since it seems to me a good introduction to the very
basis of the mathematics of
at least for those who are willing to think for themselves and fairly
naive about both
mathematics and probability, since I
explain most and added notes for
clarity and background.
And the reason for my wanting to give some
reasonable explanations of the very
basis of the mathematics of probability is given by the opening
quotations, and by my last note, that shows why the concept of
irrelevance is so important.
Introduction to probability:
There are a number of different axiomatizations of probability,
and also at least three different interpretations of any given
axiomatization. Before considering
the different interpretations I shall give what has become in this
century the standard mathematical axiomatization of probability
theory, namely Kolmogorov's. For
the finite case Kolmogorov
proposed three axioms, which may be stated as follows:
that $ is a set of propositions P, Q, R etc. and that this set is
closed for negation, conjunction and disjunction, which is to say
that whenever (P e $) and (Q e $), so are ~P, (P&Q) and (PVQ).
 Now we introduce pr(.) as a function that
maps the propositions in $ into the real numbers
 in the following way, that is, satisfying the following
A1. For all P e $ the probability of P, written as pr(P), is
some non-negative real number.
A2. If P is logically valid, pr(P)=1.
A3. If ~(P&Q) is logically valid,
present formulation I have chosen to attribute probabilities to
propositions. This is not necessary, for probabilities may be
attributed to sets as well. Since the mathematics is the same,
and since the present reading is more convenient, I have chosen to say
that probabilities apply to propositions. There is more to be said on
this choice but as this properly belongs to the interpretations of
probability, I shall return there to
theorems: Irrespective of the axiomatization or interpretation
of probability, there are a number of important theorems which we
shall need - just as we need laws like (a+b)=(b+a) for counting,
irrespective of axioms used to prove them or of what we choose to
count. The advantage and use of axioms is that one can use them to
prove the theorems one needs - and having given a valid proof one
knows that any objection against the theorem must be directed against
the axioms, for the theorem was proved to follow from them.
So what we shall do first is to derive some useful theorems. First,
then, there is
how we can find the probability of ~P from pr(P). T1 is proved by
noting that pr(PV~P)=pr(P)+pr(~P) by A3, since ~(P&~P) is logically
valid, and also pr(PV~P)=1 by A2, since (PV~P) is
logically valid. It should be noted that here and in the rest of the
chapter I merely indicate the proofs, so that the reader can do the
rest. Next there is
T2. 0 <= pr(P) <= 1.
that all probabilities are in the interval of 0 to 1 inclusive. That
pr(~P) is not less than 0 follows from A1. Now if
pr(P) would exceed 1, pr(~P) would be less than 0 by T1,
which is a contradiction. So it follows pr(P) does not exceed 1, and
T2 now follows by A1.
T3. If P |= Q, then pr(P) <= pr (Q).
that if P logically entails Q, then pr(P) is not larger than pr(Q). It
can be proved by noting that if P indeed does logically entail Q, that
then ~(P&~Q), and so A3 entails pr(PV~Q) = pr(P)+pr(~Q). By
A1 and T2 the LS
is <= 1, and so pr(P)+pr(~Q) <= 1, from which follows the theorem on
transposing 1-pr(Q). T3 immediately entails
T4. If P is logically equivalent to Q, then pr(P)=pr(Q),
equivalence amounts to: P |= Q and Q |= P. So logical equivalents have
the same probability. This is a very important theorem, and is used
all the time.  Thus we can use it
to prove the following expansion of any proposition P:
T5. pr(P)=pr(P&Q) + pr(P&~Q), for arbitrary P
and Q in $.
noteworthy that this expansion can be repeated: pr(P&Q) + pr(P&~Q) =
pr(P&Q&R)+ pr(P&Q&~R)+pr(P&~Q&R)+pr(P&~Q&~R), and repeated ad lib.
proved by noting that T4 entails that
pr(P)=pr((P&Q)V(P&~Q)), since P iff P&Q V P&~Q, which turns into T5 upon noting that
~((P&Q)&(P&~Q)) is logically valid, and applying A3.
From T5 we can conveniently prove
extends A3, and shows how we may calculate the probability of any
disjunction. To prove this, we first note that by T4
pr(PVQ) = pr((P&Q) V (P&~Q) V (~P&Q)) = pr(P&Q) + pr(P&~Q) + pr(~P&Q)
by A3. Since by T5 pr(Q) =
pr(P&Q) + pr(~P&Q), it follows that adding pr(P)+pr(Q) and subtracting
once their common term pr(P&Q) yields pr(PVQ) i.e. T6. If we now
combine T5 and T6 we get
T7. (pr(P&Q) <= pr(P)) &
The probability of a conjunction is not larger than the probability of
any of its conjuncts, and the probability of a disjunction is not
smaller than the probability of any of its disjuncts.
conditional theorems: Most probabilities are not, as they were
in this chapter so far, absolute, but are conditional: Rather
than saying "the probability of Q = x" we usually introduce a
condition and say, "the probability of Q, if P is true, = y". This
idea, that of the probability of a proposition Q given that one or
more propositions P1, P2 etc. are true is formalised by the following
1 : pr(Q/P) = pr(P&Q):pr(P)
The conditional probability of Q, given or assumed that P is true,
equals the probability that (P&Q) is true, divided by the probability
that (P) is true. NB, as this fact
has important implications for the interpretation and application of
probability theory: A conditional probability is defined in
terms of absolute probabilities, so therefore we need absolute
probabilities to establish conditional ones. 
1 has many applications, and many of these turn on the fact that it
also provides an implicit definition of pr(P&Q), namely as
pr(P)pr(Q/P) (simply by multiplying both sides of Def 1 by pr(P)).
Consequently, we have as a theorem (if pr(P)>0 and pr(Q)>0), since P&Q
and Q&P are logically equivalent
equality is, of course, also an application of Def 1,
and T8 accordingly says that the probability of a conjunction equals
the probability of one conjunct time the probability of the other
given that the one is true.
consequence of Def 1 is
results from T5 and Def 1 upon
division by pr(P), and says that the probability of Q if P plus the
probability of ~Q if P equals 1. Of course, this admits of a statement
that conditional probabilities are like unconditional ones. A theorem
to the same effect, that parallels T3 is
T11. 0 <= pr(Q/P) <= 1.
That 0 <=
pr(Q/P) follows from D1, because the components of a
conditional are both >=0 by A1; and that pr(Q/P)<=1 is equivalent to
pr(P&Q) <= pr(P), which holds by T7. A theorem in
the vein of T4 is
T12. If P |= Q, then pr(P&~Q)=0
proved by noting that if P |= Q holds, then so does ~(P&~Q), which, by
A3, entails that pr(PV~Q)=pr(P)+pr(~Q). As by
T6 pr(PV~Q)=pr(P)+pr(~Q)-pr(P&~Q), it follows
pr(P&~Q)=0 if P |= Q. From this it easily follows that
T13. If P |= Q, then pr (Q/P)=1 provided
which is to
say that if Q is a logical consequence of P, the probability of Q is P
is true is 1. The proviso is interesting, for it denies the
possibility of inferring Q from a logical contradiction or known
falsehood. This means that the def: P |= Q =df pr(Q/P)=1
strengthens the logical "|=" by adding that proviso.
 T13 immediately follows from
T5, T12 and Def 1.
Def 1 may,
of course, list any finite number of premises, as in pr(Q/P1&....&Pn)
= pr(Q&P1&....&Pn):pr(P1&....&Pn). Such long conjunctions admit of a
theorem like T8:
that the probability that n propositions are true equals the
probability that the first (in any convenient order) is true times the
probability that the second is true if the first is true times the
probability that the third is true if the first and the second are
true etc. The pattern of proof can be seen by noting that for n=3
pr(P1)pr(P2/P1)pr(P3/P1&P2) = pr(P1&P2)pr(P3/P1&P2) = pr(P3&P2&P1)
because the denominators successively drop out by Def 1.
That the premises can be taken in any order is a consequence from
T4: Conjuncts taken in any order are equivalent to
the same conjuncts in any other order.
T11 and T13, together with
T9 and T10, show that
conditional probabilities are probabilities. We need just one further
T15. If R |= ~(P&Q), then pr(PVQ/R) =
parallels A3. It is easily proved by noting that
pr(PVQ/R) = (pr(P&R)+pr(Q&R)-pr(P&Q&R)):pr(R) by Def 1,
T4 and T6, and that pr(P&Q&R)=0
by T12 and T4 on the hypothesis. The conclusion
then follows by Def 1.
Irrelevance: A second important concept which now can be
defined is that of irrelevance. Two propositions P and Q are said to
be - probabilistically - irrelevant, abbreviated PirrQ if the
following is true:
2: PirrQ iff pr(P&Q)=pr(P)pr(Q)
irrelevance is symmetric (by T4):
T16. PirrQ iff QirrP
are more interesting results. Let's call a logically valid statement a
tautology and a logically false statement a contradiction.
Then we can say:
T17. Any proposition is irrelevant to any
tautology and to any contradiction.
this entails that tautologies are also mutually irrelevant. To prove
T17, first suppose that P is tautology. By A2
pr(P)=1. Since tautologies are logically entailed by any proposition,
Q |= P, and so pr(Q&~P)=0 by T12. Consequently, it
follows pr(Q)=pr(Q&P) by T5, and so
pr(P).pr(Q)=1.pr(Q&P)= pr(P&Q) and we have irrelevance. Next, suppose
(P) is a contradiction. If so, ~(P) is a tautology, and so pr(P)=0 by
T1. By T7 pr(P&Q) <= pr(P) and
as by A1 all probabilities are >= 0, it follows
pr(P&Q)=0. But then pr(P)pr(Q)=0.pr(Q)= 0=pr(P&Q), and again we have
Def 2 is often stated in two other forms, which are
both slightly less general, as they require respectively that pr(P)>0
or that pr(P)>0 and pr(~P)>0, in both cases to prevent division by 0.
Both alternative definitions depend on Def 1, and
the first is given by
T18. If pr(P)>0, then PirrQ iff pr(Q/P)=pr(Q).
This is an
immediate consequence of Defs 1 and 2. It states clearly the important
property that irrelevance signifies: If P is irrelevant of Q, the fact
that P is true does not alter anything about the probability that Q is
true - and conversely, by T16, supposing that Q is
not also a contradiction. So irrelevance of one proposition to another
is always mutual, and means that the truth of the one makes no
difference to the probability of the truth of the other.
again be stated in yet another form, with once again a slightly
strengthened premise, for now it is required that both pr(P) and
pr(~P) are > 0:
T19. If 0 < pr(P) < 1, then PirrQ iff
hypothesis, which may be taken as meaning that P is an empirical
proposition, is true. T19 may be now proved by noting the following:
pr(Q/P) = pr(Q/~P) iff pr(Q&P):pr(P) = pr(Q&~P): (1-pr(P)) iff pr(Q&P)
- pr(P)pr(Q&P) = pr(P)pr(Q&~P) iff pr(Q&P) = pr(P)(pr(Q&P)+pr(Q&~P))
iff pr(Q&P) = pr(P)pr(Q).
important property of irrelevance is that if P and Q are irrelevant,
then so are their denials:
T20. PirrQ iff (~P)irrQ iff Pirr(~Q) iff
can be proved by noting some series of equivalences that yield
irrelevance. First consider pr(P&~Q), assuming PirrQ. Then pr(P&~Q) =
pr(P)-pr(P&Q) = pr(P)-pr(P)pr(Q) = pr(P)(1-pr(Q))= pr(P)pr(~Q)
(using T1). So
Pirr(~Q) if PirrQ. The converse can be proved by running the argument
in reverse order, and so Pirr Q iff Pirr(~Q). The other equivalences
are proved similarly.
the concept of irrelevance, which so far has been used in an
unconditional form, may be given a conditional form, when we want to
say that P and Q are irrelevant if T is true:
3: PirrQ/T iff pr(Q/T&P) = pr(Q/T)
that the probability that Q is true if T is true is just the same as
when T and P are both true - i.e. P's truth makes no difference to Q's
probability, if T is true. It should be noted that Def 3
requires that pr(T&P) > 0 (which makes pr(T) > 0), but that on this
condition T19 shows that Def 3 is just a simple
extension of Def 2. And as with Def 2 there is
T21. PirrQ/T iff QirrP/T.
PirrQ/T. By Def 3 pr(Q/T&P)=pr(Q/T) iff
pr(Q&T&P):pr(T&P)=pr(Q&T):pr(T) by Def 1. This is so
iff pr(Q&T&P):pr(Q&T) = pr(T&P):pr(T) iff pr(P/Q&T)=pr(P/T) iff
QirrP/T by Def 3.
conditional irrelevance of Q from P if T does not only hold in case P
is true, but also in case P is false. That is:
T22. PirrQ/T iff (~P)irrQ/T.
PirrQ/T, i.e. pr(Q/T&P) = pr(Q/T). By def 1 this is
equivalent to pr(Q&T&P):pr(T&P) = pr(Q&T):pr(T) iff pr(Q&T&P) =
Now pr(Q&T&P) = pr(Q&T)-pr(Q&T&~P), and so we obtain the equivalent
pr(Q&T&~P) = pr(Q&T)-(pr(T&P)pr(Q&T):pr(T)) =
pr(Q&T)(1-(pr(T&P):pr(T)) = pr(Q&T)((pr(T)-pr(T&P)) : pr(T)) =
pr(Q&T)(pr(T&~P):pr(T)) from which we finally obtain as equivalent to
PirrQ/T pr(Q&T&~P):pr(T&~P) = pr(Q&T) : pr(T), which is by
Def 3 the same as (~P)irrQ/T. Qed.
T21 and T22 yield the same result for conditional irrelevance as for
T23. PirrQ/T iff QirrP/T
is: The first line is T21, the second
T22. The third results thus: By both theorems,
QirrP/T iff (~P)irrQ/T whence PirrQ/T iff (~P)irrQ/T by T21. The
fourth results from this by substituting (~Q) for Q as in the third
line. Qed. 
So far as
regards the mathematics of probability for the moment.
 It is
interesting to note that while the ancient Greeks and Romans already
gambled, the mathematical theory of probability only started in the
17th Century, when
Fermat laid some of its foundations,
mainly related to games of chance or combinatorics, followed by
Ars Conjectandi, published a little after 1700 and
Theorie of Probabilities, published a little after 1800.
Even so, a sound and widely accepted set of axioms for probability
theory, that is, a set of propositions that, if true, logically imply
many other propositions of the
theory, only can about in the 1930ies.
"three interpretations" I have in mind are (1) ontological:
probabilities are somehow real and relate to real facts, such as the
chance that this bit of radioactive material emits a particle the next
minute or the
chance that 90% of the gas in a container is in the containers' left
part or the chance that one's teeth are bad given one has a low
income; (2) epistemological: probabilities are somehow related to
knowledge one has about something, that may be far from complete or
certain, and thus gets expressed as a fraction of certainty; (3)
subjective: probabilities are related to persons' willingness to bet
on events, and are not so much related to what the real facts are or
may be, nor to the knowledge there is about the real facts or that the person has, but to
the odds a given person is willing to accept that a certain event will
be found to be true.
Here I have been sketching,
and within each interpretation there are sub-interpretations, while
many entertain several interpretations, e.g. real probabilities for
quantum mechanics and insurance; epistemic probabilities or historical
events and testimonies of witnesses; and subjective probabilities when
one is betting on the horses.
The interesting fact is
that the mathematics of probability is generally supposed to be the
same whatever interpretation, the underlying reason being that
probabilities are in any case much like
proportions. Also, one can see the three interpretations combined,
when considering e.g. the probability that a given bit of radioactive
material will emit a particle the next minute: Presumably, there is a
real chance, about which there is some but not full scientific
knowledge, that some person may know some fraction of, while the
person may be willing to accept certain odds on the outcome for
subjective reasons of his own.
 As I
mentioned in Note ,
Kolmogorov's axiomatization, given in the text,
dates from the 1930ies. It was rapidly and widely accepted by
mathematicians, physicists and statisticians, because it is simple,
elegant, true for the intended interpretation - which is to say, in
other words, what one takes probabilities to be intuitively (something
proportions, with an interpretation as in Note
satisfies Kolmogorov's axioms - and powerful in that it implies
logically many theorems about probability that should be true of
probability in any intuitive sense (and that often had been proved
before, from more specific asssumptions). See also the next note.
also provided an axiom for infinite sums, that is much needed for the
more intricate applications of probabilities that involve the
calculus, since integrals are infinite sums (sums of infinitely many,
perhaps infinitely small, quantities). This will be left out here, but
it should be mentioned that since Kolmogorov proposed his
axiomatization that axiomatization, and indeed the purely mathematical
theory of probability have become part of
measure theory, which is a
yet more general theory of things having some measure, in a fairly
intuitive and mathematically precise sense. (See
As in Note
, the problems relating to probability these days are not so
much mathematical (standard probability theories are all part and
parcel of measure theory, mathematically speaking) but philosophical:
How is this all to be interpreted? What is probability? (But
those much inclined to skepticism should also realize that standard
probability works, as in applied technology, and physics, and also for
insurance and many kinds of statistics.)
Notational note: "e" is read as "is" or "is an element of (the set
of)"; "~" is read as "not" or "it is not true that"; "&" is read as
"and" or "both"; "V" is read as "or" in the inclusive sense, i.e. when
the one or the other or both are true; and "$" as said is an arbitrary
set of propositions (e.g. about a given subject) that is closed for
the just mentioned logical operators (which accordingly means that the
propositions that one can form with "~", "&" and "V" from propositions
in the set are also in the set, as indeed seems intuitively desirable).
Finally, "iff" is "if and
only if" (and [(P iff Q) iff ((P&Q) V (~P&~Q))]).
function is a
rule that assigns precisely one value to some expression. Many
ordinary terms, such as "your age", "your height", "your gender" are
functional in this sense, also depending on time, as one has - at any
given moment - just one age, height and gender. The rule refers to the
procedure by which the value for the expression is to be found.
Sums, products, quotients
etc. are also functions and one learns the rules for these at school.
The real numbers are
numbers which may have infinitely long fractional parts that do not
recur, and are needed as soon as one wants to calculate roots from
arbitrary numbers, such as the square root of 2, which is a real
number in the sense defined.
reader should note that the three axioms are intuitively true for
proportions, and that "logically valid" is as in standard bivalent
propositional logic, where it means "is true in each and any possible
circumstance", such as "it is true it rains or it is not true it
rains", which is so regardless from the weather. Finally, the
hypothesis in A3 means that P and Q are not both
With these provisos, it will - probably (!) - be clear to the reader
that the three axioms are true of proportions, as when "P" and "Q" are
interpreted as areas or regions, possibly overlapping, as in
reason it is said "the
mathematics is the same" for propositions and sets in this case is
that sets generally are taken to be given by statements that are
equivalent to statements involving "&", "V" and "~". Thus, the
complement of a set consists of all elements that are not - "~" - in
the set; the intersection of two sets is the set that consists of all
elements that are in both - "&" - sets a.s.o.
Note  for the interpretations of probability.
Another reason to prefer propositions and propositional logic is that
it is more simple than alternatives like set theory.
This is one reason why a mathematical or logical theory and proofs
from axioms or assumptions are so important for coming to know things:
Everything can be shifted back to one's assumptions, for in a good
mathematical or logical
theory everything else follows logically from these. (So one also
knows that if a provable consequence is not true in fact, at least one
of the assumptions used to deduce it cannot be true in fact - for this
And "the rest" is here mostly seeing that as
pr(PV~P)=1 it follows pr(P)+pr(~P)=1,
subsituting into pr(PV~P)=pr(P)+pr(~P)), from which it follows,
subtracting pr(P) from both sides that pr(~P)=1-pr(P), which is the
Notational note: "LS" is "left side". (Mathematicians also use LHS:
left hand side).
Here "logical equivalent" is meant: "by standard propositional logic".
(Generally, the logical equivalences are also intuitively equivalent,
but it is nice to have a procedure to appeal to when it is not
intuitively obvious or needs proof.)
Here it should be added for clarity that the usual arithmetical rule
for fractions is followed: If in a fraction n/d it is the case that
d=0 the fraction is undefined and does not exist. Other rules might be
adopted, conceivably, but then there is the problem that
the usual arithmetical rule is as
Also, for conditional
probabilities one may intuitively appeal to
as in Note , and say that in these terms pr(Q|P)
means that in the area corresponding to P, the proportion of Q is the
conditional probability of Q if P.
Here it should be added for
clarity that what is claimed holds in the present context, but that
there are axiomatizations of probability - Renyi's, for example -
which are based on conditional probabilities. Some prefer such
axiomatizations for philosophical reasons, which generally amount to
the - true - claim that most statements of probability are somehow
conditional rather than absolute.
"Def" and "df" are short for "definition", and the claim in the text
is conditional: If one defines entailment in probability theory
in terms of conditional probability, it follows that the ex falso rule
- "from a false assumption anything follows" no longer holds, in case
of that definition.
irrelevance (aka independence) is of very great cognitive and
theoretical importance, for various reasons such as the following:
A. In any
empirical test, for any theory whatsoever, we must make some
assumption to the effect that almost all circumstances that also
are the case, besides the prediction and theory we are testing, of
which there generally is almost a whole universe full of other facts,
are irrelevant for the outcome.
We must make such an
assumption, because otherwise the test, whatever the outcome, is not
conclusive in any way, since then, apart from the assumption (which
generally is supported by methodological designs, randomizations
etc.), any of the facts that obtain while test is done may be relevant
to the outcome of the test - and while we can exclude possibly
relevant conditions we know of by methodological design, we cannot
methodologically design away all of the also occuring surrounding
facts, and thus we must make the assumption. (This is widely missed,
and I lectured on it - and related
logical matters - some 21 years ago.)
B. In anything
we do - also apart from testing theories - we generally assume
that very much that also is happening at the time, close by or far
away, is practically speaking quite irrelevant for what we do.
Indeed, we may be mistaken,
but this is what we do do - disregard most of the things we
don't already know to be relevant as irrelevant - and what seems to be
necessary to want to do most things one does at all, for there is little point
of trying to do things if one assumes the outcome may depend as much
on oneself as on
anything whatsoever, apart from one's efforts and knowledge.
C. It is very hard
to see how
a human intellect may come to know and understand much of
the universe it finds itself in, if very many processes in that
universe are not, once certain conditions are met,
of very many other ongoing or preceding processes: Copper conducts
electricity, regardless of the political situation; water remains H20
also if the earth gets blown up a.s.o.
D. See my
of Reasoning in Philosophy for some brief fairly systematic
explanations, including probability theory and
So far for the short exposition of basic
probability theory, that was originally written in the early eighties,
essentially because I hadn't found anything like it - an elementary,
simple, fast, mathematically correct explanation, with proofs of many
fundamental basic theorems - that suited my own purposes. The 17 notes
were written today.
Finally, another reason - apart from its
fundamental importance for human reasoning - to compile and write this
Nederlog is that I love mathematics, and the above is a good basic piece
of mathematics, and besides I try to find relief from the miseries of
everyday by reconnoitering the beauties of real science and of
mathematics - see:
Real science & real psychology = joy.