Introduction
In this page, I shall present a version of the
standard classical probability theory that I shall call CPT. Just as in case
of CPL in chapter 2 it is based on standard elementary algebra and English,
and has all the standard theorems of standard probability theory  on a
slightly different new basis.
Here is an outline and preview with links:
Sections
1. A new set of axioms for probability
(*)
2. Kolmogorov's standard axioms for probability are derived
3. Over 20 standard theorems of probability theory:
A. Basic unconditional theorems
B. Basic conditional theorems
C. Basic theorems about
irrelevance
4. How probability theory explains learning from experience
Confirmation
Undermining
Competition
Support
5.
The utility of probability theory
You can take the first for granted and go straight to
the second or third, while the
fourth is the most interesting and involve some fundamental
applications of the basic ideas of probability theory to reasoning and to
learning from experience. Something much like this and much more is in G.
Polya's two volumes on Plausible Reasoning.
In particular, in section 4 it shall be shown that here are a number of important and
intuitive principles of confirmation which are always used and which can be
proved with the help of probability theory.
These principles
are stated to be of four kinds and may be
summarised as:
Confirmation:

The probability of a theory increases as its
consequences are verified.

Support:

The probability of a theory increases as relevant
circumstances are verified.

Competition:

The probability of a theory increases as its
competing theories are falsified.

Undermining:

The probability of a theory decreases as its
assumptions are falsified

These principles
are then
proved on the basis of what was establisjed in the earlier sections.
All reasoning and mathematics in what follows is elementary
but some knowledge of propositional logic is presupposed if not strictly
necessary, since most formulas are (initially) given English readings.
1.
New axioms for probability theory
What I shall provide is a set of three
new axioms that imply the standard axioms for probability of Kolmogorov.
These new axioms make it easier to join probability theory to propositional
logic than Kolmogorov's axioms and are more elementary and simpler than his
in several respects  as shall be shown. (*)
Here are the axioms, where all that is assumed about
"pr(A)" is that it is equal to some number and read as "the
probability of A". This means that one must add synactical rules to the
effect
Notation:

"pr(A)" is "the probability of A"

"_{T}A" is "A is a theorem of theory T"

CPTSyntaxis

As for
CPL plus:

 CPTpr() : If
"A" is a proposition of CPL and "a" any number
between 0 and 1 inclusive,
"pr(A)=a"
is proposition of CPT.

 CPT_{T}
: If "A" is a proposition of CPL and T "a" name
for a set of statements of CPL,
"_{T
}A" is a proposition of CPT.

This means that CPT is syntactically an extension of CPL:
"pr(A)" refines [A] in that (as we shall prove) 0 <= pr(A) <=
1.
The notation "_{T}" is
introduced to facilitate the link to Kolmogorov's statement and to have a
convenient abbreviation for "A is a theorem of theory T". Introducing it is
not necessary, for "[A]=1 holds in theory T" or "pr(A)=1 holds in theory T"
are taken to mean the same. Also it is noteworthy that it does not follow that
one can iterate either "_{T}" as in "_{T}(_{T}A)" or iterate "pr()"
as in "pr(pr(A)=a)=b".
Now the semantical axioms for CPT are:

If A and B are any
propositions in CPL:

Alternatively expressed:

AxA.

( A) > pr(A)=1

[A]=1 > pr(A)=1

AxB.

(( A > B) >
pr(A)<=pr(B)

[A>B]=1 > pr(A) <=
pr(B)

AxC.

pr(A)=pr(A&B)+pr(A&~B)

pr(A)=pr(A&B)+pr(A&~B)

Here " A" formalizes the
notion that "A is a theorem in the presumed theory", where "a
theory" is "a set of assumptions added to the axioms of logic"
and the "theorems of the theory" are all statements that can be
deduced from the theory by inference rules of a presumed logic, such as CPL.
Note that in what follows the reference
to a theory T is abstracted from (though in any application this will be what
one will want to find logical consequences from), and that therefore while
[A]=1 iff  A is useful, the notation " A_{T}" makes
reference to an item "[A]=1" doesn't (though it could be easily
added).
Also, it is noteworthy that mere factual
truth of A is not sufficient to make the hypotheses of AxA and AxB true:
Indeed, what one normally wants is an assurance (and so a proof) that a given
theory T does logically imply or fail to imply a certain proposition P 
after which one has an external check on theory T, by finding whether the
proposition P is in fact true or false.
I abstract from reference to theories to
simplify and eliminate clutter, but it is useful to state a version with such
references and provide readings, i.a. because this shows how neatly the
axioms tie PT to PL in the present formulations

If A and B are any
propositions in CPL:

Alternatively expressed:

AxA.

(_{T} A) >
pr(AT)=1

[A_{T}]=1 >
pr(AT)=1

AxB.

((_{T} A
> B) > pr(AT)<=pr(BT)

[A_{T}>B_{T}]=1
> pr(AT) <= pr(BT)

AxC.

pr(AT)=pr(A&BT)+pr(A&~BT)

pr(AT)=pr(A&BT)+pr(A&~BT)

Here is a reading with the various optional reference to a
supposed theory T (a sequence of statements of CPL) left ou

CPTaxioms
in words:

AxA:

A is a theorem only
if the probability of A is 1.

AxB:

(A only if B) is a
theorem only if the probability of A is less than or equal to the
probability of B.

AxC:

The probability of A
is the sum of the probabilities of A and B and A and not B.

It is from the formal statement of these
axioms  dropping references to T  we shall now derive Kolmogorov's axioms 
which also do not explicitly refer to a theory that may be used in the
hypotheses of its axioms.
2. The proof of
the standard Kolmogorov axioms for probability theory:
These standard Kolmogorov axioms for probability are
normally stated in such terms as:

Kolmogorov axioms
for probability theory:


Suppose that $ is a
set of propositions P, Q, R etc. and that this set is closed for negation,
conjunction and disjunction, which is to say that whenever (P e $) and (Q e
$), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a function
that maps the propositions in $ into the real numbers in the following way,
that is, satisfying the following three axioms:



A1.

For all P e $ the
probability of P, written as pr(P), is some nonnegative real number.

A2.

If P is logically
valid, pr(P)=1.

A3.

If ~(P&Q) is
logically valid, pr(PVQ)=pr(P)+pr(Q).

In fact, we don't need the initial statement, since we simpy presume CPL,
which does meet the specifications of the initial statement. What we do need
is proofs of A1, A2 and A3. Here they come.
First, there is the fundamental theorem
that permits inferences from logical equivalences to probabilities:
T*1:

 (A iff B) >
pr(A)=pr(B)

Equivalent
propositions have the same probability

(1)

 (A iff B) > 
(A > B) > pr(A) <= pr(B)

AxB

(2)

 (A iff B) >  (B
> A) > pr(B) <= pr(A)

AxB

(3)

 (A iff B) >
pr(A) <= pr(B) & pr(B) <= pr(A)

(1), (2)

(4)

 (A iff B) >
pr(A) = pr(B)

(3), Algebra

Next, it is proved contradictions have probability 0:
T*2

pr(A&~A)=0

Contradictory
propositions have zero probability

(1)

pr(A)=pr(A&A)+pr(A&~A)

AxC

(2)

pr(A)=pr(A&A)

T*1 with ( A iff
(A&A))

(3)

=pr(A)+pr(A&~A)

(1), (2)

(4)

pr(A&~A)=0

(3), Algebra

It is often helpful to have in propositional logic two special constants,
such as Taut (from "tauology") and Contrad (from
"contradiction"). These are defined as: Taut iff AV~A and Contrad
iff A&~A. Taking this for granted:
T*3

0 <= pr(A) <= 1

Probabilities are
between 0 and 1 inclusive

(1)

 A >
pr(A)=1

AxA

(2)

 pr(Taut)=1

(1) and  Taut

(3)

 A > Taut

Logic

(4)

pr(A) <= pr(Taut)

(3), AxB

(5)

pr(A) <= 1

(2), (4)

(6)

pr(Contrad)=0

T2

(7)

 (Contrad >
A)

Logic

(8)

pr(Contrad) <=
pr(A)

(7), AxB

(9)

0 <=
pr(A)

(6), (8)

(10)

0 <= pr(A) <=
1

(5), (9)

Next, we need to prove the probabilistic theorem for
denial. We do it in two steps:
T*4

pr(AV~A)=pr(A)+pr(~A)

Probability of
disjunction of exclusives is sum of probability of factors

(1)

pr(AV~A)=pr((AV~A)&A)+pr((AV~A)&~A)

AxC

(2)

pr(A)=pr((AV~A)&A)

T*1, as 
((AV~A)&A) iff AT*1, as  ((AV~A)&~A) iff ~A

(3)

pr(~A) =
pr((AV~A)&~A)


(4)

pr(AV~A) = pr(A)+pr(~A

(1),(2),(3)

T*5

pr(~A)=1pr(A)

Probability of denial
is complementary probability

(1)

pr(AV~A)=pr(A)+pr(~A)

T*4

(2)

1=pr(A)+pr(~A)

AxA since  AV~A

(3)

pr(~A) =
1pr(A)

(2), Algebra

Next, we have this parallel to AxA:
T*6

~A > pr(A)=0

Provable nontruths
have zero probability

(1)

~A

Assumption

(2)

pr(~A)=1

(1), AxA

(3)

1pr(A)=1

(2), T*5

(4)

pr(A)=0

(3), Algebra

The main point of T*6 and AxA is that if
one can prove that A (or ~A), then thereby it follows that pr(A)=1 (or
pr(A)=0 if ~A). This is normally important in comparing the supposed truths
and nontruths one can logically infer from a theory, with what the facts are
(so that if one can prove that _{T} A, while in fact one finds ~A
one thereby has learned the assumptions of theory T can't be all true, if the
proof of _{T} A was without mistakes in reasoning. Incidentally,
this shows one should not define "_{T} A" as "Nec
A", with "Nec" the modality of necessary truth: This amounts
to the presumption that T is true.)
Next, we
need a theorem that serves as a lemma to the next theorem, but that needs a
remark itself. The theorem is:
T*7

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

Full disjunctive
probabilistic sum of two factors

(1)

pr(A)+pr(~A)=1

T*5

(2)

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

(1), AxC

The promised remark is that T*7 differs essentially from
the similar theorem in CPL minus the probabilities: In CPL
([A&B]+[A&~B]+[~A&B]+[~A&~B]) is true and implies that
precisely one of the four factors is true. In PT
pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B) is true, but normally
none of the four alternatives itself is provably true by itself; normally
none of the four alternatives are true; and normally several or all of the
alternatives will have a probability between 0 and 1 (conforming T*3).
Indeed, a very interesting aspect of PT
is that it assigns numerical measures to all alternatives the underlying
logic can distinguish, regardless of whether these alternatives are true or
have ever been true. And part of the interest is that there normally are far
more logically possible alternatives than logically provable alternatives.
To finish
the proof CPT indeed implies all of Kolmogorov's axioms for PT we need to
derive his Ax3:
T*8

~(A&B) >
pr(AVB)=pr(A)+pr(B)

Conditional sums

(1)

~(A&B)

AI

(2)

pr(A&B)=0

T*6

(3)

pr(A)=pr(A&~B)

(2), AxC

(4)

pr(B)=pr(~A&B)

(2), AxC, T*1

(5)

pr(AVB)=1pr(~A&~B)

T5, T*1 with
(~(~A&~B)) iff (AVB)

(6)

=pr(A&B)+pr(A&~B)+pr(~A&B)

T*7

(7)

=pr(A&~B)+pr(~A&B)

(2),(6)

(8)

=pr(A)+pr(B)

(3),(4),(7)

I have now proved all of Kolmogorov's axioms for the finite case: A1 follows
from T*3; A2 is AxA; and A3 is T*8.
3.
Some fundamental theorems of CPT
Irrespective of the axiomatization or interpretation of
probability, there are a number of important theorems which we shall need 
just as we need laws like (a+b)=(b+a) for counting, irrespective of axioms
used to prove them or of what we choose to count. The advantage and use of
axioms is that one can use them to prove the theorems one needs  and having
given a valid proof one knows that any objection against the theorem must be
directed against the axioms, for the theorem was proved to follow from them.
So what we shall do first is to derive some useful theorems.
A. Basic unconditional theorems
First, then, there is a group of theorems
that the reader may derive from Kolmogorov's axioms (from which they do
follow) and that I derived above from my axioms:
T1

pr(~P)=1pr(P)

T*5

T2

0 <= pr(P) <= 1

T*3

T3

If P  Q, then pr(P)
<= pr (Q)

AxB

T4

If P is logically
equivalent to Q, then pr(P)=pr(Q)

T*1

T5

pr(P)=pr(P&Q) +
pr(P&~Q)

AxC

T6

pr(PVQ)=pr(P)+pr(Q)pr(P&Q)

T*7, T5

These were all proved in section (4.1). We only
add
T7

pr(P&Q) <= pr(P)
<= pr(PVQ)


that is: The probability of a conjunction is not larger than the probability
of any of its conjuncts, and the probability of a disjunction is not smaller
than the probability of any of its disjuncts. It follows from T5 and T6 or
from AxB and logic.
In what follows I'll state and prove the most important
theorems of elementary finite probability theory, firstly because I have
never seen this done properly in one paper, secondly because it seems to me
one of the cornerstones of human reasoning, and thirdly to be able to show
how we can learn from experience using probability theory. (The last subject
starts in section 4.6. It deserves to be better known than it is, for it
could help to defuse, refute or ridicule much improbable nonsense that people
believe in.)
In what follows proofs when referring to
axioms refer to Kolmogorov's. Readers thoroughly familiar with elementary
probability theory may choose to skip the rest of this chapter, but are
advised to read the last sections, 4.11 and 4.12.
B.
Basic conditional theorems
Most probabilities are not,
as they were in this chapter so far, absolute, but are conditional: Rather
than saying "the probability of Q = x" we usually introduce a
condition and say, "the probability of Q, if P is true, = y". This
idea, that of the probability of a proposition Q given that one or more
propositions P1, P2 etc. are true is formalised by the following important
definition:
Definition 1

pr(QP) =
pr(P&Q):pr(P)


That is: The conditional probability of
Q, given or assumed that P is true, equals the probability that (P&Q) is
true, divided by the probability that (P) is true. NB, as this fact has
important implications for the interpretation and application of probability
theory: A conditional probability is defined in terms of absolute
probabilities, so therefore we need absolute probabilities to establish
conditional ones.
Definition 1 has many applications, and
many of these turn on the fact that it also provides an implicit definition
of pr(P&Q), namely as pr(P)pr(QP) (simply by multiplying both sides of
Def 1 by pr(P)). Consequently, we have as a theorem (if pr(P)>0 and
pr(Q)>0)
T8

pr(P&Q)=pr(P)pr(QP)=pr(Q)pr(PQ)


The second equality is, of course, also an application of
Def 1, and T8 accordingly says that the probability of a conjunction equals
the probability of one conjunct time the probability of the other given that
the one is true. Another consequence of Def 1 i
which
results from T5 and Def 1 upon division by pr(P), and says that the
probability of Q if P plus the probability of ~Q if P equals 1. Of course,
this admits of a statement like T1:
which shows that conditional probabilities are like
unconditional ones. A theorem to the same effect, that parallels T3 is
That 0 <= pr(QP) follows from D1,
because the components of a conditional are both >=0 by A1; and that
pr(QP)<=1 is equivalent to pr(P&Q) <= pr(P), which holds by T7. A
theorem in the vein of T4 is
T12

If P  Q, then
pr(P&~Q)=0


This is
proved by noting that if P  Q holds, then so does ~(P&~Q), which, by
A3, entails that pr(PV~Q)=pr(P)+pr(~Q). As by T6
pr(PV~Q)=pr(P)+pr(~Q)pr(P&~Q), it follows pr(P&~Q)=0 if P  Q. From
this it easily follows that
T13

If P  Q, then
pr(QP)=1 provided pr(P)>0


which is to say that if Q is a logical consequence
of P, the probability of Q is P is true is 1. The proviso is interesting, for
it denies the possibility of inferring Q from a logical contradiction or
known falsehood. This means that the def: P  Q =df pr(QP)=1 strengthens
the logical "" by adding that proviso. T13 immediately follows
from T5, T12 and Def 1.
Def 1 may,
of course, list any finite number of premises, as in pr(QP1&....&Pn)
= pr(Q&P1&....&Pn): pr(P1&....&Pn). Such long
conjunctions admit of a theorem like T8:
T14

pr(P1&.....&Pn)=pr(P1)pr(P2P1)pr(P3P1&P2).....pr(PnP1&.....&Pn)


This says that the probability that n
propositions are true equals the probability that the first (in any
convenient order) is true times the probability that the second is true if
the first is true times the probability that the third is true if the first
and the second are true etc. The pattern of proof can be seen by noting that
for n=3 pr(P1)pr(P2P1)pr(P3P1&P2) = pr(P1&P2)pr(P3P1&P2) =
pr(P3&P2&P1) because the denominators successively drop out by Def 1.
That the premises can be taken in any order is a consequence from T4:
Conjuncts taken in any order are equivalent to the same conjuncts in any
other order.
T11 and T13, together with T9 and T10,
show that conditional probabilities are probabilities we need just one
further theorem:
T15

If R  ~(P&Q), then
pr(PVQR) = pr(PR)+pr(QR)


which parallels A3. It is easily proved by noting that
pr(PVQR) = (pr(P&R)+pr(Q&R)pr(P&Q&R)):pr(R) by Def 1, T4
and T6, and that pr(P&Q&R)=0 by T12 and T4 on the hypothesis. The
conclusion then follows by Def 1.
C.
Basic theorems about irrelevance
A second important concept
which now can be defined is that of irrelevance. Two propositions P and Q are
said to be  probabilistically  irrelevant, abbreviated PirrQ if the
following is true:
Def 2

PirrQ iff
pr(P&Q)=pr(P)pr(Q)


Evidently,
irrelevance is symmetric:
But there are more interesting results. Let's call a logically
valid statement a tautology and a logically false statement a contradiction. Then
we can say:
T17

Any proposition is
irrelevant to any tautology and to any contradiction.


Note that this entails that tautologies are
also mutually irrelevant. To prove T17, first suppose that P is tautology. By
A2 pr(P)=1. Since tautologies are logically entailed by any proposition, Q 
P, and so pr(Q&~P)=0 by T12. Consequently, it follows pr(Q)=pr(Q&P)
by T5, and so pr(P).pr(Q)=1.pr(Q&P)= pr(P&Q) and we have irrelevance.
Next, suppose (P) is a contradiction. If so, ~(P) is a tautology, and so
pr(P)=0 by T1. By T7 pr(P&Q) <= pr(P) and as by A1 all probabilities
are >= 0, it follows pr(P&Q)=0. But then pr(P)pr(Q)=0.pr(Q)=
0=pr(P&Q), and again we have irrelevance.
Def 2 is
often stated in two other forms, which are both slightly less general, as
they require respectively that pr(P)>0 or that pr(P)>0 and pr(~P)>0,
in both cases to prevent division by 0. Both alternative definitions depend
on Def 1, and the first is given by
T18

If pr(P)>0, then
PirrQ iff pr(QP)=pr(Q)


This is an
immediate consequence of Defs 1 and 2. It states clearly the important
property that irrelevance3 signifies: If P is irrelevant of Q, the fact that P
is true does not alter anything about the probability that Q is true  and
conversely, by T16, supposing that Q is not also a contradiction. So
irrelevance of one proposition to another is always mutual, and means that
the truth of the one makes no difference to the probability of the truth of
the other.
This can
again be stated in yet another form, with once again a slightly strengthened
premise, for now it is required that both pr(P) and pr(~P) are > 0:
T19

If 0 < pr(P) < 1,
then PirrQ iff pr(QP)=pr(Q~P)


Suppose the hypothesis, which may be
taken as meaning that P is an empirical proposition, is true.T19 may be now
proved by noting the following: pr(QP)=pr(Q~P) iff pr(Q&P):pr(P) =
pr(Q&~P): (1pr(P)) iff pr(Q&P)  pr(P)pr(Q&P) = pr(P)pr(Q&~P)
iff pr(Q&P) = pr(P)(pr(Q&P)+pr(Q&~P)) iff pr(Q&P) =
pr(P)pr(Q).
Another
important property of irrelevance is that if P and Q are irrelevant, then so
are their denials:
T20

PirrQ iff (~P)irrQ iff Pirr(~Q)
iff (~P)irr(~Q).


This too can be proved by noting some
series of equivalences that yield irrelevance. First
consider pr(P&~Q), assuming PirrQ. Then pr(P&~Q) = pr(P)pr(P&Q)
= pr(P)pr(P)pr(Q) = pr(P)(1pr(Q))= pr(P)pr(~Q). So
Pirr(~Q) if PirrQ. The converse can be proved by running the argument in
reverse order, and so Pirr Q iff Pirr(~Q). The other equivalences are proved
similarly.
Finally, the
concept of irrelevance, which so far has been used in an unconditional form,
may be given a conditional form, when we want to say that P and Q are
irrelevant if T is true:
Def 3

PirrQT iff
pr(QT&P) = pr(QT)


This says that the probability that Q is true if T is
true is just the same as when T and P are both true  i.e. P's truth makes no
difference to Q's probability, if T is true. It should be noted that Def 3
requires that pr(T&P) > 0 (which makes pr(T) > 0), but that on this
condition T19 shows that Def 3 is just a simple extension of Def 2. And
as with Def 2 there is symmetry:
For suppose PirrQT. By Def 3
pr(QT&P)=pr(QT) iff pr(Q&T&P):pr(T&P)=pr(Q&T):pr(T) by
Def 1. This is so iff pr(Q&T&P):pr(Q&T) = pr(T&P):pr(T) iff
pr(PQ&T)=pr(PT) iff QirrPT by Def 3.
And this
conditional irrelevance of Q from P if T does not only hold in case P is
true, but also in case P is false. That is:
T22

PirrQT iff (~P)irrQT


For suppose PirrQT, i.e. pr(QT&P) =
pr(QT). By def 1 this is equivalent to pr(Q&T&P):pr(T&P) =
pr(Q&T):pr(T) iff pr(Q&T&P) = pr(T&P)pr(Q&T):pr(T). Now
pr(Q&T&P) = pr(Q&T)pr(Q&T&~P), and so we obtain the
equivalent pr(Q&T&~P) = pr(Q&T)(pr(T&P)pr(Q&T):pr(T)) =
pr(Q&T)(1(pr(T&P):pr(T)) = pr(Q&T)((pr(T)pr(T&P)) : pr(T))
= pr(Q&T)(pr(T&~P):pr(T)) from which we finally obtain as equivalent
to PirrQT pr(Q&T&~P):pr(T&~P) = pr(Q&T) : pr(T), which is by
Def 3 the same as (~P)irrQT. Qed.
And finally
T21 and T22 yield the same result for conditional irrelevance as for
irrelevance:
T23

PirrQT iff QirrPT


(1)

iff (~P)irrQT

T21

(2)

iff Pirr(~Q)T

T21, T22, (1)

(3)

iff (~P)irr(~Q)T

(2)

The proof is: The first line is T21, the
second T22. The third results thus: By both theorems, QirrPT iff (~P)irrQT
whence PirrQT iff (~P)irrQT by T21. The fourth results from this by
substituting (~Q) for Q. Qed.
