And now for something completely
different, namely probability and
confirmation, and learning from experience,
and more specifically a solution of what may
to some seem a small or at least neglected
problem of probabilistic confirmation.
Sections
0. Introduction
1. Basic probabilistic
formulas, tables and schemes
2. The problem of
probabilistic confirmation
3. Dissolving the problem
of probabilistic confirmation
4.
Axiomatic approach for
probabilities in time
5. Schematic tables for
probabilities in time
Endremarks
A
note on probability and confirmation
0.
Introduction
In elementary probability
theory, and indeed also in not so elementary
probability theory, probabilities have a lot
in common with proportions and tend themselves
to be left rather unclarified.
In this note, I will leave the
question what probabilities are also
unclarified, except by noting that (1)
treating them as if they are proportions gets
one quite far and accords with such intuitions
as most people have about probabilities, and
(2) this is the case for any plausible
interpretation of probability, varying from
personal degree of belief in a statement to a
frequency in a population of a suitable kind.
I will state a basic problem
with probabilistic confirmation in any theory
of probability that I have seen and then
propose a solution. To clarify what I mean by
probability, here is a reference:
1. Basic
probabilistic formulas, tables and schemes
Probabilistic confirmation
normally is presented in terms of conditional
probabilities and schemes like the following.
First, the standard definition
or axiom for conditional probability is
assumed, that relates conditional probability
to conjunction, namely along the following
lines:
(1) p(P&T)=p(PT).p(T)
Here "p(X)" abbreviates "the
probability of X", where "probability" is
supposed understood and taken in a
proportional sense and "X" is a statement;
"p(YX)" abbreviates "the probability of Y
given that X is true", or some similar
statement; and "p(X&Y)" abbreviates "the
probability that X and Y are both true".
There are different notations,
and the explanations and motivations that are
given vary, but an equation like (1) results
in all standard treatments of probability,
understood in whatever sense, varying from
frequencies to degrees of belief.
Second, it is generally noted
that tabular schemes like the following are
enabled by (1), that outline the various
probabilities and their relations:

T 
~T 



T 
~T 

P 
a 
c 
a+c 
P 
p(PT).p(T) 
p(P~T).p(~T) 
p(P) 
~P 
b 
d 
b+d 
~P 
p(~PT).p(T) 
p(~P~T).p(~T) 
p(~P) 

a+b 
c+d 
1 

p(T) 
p(~T) 

Here the two tabular schemes are
different statements of the same presupposed
equations that depend on (1), which is to say
(2) a =
p(P&T) =
p(PT).p(T)
b = p(~P&T)
= p(~PT).p(T)
c = p(P&~T)
= p(P~T).p(~T)
d =
p(~P&~T)= p(~P~T).p(~T)
The sums work out as in
probability theory and suggested by the
schema, in which
(3) a + b =
p(P&T)+p(~P&T) =
p(PT).p(T)+p(~PT).p(T) = p(T)
a + c =
p(P&T)+p(P&~T) =
p(PT).p(T)+p(P~T).p(~T) = p(P)
and so on, as the tabular
schemes also state.
2. The
problem of probabilistic confirmation
Given the above, the problem of
probabilistic confirmation arises as follows 
and though the above is standard in all
treatments, the problem is generally neither
noticed nor treated.
The notion of conditional
probability as rendered in formula (1) is
taken as allowing probabilistic confirmation
along the following lines:
From (1) and the mathematics of
probability theory, that is the same or
equivalent for the various interpretations,
one obtains the converse of (1)
(4) p(TP) = (PT).p(T) : p(P)
on the grounds that
p(T&P)=p(P&T) and p(P) =
p(PT).p(T)+p(P~T).p(~T) =
p(P&T)+p(P&~T) and then reasons thus,
generally not formally but in natural language
(5) "Hence if P is true, the
probability of T gets to be the probability of
T given P"
where "the probability of T
given P" is rendered as p(TP) in (4).
The problem is that this is not
cogent:
Suppose that either P is or
simply P is true. Now if either implies
pP=1, which is what either does on standard
interpretations of probability, then
(6) p(TP)
= p(PT).p(T) : p(P) by
standard Probability Theory (PrTh)
=
p(P&T).p(T) : p(P) by (1)
=
p(T).p(T) :
p(P)
since by PrTh p(P&T)=p(T) if p(P)=1
=
p(T).p(T)
if
p(P)=1
This is what the mathematics of standard
Probability Theory logically implies  which
is not how it is normally used:
In fact normally p(TP) is taken as
p(TP)=p(PT).p(T) : p(P) = p(P&T).p(T)
: p(P) with all probabilities in the
equation including p(P) as < 1.
And this is the problem. It can
be restated by reference to time:
If P is found to be true at some time,
then p(T) is revalued as p(TP)=p(PT).p(T)
: p(P) = p(P&T).p(T) : p(P) with
probabilistic values as were before
P.
3.
Dissolving the problem of probabilistic
confirmation
More briefly, we can introduce a reference
to times t and t+1 and write
(7) P_{t+1} > p(TP)_{t}
= p(PT)_{t}.p(T)_{t}
: p(P)_{t}
=
p(P&T)_{t}.p(T)_{t}
: p(P)_{t}
This is all as in standard Probability
Theory, except for the references to time,
with t the last time, and t+1 the next time,
say, and with the temporalized form of (1)
used, with p(P)_{t}
< 1, to avoid the problem stated in the
last section.
This fully accords with the
standard working of confirmation in standard
probability theory, except that a formula like
(7) is rarely stated explicitly,
though something like it does get used.
In fact, since the central points
of probabilistic confirmation are that (i)
there is a new probability
of T upon learning (at time t+1) that P is
true and (ii) this new probability for T
upon learning that P is true at t+1 does
satisfy the old conditional
probability p(TP) at t, the following is
what's involved:
(8) P_{t+1}
> p(T)_{t+1} = p(TP)_{t}
=
p(PT)_{t}.p(T)_{t}
: p(P)_{t}
=
p(P&T)_{t}.p(T)_{t}
: p(P)_{t}
In the usual treatments of any standard
theory of probability theory this is what
gets used in fact, though it is rarely or
never as explicitly stated, that is, with
references to time and the notes that, if P
is true, the new probability for T
(at t+1) is the old conditional
probability of TP (at t), in which
the old probability for P (at t),
which is not equal to 1, gets used.
At this point it might be supposed
that (8) might be taken as the axiom
involved in Bayesian Confirmation,
for (8) shows how Bayesian
Confirmation
is generally worked with, and it must be an
axiom (or be derived from some axiom), since
it is about how new information
alters old probabilities, that is not
explicit in the standard axioms of standard
probability theory (which in fact do not
refer to times at all).
But
actually more is required  for the
above above may in fact result in T1_{t+1}
+ ... + Tn_{t+1}
unequal to 1, supposing that T1 + ...
+ Tn are n distinct theories to account for
a prediction P, also if we suppose that
before learning that P is true, T1_{t+1}
+ ... + Tn_{t+1} = 1
This may be illustrated by
considering the simple case of just T1 + T2 =
1 initially, with T1P > T1 and T2P = T2,
that is, when T1 is relevant to P and T2 is
not. Clearly, T1P + T2P > 1 if T1 + T2 =
1.
It is not difficult to see how
this may be settled mathematically in general,
namely by the following formula, that
normalizes it all proportionally:
(9) P_{t+1}
> pTi_{t+1}
= (pTiP)_{t}
: (∑j (pTjP)_{t})
=
(pTiP)_{t}
: (p(T1P)_{t}
+ ... + p(TnP)_{t}))
If P at t+1 then Ti at t+1 is the
conditional probability of (Ti if P) divided
by the sum of all conditional probabilities
for T1 .. Tn if P, or more briefly: If
P_{t+1}, then p(Ti) at t+1 is
the normalized probability of (Ti if
P) at t.
This makes intuitive sense and can be
supplemented by the following assumptions,
that again, like (9), are in accordance with
standard probability theory but are rarely
or never explicitly stated while standardly
used:
(10) p(PTi)_{t}
and p(P~Ti)_{t}
are given for all t and are the same at all
t
The intuitive reason for (10) is
that these two conditional probabilities are
in fact the hypotheses for P given T and given
~T, and are what is to be tested
experimentally, and namely by finding whether
at some time P or ~P is true, and then
applying (9).
(11) p(Ti)_{t+1} is
calculated by (9), that
applies to either of P, ~P
and PV~P,
depending on what is given at t+1 (*)
and on the understanding that if
one of the former two is given it is used in
preference to the third for the new pr(Ti)t.
(12) P_{t+1} does not
imply p(P)_{t+1}=1, which
comes instead from
p(PTi)_{t}
and p(P~Ti)_{t}
and p(Ti)_{t+1}.
Formally, that is
(13) p(P)_{t+1}
= p(PTi)_{t+1}.p(Ti)_{t+1}
+ p(P~Ti)_{t+1} .p(~Ti)_{t+1}
which accords again with
standard Probability Theory, and involves (10)
and (11), that also allows equating
p(PTi)_{t+1 }and
p(PTi)_{t}.
4.
Axiomatic approach for probabilities in time
Having come this far, we may as
well propose a set of axioms, so as to make
clear what is in fact assumed:
A1 0 <= p(Ai)_{t} <= 1
A2 p(~PAi)_{t}=1(PiAi)_{t}
A3 (Ai>P)_{t} > p(Ai)_{t}<=p(P)_{t}
In fact, all of the above accords
with standard probability theory: A1 has it
that probabilities are between 0 and 1
inclusive; A2 says a conditional probability
and its denial on the same assumption sum to
1; and A3 says that if it is a thesis that
Ai implies P then the probability of Ai is
not higher than the probability of P.
All of these are either theorems or
axioms in any formulation of any standard
probability theory, as is the following
definition of probabilistic conjunction:
D1 p(P&Ai)_{t}=p(PAi)_{t}.p(Ai)_{t}
Also, all of this with reference to
times t is supposed to hold for each and any
time t, provided it is the same for all
statements in the formulas.
The above is sufficient for the
usual Kolmogorov axioms for probability
theory, since we have from A2 and D1 the
theorem
T1 p(Ai)_{t} = p(P&Ai)_{t} +
p(~P&Ai)_{t}
again for each and any time t. We also have
from A3 for each and any time t that
T2  (X IFF Y)_{t}
> p(X)_{t}=p(Y)_{t}
If X and Y are logically
equivalent, they have the same probability, at
the same time (and indeed at any time, for
logical equivalents hold at any time).
This allows the resurrection of
all of standard elementary probability theory,
but we have seen that more is needed for
probabilistic confirmation if we want to
avoid the above stated problem:
A4 (P)_{t+1} > p(Ti)_{t+1} = ( p(TiP)_{t}
: (∑j (pTjP)_{t}) )
= ( p(TiP)_{t} : ( p(T1P)_{t} + ... +
p(TnP)_{t} ) )
This is (9) stated above:
If (P)_{t+1}, then p(Ti) at
t+1 is the normalized probability of
(Ti if P) at t. This resulting
normalized probability not only sums to
1 as it should, but also changes all old
probabilities so as to cohere with the
changes in some of them, upon learning that
(P)_{t+1}.
(**)
A5 p(PT)_{t+1}=p(PT)_{t}
And this is (10) stated above: The
hypotheses for P given T and given ~T are what
is to be tested experimentally, namely by
finding whether at some time P or ~P is true,
and this requires that these are the same for
each and any time t.
5. Schematic
tables for probabilities in time
Normally, one has several
hypotheses to account for a class of facts,
and one tests these hypotheses by finding
out whether their predictions hold.
A very convenient tabular
schema, that conforms to the above schema for
one T is
also involved. Basically, it looks like
this, with details left out:

T0 
T1 
.. 
Ti 
.. 
Tn 
P 






~P 







p(T0) 
p(T1) 
.. 
p(Ti) 
.. 
p(Tn) 
Here P is some prediction, T0 is
the background knowledge, and T1 ..
Tn are alternative explanations
for P, that may or may not be relevant to P,
and if relevant may or may not be relevant
to the same extent as the others. (***)
Note this leaves out the denials of
the Ti's, though these can all be recovered
from the following:
A6 For 1 .. i .. n:
p(Ti)_{t},
p(PTi)_{t},
p(P~Ti)_{t}
That is, for all of the
alternative theories we are given somehow
(or dream up, to begin with) the probability
of the theory Ti and of the probability of P
if Ti and also of the probability of P if
~Ti. Only the first may vary with time.
As the earlier schema shows 
see (2)  these three probabilities are
sufficient to find all of a, b, c and d by
standard probability theory.
Next, we make an additional
assumption that simplifies things rather a lot
A7 For 1 .. i ..
n: p(P~Ti)_{t}
= p(PT0)_{t}
That is, rather than breaking
one's head over the issue what p(P~Ti) might
be for any of T0, T1 .. Tn, we assume that it
equals P on background knowledge T0 if ~Ti. (****)
This simplifies things
considerably, not least because now for n
alternative theories one needs but 2n
probabilities, rather than 3n.
Note that we have by A7 for
background knowledge T0 the following:
T0
~T0
P p(PT0)_{t}.p(T0)_{t}
p(PT0)_{t}.p(~T0)_{t}
~P
p(~PT0)_{t}.p(T0)_{t}
p(~PT0)_{t}.p(~T0)_{t}
So p(T0P)_{t}
= p(PT0)_{t}.p(T0)_{t}
: p(PT0)_{t}.p(T0)_{t}+p(PT0)_{t}.p(~T0)_{t}
= p(T0)_{t}:
The probability of the background knowledge
does not alter if P_{t}.
Note this does not mean it also remains the
same on normalization: If alternatives get
higher, it gets lower, and the other way
around, when probabilities of theories get
normalized.
Now we can write the table with a
little more detail and bring out what
happens at t and at t+1 if P. To start with
the former, and leaving unstated what should
be clear:
t 
T'0 
T'1 
.. 
T'i 
.. 
T'n 

P 



p(PTi).p(Ti) 



~P 



p(~PTi).p(Ti) 




p(T0) 
p(T1) 
.. 
p(Ti) 
.. 
p(Tn) 
probabilities at t of T0 .. Tn 
t+1 
p(T0P) 
p(T1P) 
.. 
p(TiP) 
.. 
p(TnP) 
conditional probabilities at t+1 of T0
.. Tn: p(TiP)_{t 
may not sum to 1:
} 
t+1 
p(T'0) 
p(T'1) 
.. 
p(T'i) 
.. 
p(T'n) 
normalized conditional
probabilities at t+1 of T0 ..
Tn: p(Ti)_{t+1} =
p(TiP)_{t} : ∑j(pTjP)_{t}.
Do
sum to 1.

P 



p(PTi).p(T'i) 



~P 



p(~PTi).p(T'i) 




p(T'0) 
p(T'1) 
.. 
p(T'i) 
.. 
p(T'n) 

So this is how we learn from
experience, using background
knowledge T0 and T1 .. Tn as a
class of alternative explanations for a
kind of facts, and needing no more
than 2.n assumptions of probabilities, given
A1 .. A7 stated above.
Note also that while A7 allows
simplifications and the schematic tables do
not list the p(P&~Ti)_{t} =
p(P~Ti)_{t}.p(~Ti)_{t}
components, and indeed also do not list the
negations of the T0, T1 .. Tn, these in fact
do enter the conditional probabilities through
probability theory on the stated axioms:
(14) p(TiP)_{t} =
p(PTi)_{t}.p(Ti)_{t} :
(p(PTi)_{t}.p(Ti)_{t} +
p(P~Ti)_{t}.p(~Ti)_{t})
Finally, while this solves the
problem of probabilistic confirmation that I
noted, it does so by restating probability
theory, and in a way that seems compatible
with personal and subjective interpretations
of
probability, and with probability as degree
of belief, but perhaps not with other
interpretations.
For this reason, the
probabilities spoken of in this essay are best
taken as degrees of belief, that indeed may in
some cases be derived from or be close to
frequentist or empirical probabilities, but
that do not coincide with them.
Endremarks:
As I remarked before, I have some relief
with large doses B12, and when I feel
better I tend to think about the things that
interest me but that I often disregard because
I am too miserable to do anything or much that
is useful for it.
This note treats a problem I
have been aware of for a long time, and few if
any saw before. Now I may be mistaken about my
solution, but whether or not I am: The above
text is a factor 100 or 1000 as sophisticated
about what theories are than the whole APA
seems to be capable of, which is a great
moral, scientific, medical and legal shame,
and besides is very dangerous for very many.