Previous IndexNL Next

July 19, 2011           

A note on probability and confirmation


And now for something completely different, namely probability and confirmation, and learning from experience, and more specifically a solution of what may to some seem a small or at least neglected problem of probabilistic confirmation.


0. Introduction
1. Basic probabilistic formulas, tables and schemes
2. The problem of probabilistic confirmation
3. Dissolving the problem of probabilistic confirmation
4. Axiomatic approach for probabilities in time
5. Schematic tables for probabilities in time


A note on probability and confirmation

0. Introduction

In elementary probability theory, and indeed also in not so elementary probability theory, probabilities have a lot in common with proportions and tend themselves to be left rather unclarified.

In this note, I will leave the question what probabilities are also unclarified, except by noting that (1) treating them as if they are proportions gets one quite far and accords with such intuitions as most people have about probabilities, and (2) this is the case for any plausible interpretation of probability, varying from personal degree of belief in a statement to a frequency in a population of a suitable kind.

I will state a basic problem with probabilistic confirmation in any theory of probability that I have seen and then propose a solution. To clarify what I mean by probability, here is a reference:

1. Basic probabilistic formulas, tables and schemes

Probabilistic confirmation normally is presented in terms of conditional probabilities and schemes like the following.

First, the standard definition or axiom for conditional probability is assumed, that relates conditional probability to conjunction, namely along the following lines:

(1) p(P&T)=p(P|T).p(T)

Here "p(X)" abbreviates "the probability of X", where "probability" is supposed understood and taken in a proportional sense and "X" is a statement; "p(Y|X)" abbreviates "the probability of Y given that X is true", or some similar statement; and "p(X&Y)" abbreviates "the probability that X and Y are both true".

There are different notations, and the explanations and motivations that are given vary, but an equation like (1) results in all standard treatments of probability, understood in whatever sense, varying from frequencies to degrees of belief.

Second, it is generally noted that tabular schemes like the following are enabled by (1), that outline the various probabilities and their relations:

      T    ~T                 T          ~T  
  P     a     c a+c   P   p(P|T).p(T)   p(P|~T).p(~T) p(P)
~P     b     d b+d ~P   p(~P|T).p(T)   p(~P|~T).p(~T) p(~P)
    a+b   c+d 1            p(T)        p(~T)  

Here the two tabular schemes are different statements of the same presupposed equations that depend on (1), which is to say

(2) a = p(P&T)    =  p(P|T).p(T)
     b = p(~P&T)  =  p(~P|T).p(T)
     c = p(P&~T)  =  p(P|~T).p(~T)
     d = p(~P&~T)=  p(~P|~T).p(~T)

The sums work out as in probability theory and suggested by the schema, in which

(3) a + b =  p(P&T)+p(~P&T) =  p(P|T).p(T)+p(~P|T).p(T)   = p(T)
     a + c =  p(P&T)+p(P&~T) =  p(P|T).p(T)+p(P|~T).p(~T) = p(P)

and so on, as the tabular schemes also state.

2. The problem of probabilistic confirmation

Given the above, the problem of probabilistic confirmation arises as follows - and though the above is standard in all treatments, the problem is generally neither noticed nor treated.

The notion of conditional probability as rendered in formula (1) is taken as allowing probabilistic confirmation along the following lines:

From (1) and the mathematics of probability theory, that is the same or equivalent for the various interpretations, one obtains the converse of (1)

(4) p(T|P) = (P|T).p(T) : p(P)

on the grounds that p(T&P)=p(P&T) and p(P) = p(P|T).p(T)+p(P|~T).p(~T) = p(P&T)+p(P&~T) and then reasons thus, generally not formally but in natural language

(5) "Hence if P is true, the probability of T gets to be the probability of T given P"

where "the probability of T given P" is rendered as p(T|P) in (4).

The problem is that this is not cogent:

Suppose that either |-P is or simply P is true. Now if either implies pP=1, which is what either does on standard interpretations of probability, then

(6) p(T|P) = p(P|T).p(T) : p(P)    by standard Probability Theory (PrTh)
               = p(P&T).p(T) : p(P)   by (1)
               = p(T)
.p(T) : p(P)       since by PrTh p(P&T)=p(T) if p(P)=1
               = p(T).p(T)                if p(P)=1

This is what the mathematics of standard Probability Theory logically implies - which is not how it is normally used:

In fact normally p(T|P) is taken as p(T|P)=p(P|T).p(T) : p(P) = p(P&T).p(T) : p(P) with all probabilities in the equation including p(P) as < 1.

And this is the problem. It can be restated by reference to time: If |-P is found to be true at some time, then p(T) is revalued as p(T|P)=p(P|T).p(T) : p(P) = p(P&T).p(T) : p(P) with probabilistic values as were before |-P.

3. Dissolving the problem of probabilistic confirmation

More briefly, we can introduce a reference to times t and t+1 and write

(7) |-Pt+1 --> p(T|P)t = p(P|T)
t.p(T)t : p(P)t
                               = p(P&T)
t.p(T)t : p(P)t

This is all as in standard Probability Theory, except for the references to time, with t the last time, and t+1 the next time, say, and with the temporalized form of (1) used, with p(P)
t < 1, to avoid the problem stated in the last section.

This fully accords with the standard working of confirmation in standard probability theory, except that a formula like (7) is rarely stated explicitly, though something like it does get used.

In fact, since the central points of probabilistic confirmation are that (i) there is a new probability of T upon learning (at time t+1) that P is true and (ii) this new probability for T upon learning that P is true at t+1 does satisfy the old  conditional probability p(T|P) at t, the following is what's involved:

(8) |-Pt+1 --> p(T)t+1 = p(T|P)t
                                = p(P|T)
t.p(T)t : p(P)t
                                = p(P&T)
t.p(T)t : p(P)t

In the usual treatments of any standard theory of probability theory this is what gets used in fact, though it is rarely or never as explicitly stated, that is, with references to time and the notes that, if P is true, the new probability for T (at t+1) is the old conditional probability of T|P (at t), in which the old probability for P (at t), which is not equal to 1, gets used.

At this point it might be supposed that (8) might be taken as the axiom involved in Bayesian Confirmation, for (8) shows how Bayesian Confirmation is generally worked with, and it must be an axiom (or be derived from some axiom), since it is about how new information alters old probabilities, that is not explicit in the standard axioms of standard probability theory (which in fact do not refer to times at all).

But actually more is required - for the above above may in fact result in T1t+1 + ... + Tnt+1 unequal to 1, supposing that T1 + ... + Tn are n distinct theories to account for a prediction P, also if we suppose that before learning that P is true, T1t+1 + ... + Tnt+1 = 1

This may be illustrated by considering the simple case of just T1 + T2 = 1 initially, with T1|P > T1 and T2|P = T2, that is, when T1 is relevant to P and T2 is not. Clearly, T1|P + T2|P > 1 if T1 + T2 = 1.

It is not difficult to see how this may be settled mathematically in general, namely by the following formula, that normalizes it all proportionally:

(9) |-Pt+1 --> pTit+1 =  (pTi|P)t : (∑j (pTj|P)t)
                             =  (pTi|P)
t : (p(T1|P)t + ... + p(Tn|P)t))

If |-P at t+1 then Ti at t+1 is the conditional probability of (Ti if P) divided by the sum of all conditional probabilities for T1 .. Tn if P, or more briefly: If |-Pt+1, then p(Ti) at t+1 is the normalized  probability of (Ti if P) at t.

This makes intuitive sense and can be supplemented by the following assumptions, that again, like (9), are in accordance with standard probability theory but are rarely or never explicitly stated while standardly used:

(10) p(P|Ti)t and p(P|~Ti)t are given for all t and are the same at all t

The intuitive reason for (10) is that these two conditional probabilities are in fact the hypotheses for P given T and given ~T, and are what is to be tested experimentally, and namely by finding whether at some time P or ~P is true, and then applying (9).

(11) p(Ti)t+1 is calculated by (9), that applies to either of |-P, |-~P
      and |-PV~P, depending  on what is given at t+1 (*)

and on the understanding that if one of the former two is given it is used in preference to the third for the new pr(Ti)t.

(12) |-Pt+1 does not imply p(P)t+1=1,  which comes instead from
t and p(P|~Ti)t and p(Ti)t+1.

Formally, that is

(13) p(P)t+1 = p(P|Ti)t+1.p(Ti)t+1 + p(P|~Ti)t+1 .p(~Ti)t+1

which accords again with standard Probability Theory, and involves (10) and (11), that also allows equating p(P|Ti)t+1 and p(P|Ti)t.

4. Axiomatic approach for probabilities in time

Having come this far, we may as well propose a set of axioms, so as to make clear what is in fact assumed:

A1  0 <= p(Ai)t <= 1
A2  p(~P|Ai)
A3  |-(Ai-->P)
t --> p(Ai)t<=p(P)t

In fact, all of the above accords with standard probability theory: A1 has it that probabilities are between 0 and 1 inclusive; A2 says a conditional probability and its denial on the same assumption sum to 1; and A3 says that if it is a thesis that Ai implies P then the probability of Ai is not higher than the probability of P.

All of these are either theorems or axioms in any formulation of any standard probability theory, as is the following definition of probabilistic conjunction:

D1  p(P&Ai)

Also, all of this with reference to times t is supposed to hold for each and any time t, provided it is the same for all statements in the formulas.

The above is sufficient for the usual Kolmogorov axioms for probability theory, since we have from A2 and D1 the theorem

T1 p(Ai)t = p(P&Ai)t + p(~P&Ai)t

again for each and any time t. We also have from A3 for each and any time t that

T2 |- (X IFF Y)t --> p(X)t=p(Y)t

If X and Y are logically equivalent, they have the same probability, at the same time (and indeed at any time, for logical equivalents hold at any time).

This allows the resurrection of all of standard elementary probability theory, but we have seen that more is needed for probabilistic confirmation if we want to avoid the above stated problem:

A4  |-(P)t+1 --> p(Ti)
t+1 = ( p(Ti|P)t : (∑j (pTj|P)t) )
                                  = ( p(Ti|P)t : ( p(T1|P)t + ... + p(Tn|P)t ) )

This is (9) stated above:
If |-(P)t+1, then p(Ti) at t+1 is the normalized  probability of (Ti if P) at t. This resulting normalized probability not only sums to 1 as it should, but also changes all old probabilities so as to cohere with the changes in some of them, upon learning that |-(P)t+1. (**)

A5  p(P|T)t+1=p(P|T)

And this is (10) stated above: T
he hypotheses for P given T and given ~T are what is to be tested experimentally, namely by finding whether at some time P or ~P is true, and this requires that these are the same for each and any time t.

5. Schematic tables for probabilities in time

Normally, one has several hypotheses to account for a class of facts, and one tests these hypotheses by finding out whether their predictions hold.

A very convenient tabular schema, that conforms to the above schema for one T is also involved. Basically, it looks like this, with details left out:

   T0   T1           ..          Ti                 ..   Tn


 p(T0)  p(T1)           ..         p(Ti)                 ..  p(Tn)

Here P is some prediction, T0 is the background knowledge, and T1 .. Tn are  alternative explanations for P, that may or may not be relevant to P, and if relevant may or may not be relevant to the same extent as the others. (***)

Note this leaves out the denials of the Ti's, though these can all be recovered from the following:

A6     For 1 .. i .. n: p(Ti)
t, p(P|Ti)t, p(P|~Ti)t

That is, for all of the alternative theories we are given somehow (or dream up, to begin with) the probability of the theory Ti and of the probability of P if Ti and also of the probability of P if ~Ti. Only the first may vary with time.

As the earlier schema shows - see (2) - these three probabilities are sufficient to find all of a, b, c and d by standard probability theory.

Next, we make an additional assumption that simplifies things rather a lot

A7     For 1 .. i .. n: p(P|~Ti)t = p(P|T0)t

That is, rather than breaking one's head over the issue what p(P|~Ti) might be for any of T0, T1 .. Tn, we assume that it equals P on background knowledge T0 if ~Ti. (****)

This simplifies things considerably, not least because now for n alternative theories one needs but 2n probabilities, rather than 3n.

Note that we have by A7 for background knowledge T0 the following:

                T0                   ~T0
  P    p(P|T0)
t.p(T0)t     p(P|T0)t.p(~T0)t
~P    p(~P|T0)
t.p(T0)t  p(~P|T0)t.p(~T0)t

So p(T0|P)
t = p(P|T0)t.p(T0)t : p(P|T0)t.p(T0)t+p(P|T0)t.p(~T0)t = p(T0)t: The probability of the background knowledge does not alter if |-Pt. Note this does not mean it also remains the same on normalization: If alternatives get higher, it gets lower, and the other way around, when probabilities of theories get normalized.

Now we can write the table with a little more detail and bring out what happens at t and at t+1 if P. To start with the former, and leaving unstated what should be clear:

t    T'0   T'1     ..        T'i    ..    T'n




 p(T0)  p(T1)   ..         p(Ti)    ..   p(Tn) probabilities at t of T0 .. Tn

t+1  p(T0|P) p(T1|P)    ..       p(Ti|P)    ..   p(Tn|P) conditional probabilities at t+1 of T0 .. Tn: p(Ti|P)t -
may not sum to 1:
t+1  p(T'0)  p(T'1)    ..        p(T'i)    ..   p(T'n) normalized conditional probabilities at t+1 of T0 .. Tn:  p(Ti)t+1 = p(Ti|P)t : ∑j(pTj|P)t. Do sum to 1.




 p(T'0)  p(T'1)    ..        p(T'i)    ..   p(T'n)

So this is how we learn from experience, using background knowledge T0 and T1 .. Tn as a class of alternative explanations for a kind of facts, and needing no more than 2.n assumptions of probabilities, given A1 .. A7 stated above.

Note also that while A7 allows simplifications and the schematic tables do not list the p(P&~Ti)t = p(P|~Ti)t.p(~Ti)t components, and indeed also do not list the negations of the T0, T1 .. Tn, these in fact do enter the conditional probabilities through probability theory on the stated axioms:

(14) p(Ti|P)t = p(P|Ti)t.p(Ti)t : (p(P|Ti)t.p(Ti)t + p(P|~Ti)t.p(~Ti)t)

Finally, while this solves the problem of probabilistic confirmation that I noted, it does so by restating probability theory, and in a way that seems compatible with personal and subjective interpretations of probability, and with probability as degree of belief, but perhaps not with other interpretations.

For this reason, the probabilities spoken of in this essay are best taken as degrees of belief, that indeed may in some cases be derived from or be close to frequentist or empirical probabilities, but that do not coincide with them.

Endremarks: As I remarked before, I have some relief with large doses B12, and when I feel better I tend to think about the things that interest me but that I often disregard because I am too miserable to do anything or much that is useful for it.

This note treats a problem I have been aware of for a long time, and few if any saw before. Now I may be mistaken about my solution, but whether or not I am: The above text is a factor 100 or 1000 as sophisticated about what theories are than the whole APA seems to be capable of, which is a great moral, scientific, medical and legal shame, and besides is very dangerous for very many.


(*)  Actually p(Ti)t+1 if ~|-P and ~|-~P i.e. if just
|-PV~P will work out by later assumptions as equal to p(Ti)t.

(**) Here is an example: If Lord Mors got killed, and either the butler, the cook, his mistress, or his son did it, as respectively T1 .. T4, with T0 the background knowledge, and it turns out he was killed with the kitchen-knife, that only the butler and the cook have a higher probability of using, then this finding should increase T1 and T2, while decreasing T3 .. T4. This all gets done by the normalization in A4, supposing p(K|T1) > p(K) and p(K|T2) > p(K), and supposing p(K|T3) = p(K) and p(K|T4) = p(K), with K = the killer used the kitchen knife.

(***) It may be that P is only relevant to one or a few of T0, T1 ... Tn, and in general a set of theories will be selected because, given background knowledge T0, the selected set explains a set of facts, to some extent. All that matters at present is that T0 + T1 + ... + Tn must sum to 1, if necessary by normalization.

(****) In standard probability theory p(P|~Ti) = p(P|T0 V T1 V ... V Tn) with T0 V T1 V ... V Tn the disjunction of all of T0, T1 etc. except Ti. A7 makes intuitive sense, in that it simply says that p(P|Ti) depends on Ti while p(P|~Ti) is just the probability of P on background knowledge.

P.S. Corrections, if any are necessary, have to be made later.
-- Jul 20, 2011: Corrected a few small typos and added the B12 link
(which is what the reader owes the above outburst to).
-- Aug 18, 2011: Clarified the last table some by adding a row.

As to ME/CFS (that I prefer to call ME):

1.  Anthony Komaroff Ten discoveries about the biology of CFS (pdf)
3.  Hillary Johnson The Why
4.  Consensus of M.D.s Canadian Consensus Government Report on ME (pdf)
5.   Eleanor Stein Clinical Guidelines for Psychiatrists (pdf)
6.  William Clifford The Ethics of Belief
7.  Paul Lutus

Is Psychology a Science?

8.  Malcolm Hooper Magical Medicine (pdf)
 Maarten Maartensz
ME in Amsterdam - surviving in Amsterdam with ME (Dutch)
 Maarten Maartensz Myalgic Encephalomyelitis

Short descriptions of the above:                

1. Ten reasons why ME/CFS is a real disease by a professor of medicine of Harvard.
2. Long essay by a professor emeritus of medical chemistry about maltreatment of ME.
3. Explanation of what's happening around ME by an investigative journalist.
4. Report to Canadian Government on ME, by many medical experts.
5. Advice to psychiatrist by a psychiatrist who understands ME is an organic disease
6. English mathematical genius on one's responsibilities in the matter of one's beliefs:

7. A space- and computer-scientist takes a look at psychology.
8. Malcolm Hooper puts things together status 2010.
9. I tell my story of surviving (so far) in Amsterdam with ME.
10. The directory on my site about ME.

See also: ME -Documentation and ME - Resources
The last has many files, all on my site to keep them accessible.

        home - index - top - mail