Help
Index

 Maarten Maartensz:    Philosophical Dictionary | Filosofisch Woordenboek                      

 R - Reasoning Probabilistic: Rules of

 

Rules of Probabilistic Reasoning: Rules of Reasoning with probabilities.

This lemma gives effectively a new theory of and a new approach to probability. Fortunately for those who are familiar with probability theory, the standard theorems are preserved.

1. Introduction: One family of theories concerning theoretical probabilities in the given sense is Bayesianism, that is based on Bayes' Theorem or, equivalently, on the observation that conditional probability in principle may explain how one can learn from experience.

In the following discussion some knowledge about probability theory, logic and Bayesianism is presupposed. A useful and readable introduction to the latter is Howson & Urbach: 'Scientific Reasoning - The Bayesian Approach'.

Bayesianism is attractive but comes with several problems, such as the assumption of omniscience (it seems as if one should know all consequences of one's theories, and all probabilities of all possible consequences); the problem of priors (how does one arrive at the probability of theories); the problem of predictions (how does one arrive at the probability of the predictions of theories); the problem of old evidence (what if one finds that a theory entails something one knew already, but wasn't aware that the theory entailed); and problems with the subjective interpretation of probability (that many Bayesians presuppose).

2. Preliminaries: There are various answers to these problems by Bayesians, but the present lemma sketches another approach, that is based on taking time serious, including the fact that actual theories are developed in time, by actual people, and do not arrive ready made in scientific journals with all their logical consequences, and that seeks to answer the above problems by some postulates that serve as basis for Rules of Reasoning with probabilities, and the statement of which requires a few preliminary assumptions about time, propositions and notation.

First, it is assumed that all propositions to be considered have a temporal index, which satisfies assumptions for a temporal logic.

In the present only the skeleton is used, namely the temporal index, for which a suffix is used, that is supposed to range over moments or stretches of time, with (P).t+x later than (P).t if 'x' is positive and '(P)' a proposition, and (P).t-x  earlier than (P).t . Here '(P).t' is read as 'P at t', and it is also tacitly assumed that the temporal adfix distributes properly over logical connectives, as e.g. in (P&Q).t  IFF ((P).t&(Q).t), for this should be implied by the presupposed temporal logic.

Second, it is assumed propositions come in two kinds: Theoretical propositions, indicated by the predicate 'THE', that may contain all manner of generalizations, abstract entities, hypothetical entites etc., and empirical propositions, indicated by the predicate 'EMP', that must state a claim that is decidable by some experience, and thus cannot be equivalent to a generalization that goes beyond experience etc.

The predictions of an empirical theory are supposed to include empirical propositions, and though much may be said here, e.g. of a methodological nature, about how to ascertain that something is an empirical proposition the problems here also will not be further discussed, as not very relevant to what this lemma is about.

Third, we assume standard predicate logic and standard probability theory, and some basic knowledge about intervals and functions.

Intervals will be rendered as '[X,Y]' with 'X' for the lower boundary and 'Y' for the upper boundary, while 'f(Z)' is used for the phrase 'is a function of Z' i.e. depends on Z. Thus 'x e [10-1/2f(10), 10+1/2f(10)]' says that x belongs to the interval that depends on f(10) and lies symmetrically around 10, and if f(10)=10 this means x is between 5 and 15 inclusive. Both intervals and functions could be left out, but are included for resp. clarity and generality.

Fourth, it is both convenient and adequate to explicitly include reference to the background knowledge that is assumed. This is written as 'K', and it is assumed that p(K).t = 1 if t=now: What one counts presently as background knowledge is presently certain (but may be found false tomorrow in some respect). Note this implies for any Q at t=now that p(Q|K).t = p(Q).t .

3. Postulates for real probability: Now the postulates are as follows, and will be briefly explained and discussed after their statement:

A. Real probabilities are proportions of cardinal numbers of sets in some
    domain D.  
    (Di)(D)( DiaD -->  
    p(Di)         = #(Di) : #(D)
    p(Di|Dj)     = #(Di
ODj) : #(Dj))

This is a new interpretation of what probabilities are: proportions of cardinal numbers. This has the great advantage that, thus defined, probabilities exist objectively if the sets they are derived from exist objectively, since such sets have cardinal numbers. This is also why they are called real probabilities.

The justification is that proportions as defined go a long way towards the standard axioms for probability theory but need some supplementation to reason with probabilities. (See: Measurement of reality by truth and probability)

Also the definition has the disadvantage that often the cardinalities of sets are not known or only imperfectly known, and anyway different people may have different ideas about them.

A convenient concept that can be defined here is that of a random set:

B. A set D is a random set iff for every element of the set the
    probability that it belongs to any subset of the set equals the
    probability of the subset in the set:
    RandomSet(D) IFF (Di inc D)(deD)( p(deDi)=p(Di) )

Normally it is easy to make a random set D' for a given set D: One useful way to is to put each name of each of the elements in D on a slip of paper (of the same size and kind for all names); put these slips in a vase, bowl or urn; thoroughly shake it; and blindly select from it.

C. Probabilities are objective, but there are personal probabilities:
    What a person a believes an objective probability to be.
    aB(p(Di)=d) IFF p(a,Di)=d

Thus, a personal probability is no more or other than a personal belief about what a real probability is or might be. The personal probability exists, if it does, because a person as thought about what the real probability might be; the real probability exists because real sets have cardinal numbers.

Note that the real probability is neatly indicated and kept apart by the notations for personal probability of a which is p(a,Di)=d and the real probability p(Di)=x. This allows us to say that a believes truly that p(a,Di)=d iff p(Di)=d i.e. if his personal probability for Di is the same as the real one. And all that 'aB(p(Di)=d)' means is that 'a believes that the probability of Di equals d'.

The justification of the postulate is that there is evidently a need for it. This immediately introduces the temporal complication and relativization I spoke of above: Real probabilities may exist in time, but this they do in their own sense that needs not to be discussed here, but personal probabilities do exist in time and depend on one's evidence and knowledge.

4. Postulates for personal probability: How personal probabilities exist in time and depend on one's evidence and knowledge needs some assumptions to get straight, first for deductions:

D. Deductions are independent of time once made:
    (T)(P)(K)(t)(x>0) (K & T |= P).t |= (K & T |= P).t+x

That is: If P is a prediction that has been deduced from background knowledge K and theory T at t, then this remains so at any later time.

This seems obviously true in case if - as is assumed - predictions always are deductions from K&T. Or, in other words: If one insists that the relation between an explanation and what it explains is deductive.

Next, there is an assumption that explains how to arrive at conditional probabilities:

E. Probabilities of predictions P from theories T&K with background-
    knowledge K are deductions from T&K:
    (T)(P)(K)(q)(t) ( p(P | T & K).t = q IFF ( T & K |=  p(P)=q ).t )

That is: The statement of a conditional probability of P given T&K at any time amounts to a deduction of the probability of P from T&K. The fact that conditional probability is explained in terms of a deductive theory means that the probability of the prediction must depend on assumptions made in T or K.

The justification of this postulate is that it gives a neat and intuitive explanation for conditional probabilities of the stated kind, that also says whence come probabilities from empirical propositions: From assumptions in one's theories or background knowledge about the unconditional probabilities of events. And in the end - given the earlier assumptions above - these depend on what one knows or assumes about the cardinal numbers of the sets of things one theorizes about.

Also, the last two assumptions have the great benefit of explaining why Bayes' Theorem would work and how it is to be used in time and in general, which is not clear from standard probability theory. For we have the following theorem:

T1: p(P).t+x=1 --> p(T|K).t+x = p(P|T&K).t * p(T|K).t : p(P|K).t
Proof:
(1) Suppose p(P)
.t+x=1
(2) p(T|P&K)
.t+x = p(T|K).t+x                               by (1, PT)
(3) p(T|P&K)
.t     = p(T|K).t+x                              by (2, C, D)
(4) p(T|P&K)
.t     = p(P|T&K).t * p(T&K).t : p(P&K).t  by PT 
(5) p(T|K)
.t+x     = p(P|T&K).t * p(T|K).t : p(P|K).t   by (2-4)  

This explains the working of Bayes' Theorem in time: If at a later time we verify a prediction of a theory we can recalculate the probability of the theory using probabilities from an earlier time (presumably the last time for which we have the required evidence). Note that the new probability of the theory is the same as the probability of the theory was at the earlier time if and only if in fact the theory was irrelevant to the prediction at that earlier time.

F. Empirical probabilities depend on empirical samples.
    (P)(K)(t) (P e EMP).t |=
    p(P|K).t e [freq[P|K]-1/2.f(freq[P|K]), freq[P|K]+1/2.f(freq[P|K])].t

That is: The probability of an empirical proposition at t given background knowledge K falls in an interval that depends on the empirical frequency of P on K at t. It is here, with '1/2.f(freq[P|K]', that the interval notation and the functional notation mentioned above is used. The simplest case is that the dependency is identity, and then p(P|K) is supposed to be somewhere in a symmetrical interval around freq(P|K).

The functional relation that determines the size of the interval may depend on the size of the sample, for example. Also, it is useful to have an interval-estimate rather than a point-estimate, if only to account for uncertainties and for statistical estimates.

The justification here is that one needs, for empirical theories, real and intersubjectively valid empirical evidence, and the only good kind that one has here is such as is based on empirical samples (that have been collected in proper methodological ways).

Note also that frequencies are not probabilities: They are summaries of evidence for probabilities, that in the end are no more than a list of particular data of what has been found in experience or experiment.

G. Theoretical probabilities depend on their least probable proper  
    consequence.
    (T)(K)(t) (T e THE).t |=
    p(T|K).t e [Min(T|K)-1/2.f(Min(T|K)), Min (
T|K)+1/2.f(Min(T|K))].
t)

This postulate for theoretical propositions is similar to the former postulate for empirical propositions, in that it proposes an interval within which the probability falls that depends functionally on a quantity. But the quantity is not a frequency, for this one cannot have for theories. Instead, the quantity proposed, written as 'Min(T|K).t' is the least probable of the known proper consequences of T&K at t, where the proper consequences of T&K are the Q that are entailed by(T&K) at t that fail to be entailed by (~T&K) at t.

Formally, using '=d' for 'is by definition' and noting that p(K).t=1 which makes Min(T&K).t = Min(T|K).t we define:

pc(K&T).t  =d {Q: (K&T |= Q).t & ~((K&~T)|= Q).t}
Min(
T|K).t =d {q: p(Q).t=q & Qepc(K&T).t & (S)(Sepc(K&T).t --> p(Q).t<=p(S).t}

The justification here is that the least probable of the known proper consequences of T&K at t is something one can intersubjectively agree on at t: It depends on the known evidence and on what can be deduced from T&K. And it is consistent with probability theory in that it is the maximum T is capable of, given K, for a theory cannot be higher than its least probable proper consequence (since if T |= P then pr(T) <= pr(P)).

H. What is irrelevant to a theory and a prediction of it is also irrelevant
    if the theory and the prediction are true:
    (T)(P)(X)(t)( (T irr X).t & (P irr X).t |= (P irr X | T).t )

This postulate H enables one to test theories given predictions by enabling one to abstract from irrelevant circumstances, which always exist.

It involves a definition of irrelevance that extends the standard probabilistic definition of independence while relying on some of the above assumptions:

(A irr B).t =d (x)( (A |= p(B)=x).t IFF (~A |= p(B)=x).t ) &
                  (y)( (B |= p(A)=y).t IFF (~B |= p(A)=y).t )

This implies the standard properties of independence using postulates A and B above. Also used in the above postulate H is conditional irrelevance, that is defined similarly

(A irr B | C).t =d  (x)( (C&A |= p(B)=x).t IFF (C&~A |= p(B)=x).t ) &
                        (y)( (C&B |= p(A)=y).t IFF (C&~B |= p(A)=y).t )  

The justification of postulate H is that it enables induction, that is, in effect, that it enables the testing of theories by their predictions, which cannot be done properly if the result of an experiment may depend on any fact that also happens to be true, and involves no more than the assumption that a theory should also entail whatever is relevant to its predictions.

The reasoning that postulate H enables is this, given that a prediction P from a theory T gets verified in a context where there also happens some event X such that (T irr X).t and (P irr X).t:

p(T|P&X).t = p(P|T&X).t*p(T&X).t:p(P&X).t            by PT
              = p(P|T&X).t*p(T).t*p(X).t:p(P).t*p(X).t  by T irr X and P irr X
              = p(P|T&X).t*p(T).t:p(P).t                         by algebra
              = p(P|T).t*p(T).t:p(P).t                             by H
              = p(P&T).t:p(P).t                                by PT
              = p(T|P).t                                                   by PT 

And thus one can inductively confirm a theory and abstract from all manner of irrelevant circumstances. See: Problem of Induction.

 


See also: Abduction, Falliblism, Hypothesis, Induction, Problem of Induction, Invarance, Knowledge, Science, Scientific knowledge, Scientific Realism, Theory, Rules of Reasoning, Basic Logic - Semantics,


Literatuur:

Maartensz, Stegmüller
 

 Original: Mar 28, 2005                                                Last edited: 4 July 2013.   Top