Koehler: Conditional Probability

From: HARNAD Stevan (harnad@coglit.soton.ac.uk)
Date: Sun May 31 1998 - 20:38:18 BST

On Sun, 31 May 1998 Eal195@aol.com wrote:

> Dear Stevan,
> I'm tying myself in knots over 'inverse fallacy', any chance of a kid-sib
> explanation? I can't seem to grasp the difference between (to use Koeler's
> example) an innocent person (not source) being a "match" for a DNA sample and
> its posterior probability/inverse. Isn't the probability of the sample
> matching an innocent person the same? Or does it mean matching a particular
> innocent person? Posterior probability is the probability of an event
> occuring given another event isn't it?
> Thanks
> Liz

First let me review probability, conditional probability, prior
probability, posterior probability and Bayes' Rule. I'll use a slight
variant of the examples I used in class:

The probability of something happening can be interpreted as a kind of
percentage. If the probability that someone will be Tall (where "Tall"
means 5 feet 9 inches or taller) is 50%, that means out of every 10
people, 5, on average, should be Tall.

A conditional probability is a probability of something happening GIVEN
that something else happens. The probability of someone being Tall may
be 50% when they are chosen at random, but the probability of a Male
being Tall (the probability of being Tall GIVEN that someone is a Male)
may be, let's say, 80%.

Now this conditional probability that someone is Tall given that they
are a Male, P(T|M), should not be confused with the conditional
probability I'll show you how to calculate P(M|T) in a moment. For that
you'll need Bayes' Rule.

First we need the probability of being a Male. Let's say that's 40%
(amplifying the fact that there are slightly morfewer e males than females).

So we have the probability of being Tall:

Then we have the probability of being tall GIVEN that someone is a Male:

Then we have the probability of being Male:

Now we need Bayes' Rule. Bayes' Rule is based on the JOINT probability of
two events, that is, the probability that they will BOTH happen. The
probability that two events will happen together depends on whether or
not they are INDEPENDENT. If they are independent, the answer is easy,
it's just the probability of the one times the probability of the other.

As an example of independent events, there's no reason to believe that
the probability being Dark differs for males and females: let's say
it's 50% in both cases. Then the probability that someone is Male is:

The probability of being Dark is:

Moreover, the CONDITIONAL probability of being Dark GIVEN you are Male
will be exactly the same as the probability of just being Dark, namely,
50%, because the two are independent; that's what it means to be

So the JOINT probability of being Male and Dark is:
P(M AND D) =
P(M) x P(D) =
40% of 50% = 20%.

But with Tall and Male this does not work: They are not independent.
The joint probability is NOT the product of P(M) and P(T), even though
they are again 40% and 50%. This is because the CONDITIONAL probability
of being Tall GIVEN that you are a male is not the same as the
probability of just being Tall, it's much higher:

And THAT's the probability you need for figuring out the JOINT probability
of being both Tall and Male.

The joint probability can be calculated either of two ways, because
of a certain symmetry you will see in a moment:

It is always the probability of the first times event the CONDITIONAL
probability of the second event, GIVEN the first. So with Tall and
Male, it's:
P(M AND T) =
P(M) x P(T|M) =
40% x 80% = 32%

This means that Tall Males are more common than Dark Males. Exactly 20%
(less than 1/4) of the people you stop randomly on the street will be
both Dark and Male (the rest will be Short or Female, or both), but 32%
(more than 1/4) of the people you stop on the street will be both Tall
and Male (because there are fewer Tall Females).

The other way you can calculate the joint probability of being Male and
Tall is the following. It is identical to the previous formula, just
swapping the Ms and Ts:

P(M AND T) =
P(T) x P(M|T) =
50% x ? = 32%

We know the answer has to be the same, 32% (because P(M AND T) equals
itself, and we know that P(T) is 50%; but earlier I said that we don't
know what P(M|T) is: we don't know the CONDITIONAL probability that
someone is Male GIVEN that he is Tall, P(M|T); so far we only know the
conditional probability that someone is Tall given that he is Male,

So Bayes' rule simply uses this symmetry, that

P(M AND T) = P(M) x P(T|M) = P(T) x (PM|T)
and just does a little algebra to rewrite it as:

P(M|T) = [P(M) x P(T|M)] / P(T)

Using that we can calculate P(M|T) = (40% x 80%) / 50% =

(because this is easier to calculate with fractions than with

(4/10 x 8/10) / 5/10 =
32/10 / 5/10 = 32/10 x 10/5 = 320/50 = 64%

Now I can tell you exactly what the "inverse fallacy" is:

It is to confuse P(T|M) and P(M|T). In this case, it would be to
confuse 80% with 64%. As you can see, if you are told that the
probability that someone is Tall given that he is Male is 80%, that
does not yet tell you what the probability is that someone is Male
given that they are Tall! To know that, you first need to know the
probability of being Male, P(M), 40%, and the probability of being
Tall, P(T), 50%, the base rates. And then you have to apply Bayes' Rule!

To translate this into the question of DNA evidence, the probability
that, GIVEN that someone is (in reality) innocent, the DNA finds them
guilty, P(G|I), is sometimes misdescribed to the jury, and
misunderstood by them, as the probability that, GIVEN that the DNA
implies they are Guilty, someone is in reality Innocent, P(I|G). These
are clearly very different, and one cannot be derived from the other
without knowing the base rates P(G) and P(I).

Suppose the probability that the DNA will call someone guilty when they
are really innocent is very low, and the jury misinterprets this as
meaning that the probability that they are really innocent when the DNA
calls them guilty is very low; so they vote guilty. But maybe there are
a lot of falsely accused people and a lot of DNA tests, so the probability
of being one of the (rare) false calls of the test is not so low!

P.S. Bayes' Rule is often used in statistics to UPDATE the probability of
hypotheses in the face of new evidence (data). So it is often written
in the form:

P(H|E) = P(E|H)P(H) / P(E)

This means the probability of the Hypothesis GIVEN the new Evidence
(otherwise knows as the "posterior probability" or the new, recalculated
probability, once you've done the calculation) is:

     the probability of the Evidence GIVEN the Hypothesis
     times the probability of the Hypothesis,
     all divided by the probability of the evidence

The best way to get such date is by actually counting frequencies:
How often does E happen? How often does E happen given that H is true?
And how true did we think H was before? P(H) is called the "prior"
probability, and this is what is being updated or adjusted on the basis
of the new conditional probability, P(H|E), the "posterior" probability.
The prior probability is also the base rate.

Cheers, Stevan

This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:22 GMT