Tuesday, April 14, 2009

A parable on induction: The Village Census

Bertrand Russell's Census Officer

A reader mentioned a parable by Bertrand Russell about "induction by simple enumeration". Through my infinite resources, it just so happens that I have access to the exact quote:
Induction by simple enumeration may be illustrated by a parable. There was once upon a time a census officer who had to record the names of all householders in a certain Welsh village. The first that he questioned was called William Williams; so were the second, third, fourth, . . . At last he said to himself: "This is tedious; evidently they are all called William Williams. I shall put them down so and take a holiday." But he was wrong; there was just one whose name was John Jones. This shows that we may go astray if we trust too implicitly in induction by simple enumeration.

- Bertrand Russell, The History of Western Philosophy, page 543
In my unrelenting quest to make everything more tediously mathematical, I wish to analyze this parable using Bayes' Theorem. But first, let us simplify it just a bit. Let us assume that there are only five villagers (all men), and the census officer has questioned four of them.

An aside on notation

I was originally planning to explain my point in plain English, but I found it highly cumbersome. Therefore, I will be using a little notation from probability theory. Firstly, it is useful to denote several claims using letters:
  • W: all five villagers are named William Williams.
  • J: only four of the villagers are named William Williams, and the last is not.
  • E: The four villagers that the census officer questioned are named William Williams.
We will use the letter P to denote the probability of a claim. For instance, P(W) is the probability that all five villagers are named William Williams. P(W) is just a number between 0 and 1 (where 1 means that W absolutely must be true). P(W) is called the "prior" probability of W. It's called the "prior" probability because it is prior to any evidence. P(W) is the probability that W is true before we learn about the census officer's new observations.

When the census officer questions the villagers, we find that all four people he questioned were named William Williams. In other words, we find that claim E is true. If we want to know the probability of W before the census officer questions the villagers, then we need P(W). If we want to know the probability of W after the census officer questions the villagers, then we would denote this probability as P(W|E). P(W|E) is called "the probability of W given E".

Back to the village

What we see in this parable is that a census officer has collected a piece of evidence: claim E is true. What should he conclude from this piece of evidence? In the parable, he concludes that all villagers are named William Williams, but is this conclusion reasonable? In other words, is P(W|E) high? We might compare the officer's conclusion with other conclusions he might have had. For instance, would it have been better to conclude that only four of the villagers were named William Williams? In other words, is P(J|E) high? Which is higher, P(W|E) or P(J|E)?

To compare these two probabilities, it might be best to take the ratio between the two. Let's find P(W|E)/P(J|E). If the value of P(W|E)/P(J|E) is greater than 1, that means that, given the census officer's observations, he was justified in concluding that all villagers are named William Williams. If the value is less than 1, then he was not justified in his conclusion. According to Bayes' Theorem, P(W|E)/P(J|E) = P(W)/P(J) * P(E|W)/P(E|J).

Basically, this means that the officer's conclusion depends on two factors. The first factor, P(W)/P(J), is the ratio of prior probabilities of W and J. This means that if we know that W is much more likely than J before learning about the results of the census, then will still be much more likely after learning about the census. After all, the census is just one piece of evidence, which may not be enough to overcome our previous prejudices. What could be the source of these prejudices? Perhaps we know from the last census that J was true last year. Or perhaps they all adhere to some religion that requires them to all share a surname. But in absence of anything like that, we may assume that P(W)/P(J) is not too big or too small. It's about equal to 1.

The second factor, P(E|W)/P(E|J), is easy to calculate. P(E|W) is nothing other than the probability of E given W. If W is true, then E is necessarily true, so P(E|W) is simply 1. P(E|J) is the probability of E given J. If J is true, then it is only by lucky coincidence that the census officer only found villagers named William Williams. Assuming that the census officer questioned four random villagers, then P(E|J) is equal to 1/5. Therefore, the second factor, P(E|W)/P(E|J), is equal to 5.

Going back to Bayes' theorem, we know that P(W|E)/P(J|E) = P(W)/P(J) * P(E|W)/P(E|J). Based on my analysis, this equal to 5. Since it's greater than 1, that means that the officer is justified in concluding that all the villagers are named William Williams.

But hold on! Recall that I said that P(W)/P(J) is "about" 1. I'm a physicist, so when I say something is "about" 1, I really mean that it could be anywhere between 0.01 and 100, maybe even more or less. There is a lot of uncertainty which I completely ignored. With this in mind, the value of P(W|E)/P(J|E) can be anywhere between 0.05 and 500. The census officer's conclusion doesn't look so solid now!

Another comparison

Previously, we compared the two claims W and J. But what happens if we introduce a third claim C?
  • C: The four villagers which the census officer questioned are named William Williams. The last one has a different name.
Claim C is distinct from J because it's more specific. Claim J only states that one of the villagers is not named William Williams. Claim C states that a specific villager, the one which the officer skipped, is not named William Williams.

Let us compare claim C to claim W. As before, we will calculate P(W|E)/P(C|E),* which is equal to P(W)/P(C) * P(E|W)/P(E|C).

*The advanced reader would note that this is also exactly equal to P(W|E)/P(J|E), our previous result.

Our first factor in the equation is P(W)/P(C). We're comparing the prior probabilities of two claims. W claims that the last villager is named William Williams. C claims that the last villager is not named William Williams. All things considered, C seems more likely than W, since there are a lot of names out there, and only one of them fits W. Nevertheless, in absence of anything which would cause major prejudice, I will say that P(W)/P(C) is "about" 1, same as I did before.

The second factor, P(E|W)/P(E|C), is easy to calculate. Both W and C necessarily imply E, therefore P(E|W) = P(E|C) = 1. The ratio is just 1.

Therefore, when we compare the claims C and W, we find that P(W|E)/P(C|E) is "about" 1, same as P(W)/P(C). The ratio of probabilities is the same before and after the results of the census. The evidence tells us nothing about the comparison of claim C to W. The census officer's conclusion is looking even weaker than before!

Of course, the officer's argument was weak from the start. But before, the weakness was hidden in the word "about".

There moral of the story is that you have to be very careful with Bayesian analysis. Sometimes, the conclusion seems to depend on what angle you look at it. There is always a degree of uncertainty in the prior probabilities that you can never eliminate. The best way to do it is with more evidence, evidence, evidence. For instance, if the officer had simply questioned the last villager, then his evidence would likely overwhelm any previous uncertainty.

3 comments:

Larry Hamelin said...

In the parable, he concludes that all villagers are named William Williams, but is this conclusion reasonable? In other words, is P(W|E) high?Careful. The two underlying assertions are different:

1) All villagers are named William Williams

-- differs from --

2) The probability that all villagers are named William Williams is high

Consider a lottery with 1,000,000 tickets; exactly one ticket will win. The probability that ticket X will lose is very high; however we cannot conclude that ticket X will in fact lose. We would then conclude that because all tickets will probably lose then all tickets will in fact lose; but one ticket will in fact win.

Larry Hamelin said...

The setting of a priori probabilities in Bayes' theorem is well known. Bayes' theorem formally establishes the relationship between the prior and posterior probabilities; the outcome is only as good as the parameters. Proponents of the Fine Tuning argument make this error consistently: they typically assume the prior probability of God's existence is higher than the prior probability that the universe would have its parameters by chance. If the prior probability of the hypothesis is sufficiently larger than the probability of the evidence by chance, it is no surprise you'll get a very high posterior probability.

In short, as you note, using Bayes theorem without an adequately justified prior probability is a pointless exercise in arithmetic.

Without an adequately justified prior probability, the only way to use Bayes' theorem is to say, "The evidence is sufficient to overcome an initial skepticism of X% to justify confidence of Y% in the hypothesis."

Larry Hamelin said...

(Note: Your proviso "is the conclusion reasonable" is accurate. I'm just adding more detail: that the probability of W is high means that the probability is high, it doesn't mean it's certain.)