A reader mentioned a parable by Bertrand Russell about "induction by simple enumeration". Through my infinite resources, it just so happens that I have access to the exact quote:
Induction by simple enumeration may be illustrated by a parable. There was once upon a time a census officer who had to record the names of all householders in a certain Welsh village. The first that he questioned was called William Williams; so were the second, third, fourth, . . . At last he said to himself: "This is tedious; evidently they are all called William Williams. I shall put them down so and take a holiday." But he was wrong; there was just one whose name was John Jones. This shows that we may go astray if we trust too implicitly in induction by simple enumeration.In my unrelenting quest to make everything more tediously mathematical, I wish to analyze this parable using Bayes' Theorem. But first, let us simplify it just a bit. Let us assume that there are only five villagers (all men), and the census officer has questioned four of them.
- Bertrand Russell, The History of Western Philosophy, page 543
An aside on notation
I was originally planning to explain my point in plain English, but I found it highly cumbersome. Therefore, I will be using a little notation from probability theory. Firstly, it is useful to denote several claims using letters:
- W: all five villagers are named William Williams.
- J: only four of the villagers are named William Williams, and the last is not.
- E: The four villagers that the census officer questioned are named William Williams.
When the census officer questions the villagers, we find that all four people he questioned were named William Williams. In other words, we find that claim E is true. If we want to know the probability of W before the census officer questions the villagers, then we need P(W). If we want to know the probability of W after the census officer questions the villagers, then we would denote this probability as P(W|E). P(W|E) is called "the probability of W given E".
Back to the village
What we see in this parable is that a census officer has collected a piece of evidence: claim E is true. What should he conclude from this piece of evidence? In the parable, he concludes that all villagers are named William Williams, but is this conclusion reasonable? In other words, is P(W|E) high? We might compare the officer's conclusion with other conclusions he might have had. For instance, would it have been better to conclude that only four of the villagers were named William Williams? In other words, is P(J|E) high? Which is higher, P(W|E) or P(J|E)?
To compare these two probabilities, it might be best to take the ratio between the two. Let's find P(W|E)/P(J|E). If the value of P(W|E)/P(J|E) is greater than 1, that means that, given the census officer's observations, he was justified in concluding that all villagers are named William Williams. If the value is less than 1, then he was not justified in his conclusion. According to Bayes' Theorem, P(W|E)/P(J|E) = P(W)/P(J) * P(E|W)/P(E|J).
Basically, this means that the officer's conclusion depends on two factors. The first factor, P(W)/P(J), is the ratio of prior probabilities of W and J. This means that if we know that W is much more likely than J before learning about the results of the census, then will still be much more likely after learning about the census. After all, the census is just one piece of evidence, which may not be enough to overcome our previous prejudices. What could be the source of these prejudices? Perhaps we know from the last census that J was true last year. Or perhaps they all adhere to some religion that requires them to all share a surname. But in absence of anything like that, we may assume that P(W)/P(J) is not too big or too small. It's about equal to 1.
The second factor, P(E|W)/P(E|J), is easy to calculate. P(E|W) is nothing other than the probability of E given W. If W is true, then E is necessarily true, so P(E|W) is simply 1. P(E|J) is the probability of E given J. If J is true, then it is only by lucky coincidence that the census officer only found villagers named William Williams. Assuming that the census officer questioned four random villagers, then P(E|J) is equal to 1/5. Therefore, the second factor, P(E|W)/P(E|J), is equal to 5.
Going back to Bayes' theorem, we know that P(W|E)/P(J|E) = P(W)/P(J) * P(E|W)/P(E|J). Based on my analysis, this equal to 5. Since it's greater than 1, that means that the officer is justified in concluding that all the villagers are named William Williams.
But hold on! Recall that I said that P(W)/P(J) is "about" 1. I'm a physicist, so when I say something is "about" 1, I really mean that it could be anywhere between 0.01 and 100, maybe even more or less. There is a lot of uncertainty which I completely ignored. With this in mind, the value of P(W|E)/P(J|E) can be anywhere between 0.05 and 500. The census officer's conclusion doesn't look so solid now!
Previously, we compared the two claims W and J. But what happens if we introduce a third claim C?
- C: The four villagers which the census officer questioned are named William Williams. The last one has a different name.
Let us compare claim C to claim W. As before, we will calculate P(W|E)/P(C|E),* which is equal to P(W)/P(C) * P(E|W)/P(E|C).
*The advanced reader would note that this is also exactly equal to P(W|E)/P(J|E), our previous result.
Our first factor in the equation is P(W)/P(C). We're comparing the prior probabilities of two claims. W claims that the last villager is named William Williams. C claims that the last villager is not named William Williams. All things considered, C seems more likely than W, since there are a lot of names out there, and only one of them fits W. Nevertheless, in absence of anything which would cause major prejudice, I will say that P(W)/P(C) is "about" 1, same as I did before.
The second factor, P(E|W)/P(E|C), is easy to calculate. Both W and C necessarily imply E, therefore P(E|W) = P(E|C) = 1. The ratio is just 1.
Therefore, when we compare the claims C and W, we find that P(W|E)/P(C|E) is "about" 1, same as P(W)/P(C). The ratio of probabilities is the same before and after the results of the census. The evidence tells us nothing about the comparison of claim C to W. The census officer's conclusion is looking even weaker than before!
Of course, the officer's argument was weak from the start. But before, the weakness was hidden in the word "about".
There moral of the story is that you have to be very careful with Bayesian analysis. Sometimes, the conclusion seems to depend on what angle you look at it. There is always a degree of uncertainty in the prior probabilities that you can never eliminate. The best way to do it is with more evidence, evidence, evidence. For instance, if the officer had simply questioned the last villager, then his evidence would likely overwhelm any previous uncertainty.