## Sunday, July 12, 2015

### Defining evidence

To define "evidence", a good starting point is the Bayesian definition:

B is evidence for A if P(A|B) > P(A)

In other words, evidence is "anything that increases your belief in a claim."  This definition is appealing for a number of reasons.  It's sufficiently precise that in principle everything either counts as evidence or it doesn't.  And we can apply Bayes' theorem to build an intuition about what counts as evidence.

However, The Barefoot Bum points out a couple deficiencies in the definition.  First, just because B is evidence for A does not mean that A is very likely.  Second, it is possible to exploited the definition.  Here, I only wish to address the first deficiency, but the best illustration of why it is a problem is to show how the definition can be exploited.

Suppose I play a single game of poker with my robot boyfriend.  I'm dealt a mediocre hand, with aces high.  After we reveal our hands and I lose, we have the following argument:
Robot boyfriend: This is evidence that you cheated!
Trivialknot: That's absurd!  This is a typical hand.
Robot boyfriend: But there is no typical poker hand.  There are millions of possible hands, and each is very unlikely.
Trivialknot: You are correct, but surely if I cheated by choosing a hand for myself, this hand would be even more atypical.
Robot boyfriend: You're right.  Let's instead consider the hypothesis that you cheated and gave yourself this particular hand.
Trivialknot: Why would I do that?
Robot boyfriend: Who knows, but we can see the evidence for it right in front of us.  This is an atypical hand to be dealt randomly, but an extremely typical hand for someone who wants to cheat their way into a 3 of spades, 5 of hearts, 7 of diamonds, 8 of spades, and ace of hearts!
Trivialknot: And I would have gotten away with it too if it weren't for your Bayesian analysis!
[note: this is not a realistic scenario since we don't like poker.]

Essentially, it is possible to exploit the definition of evidence by carefully tailoring the hypothesis to fit hypothesis.  In mathematical terms, we are choosing A such that P(A|B) >> P(A), at the cost of making P(A|B) << 1.  This is obviously exploitative, because the entire point of evidence is to find out what is true, and yet here we're discussing a hypothesis which is extremely unlikely to be true even after the strong evidence for it.

Unfortunately, the distinction between exploitative and non-exploitative uses of the definition of evidence is not always clear cut.  Even in science we do want to tailor our hypotheses to fit the evidence.  Just not in that way, you know?

There are three kinds of resolutions to this problem:

1. Stick to the original definition of evidence.  Bite the bullet.
2. Formulate another precise definition of evidence.
3. Leave some of the definition up to subjective judgment.

I have, in the past, stuck with the standard definition of evidence.  It is useful, for example, to prove that absence of evidence is in fact evidence of absence.  That's always true in the technical sense.  And then if we want more nuance, we can discuss what makes evidence strong or weak.

On the other hand, the word "evidence" has some value attached to it.  Evidence is desirable, laudatory, and the way to truth.  If there is a way to use the technical definition of evidence in order to frame what is untruthful as truthful, then maybe there is something wrong with the technical definition.

I think there is a lot of value in a more subjective definition of "evidence", and practically speaking, people are making subjective judgments anyway, so we might as well admit it.  But it's at least worth looking for a technical fix.  The Barefoot Bum suggests that we speak of evidence as something which favors one hypothesis relative to another mutually exclusive hypothesis:

C is evidence for hypothesis A relative to hypothesis B if P(A|C)/P(B|C) > P(A)/P(B)

This definition is still exploitable if we carefully tailor our hypotheses.  However, we can add the restriction that A and B are only comparable if they receive similar degrees of tailoring.  Thus we retain the ability to tailor hypotheses while removing the ability to do so in an exploitative way.