Monday, February 14, 2011

Is this data snooping?

Do you read oktrends?  It's a funny blog about statistics gathered from okcupid... How can you lose?

In their most recent post, they attempted to find the best casual questions to ask someone in order to determine something deep.  They found that "Do you like the taste of beer?" was the best question to determine if a person would have sex on a first date, and "Do you like horror movies?" is one of the questions that couples agree on most often.

That's all quite amusing, but this was ringing skeptical bells for me.  They tested over 50,000 questions, and tried to correlate them with several things.  With that many questions, I think some of them will show good correlations just by random chance.  This is called data snooping, when you test so many different hypotheses that some of them are bound to be false positives.  At the very least, I think it would exaggerate any results.

On the other hand, okcupid is a pretty large data set, numbered in the millions.  So what do you think?  Data snooping, or no?

1 comment:

Larry Hamelin said...

Dunno. I suspect a multiple comparison test, like Tukey's or Sheffe's, would set too high a bar for significance.