Comments on Skeptic's Play: The science of closed boxes

Scientists are searching for basically four differ...

2009-12-21T10:31:59.334-08:00

Scientists are searching for basically four different kinds of sources of gravitational waves. First are compact binary coalescences (ie two black holes or neutron stars falling into each other), which is what I was working on. Second are gravitational wave bursts, which is a catch-all category for short events. Third are continuous waves, which are long-lasting waves with very constant frequency (ie from a spinning neutron star with asymmetrical mass distribution). Fourth is the stochastic background from the Big Bang, which is very weak, and appears like noise, but is constant over time.

From what I understand, Einstein@Home helps to search for continuous waves. The good thing about continuous waves is that even if they're weak, they last indefinitely. So we can look at a long period of time (say, a year), and let the noise all average out. The bad thing is that for this to work, you have to guess the frequency exactly right, within a very small error. And since you need to account for the Doppler shift and other effects, you also need to guess the location in the sky exactly right. So we have to search through about 10^17 possibilities.

I think they have ways of reducing the number of possibilities, but they still need a lot of computing power. And that's where Einstein@Home comes in. (For those who don't know, Einstein@Home is a project which allows individuals to donate computing time just by installing a screen saver.)

Way cool that you worked on LIGO data. I took part...

2009-12-21T08:12:28.875-08:00

Way cool that you worked on LIGO data. I took part in the Einstein at Home program that used home computers to do something with the data. Can you explain to me how that works in general and also specifically about this issue?

Anonymous, Incidentally, you've hit precisely...

2009-12-20T09:14:37.076-08:00

Anonymous,

Incidentally, you've hit precisely the sort of research I did on LIGO. I worked on machine-learning. In fact, we used more than just two sets of data. There were layers and layers of data sets in order to prevent overtraining, and to make sure that all the techniques were unbiased.

When designing new computer pattern recognition al...

2009-12-20T08:24:33.208-08:00

When designing new computer pattern recognition algorithms, researchers often split up the data into "training data" and "test data". Say if a computer program is desired to recognize human faces (or fingerprints, or handwriting, or whatever), it is "trained" on one set of data, and tested on another. With neural networks as one example, more training on the training data enables the algorithm to recognizes the patterns on the training data better and better, but past a certain point, the algorithm actually gets worse on the test data. That is why it is important to keep the test data and the training data segregated, especially when new data is difficult to acquire (human faces are not a good example).
This also reminds me of stock market predictions made by computer. I have seen cases where huge amounts of historical data are fed into a computer, and the computer looks for patterns that would predict which way the stock market will go based on previously available financial data. With enough analysis of enough variables, the predictions can come out nearly perfect! But the problem is that the patterns don't hold for the future. Immediately after the equations are found, they stop working well. Then the computer program can be improved by including the new data, and it can be made nearly perfect. But then the improved program stops making good predictions, too. The effect is worse than if fewer variables and less data are used, the prediction equations are made simpler, and the computer is not made to match the historical data so perfectly. It's basically the "overtraining" I was describing above.