Finding useful information in oceans of data is an increasingly complex problem in many scientific areas. This is why researchers from Case Western Reserve University (CWRU) have created new statistical techniques to isolate useful signals buried in large datasets coming from particle physics experiments, such as the ones run in a particle collider. But their method could also be applied to a broad range of applications, like discovering a new galaxy, monitoring transactions for fraud or identifying the carrier of a virulent disease among millions of people.
Here are some quotes from the researchers who are involved in physics and statistics.
"As haystacks of information grow ever larger -- and the needles ever smaller -- the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.
"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Ramani Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."
Below are two images illustrating the problem of finding a single pattern inside a sea of data (Credit: Ramani S. Pilla and Heidi Cool, CWRU).
This image, as well as the other one below, was extracted from this article about geometric reasoning for signal discovery at Ramani Pilla's lab.
You'll also find detailed explanations about this new method on the site mentioned above.
At the core of our method is the idea of posing the problem in terms of classical "hypothesis-based testing" paradigm to detect statistical disorder in the data. There are two challenges in making the method a practically useful one: defining efficient test statistics (i.e., a function of data), and determining the critical cut-off value that enables the researcher to make a decision, at a given false positive rate, to reject the null hypothesis of no signal present in the data.
Our method further exploits the flexibility behind the long-established geometric method pioneered independently by Harold Hotelling and Hermann Weyl in their 1939 seminal papers. This geometric method is extended to the current problem of detecting a signal; in particular, in creating an approximation to find the critical cut-off value. Our technique based on geometric reasoning significantly enhances the researchers' ability to distinguish a signal.
The team tested the method on high-energy particle physics and astrophysics problems, and here is what said one of its members about the results.
"Conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."
The research work has been published in 'Physical Review Letters' under the title "New Technique for Finding Needles in Haystacks: Geometric Approach to Distinguishing between a New Source and Random Fluctuations" (Volume 95, Article 230202, December 2, 2005). Here are two links to the abstract and to the full paper (PDF format, 4 pages, 254 KB).
Now, will this method be reserved for scientific research or will it be integrated one day into the search engines that we're all using daily? Time will tell.
Sources: Case Western Reserve University news release, via EurekAlert!, December 5, 2005; and various web sites
You'll find related stories by following the links below.