Big bang data: How 'citizen data scientists' will help astrophysicists look back to the dawn of time

Vast amounts of computing power are required to process 'barely possible' big data challenges faced by the Square Kilometre Array radio telescope.
Written by Danny Palmer, Senior Writer

The Square Kilometre Array will look back in time to the big bang.

Image: SKA Organisation/Swinburne Astronomy Productions

Designed by astronomers and engineers seeking to push the boundaries of modern technology, the Square Kilometre Array [SKA] will be the world's largest radio telescope, covering over one million square metres of collecting area.

Scheduled to begin construction in 2018, SKA is an international project which will consist of thousands of antennas spread across the world, with central cores of operation in South Africa and Western Australia. Its central computer alone will have the processing power of about one hundred million PCs.

At 50,000 times more sensitive than any other radio instrument currently in existence and powerful enough to detect very faint radio signals emitted by cosmic sources billions of light years away from Earth -- including those emitted shortly after the Big Bang, over 13 billion years ago -- SKA is set to help scientists answer fundamental questions about the universe and the laws of nature.

With the array so sensitive that it could pick up an airport radar on a planet ten light years away, the project will generate vast amounts of data, as explains Professor Danielle George, from the microwave and communications systems research group at the University of Manchester.

"The amount of data the SKA will produce is enormous; it's estimated the dishes alone will generate ten times the global internet traffic. The SKA super computer will also need to perform 1018 [one quintillion] operations per second in order to process all of the data from the telescopes," she said, speaking at at Gartner's Business Intelligence and Analytics Summit in London.

George, one of the leads for the UK arm of the SKA project, said the power required to process that data -- which she describes as "barely possible" at the rate Moore's Law continues to evolve - - could push the boundaries of current computing capabilities.

"It'll enable us to make huge advances, not only in the field of radio astronomy, but hopefully in many other fields," she said.

The SKA project will require what George described as "huge innovations at the limit of current technology" and close collaborations with high-performance computing, advanced data networks, and more, thus relying on collaboration between industry, corporations, universities, and others.

But that's not an absolute limit on who this "huge engineering grand challenge" will rely on to successfully crunch vast amounts of data, because Professor George said she and the SKA team will require the aid of 'citizen data scientists' in order to analyse all the information generated by the array.

That's because so much data will be arriving so quickly that that even the most powerful computer in the world in 2022 won't be able to meet that challenge alone.

"To keep up with the data flow from the SKA telescope, data processing must be designed and done in near real time, which requires that it must be automated," George said. An answer to this quandary could include releasing the information to the public to help analyse it, she explained.

That would mean analysis of data generated by SKA would be done in a similar method to that of the SETI Institute's search for extraterrestrial intelligence, which allows volunteers with an internet connection to aid the search by downloading the SETI@home programme and letting their personal computer analyse data when it's idle.

"We're going to have to continue doing this in order to understand and analyse all of this data," said George, describing how this type of data sharing with the public can lead to innovations and new findings.

"A lot of the astronomical discoveries we see now are made after the data is released to other groups. Because these other groups are able to add value to this data by adding it to other data, new ideas and models that may not have been available to the survey astronomers originally," she said.

Therefore, in order to unleash the true potential of SKA and the supercomputing technology required to power it, George argued, "We all need to innovate and get the most out of these smart machines and this big data".

Only through physicists, technologists, engineers, and citizen data scientists working together, she explained, will the project be able to fully analyse the detectable remnants from the origin of our universe.

"Innovation is most likely to reside at the boundaries of all our disciplines and I think that's a pointer towards progress in this area; get experts together from different fields, give them challenging problems and build up a culture of sharing those outcomes with everybody," George concluded.

Read more about scientific computing

Editorial standards