The Australian Department of Treasury's project to address big data is a never-ending affair and will require constant development, according to the department's CIO Peter Alexander.
The Treasury's role is to complete economic forecasts by collecting data from a disparate number of sources, including the Australian Bureau of Statistics (ABS) and the Reserve Bank of Australia (RBA). It has been grappling with collecting large datasets, usually in Excel form, for some time — and by 2007, the shortcomings of that methodology were evident.
Alexander said that the Treasury has long been dealing with "big data in a micro sense," and earlier this year, the government department initiated the rollout of its bespoke data-analytics platform Odysseus in response to persistent data problems.
It was a two-year process, and it involved construction of a new data warehouse. Odysseus has allowed the Treasury to automatically update its databases as information is made available by external sources.
Speaking at the CeBIT Big Data conference in Sydney, Alexander described the project as a "10-year Greek tragedy," because even, the Treasury had been trying to find a solution to its data problems for a long time.
"It was a tragedy, because it took 10 years to get there and probably longer to get somewhere where we were mature," Alexander said.
Given that Odysseus is well on its way to a wider deployment across the department, is the Treasury's big-data Greek tragedy approaching its denouement?
"I don't think so — I don't think it's ever over," Alexander said. "If we said that [the project] is perfect and that it's done, then we would have failed.
"It's all about always learning, evolving, changing models, and developing."
While it is usually structured data, the amount of information that the Treasury has to collect is dramatically increasing. Currently, the department has a swathe of Microsoft data-extraction and analytics tools on its Odysseus platform to deal with that, but it is continually looking for more technology offerings to tackle big data.
"There are some really fantastic end-user technology coming that's quite nice," Alexander said. "We're looking at Microsoft PowerPivot and the like, but we do think, 'Jeez, we've build this fantastic data warehouse and things like that, but is it going to be redundant in time?'
"There are changes coming all the time."
The Australian Treasury is considering collecting data on social-media sentiment as a way to enrich its economic forecast. But this will bring the added challenge of handling a torrent of unstructured data.
Alexander said that offerings such as NoSQL and Hadoop are available for dealing with unstructured data, and that the Treasury will carefully vet different options to ensure that it picks the right one for the department.
It has also created the new role of data custodian as a part of its new approach to information management. The data custodian is responsible for checking all of the new data that flows into the department. The Treasury is still piloting this position to determine whether it will make the data custodian a permanent role.