UNSW switches cloud-based data lakes for AI and ML capabilities

The cloud-based data lake has also helped democratise access to data at the university.

UNSW switches cloud-based data lakes for AI and ML capabilities

It was clear to the University of New South Wales (UNSW) that at the end of 2018, when it was developing its data strategy, it needed to improve the turnaround time it took to get information into the hands of decision makers.

But to do that, the university had to set up a cloud-based data warehouse, which it opted to host in Microsoft Azure. The cloud-based warehouse now operates alongside the university's legacy data warehouse, which is currently hosted in Amazon Web Service's (AWS) EC2.

"Our legacy data warehouse has been around for 10 to 15 years. But we started looking at what platforms can let us do everything that we do now, but also allows us to move seamlessly into new things like machine learning and AI," UNSW chief data and insights officer and senior lecture at the School of Computer Science and Engineering, Kate Carruthers said, speaking to ZDNet.

"We did a proof-of-concept in AWS … and the business response to that was really positive … but we did a market survey and we realised we wanted to go with Microsoft. Part of that was because of the richness of their data landscape."

In spinning up its cloud-based data warehouse, UNSW has created two data lakes. One is focused on ingesting raw data and the second is for storing curated data.

By organising its "new world" data warehouse with two data lakes, Carruthers said the turnaround time to produce reports has improved significantly.

"In the old world, it would take three to four months to just get a prototype or a first cut of a new report. Here, the team were able to get recruitment reporting, and were able to get some data from them and prototype it within two days," she said.

Carruthers also described the shift has enabled the "democratisation of data safely", so that the university could access data from the data lake to build reports themselves -- a process that was not previously possible.

"My team were report writers for the entire university. All we did in doing that was make people unhappy because we couldn't keep up with the demand. We had to change something, and so developing this model, we've become data engineers and data analysts are now writing their own reports," she said.

At the same time, the university is now able to incorporate external data into the data warehouse and make it available through the curated data lake.

Looking ahead, Carruthers said as part of the university's data strategy, there are plans to use the curated data to explore how machine learning and artificial intelligence can be used to build more capabilities. For instance, a machine learning proof-of-concept has been developed with Insight and Microsoft to examine how the university can identify contract cheating.

By next year, there are also plans to decommission the university's legacy data warehouse, Carruthers said.

Related Coverage