Machine learning might be high on the agenda for the data science team at Coles, but according to Richard Glew, Coles head of engineering and operations, they are currently limited by the existing on-premise environment.
"Even if we can do something, being able to do something quickly is another matter. We've got a lot of issues [like] where is our data, do we have the right hardware, how long does it take to get it … all the usual stuff with an on-prem environment," he said, speaking as part of the Databricks Data and AI APAC virtual conference.
In a move to expand the possibility of enabling machine learning, advanced analytics, and data exchange, the company is currently developing an electronic data processing platform (EDP) to change the way it manages and stores data.
"Our EDP platform is designed to be a universal data repository for all the data we want to share internally or externally as an organisation, and we fully catalogue that," Glew said.
"We want it to be a real platform so teams can get access to data themselves, and we do that through a fully multi-tenanted, clean way so we don't become that bottleneck.
"We want to be really scalable and agile by improving the time to value for people who want to use data. We also want to be able to take advantage of the elastic nature of the cloud and make sure we're able to meet business demands when they arise and to scale back when we don't need things."
The design and the development of the EDP have been carefully considered, with Glew revealing how Coles has adopted a blueprint of how the company thinks about data as a "first-class asset".
"It means we don't treat it as an afterthought. We make sure that when we do an initiative or create a project, we're looking at publishing that data then and there into the platform, cataloguing, and it making available for people," he said.
"It's also about speaking to the quality of the data. It's not about publishing rubbish, it's about being thoughtful about what we put out."
By adopting a multitenancy model for the EDP architecture, Coles is leveraging Oracle Golden Gate to shift data from its existing on-premise Oracle data warehouses into Microsoft Azure, while also using Databricks as a central processing technology to prep, transform, and model the data being ingested.
Glew said when the EDP project is complete, it would enable data to be easily discoverable, streamed and used in real time, and be stored in one place.
"There's a lot of great things a platform like this enables us to do that we struggle within our existing environment," he said.
Glew added another aspect of the project has been getting data governance and compliance to a level that is better but less complex, which is an issue Coles has had to face in the past.
"Our current environment, while secure, it doesn't make it easy to do things with our data quickly and easily," he said.
Since adopting this new platform, Glew said the investment has started to pay off by highlighting how, for instance, model training jobs that would typically take three days has been reduced to three hours.
"Over time, as we get the speed of data faster and faster the business will change fundamentally," he said.
Under the partnership, Coles is building an enterprise data platform in Azure to bolster its analytics capabilities and enable the use of artificial intelligence to drive innovation through its supply chain and physical stores.