Oracle launches Cloud Infrastructure Data Science Service

The new service aims to give data scientists a collaborative platform that guides data science and machine learning projects through the full lifecycle.
Written by Stephanie Condon, Senior Writer

Oracle on Wednesday announced the launch of the new Oracle Cloud Infrastructure Data Science Service, a native service on Oracle Cloud Infrastructure (OCI) that's designed to let teams of data scientists collaborate on the development, deployment and maintenance of machine learning models. 

As Oracle grows the footprint of its "second generation" cloud, the new service aims to leapfrog the services other public cloud vendors offer for data scientists -- and the problems that come with typical data scientist workflows. 

"One of the traditional problems in data science, and I think it's still what you see typically in almost all organizations, is that [data scientists] are really working in silos, working in isolation," Greg Pavlik, Oracle's SVP for product development, Data and AI services, told ZDNet. "The focus of the service here is to really bring them together into teams, into a collaborative environment, allowing them to work together, and allowing organizations to track their work... in a way that doesn't create hurdles for the data scientists."

Providing customers with a complete data science platform is a "strategically critical" part of Oracle's cloud strategy, Pavlik said. Oracle already has one of the largest global SAAS businesses, he said, as well as significant momentum with database customers moving to the cloud. 

"In both cases, leveraging that data to drive business decisions, either within the application or within the individual databases themselves is one of the first things that customers are trying to do," he said. 

Other public cloud vendors offer tools that aim to make data science more collaborative -- Google, for instance, introduced Kubeflow Pipelines and its AI Hub to maximize the impact of data science across an organization. 

Pavlik contended that Oracle's new service stands out for its focus on team execution. Chief Data Officers and CIOs, he said, "are struggling with, 'How do I get these teams to be effective? How do I have accountability with these teams, how do I make sure I understand the data scientists are building things that they can actually leverage?' And these are all part and parcel of the service here."

The new services key capabilities can be categorized into four components, Pavlik said. First, it offers a collaborative space for data exploration, machine learning experimentation and model training. 

Next, it includes what Oracle calls an "accelerated data science toolkit." The native Python library offers multiple key capabilities, such as access to underlying cloud resources, as well as productivity tools like advanced visualization capabilities. It also includes Oracle's auto ML -- capabilities that allow data scientists to largely automate model selection and model optimization. 

The toolkit also includes model explainability capabilities, allowing users ot explain what data sets and what inputs are driving a model's outputs. "It's very important just to be able to frankly explain what's driving these decisions to the line of business," Pavlik said, "but also in regulated businesses in areas where there's strict governance requirements, you have to be able to to explain why decisions are being made."

The third component of the Data Science Service is a model catalog through which a data scientist can make models available to other users, including other data scientists, business analysts or application developers. "Anyone that's trying to use these models to drive either business decisions, or reporting or application logic, can very simply consume the model and integrate it into their own context without having to have specialized skills or knowledge around machine learning," Pavlik explained. 

Lastly, the service offers model deployment into a loosely-coupled service context. Users can monitor a model's effectiveness and update the model without disrupting the application or the consumers of the model. 

"This loosely-coupled paradigm is a kind of an emerging best practice of how to manage the full lifecycle of a machine learning model," Pavlik said.

Editorial standards