AI startup Petuum aims to industrialize machine learning
Pittsburg-based Petuum, backed by SoftBank, has developed novel tools for parallelizing machine learning operations across computers. The software could help break the bottlenecks IT encounters in scaling up AI across industries.
The past thirty years of machine learning breakthroughs are intimately entwined with a big idea in computing: parallel distributed processing, where a parts of a program run simultaneously on multiple processors to speed computation.
One AI researcher-turned-entrepreneur believes the field needs a lot more savvy about parallelism, to make parallelizing AI dead simple.
Eric Xing, a Carnegie Mellon professor of machine learning, three years ago founded Petuum, based in Pittsburg, which has received $108 million in funding from Japanese conglomerate SoftBank, along with Advantech Capital, Chinese computing giant Tencent, Northern Light Venture Capital, and Oriza Ventures.
The company plans to ship the first version of its AI platform software next summer, an offering Xing hopes will "industrialize" machine learning, thereby making it more reliable and more broadly available.
Much of the challenge of AI is a systems engineering challenge, and at the heart of that is a problem of parallelizing the running of algorithms across all kinds of configurations of machines.
"When you deploy algorithms, you need to maintain it, you need to update it, change it," Xing told ZDNet.
"That is the very bottleneck of getting AI accessible," he says, "for companies that aren't Google or Microsoft, that don't have armies of engineers, for traditional IT teams.
"There is a shortage of talent, and there is little to no history of building AI teams within most companies."
"Other companies want Lego pieces, they want building blocks of machine learning solutions. AI needs to be industrialized, and there need to be standards - we want to be the front-runners of such a culture."
The platform, which is laid out extensively in a 2015 paper by Xing and colleagues published in IEEE Transactions on Big Data, describes a way to automatically break programs apart in two different ways.
One is "data parallelism." That is, of course, a very popular approach already in AI. Training, and in some cases inference, in machine learning, is sped up by sending different pieces of data to different processors, either CPUs or, more commonly, GPUs. Each processor trains the neural network using its portion of the total data set, and the parameters of the network, the weights, are updated across all those slices of data.
Another approach, less common, and more difficult to engineer, is to the network into pieces across processors, known as "model parallelism." These problems of parallelism have been a focus of computer science for decades. For machine learning programs written in Google's TensorFlow, or the popular Caffe framework, Petuum's software can automatically achieve either data or model parallelism, or a combination of both.
The key insight of that work is that machine learning, unlike other programs, is not "deterministic," it is probabilistic. As such, it has three advantages other kinds of software doesn't have in the terms of parallelism: It can tolerate error in individual parts of the program's function to a greater degree; the dependencies between parts of the program change in the course of running the program, they are dynamic; and different parts "converge" at a solution to the given problem at different rates.
The Petuum software has developed several tricks to exploit those strengths. For example, a "parameter server" runs a scheduling protocol that chooses which parameters of the neural network to run in parallel, based on which parameters of the neural network are only "weakly" correlated with one another, and therefore can be affected independently.
The results are a little reminiscent of the MapReduce big data framework, but Petuum argues its system has numerous advantages over MapReduce and other parallelizing infrastructure, such as Spark and GraphLab.
Xing had the epiphany that started the company while taking a sabbatical from Carnegie Mellon at Facebook in 2010.
"I was embarrassed at my own inability to deliver my models rapidly," he recalls. "I went back to CMU, and we started a research project on how to take a piece of existing machine learning code, and automatically make a parallel version for the data center."
Petuum is still developing how it will monetize the platform. Xing says it could include a licensing model that charges by the number of machines or users a client has working on a given AI system. But, in the meantime, Petuum is in the process of shipping some packaged software for vertical industries. The idea is to prove that "we are able to address non-trivial AI problems," he says. But it is also the beginning of what Xing hopes will be a marketplace of vertical solutions that can come from numerous parties - Lego bricks for industries.
One industry that is an early customer is healthcare. Hospitals are especially interesting to Xing, because they likely may not have a dedicated AI team, and even if they do, their IT team would perhaps be challenged by the need to deploy AI models on a range of hardware, from single laptops on up to cloud infrastructure of numerous application containers.
"Where they have an IT team, they may sit in front of a UI and update the algorithms, but running on Petuum, they don't need to worry about how the data is distributed or run on different machines."
"This is not about classification," says Xing. "It is about summarizing knowledge into a one-pager, with a deeper understanding of medial information."
"You can increase diagnostic outcomes, you can speed up a doctor's work."
One outcome is the company's partnership, announced in September, with the Cleveland Clinic, to produce an Artificial Intelligence Diagnosis Engine (AIDE) that can "apply advanced machine learning algorithms to medical record data." The partnership is competing for IBM's "Watson AI Xprize."