How the GPU became the heart of AI and machine learning

The GPU has evolved from just a graphics chip into a core components of deep learning and machine learning, says Paperspace CEO Dillion Erb.
Written by Colin Barker, Contributor

Paperspace offers products ranging from virtual desktops to high-end workstations for use across a host of areas, including animation studios to the infrastructure for machine learning and data science. ZDNet spoke to Paperspace CEO, Dillon Erb, to find out more.

ZDNet: What's your company's background and where do you position yourself in the market?

Erb: We've been around for about three years and I developed the product with my friend, Daniel Koban.

The company kind of grew out of a shared interest in GPU compute. Now GPUs have had a relatively long history, starting from the video game world to today where they have become a general compute device.

We started off by thinking about how cloud computing was going to be transformed in a big way thanks to these parallel computer architectures and as that happens new kinds of emerging workloads would happen as well.

SEE: Deep learning: An insider's guide (free PDF)

We started out two years ago and we went through a programme called Y Combinator [a model for funding start-ups] out on the West Coast and then put up a headquarters in New York city.

From there we built up a team -- which is now 14 people -- and started our first foreign venture with an office in Amsterdam.

We are engineers for the most part and what we are doing is building a platform which is mostly for making parallel compute more accessible.

The killer application, the big story over the last couple of years, has been the evolution of the GPU from purely the graphics pipeline to really being one of the three core components of deep learning and machine learning.

What that looks like today is that we are sort of in a golden age of amazing technology around deep learning and machine learning -- new, amazing frameworks that are coming out all the time, the big ones being TensorFlow and Torch and a handful of others.

And then you have an amazing amount of computational power that's coming out of these GPUs. Combine that with the unprecedented amounts of unstructured data around today and you really have something.

If you thing about these things all coming together we are in an age where deep learning and machine learning are incredibly powerful. But, that said, it is still, even today with all of the accessibility, hard for even the very motivated of people to actually get started with it.

What that means is that most of the cutting-edge research comes out of the big guys, the Googles, the Facebooks. But finding an easier way for core developers to come into this world is still extremely difficult.

So what we offer is a basic platform that lets you go from zero to building out a real machine learning model very, very quickly.

Now if you are web a developer today you have access to a myriad of tools, a very rich ecosystem. But if you want to do more modern AI, or machine learning, the tooling isn't there yet. A lot of that is revolving around certain infrastructures and getting the GPUs to work right, along with combing all of the software and really new workflows that are emerging.

That's where we spend most of our time -- thinking and building out tools and so forth. The value here is that the technology is amazing and if we can open it up to more people I think that that will have a pretty powerful effect.

Thinking about the power that is avilable, can you think of any particular areas where you can use this power?

One of the interesting things about learning in particular or modern machine learning is that it's actually a really generalisable technique.

What that means is that if you have any access to, say, structured data -- i.e. x-ray scans, or you're collecting image data -- a deep-learning model can actually give you a pretty magical amount of predictive power.

SEE: Sensor'd enterprise: IoT, ML, and big data (ZDNet special report) | Download the report as a PDF (TechRepublic)

Our take on this is that we think that this is a fundamentally disruptive technology that will really impact every industry. The reality today is that where you have the biggest wins using deep learning is in computer vision. Things like image detection or categorization -- things that have large amounts of image data. Those are the areas that work well today.

But really computes for everything from anomaly detection to security systems. We work with a company doing industrial robotics using a technique called reinforcement learning -- basically teaching robots to learn more quickly -- and another doing fabrication.

One of things that strikes me about AI is that while there is an awful lot of compute power around, people don't really know how to use it. What do you think?

Yes, but I think there are two vectors here. One is a kind of pure technology enablement. At one level, a lot more companies have the ability to do different types of machine learning but are not able to do it purely because they don't have the tools yet or the expertise.

I think that the other kind of vector is that you are definitely seeing a consolidation but I think it's problematic in a lot of ways. Companies like Google have access to almost limitless amounts of information but end up making predictions that are almost scary at some level. I don't want to get too worried about it but I think that we are seeing a trend where having access to a lot of data and expertise -- thanks to machine learning experts -- can really give you a competitive advantage. That's the way the big companies are doing this. Google has re-orientated itself entirely around AI.

But I think that this is a technology that by its own nature is not reserved for large companies. There is a big project which has to happen, opening it up to everybody and making it accessible.

How did you get into this area - what steered you towards it?


Erb: "Things like image detection or categorization - things that have large amounts of image data. Those are the areas that work well today."

Photo: Paperspace

I actually came from the building and architecture world, but even within that world I was kind of in the technology area.

Three and a half years ago I was working for a social architecture team that was doing social simulation. And so, my job in this group was to build out social simulation software for large-scale structures.

So I started to research other peoples ways of getting better performance out of the system and that led me to GPUs -- and in particular the largest player, Nvidea.

Now they have a framework called Cuda and about four years ago, the majority of Cuda was used for HPCs -- those structural simulations that I was working on. So then I was in analysis, which was another piece of it, and then I was in financial applications. And so then, after a year or so, I had shifted almost entirely over into deep learning.

Can you give me a practical example of its uses?

There are a few things that are interesting and I break them into two camps.

One of them is the kind of "moonshot" project. Things like driverless cars and other things like cancer detection. We're working with a handful of companies who are working in those spaces.

One company is taking image data from cars that are driving around a city and then pulls that into a system that can actually predict all sorts of interesting things like construction and real-estate value, and things like that.

Cancer detection is a big one. If you have access to a lot of images of malignant tumors and benign tumors you can start training models that can actually predict, very accurately, and show when there is something that needs investigation.

And then there are mundane but still interesting things like the robotic app I mentioned and there is a system that is basically teaching the robots to work better.

What are you currently focusing on?

Doing machine learning is a many faceted operation and what you have to do is start with the data that you ingest. Then you train these models and decide whether you are going to put them into an iPhone app or a physician's office or whatever.

What you then end up looking at is a pipeline and a pipeline is running things from the data coming in to the prediction going out.

And the building up of these pipelines is probably one of the more complicated things about machine learning. It's one thing to do a one-off project but it is a much more complex issue to build a pipeline.

SEE: How to implement AI and machine learning (ZDNet special report) | Download the report as a PDF (TechRepublic)

Where we focus today is on building these pipelines. There is a technique call continuous integration (CI), or continuous deployment (CD), which is pretty common in the web-development world.

What we hear is that every company, at some point, will be doing either CI or CD for machine learning and that's the type of platform that is really missing in the development world, and so that is where we are working a lot of the time.

The aim is to have a real prediction engine where the data is coming in and your model is predicting and giving you better predictions.

The hardware is all GPU, and presumably more and more powerful GPUs?

You can use the same GPU with videogames as you could use for training deep learning models. What's happened over the last year or so is that Nvidia came out with their first GPU that was designed for machine learning, their Volta architecture. And Google came out with an accelerator called the TPU which was a custom chip built entirely for machine learning.

As of today, 95 percent of our customers and users are using Nvidia but that is changing and I think that the software ecosystem will become more rich and the hardware ecosystem will become more heterogeneous, and as that happens you need a software layer to really bridge that gap and make it easier for developers to leverage all kinds of new technologies without digging into the underlying implementations.

Can you tell me about Gradient?

The company is Paperspace and our tool stack is called Gradient. This is a tool stack for developers that helps them, at a relatively high level, plug into a GPU infrastructure.

There are a few tools that enable that. One is a notebook integration. A Jupyter notebook is kind of the de-factor standard for a machine-learning developer to start writing code. And then the interactive notebook works around those and we have pretty powerful versions of both of those.

That's with the Javelin architecture where effectively a developer can say, "Here's a little bit of code", and they send it to us using either a command line utility or we have a GUI to do it as well. And they send it to us and say, "Here's the code I want to run and the framework I want to use", and then we do all the plumbing and make sure that we can place that job on an accelerator and then return the results.

I think that we bring all this together. But I think there is a criticism of deep learning which is that it is somewhat of a black box where you put data in, get out a prediction and you hope that it's right. I think a big part of what we want to do is to built usability into deep learning -- into what might be an inscrutable system.

Further reading on power and GPU

Nvidia makes Kubernetes on GPUs available

Nvidia has announced new deep learning tools for both researchers and developers, including a release candidate version of Kubernetes on Nvidia GPUs that's available to developers for feedback and testing.

Intel: Expect our first discrete GPUs by 2020

When Intel last year hired AMD's top Radeon architect, Raja Koduri, the chip maker flagged plans to deliver its own high-end discrete graphics cards. Now the company has announced that its first discrete GPU will be coming in 2020.

Nvidia wants GPUs reserved for those who need it, not those mining Ether

Delivering his keynote at the Nvidia GPU Technology Conference, company CEO Jensen Huang checked off a number of industry hot topics.

Intel to launch discrete GPU in 2020 (CNET)

The question remains: for what?

SK Telecom and Nvidia to launch GPU cloud solution

The South Korean telco has teamed up with Nvidia to launch its SKT Cloud for AI Learning, or SCALE, a private GPU cloud solution, within the year.

Nvidia reveals special 32GB Titan V 'CEO Edition' GPU, and then gives away a bunch

Nvidia CEO Jensen Huang has unveiled a new souped-up variant of its $3,000 Titan V GPU, which the company launched last year and billed as the most powerful PC GPU ever.

Editorial standards