Intel advances oneAPI as the all-important 'next click down'

From data scientists working in Python to those toiling in assembly language, there's a need for an open, high-performing alternative to Nvidia's CUDA, says Intel's Joe Curley.
Written by Tiernan Ray, Senior Contributing Writer

The rise of "generative" artificial intelligence is all about scaling, the idea of adding more resources to a computer program to get better results. As OpenAI co-founder and chief scientist Ilya Sutskever has remarked, "I had a very strong belief that bigger is better" when he founded the company that would create ChatGPT.

That idea of bigger and bigger compute has led to a race to develop the most powerful chips for AI, including not only new GPUs from Nvidia, but also Intel's Habana Labs, which has shown impressive results in benchmark tests; and Advanced Micro Devices, and startups such as Cerebras Systems.

Also: Can generative AI solve computer science's greatest unsolved problem?

That rush to develop chips has created a very practical problem: How are developers supposed to develop for an expanding universe of kinds of chips that have unique capabilities, and unique programming environments?

"We've got GPUs, we've got TPUs, we've got FPGAs -- all these things hit different, wonderful design points in the marketplace, but if you think about the developer experience side of it, it's, like, Oh, how do you program?" says Joe Curley, vice president and general manager of software products and ecosystem at Intel in an interview with ZDNET.

Chips always depend on the developer tools available to use those chips; no matter how great a chip is, it's a pile of silicon if there's nothing with which to write for it.

Intel's answer is oneAPI, an open-source programming specification, the source code of which is posted on GitHub, and which is meant to enable developers to achieve parallel processing of numerous kinds of chips without knowing all the details of all chips.

Also: Extending ChatGPT: Can AI chatbot plugins really change the game?

Last week was a big week for oneAPI, as it was announced that the specification is being reformulated as the Unified Acceleration Foundation, or UXL, which is hosted by the Linux Foundation's Joint Development Foundation. The UXL has as founding "steering members" other giants of the chip world: ARM Holdings, Qualcomm, Fujitsu, and Samsung, and also Google.

The Steering Committee lead for UXL, Rod Burns, called the unveiling last week of UXL, a "pivotal moment for heterogenous computing."


Intel vice president Joe Curley


"All oneAPI is, is a software programming model that attempts to create a common abstraction layer that allows you to program and different brands of accelerator through common languages and common library interfaces," said Curley.

Intel has been shipping its own implementation of the open-source spec since December of 2020. Components of oneAPI include a cross-platform parallelizing language, called DPC++, which is an adaptation of an open-source programming language called SYCL, built on the C++ programming language, and managed by Khronos Group. 

Also: How does ChatGPT actually work?

In June of last year, Intel acquired startup CodePlay of Edinburgh, Scotland, a supplier of parallel compilers. That deal brought Intel expertise in cross-device compiling for SYCL. 

oneAPI also has a selection of libraries for different functions, such as, for example, oneDNN, for the speed-up of the matrix multiply primitives. 

More details on the specification for oneAPI can be seen on the technology's specs page.

The compiler technology and the libraries provide for different approaches to AI programming. One is to come from a data scientist's standpoint and simply work downward from popular AI frameworks such as PyTorch and TensorFlow, and use libraries to parallelize that code. 

Also: Nvidia sweeps AI benchmarks, but Intel brings meaningful competition

The point of oneAPI, said Curley, is to "target" a part of the continuum of parallel programming that has never been standardized.

Many times in parallel computing, said Curley, "You start with completely abstracted languages, Python, going through some library infrastructure that gets you access to acceleration," and everything is done under the hood for the data scientist, said Curley. At the other end of the spectrum, the programmer in C++ or Fortran gives explicit hints to a compiler to parallelize the code for a GPU."

"Where oneAPI comes in, is, kind of the next click down, which is, I know I'm writing to an accelerator, I know I'm going to be using an accelerator, so I want to target that device, and I want to write optimum code, and I actually care about performance," said Curley. 

"That's really where you fall into the idea of something like oneAPI, and the trick is that, in that part of the continuum, there's never been a successful standardization effort," he said. There is Nvidia's CUDA, which dominates the building of frameworks for AI, and which is not open.

Also: Can AI code? In baby steps only

"CUDA is a language owned by a company, and they've got their interests, and I'm sure they do a fine job with it, but their interest is not about building a diverse community," observed Curley.

"OpenCL took a crack at it, had some successes, [but] a lot of developers found OpenCL to be cumbersome: a lot of header files, it wasn't terribly productive."

The approach of OpenCL, also maintained by Khronos Group, was divided, observed Curley. "You had a host page, and an accelerator page, and you had to think differently" about the two devices. 

"What SYCL does, is, it creates a much more intuitive single-page language that goes, hey, I'm writing code, I want you to run this C++ code on that device in this way," explained Curley, "instead of saying, I'm gonna have a host code and a device code and manage these two, and then link them at the bottom."

Also: How to use ChatGPT to write code

"So what it does, is, providing an open and productive answer to that part of the continuum, you know, before you get to the guy or gal that's out there writing in assembly and intrinsics and things like that."

Curley described SYCL as coming from "a more modern start" than previous parallel computing efforts, by "having learned a great deal from what people have learned from programming accelerators for the previous 13 years." 

The goals of oneAPI are consistent with what Intel's Vice President and general manager of AI and analytics, Wei Li, last year told ZDNET is the company's goal of using "software as a bridge to get to AI everywhere," by overcoming the technical hurdles.

The acquisition of CodePlay brought technology to make more of a push-button approach to compiling cross-platform. "On top of our compilation stack, they've provided a plugin that just goes right into our tool chain to allow you to generate code for an AMD or an Nvidia GPU," said Curley. 

Also: A new role emerges for software leaders: Overseeing generative AI

"So if you're starting from scratch today, we've got an environment where you don't have to write for one person and then move, you can write in the SYCL language and then port to whatever device you like."

That push-button aspect is not just about easing the programming burden, said Curley; it's also about the portability of code, which has its own appeal.

"That's really important for one class of customer in industrial systems, people that are putting this into aircraft or something that may have to live for 20 or 30 years where you don't want to be tied to a language or a single company for just practical maintenance reasons," observed Curley.

How will oneAPI fair against the decade-plus lead that CUDA has had in the market, and the tremendous installed base of Nvidia GPUs?

Also: AI is great at coding, but there are some massive caveats

"If you have the choice to build on an open tool chain, and the choice to build on a closed tool chain, generally speaking, open wins," is one way to look at it, said Curley. 

At the same time, these things take time. 

"It's new," he said of oneAPI, "and we're really at the beginning of our journey." In the case of "that other language," as he refers to CUDA, "for the first 8 or 9 years, it was very much a niche language, and, even today, it's still really only for professional practitioners a certain space."

"The idea here is that, if we can make something that's a little more pervasive, a little more open, a little more C++-like, then, honestly, it creates a great alternative."

Editorial standards