Fourteen years ago, Raja Koduri helped usher in the age of GPU computing, a year before Nvidia chief Jensen Huang talked about the phenomenon. The idea was that GPUs can do more than smooth video game graphics, they can crunch scientific problems.
The vision is today a reality, as GPUs are the workhorse of artificial intelligence and, increasingly, high-performance computing of all sorts.
Koduri, then working at Advanced Micro Devices, now lead architect at Intel, sat down with ZDNet to talk about how GPU computing is a revolution that's still unfolding.
"We are at an inflection point for high-performance computing," said Koduri. "Twenty years ago, HPC was dominated by vertically-integrated architectures," but then, "commodity X86 took over, with open source software, and an explosion of libraries, and all clusters, not just HPC, became x86 with a plethora of software."
That software movement, he said, is leading to a new era of compute that can be programmed to be as powerful as supercomputers.
"We see the next cycle with the combination of what's happening in the AI world and heterogeneous architectures driving the next non-linearity."
The occasion for Koduri's remarks was the announcement on Sunday by Intel of a new graphics processing unit designed especially for high-performance computing, code-named Ponte Vecchio. Though still a yearly more from production, it is a sign of the times, a machine optimized for deep learning.
Perhaps more important than new hardware, Intel also Sunday said it will make available the beta release of its software toolkit for programming high-performance systems, called OneAPI, which simplifies the programming of supercomputing-like tasks across many kinds of processors and systems.
Both announcements were made at the 31st annual International Conference for High-Performance Computing, Networking, Storage and Analysis, taking place this week in Boulder, Colorado.
When it was pointed out that OneAPI is in some sense a continuation of work Koduri has done for many years, he concurred, adding "Nothing teaches you better than failure." AMD, he said, had the first GPU hardware for general-purpose compute -- what now gets called a "GPGPU," two years before Nvidia did, but "we didn't start with software." That gave Nvidia's "CUDA" programming toolkit the opening it needed to sweep the industry.
At Intel, by contrast, "we took a completely different approach, we said, 'Let's start with software first.'"
"We have had 1,000-plus engineers working the past 18 months on a massive effort," is how he described OneAPI.
The challenge, of course, for Intel, is that it doesn't sell one basic product, like Nvidia, it now sells a plethora of different processors, including Xeon, the Ponte Vecchio and future iterations of the GPU, the Mobileye chips, the Movidius chips, the Agilex family of FPGAs, and the Nervana neural network processors, to name just the most obvious items.
When Koduri was at AMD, he used to say that Intel had a "buffet" lunch offering when all people wanted was a burger and a milkshake, which is the simple choice Nvidia provided.
Koduri said the same challenge remains now to smooth things for Intel customers. "Our customers don't want to deal with heterogeneity," said Koduri. "That's why OneAPI is meant to work at every layer of abstraction." The CPU is the only architecture of Intel's that has historically scaled to a massive general-purpose platform; OneAPI is intended to be a "bridge," he said, to make the heterogenous future of processing scale in the same way.
Asked if Koduri has confidence that customers will effectively make use of OneAPI and all the chips it abstracts, Koduri replied, "That's a great question."
"We will offer tools that analyze things and tell you, even before you port code over, whether a certain portion of code will run super-efficient on GPUs, for example — tools that make life easy for people to deploy hardware and get an idea if it will benefit their data center."
As for the Ponte Vecchio GPU, it will be made in Intel's 7-nanometer process technology, and it is "still a year or more away" from production, said Koduri.
Ponte Vecchio will be part of Aurora, a half-billion-dollar supercomputer expected to be installed at Argonne National Laboratory in Illinois, which is being built in conjunction with Cray and other vendors.
Koduri declined to go into details about how the GPU differs from other GPUs. All he would say is that "there are several modes of operation in that architecture that makes it much much more flexible than current GPUs with existing architectures." Said Koduri, one can "map many more workloads onto it," adding, "we have a new way to do the vector processing on the architecture."
"There are some details that we aren't disclosing at this point in time," he continued. "The short answer is that its transistors are much more optimized" for HPC. When asked if that would include doing away with some traditional aspects of GPUs, such as memory hierarchies and shaders, Koduri noted that Intel prioritized supporting existing software, to maintain the valuable installed base of GPU programs, but that it also had made choices to do away with some things not as necessary for HPC.