The United States' possesses thousands of nuclear warheads in the Department of Defense nuclear weapons stockpile. The size of the stockpile has declined dramatically over the past half century, but maintaining the existing stockpile -- comprising warheads mostly produced in the 1950s and 1960s -- is a complicated job.
"With our stockpile, it's not getting any younger," Jim Lujan, the HPC platforms program director at Los Alamos National Lab (LANL), tells ZDNET. As the warheads get older, he says, the LANL has the responsibility of assessing how the aging process may impact their safety or performance.
Of course, you can't exactly test nuclear warheads -- at least, not under the Comprehensive Nuclear Test Ban Treaty of 1996. To fulfill its mission, the Los Alamos lab uses modeling and 3D simulations. With the most-cutting edge high performance computing tools, the lab and its partners can produce high-fidelity physics simulations, and they can validate their simulations against real and historical phenomena.
The government has been using advanced simulations and computing to accomplish this since the 1990s. The challenge, however, has been that "these problems become bigger and bigger," Lujan says, "and they take more time... Some of these physics simulations that we do, to go from beginning to end, can take upwards of six to eight months. If you're looking at a problem, and you're not going to have an answer for six to eight months, it makes it a little difficult to say, 'OK, oops, I didn't quite get it right here. I need to go adjust it.'"
Why are these problems getting bigger and taking longer? Part of the challenge stems from the fact that computing capabilities have simply gotten really good -- to the point that CPUs have outstripped the pace at which they can move data in and out to perform arithmetic operations. Typically, computing systems rely on DDR memory, which is all off-chip, to access those datasets -- creating a bottleneck.
High-fidelity simulations, such as the ones used to assess the state of the nuclear stockpile, use massive datasets. But trying to use a powerful CPU to run workloads that leverage massive datasets is a bit like using a sports car to run your errands.
"It's sort of like saying you have a car that can go zero to 100 in two seconds, but if it can't hold all the groceries, how effective is that car, right?" Lujan says. "You may have a great race engine, but if you can't deliver that speed effectively to a broad set of applications, it makes it challenging."
To address that problem, LANL is in the early stages of leveraging Intel's new Max Xeon CPU Max Series (code-named Sapphire Rapids HBM) -- the first x86-based processors with high bandwidth memory (HBM) on the chip.
Intel this week is rolling out five different SKUS of the chip, with core counts ranging from 32 to 56. With 64 GB of high bandwidth in-package memory, the Xeon Max CPUs will provide enough memory capacity to fit most common HPC workloads -- without leveraging DDR memory.
Besides simulating the physics of nuclear warheads, the Max CPUs are well-suited for a wide range of other HPC workloads that rely on huge datasets. That could be drug discovery or genomics in the life sciences space, or climate modeling. Meanwhile, a growing number of AI models, like Chat GPT, are beginning to leverage massive datasets.
"We're eager to have this increased memory bandwidth close to the processor because it is going to make a big difference," Lujan says. "We're not just chasing the speed. We're trying to get efficacy and problem resolution."
So far, Lujan says, the LANL has seen roughly a 4x to 5x performance improvement with applications leveraging the Max CPU -- without having to make any modifications to the applications.
A big selling point of Intel's Max portfolio is the ability to leverage oneAPI -- a common, open, standards-based programming model.
"Developers can leverage all the codes that they have on Xeon today and bring them onto the Xeon Max with really no code changes," Intel VP Jeff McVeigh says to ZDNET.
To put oneAPI to the test, the LANL tried taking an application with binary code and porting it to the Xeon Max processor -- they were able to run it, without any changes, with a modest performance improvement.
"So things are running faster, which is great," Lujan says. "But the level of effort with which to recognize that performance improvement is very minimal. We could go to other architectures that might give us more modest improvements in some respects. But if we have to rewrite hundreds of thousands of lines of code to achieve that performance, there's a cost associated to that."