AWS wants to reinvent the supercomputer, starting with the network

VP of AWS global infrastructure Peter DeSantis talked up the possibilities of running high performance computing workloads in the cloud, instead of on a 'costly' supercomputer with a pretty paint job.

aws-supercomputer.jpg

Image: Asha Barbaschow

Amazon Web Services wants to reinvent high performance computing (HPC) and according to VP of AWS global infrastructure Peter DeSantis, it all starts with the network.

Speaking at his Monday Night Live keynote, DeSantis said AWS has been working for the last decade to make supercomputing in the cloud a possibility.

"Over the past year we've seen this goal become reality," he said.

According to DeSantis, there's no precise definition of an HPC workload, but he said the one constant is that it is way too big to fit on a single server.

"What really differentiates HPC workloads is the need for high performance networking so those servers can work together to solve problems," he said, talking on the eve of AWS re:Invent about what the focus of the cloud giant's annual Las Vegas get together will be.

"Do I care about HPC? I hope so, because HPC impacts literally every aspect of our lives … the big, hard problems in science and engineering."

See also: How higher-ed researchers leverage supercomputers in the fight for funding (TechRepublic)

DeSantis explained that typically in supercomputing, each server works out a portion of the problem, and then all the servers share the results with each other.

"This information exchange allows the servers to continue doing their work," he said. "The need for tight coordination puts significant pressure on the network."

To scale these HPC workloads effectively, DeSantis said a high-performance, low-latency network is required.

"If you look really closely at a modern supercomputer, it really is a cluster of servers with a purpose-built, dedicated network …  network provides specialised capabilities to help run HPC applications efficiently," he said.

In touting the cloud as the best place to run HPC workloads, DeSantis said the "other" problem with physical supercomputers is they're custom built, which he said means they're expensive and take years to procure and stand up.

"One of the benefits of cloud computing is elasticity," he continued.

Another problem AWS wants to fix with supercomputing in the cloud is the democratisation element, with DeSantis saying one issue is getting access to a supercomputer.

"Usually only the high-value applications have access to the supercomputer," he said.

"With more access to low-cost supercomputing we could have safer cars … we could have more accurate forecasting, we could have better treatment for diseases, and we can unleash innovation by giving everybody [access].

"If we want to reinvent high performance computing, we have to reinvent supercomputers."

AWS also wants to reinvent machine learning infrastructure.

"Machine learning is quickly becoming an integral part of every application," DeSantis said.

However, the optimal infrastructure for the two components of machine learning -- training and inference -- are very different.

"A good machine learning dataset is big, and they're getting bigger … and training involves doing multiple passes through your training data," De Santis said.

"We're excited in investments we've made in HPC and it's helping us with machine learning."

Earlier on Monday, Formula 1 announced it had partnered with AWS to carry out simulations that it says has resulted in the car design for the 2021 racing season, touting the completion of a Computational Fluid Dynamics (CFD) project that simulates the aerodynamics of cars while racing

The CFD project used over 1,150 compute cores to run detailed simulations comprised of over 550 million data points that model the impact of one car's aerodynamic wake on another.   

Asha Barbaschow travelled to re:Invent as a guest of AWS.    

RECENT SUPERCOMPUTING NEWS