Intel's cascade of data centre riches

Customers will need to spend time matching Intel's new Cascade Lake processors to applications, and working out how best to deploy Optane DC memory, but there are real performance benefits to be had.

cascade-lake-intel-xeon-family-header.jpg

Image: Tim Herman\Intel Corporation

In one of its biggest ever announcements, Intel kicked off its second quarter by releasing a slew of new and enhanced products aimed at consolidating its already dominant position in the corporate data centre. Topping that list was an extended and tweaked family of 2nd-generation Xeon Scalable processors (Cascade Lake) together with the vendor's much-anticipated Optane DC persistent memory (Apache Pass).

But that was back at the beginning of April, prompting us to wonder whether any of these had actually made it into the data centre yet and how those deliverables measure up against the hype.

Work in processors

Let's start with processors, where it's important to stress that the architecture behind Intel's 2nd-generation Xeon Scalable line-up is far from the major leap forward some might expect. Indeed, the architecture is little changed from the first generation and the silicon itself is fabricated using the same 14nm process. That said, clock rates have been tweaked on some of the SKUs, others have a few more cores, plus they all support DDR4 RAM up to 2933MHz. There's also a lot of extra tech built in to take advantage of the new Optane DC persistent memory, handle security in hardware rather than software, and accelerate AI processing for example. Moreover, with the exception of the 9200 series (about which more later), the new Xeons and supporting Optane DC persistent memory can be plugged onto the same motherboards as first-generation Xeons and any new features unlocked with a BIOS update.

cascade-lake-4-new-xeons.jpg

These 2nd-generation Xeon Scalable processors fit into the same motherboard sockets as their predecessors.

Image: Alan Stevens/ZDNet

Some may be disappointed by this and the fact that Intel still hasn't managed to perfect its long-promised 10nm fabrication process. In the short term, however, it's good news as instead of having to wait for a slow trickle of new 2nd-gen servers, vendors can make them available almost immediately. Indeed, most are doing just that, including Dell EMC and HP, both of which have announced updated products using the new chips. Unfortunately we couldn't get hold of any of these servers, but we were able to drop into the labs at Boston Limited and get hands-on with a couple of Supermicro servers to get a much better feel for what customers can expect from the new silicon.

What happened next

Given that they can be slotted into the same sockets as first-generation Xeon Scalable processors, it came as no surprise to find Supermicro offering the 2nd-generation CPUs across almost its entire server range. Indeed, the servers we saw were ones we've reviewed before. One of these was a straightforward dual-socket 1U Supermicro Ultra, the other a 2U Supermicro Twin Pro that can accommodate four dual-socket servers mounted on plug-in sleds. Both feature motherboards equipped with LGA 3647 sockets, enabling them to be equipped with either 1st or 2nd-gen processors and requiring little in the way of modification to take the latter, other than larger heatsinks to handle the higher TDP ratings.

cascade-lake-bigger-heatsinks.jpg

Bigger heatsinks may be needed with the 2nd-generation Xeon Scalable processors, which on some servers may block adjacent DIMM slots, as shown in this photo where the larger heatsink is balanced on top of the original.

Image: Alan Stevens/ZDNet

Unfortunately on space-constrained servers like the modular Twin Pro, larger heatsinks mean that adjacent DIMM slots can't be used, thereby limiting memory capacity. However, this is only an issue on fully populated systems and can be offset by support for Intel's new Optane DC Persistent Memory modules (DCPMM) which, like the new processors, can also be plugged into existing motherboards.

Let's look at what that's all about in a bit more detail.

Thanks for the memory

Much discussed and previewed under the codename Apache Pass, Optane DC (Data Centre) memory uses a technology called 3D XPoint which, like conventional NAND flash, can retain data when the power is turned off. Unlike conventional flash, however, Optane DC can deliver close to DRAM performance but at a much lower cost, making it possible to build servers with large amounts of memory without breaking the bank. To this end, we found 128GB Optane DC modules retailing for around $890: that's a lot more than an SSD, but less than half the cost of an equivalent DDR4 DIMM fitted with ECC DRAM.

cascade-lake-box-of-optane-dc.jpg

Optane DC Persistent Memory comes in the same DIMM format as DRAM and plugs into the same motherboard slots.

Image: Alan Stevens/ZDNet

Optane DC memory comes in standard DIMM format (which is why no new connectors or sockets are required), and can currently deliver up to 512GB per module. According to Intel that means up to 36TB of low-cost memory per server overall, which sounds great -- but there are caveats.  

To start with, Optane DC requires a new memory controller which is built into the new 2nd-gen processors, so you can't just plug new Optane DC modules into an existing server without also swapping processors. There's also a limit of six Optane modules per socket, added to which each socket requires at least one conventional DRAM DIMM. In default 'Memory Mode', this is used as a high-speed cache for data held on the Optane modules.

Another drawback is that the DRAM cache doesn't count to overall memory capacity, but it does enable the Optane memory to deliver near-DRAM performance when supporting applications with predictable data usage patterns. Moreover, this can be achieved without the need for any changes either to the operating system or other software, which immediately benefits applications such as big data analytics and large-scale virtual server farms.

cascade-lake-optane-dc-in-situ.jpg

Optane DC Persistent Memory modules (shown here with prominent white labels) plug into standard DIMM slots alongside conventional DRAM which, in the default Memory Mode, is used for caching.

Image: Alan Stevens/ZDNet

For applications with less predictable patterns of usage, however, Optane DC used in Memory Mode can easily slow performance compared to DRAM. Moreover, persistency isn't available in Memory Mode so, just as with an all-DRAM setup, data will still be lost when the power is turned off, whether it's in the DRAM cache or on Optane DC modules.

Persistency pays

To take full advantage of its persistency capabilities and better support applications across the board, Optane DC memory needs to be used in what's called App Direct mode, enabling data to be directed to either DRAM or persistent memory as required. Also in this mode, Optane DC can be used as byte-addressable block storage just like an SSD, and so deliver high-performance storage without the bottlenecks associated with conventional storage interfaces.

SEE: Cloud v. data center decision (ZDNet special report) | Download the report as a PDF (TechRepublic)

However, all this is dependent on the use of a hypervisor and/or an operating system that's able to distinguish between the different memory technologies. At the time of writing, that means Windows Server 2019 and/or the latest release of VMware vSphere (6.7). Applications may also need to be updated to maximise the benefits, typically using the open-source Persistent Memory Development Kit (PMDK).

The big cloud vendors, SAP and others are also said to be working on mods to leverage Optane DC on their platforms.

Real-world numbers

At the launch, Intel hyped up the performance benefits, both of its new Xeons and Optane DC memory. Putting those claims to the test, however, is far from easy and by far the biggest gains will come from servers using both new products, which makes it even harder. That said, engineers at Boston have confirmed measurable performance improvements using the processors alone, chiefly as a result of Intel bumping up the clock numbers plus, in a few cases, higher core counts compared to equivalent first-generation Xeons.

Availability issues meant that they couldn't test every processor, but the results for those they could measure can be seen here:

cascade-lake-linpack.jpg

LINPACK results comparing first- and second-generation Intel Xeon Scalable processors. (*Clock speed raised, **Core count increased, ***More cores & faster clock).

Source: Boston Limited

On some of the processors you can further tweak performance by using Intel's new Speed Select technology to optimise the base frequency. This is normally set at a low level that can be sustained even if all the cores are active, but if the workload pattern requires less than the full complement, Speed Select allows the base frequency to be raised to deliver higher performance overall.

These gains are all worth having, but are well below the numbers on some of the slides shown at the Intel launch. That's not surprising because if you drill down into the detail the most eye-catching numbers are centred around the Xeon Platinum 9200 which, despite being part of the same 2nd-generation Xeon Scalable family, is a different beast altogether.

Double jeopardy

Characterised as Advanced Performance or 'AP', what you get inside each 9200 'processor' is a pair of Xeon Scalable 8200 series dies coupled together in a single highly dense package. That means double the number of processor cores -- up from 28 to 56 on the high-end 9282 SKU -- plus double the associated memory, accessed using 12 DDR4 channels instead of just six.

cascade-lake-platinum-9200-slide.jpg

The Xeon Platinum 9200 is really two processors in one, and can't be slotted into a standard motherboard socket.

Image: Intel

It's this doubling arrangement that enabled Intel to headline a 2X performance gain for the 9282 Xeon Scalable compared to the earlier 8180 in its launch literature. But then the 8180 has half the number of cores, so that figure is far from surprising.

At the time of writing, Platinum 9200 processors were only just starting to leave the factory and Boston hadn't received any samples to test, or any pricing information. Moreover, it's important to stress that, unlike other members of the 2nd-generation Xeon Scalable family, 9200s aren't designed to fit standard motherboard sockets. Instead, they come as a BGA package and will likely be delivered firmly attached to a host motherboard. The motherboard will also be from Intel and ZDNet understands that OEM vendors, like Supermicro, will be unable to offer Scalable 9200 processors on their own designs.

Boost for AI

All this doesn't mean that customers won't see big gains from other 2nd-generation Xeon Scalable processors, particularly if they also deploy Optane DC persistent memory. However, some care will be needed to get the combination right as buyers looking to save money by maximising Optane DC over DRAM, for example, could see performance fall compared to all-DRAM platforms. The performance differences will also vary depending on the type of application involved, with Intel having yet one more trick up its sleeve in the form of a new Deep Learning Boost capability aimed specifically at accelerating AI inferencing apps.

Unlike AI training, which makes huge demands on processing capabilities that are best met using GPU technology, inferencing applications (where AI algorithms are actually put to work) are far less draining. Indeed, inferencing data volumes can be significantly smaller and the processing done at relatively low precision -- a requirement facilitated by Intel's new Deep Learning Boost using the 512-bit vector units (AVX512) included in the Xeon Scalable architecture. Using these, the 2nd-generation processors can now simultaneously process 64 8-bit or 32 16-bit integers using a single hardware instruction which, together with support for fused operations such as FMA (Fused Multiply Add), can have a huge impact on inferencing performance.

Dell EMC for example, has published test results using the ResNet-50 inference benchmark showing 2nd-generation Xeon Scalable processors with Deep Learning Boost more than tripling throughput in 8-bit integer precision:

cascade-lake-deep-learning-boost.jpg

AI inferencing performance can be significantly enhanced by Intel's Deep Learning Boost.

Image: Dell EMC

Again, real applications may not always mirror these figures, but there are definite benefits to be had and without the need for additional GPU hardware, further reinforcing Intel's data centre proposition.

The state of play

The bottom line here is that, while some of the figures quoted by Intel need to be taken with a pinch of salt, there are real performance benefits to be had. However, they're not automatic and customers will need to spend more time both matching the new processors to applications, and working out how best to deploy Optane DC memory. The good news is that, with the exception of the Platinum 9200, Intel has been surprisingly quick off the mark getting its new products to market and doing so alongside a raft of other technologies which together look set to rapidly reshape the data centre.

RECENT AND RELATED CONTENT

Intel unveils broad Xeon stack with dozens of workload-optimized processors

HP updates Z6, Z8 Workstations with latest Intel Xeon processors

Intel puts Optane memory and NAND storage together on H10 SSD

First Optane Performance tests show benefits and limits of Intel's NVDIMMs

10nm Intel CPUs not coming to desktops until at least 2022, amid manufacturing issues (TechRepublic)