Tech

Near-data processing: What it is and why you need it

Near-data processing (NDP) is a simple concept: Place the processing power near the data, rather than shipping the data to the processor. It's also inevitable. Here's why it's coming to your datacenter.

Written by Robin Harris, Contributor Oct. 19, 2016 at 5:15 a.m. PT

I've been expecting this for almost 20 years, but it's only in the last few years that it has started taking off.

Why now? Simple: application demands have exceeded infrastructure capabilities. We've outgrown the old ways of processing.

Slow pace of network bandwidth growth

Google, IBM, Dell EMC: We can make servers that are 10 times faster

A consortium of tech giants have developed a technical spec they say is needed if datacenters are to cope with the demands of machine learning and other data-intensive workloads.

Read now

It was almost 20 years ago that the first gigabit fibre-channel (FC) storage came to market (I was its product manager). FC was optimized for storage use - low latency; guaranteed delivery; long (optical) cable runs - and, at the time, higher bandwidth than Ethernet.

Switchable FC enabled the Storage Area Network, or SAN, that became the go-to infrastructure for enterprise IT systems. SANs made financial sense because servers were cheap and large storage arrays expensive. SANs made it feasible to share costly storage resources.

Applications cooperated. The big enterprise apps were mostly database driven - small updates and small capacities in the multi-gigabyte range.

In almost 20 years we've gone from 1Gb to 100Gb links. That's roughly a 25 percent annual growth rate. At the same time, hard drives have gone from about 9GB to 10TB, about a 40 percent growth rate.

At the leading edge, application data capacities have grown from ≈10GB in a database, to ≈100TB, about a 60 percent compound annual growth rate. Given the different growth rates, collision was inevitable.

The internalization of storage

A crude form of NDP is moving the storage inside the server box and off the network. Bandwidth inside a server is cheap. Bandwidth across a network is expensive.

This fact motivated EMC's effort to sell itself to Dell, since EMC couldn't compete for internal storage. This is also driving the advent of converged and hyper-converged infrastructure.

The Storage Bits take

Of course, to take full advantage of the bandwidth and low latency of NDP requires rethinking our software infrastructure as well, just as SSDs require NVMe.

But that's a discussion for another day.

Courteous comments welcome, of course.