Near-data processing: What it is and why you need it
Near-data processing (NDP) is a simple concept: Place the processing power near the data, rather than shipping the data to the processor. It's also inevitable. Here's why it's coming to your datacenter.
It was almost 20 years ago that the first gigabit fibre-channel (FC) storage came to market (I was its product manager). FC was optimized for storage use - low latency; guaranteed delivery; long (optical) cable runs - and, at the time, higher bandwidth than Ethernet.
Switchable FC enabled the Storage Area Network, or SAN, that became the go-to infrastructure for enterprise IT systems. SANs made financial sense because servers were cheap and large storage arrays expensive. SANs made it feasible to share costly storage resources.
Applications cooperated. The big enterprise apps were mostly database driven - small updates and small capacities in the multi-gigabyte range.
In almost 20 years we've gone from 1Gb to 100Gb links. That's roughly a 25 percent annual growth rate. At the same time, hard drives have gone from about 9GB to 10TB, about a 40 percent growth rate.
At the leading edge, application data capacities have grown from ≈10GB in a database, to ≈100TB, about a 60 percent compound annual growth rate. Given the different growth rates, collision was inevitable.
The internalization of storage
A crude form of NDP is moving the storage inside the server box and off the network. Bandwidth inside a server is cheap. Bandwidth across a network is expensive.
This fact motivated EMC's effort to sell itself to Dell, since EMC couldn't compete for internal storage. This is also driving the advent of converged and hyper-converged infrastructure.
The Storage Bits take
Of course, to take full advantage of the bandwidth and low latency of NDP requires rethinking our software infrastructure as well, just as SSDs require NVMe.