Scale changes everything

"Scale changes everything" is Silicon Valley folk wisdom. So how does it affect us in storage and, surprisingly, elsewhere?

Data storage is evolving faster now than at any time in the last 40 years that I've been watching it. Why? Scale - on several levels. Forty years ago, the height of mainframe/minicomputer hegemony, buying mass storage meant buying either individual disks or tape drives. Companies sold disk mirroring software for availability, and some caching options for performance, but at $50/MB disks were too expensive to waste.

Storage arrays scaled disks

With the publication of the seminal RAID paper in 1988, the strategy of aggregating disks for availability and performance electrified the industry. EMC, which until then had been a small, add-in memory supplier, bought RAID technology, dubbed it Symmetrix, and became the second highest performing stock of the 1990s, as well as a multi-billion dollar behemoth. Traditional storage suppliers, such as IBM and DEC, lost their captive markets and their storage dominance.

SSDs scaled peformance

Storage arrays dominated add-on storage until about 2010, when the first high-performance SSDs - led by Fusion I/O - suddenly scaled performance way beyond what multi-million dollar arrays could do. Suddenly, a few thousand dollars could provide unrivaled storage performance. And just as suddenly the software overhead of disk accesses and database I/Os went from less than a tenth of an I/O to more than half, which set off a scramble to re-engineer I/O stacks and interconects, leading to PCIe/NVMe, with their very low latency, high bandwidth, and configuration flexibility.

Cloud storage

At the same time that SSDs were exploding the performance of storage, cloud storage was blowing up the cost. Originally built using the cheapest possible hardware and glued together with software, cloud storage took consumer-priced storage into hyperscale applications. As hyperscale systems and storage scaled up, it became feasible for providers to design their own hardware, optimized for their architectures and workloads. Those optimizations increased cloud vendor margins, making it feasible for them to hire PhDs to fine tune their systems. Even the largest enterprises couldn't justify hiring PhDs to do that, a fact reflected in enterprise IT's dismal 30+ percent utilization rates.

The future of scale in storage

The Next Big (High-Scale) Thing in storage is large memory servers. Sporting terabytes of physical memory, these servers enable critical applications to run completely in memory. In one go they wipe out much of the demand for SSDs, as well as enabling massive multi-core CPUs to put their compute power to work on large data sets and parallelizable tasks - key attributes of image processing and machine learning.

The Storage Bits take

Storage isn't the only thing scale changes. A place with 5 people per square mile needs a lot less of everything - infrastructure, police - than a place with 50,000 people per square mile. It isn't a question of Small vs Big: it's simply a function of scale. The challenge is help citizens understand that what works in rural Arizona - where I live - won't necessarily work in urban New Jersey. If you specify IT products, or produce them, understanding how scale affects the economics and technology you incorporate is increasingly critical to your success. Scale changes everything - including your job. Comments welcome!