In an IBM blog labeled "Keep Your Friends Close and Your Data Closer" by Matthew Drahzal, a leader in IBM's Systems & Technology Group (STG), has announced a new storage server for bring super-computer data storage speed to a Commercial Off-The-Shelf (COTS), General Parallel File System (GPFS) storage servers.
The trouble these new servers is meant to address is simple. As Drahzal wrote, "The technology industry has a problem. Disk drives--devices used for over 50 years to store and retrieve digital information -move data too slowly. Companies regularly use 3 terabyte disk drives--roughly equal to the capacity of about 100 iPads--but the drives can only move data at 50 to 100 megabytes per second. Many organizations need to analyze data at 100 gigabytes per second--a difference of a few of orders of magnitude."
To address that problem, we've long been spread our data across multiple drives to speed up access with redundant array of independent disks (RAID). RAID, however, and the drives that use it hasn't kept up with our need for reliable speed. Yes, our drives are growing ever larger, but they've not become more reliable and even the highest level of data redundancy available, RAID 6, or the pairing of RAID 1, mirrored disk, with RAID 5, aka RAID 51, isn't sufficient for data protection. Worse still, when a drive does fail, and they always do, rebuilding big data with RAID can take days.
There's nothing new about this. Indeed, , IBM Research developed GPFS to deal with this issue over a decade ago. For most of its life, GPFS was only available on dedicated hardware such as the IBM Scale Out Network Attached Storage (SONAS) and IBM Storwize V7000 Unified storage systems. That's fine for the biggest of big data enterprises or super-computers, but it's not so affordable for smaller companies that still have big data needs.
Recently IBM extended GPFS to develop GPFS Native RAID (GNR). Drahzal said, "This is a software layer beneath GPFS that interacts directly with the disk drives themselves. Modern servers have more than enough processing power to manage the disks directly, so GPFS Native RAID eliminates the need for expensive external RAID arrays. This essentially cuts out the middleman and makes data more quickly available for analysis. This capability has been available in the IBM Power 775 server for well over a year and is managing scores of petabytes of data worldwide."
Again, great for people who can afford high-end data networks and super-computers, but not so affordable for non-Fortune 500 companies. Today, IBM has announced the IBM System x GPFS Storage Server. This makes GPFS and GNR available for far more affordable IBM System x servers. IBM claims that "The Storage Server can run in any datacenter and even in most office environments. It will be delivered as a complete, integrated storage solution consisting of servers, solid state drives (SSDs), disks and software for the IBM Intelligent Cluster."
COTS technology doesn't mean you need to give up performance or reliability. According to IBM, the Juelich Supercomputing Centre (JSC) will be using the IBM System x GPFS Storage Server instead of a large storage array, for its IBM Blue Gene/Q-based JUQUEEN Linux-powered supercomputer. JUQUEEN is one of the fastest supercomputers in the world.
On JUQUEEN, GNR must handle storage for 100,000-disk. 120-petabyte systems that will experience daily failures. How does it do it? In part, it manages by integrating Solid State Drives (SSDs) to temporarily store small blocks of data and maintain system logs. They're not used as add-ons, but as an an integral part of its design. As Drahzal observed. "Instead of exclusively writing data to disk, GPFS will help keep data right were it is needed. To play on the old saying, keep your friends close and your important data closer."
The IBM System x iDataPlex dx360 M4, which is a rack-mounted take on GNR, also offers Intel Xeon Phi co-processor as an option to to deliver better overall data performance for high performance computing applications.
As our big data grows ever bigger, it's good to see hardware vendors, such as IBM, giving up the affordable tools we need to keep from drowning in all our data.