A ZDNet Multiplexer Blog What's this?

Releasing clouds of data

Much is talked about in terms of mining data. Most enterprises, however, 'write once, read never'. How do you make research data available across your organisation at a local level and even move beyond that to a global level?

The position of being able to share data among employees in global locations was demonstrated when Intel helped The San Diego Supercomputer Center (PDF) (SDSC) create a massive storage cloud that makes research data of virtually any type accessible by others, easily and at a low cost.

Traditionally, researchers have archived data on tape or other removable media, typically on isolated storage that is chosen with safekeeping in mind. This made accessibility by other colleagues difficult. The limitations of using tape or other removable media becomes even more evident with the increasingly mainstream big data capabilities being utilised for data mining and analysis on massive data sets such as those created by large-scale simulations in scientific engineering.

To make scientific data sets readily available to researchers all over the world, engineers at the SDSC created a storage cloud with an initial raw capacity of 5.5 petabytes (PB), which is able to scale to tens of petabytes.

At the core are open-source solutions: CentOS and OpenStack are the software technologies of choice for this project, and nodes based on Intel Xeon processors deliver a cost-effective performance that helps drive success.

A major advantage of the project was Intel's involvement in the OpenStack community, which has included significant contributions to the project.

This relationship has helped ensure high optimization for the features and capabilities of Intel Xeon processors. Intel's broad industry and ecosystem involvement extends to collaboration with many equipment providers, such as Aberdeen, Arista Networks, and Dell.

Those relationships benefit the SDSC Cloud, the main hardware components of which include:

  • Dell PowerEdge R610 and R620 servers, rack-optimized 1U systems based on the Intel Xeon processor 5600 series and Intel Xeon processor E5-2600 product family, respectively, used as both proxy and storage nodes
  • Aberdeen x539 Storage Servers, rack-optimized 5U systems based on the Intel Xeon processor 5500 series and configured with 24 SATA hard drives, each with 2-terabyte (TB) capacity (48TB total per server)
  • Intel Ethernet Network Daughter Card X520-DA2 / I350-T2 (Dell 430-4935), with two 10-gigabit Ethernet SFP+ Direct Attach ports and two gigabit Ethernet 1000BASE-T ports
  • Redundant Arista Networks 7508 switches, each providing 384 10-gigabit Ethernet ports for more than 10 terabits per second of non-blocking, IP-based connectivity

The open-source basis for the SDSC Cloud has also delivered advantages in terms of adding new features and capabilities when needed on a shorter development cycle than would be possible with commercial software.

As Ron Hawkins, industry relations director San Diego Supercomputer Center, said, "the OpenStack object store offers extremely scalable capacity for any type of data, accessible by a wide range of APIs. It's also automatically optimised for the performance and security features of Intel Xeon processors."