Cray's Urika-GX aims at big data analytics

The Urika-GX aims to fuse supercomputer power with an open, enterprise format.

cray-urika-gx-sytem.png

Cray Urika GX: Aims to combine the best of the GD and XA but in a smaller format.

Photo: Cray

Business analytics are a core feature of most business systems today and to get the most from them companies are allotting them more and more compute power.

Cray's new Urika-GX, the latest in the line of its top platform, provides an open, enterprise framework aimed specifically at the analytics market.

The new machines are already being used by customers across the life sciences, healthcare, and cybersecurity industries, the company said. For example the Broad Institute of MIT and Harvard, a research institute, is using the Cray Urika-GX system for analyzing high-throughput genome sequencing data.

According to Dominik Ulmer, Cray's VP of business operations EMEA, this is not exactly new ground for Cray. The company has been in the analytics business for the last four years.

According to Ulmer, it started with a system based on graph analysis. The first system was Urika GD, the second Urika XA, and now it has launched the GX.

"It may sound like this is a merger of the two products," said Ulmer, "and there is an element of this."

The thinking behind the GX is that today, companies are making decisions on a data-driven basis. So if you really want to have a competitive advantage based on data-driven decision making then, "you have to make them fast and at a high frequency and in as flexible a way as possible," he said.

That means being able to test different hypotheses as quickly as possible and concurrently, he said, and is what Cray calls "agile analytics".

"We have chosen features from our supercomputing stack," he said.

The Cray GX is aimed at data scientist who will want to carry out high-quality analyses and perform data discovery, said Ulmer. "That means doing high-level data analysts with standard tools like Hadoop and Spark, along with something that we had on our Urika GD system - special, purpose-built hardware with graph analytics on top."

He believes that this will help researchers go deeper in order to discover unknown patterns and new dependencies and relationships.

The aim is to let users do real-time analytics that can be adapted with a number of models running side-by-side in real-time, leading to a quicker turnaround when testing hypotheses, he said.

dominik-ulmercray.jpg

Cray's Ulmer: "The aim is to fuse supercomputer power with an open, enterprise format."

Photo: Cray

How does the GX stack up on performance? Cray has run benchmarks of the GX against a system from "a major cloud provider" and, according to Ulmer, on simple workloads, like loading and partitioning, the GX was twice as fast and on more complicated tasks, like PageRank [an algorithm used by Google search to rank web sites] the GX is four times faster.

The system uses a standard Apache framework, along with Hadoop and Spark, that is pre-integrated. "This is something that you can deploy within days and is in production mode quickly to open software," he said.

"The GX is pre-integrated but it is not a closed box," he said. He believes that this is the best of both worlds since it is pre-integrated but at the same time, "can have all of the standard features and controls demanded by the IT department".

On the software side, there is a base software stack running Linux. On top of that is the CentOS kernel, and on top of that there are modifications to make it "a very lightweight OS", according to Ulmer.

On top of that is a standard analytics environment that could be based on Java, Python, or whatever the user wants, along with Cray's own compiler as an option.

Sitting on top of that can be HDFS or a standard Cray option. On top of that is an Apache Mesos level which abstracts the hardware resources and makes them available to the applications, and on top of that there are two different workload managers: Yarn for the analytics side or Cray's own Slurm.

Finally, on top of what is a very tall stack can be Hadoop, Spark, or the Cray graph engine.

The new system will be available in three sizes: small which has 16 nodes, medium which has 32, and large which has 48. The biggest system will use 18-core, Intel Broadwell processors in multiple configurations.

"In an L configuration you would have 1,728 cores," said Ulmer. With that you get up to 22TB of DRAM and 35TB of DDS and all with up to 192TB of disk.

Read more about Cray