GPU databases are coming of age

GPUs are powering a new generation of databases. What is so special about them and can they come into their own?

Video: Cryptocurrency mining raises GPU prices, causes shortage

Google Compute Engine now offers faster Nvidia GPUs

Google's Cloud GPUs sees Nvidia K80 and P100 GPUs introduced.

Read More

GPUs are that obscure object of desire right now. Originally created to provide better performance for gamers, now everyone from crypto miners to deep-learning experts wants a piece of them.

Read also: Nvidia's Titan V giant: $3,000 buys you 'most powerful PC GPU ever'

Increased demand for cryptocurrency mining and competition for memory modules has created a perfect storm driving GPU prices to skyrocket. That's bad news for users, but great for GPU manufacturers like Nvidia.

We have elaborated on ZDNet on how Nvidia has gone from gamer's delight to AI powerhouse. Just looking at the agenda at GTC, Nvidia's GPU Technology Conference taking place this week in Silicon Valley, confirms this transition.

But why should you care about GPUs if you're not into AI, gaming, or crypto? Because GPUs can also accelerate your databases, and there's not a single organization today not using one.

Massive parallelism through GPUs

GPUs greatly accelerate operations that can be parallelized. This approach has been in use in masively parallel architectures such as Hadoop or Spark for a while now. The idea is to combine an array of database instances, each on a separate server, and then to use a master node that delegates subqueries to each one.

Read also: Google releases Cloud TPU beta, GPU support for Kubernetes

The individual servers execute their subqueries in parallel, get the result sets back to the master node, which combines them and sends a single one back to the client. GPUs make the same divide-and-conquer approach possible within individual servers, with CPUs taking the role of the master node.

gpu-acceleration.jpg

GPUs can greatly accelerate workloads that can be broken down in parts to be executed in parallel, working in tandem with CPUs.

(Image: SQream)

So, can you just add some GPUs to the server hosting your database and expect to see a massive improvement in performance? Not so fast. First of all, not all database operations are parallelizable, and for the ones that are not, adding GPUs won't make a difference.

But even for operations that can be parallelized, databases have to be designed and implemented in a way that enables them to take advantage of the GPU architecture. In other words, it takes a special type of database to be able to capitalize on GPUs.

This is the premise on which GPU databases such as SQream were built on. SQream was born when entrepreneur and investor Ami Gal met programmer and algorithm builder Kostya Varakin in 2010. They joined forces and started working on implementing Varakin's ideas on how to work with GPUs in a database.

A few years, patents, and funding rounds later, SQream is run by Gal and VP R&D Razi Shoshani. Varakin has set off to new adventures, but the company has raised $15 million to date and has more than 50 employees and is expected to grow to around 75 employees during 2018.

What are the options in GPU databases?

Today, at GTC, SQream is announcing the latest in a series of partnerships, and we took the opportunity to connect and discuss. Before we delve on the specifics of the partnership, however, let's take a step back to quickly review this space.

Read also: Intel turns to AMD for semi-custom GPU for next-gen mobile chips

Apart from SQream, other GPU database players include Blazegraph, BlazingDB, Brytlyt, Kinetica, MapD, and PG-Strom. SQream positions itself in the analytics database market and differentiates by focusing on big workloads. According to David Leichner, CMO SQream:

"There are several vendors who develop and sell GPU databases. The ones you probably hear the most about are Kinetica and MapD. While SQream is often coupled with these two vendors in roundups about GPU databases, there is one main and several sub differentiators between the solutions.

First and foremost, Kinetica and MapD use in-memory storage. While this enables them to provide extremely fast analysis of up to say 5TB or 10TB of data, they are limited in scalability due to architecture as well as cost. You rarely see either of these talking about data stores of 20TB and up. SQream, on the other hand, is built for massive data stores.

SQream DB is the only GPU powered SQL analytics database that allows organizations to analyze 20 times the amount of data, from terabytes to petabytes, at up to 100 times faster at 10 percent of the cost and administration. A typical engagement would involve lift and shift from an existing MPP or distributed data lake to SQream DB, where SQream DB would do the heavy lifting data analytics."

nvidia-volta-v100-620x465.jpg

The rise of GPUs has lead to a new generation of GPU-powered databases.

(Image: Nvidia)

The partnership SQream is announcing today is with X-IO, as SQream and the X-IO Axellio Edge Micro-Datacenter platform were integrated. SQream says benchmarks were performed on hundreds of terabytes, resulting in very fast analysis and short query times.

Some of the reported results include 11.5TB of analytics processing per hour, ingesting and analyzing up to 1PB, queries running up to 2.5 times faster than other flash-based hardware solutions and consistent data rates of 3.2GB per GPU, more than double the peak performance measured with other solutions.

These results will be officially presented in a webinar on April 10, and as always, you should take them with a pinch of salt. What is perhaps more interesting, however, is the analysis on where the partnership fits in SQream's strategy, what's going on in GPU databases, and the interplay with cloud and machine learning workloads.

Are GPU databases a thing?

This comes as the latest in a series of partnerships for SQream: Nvidia, IBM, and Dell EMC as hardware and servers partners' AWS, Microsoft Azure and IBM Bluemix as cloud providers; Tata and Moyo as System Integrators; and a number of storage and network partners. There are, however, two partnerships that stand out.

Read also: Nvidia expands new GPU cloud to HPC applications

First, there is Tableau. Trying to position as an analytics database without a visualization layer would be a tough sell. So, SQream has partnered with Tableau to offer visual access to those fast analytics it promises.

Then, we also have Alibaba Cloud. SQream refers to Alibaba Cloud as a strategic partner, and this deserves some analysis. Leichner says the deal with Alibaba includes full integration with the Alibaba Cloud eco-system including installation, monitoring tools, training, support, marketing, and sales.

As a first step, Alibaba Cloud will be the distributor of SQream in China and will actively be marketing and selling SQream to the Chinese market. This is important for a number of reasons -- beyond the sheer size of the market this opens up for SQream.

SQream also has partnerships with AWS, Azure, and IBM Bluemix. However, unlike Alibaba, all of them are also database vendors, which puts SQream in a co-opetition position. As cloud vendors are looking at workloads such as analytics and machine learning strategically, partnering with a cloud vendor that does not have its own horse in the race makes sense.

hybridcloud.jpg

Competition in the cloud goes through GPUs, too.

(Image: Getty Images/iStockphoto)

It also brings us to an important question: How much of a future do GPU databases have as independent offerings? Any traditional CPU-oriented database could do what GPU databases do, by redesigning their architecture to leverage GPUs.

It won't be easy, of course, but if the incumbents would go for it, it would be hard to stop them. It's actually quite likely this would be done via acquisition, as exemplified by Blazegraph, which is now AWS Neptune. AWS has two birds with one stone there, adding a GPU-based graph database to its arsenal.

Leichner says SQream was built from scratch for the GPU processor and holds patents for much of that work. He concedes that while existing vendors can also try to port their DBMS to GPUs or create their own from scratch, it would seemingly make more sense to look at existing vendors in the GPU space as potential acquisition targets.

He also adds that SQream is not on the market to be acquired at this point in time, but he leaves a window open for the future. At this point, the X-IO partnership brings SQream closer to analytics on the edge.

Edge analytics are becoming increasingly important, and one of the key challenges in which Leichner sees use for GPU databases is to provide reliable handling of the exponentially growing data stores to streamline the storage and analytics of the data to garner business intelligence needed to compete.

When asked to comment on use cases for edge analytics and the competition posed there, for example, by Hadoop vendors such as MapR, his reply was that they don't see them as competition but as a complementary solution, or a solution that can also be part of the same eco-system.

Regardless of how this space evolves in terms of mergers and acquisitions we expect to see going forward, it will be interesting to watch the benefits of parallelism via GPUs becoming commoditized.

Previous and related coverage

Cyber attackers are cashing in on cryptocurrency mining - but here's why they're avoiding bitcoin

Cryptocurrency mining malware has emerged as a key method of criminal hackers making money - so why aren't they targeting the most valuable blockchain-based currency of them all?

Cryptocurrency mining GPU demand hampers scientific research

Researchers, alongside gamers, are becoming frustrated due to the hardware shortages.

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All