But why should you care about GPUs if you're not into AI, gaming, or crypto? Because GPUs can also accelerate your databases, and there's not a single organization today not using one.
Massive parallelism through GPUs
GPUs greatly accelerate operations that can be parallelized. This approach has been in use in masively parallel architectures such as Hadoop or Spark for a while now. The idea is to combine an array of database instances, each on a separate server, and then to use a master node that delegates subqueries to each one.
The individual servers execute their subqueries in parallel, get the result sets back to the master node, which combines them and sends a single one back to the client. GPUs make the same divide-and-conquer approach possible within individual servers, with CPUs taking the role of the master node.
So, can you just add some GPUs to the server hosting your database and expect to see a massive improvement in performance? Not so fast. First of all, not all database operations are parallelizable, and for the ones that are not, adding GPUs won't make a difference.
But even for operations that can be parallelized, databases have to be designed and implemented in a way that enables them to take advantage of the GPU architecture. In other words, it takes a special type of database to be able to capitalize on GPUs.
A few years, patents, and funding rounds later, SQream is run by Gal and VP R&D Razi Shoshani. Varakin has set off to new adventures, but the company has raised $15 million to date and has more than 50 employees and is expected to grow to around 75 employees during 2018.
What are the options in GPU databases?
Today, at GTC, SQream is announcing the latest in a series of partnerships, and we took the opportunity to connect and discuss. Before we delve on the specifics of the partnership, however, let's take a step back to quickly review this space.
Apart from SQream, other GPU database players include Blazegraph, BlazingDB, Brytlyt, Kinetica, MapD, and PG-Strom. SQream positions itself in the analytics database market and differentiates by focusing on big workloads. According to David Leichner, CMO SQream:
"There are several vendors who develop and sell GPU databases. The ones you probably hear the most about are Kinetica and MapD. While SQream is often coupled with these two vendors in roundups about GPU databases, there is one main and several sub differentiators between the solutions.
First and foremost, Kinetica and MapD use in-memory storage. While this enables them to provide extremely fast analysis of up to say 5TB or 10TB of data, they are limited in scalability due to architecture as well as cost. You rarely see either of these talking about data stores of 20TB and up. SQream, on the other hand, is built for massive data stores.
SQream DB is the only GPU powered SQL analytics database that allows organizations to analyze 20 times the amount of data, from terabytes to petabytes, at up to 100 times faster at 10 percent of the cost and administration. A typical engagement would involve lift and shift from an existing MPP or distributed data lake to SQream DB, where SQream DB would do the heavy lifting data analytics."
Some of the reported results include 11.5TB of analytics processing per hour, ingesting and analyzing up to 1PB, queries running up to 2.5 times faster than other flash-based hardware solutions and consistent data rates of 3.2GB per GPU, more than double the peak performance measured with other solutions.
These results will be officially presented in a webinar on April 10, and as always, you should take them with a pinch of salt. What is perhaps more interesting, however, is the analysis on where the partnership fits in SQream's strategy, what's going on in GPU databases, and the interplay with cloud and machine learning workloads.
Are GPU databases a thing?
This comes as the latest in a series of partnerships for SQream: Nvidia, IBM, and Dell EMC as hardware and servers partners' AWS, Microsoft Azure and IBM Bluemix as cloud providers; Tata and Moyo as System Integrators; and a number of storage and network partners. There are, however, two partnerships that stand out.
First, there is Tableau. Trying to position as an analytics database without a visualization layer would be a tough sell. So, SQream has partnered with Tableau to offer visual access to those fast analytics it promises.
Then, we also have Alibaba Cloud. SQream refers to Alibaba Cloud as a strategic partner, and this deserves some analysis. Leichner says the deal with Alibaba includes full integration with the Alibaba Cloud eco-system including installation, monitoring tools, training, support, marketing, and sales.
As a first step, Alibaba Cloud will be the distributor of SQream in China and will actively be marketing and selling SQream to the Chinese market. This is important for a number of reasons -- beyond the sheer size of the market this opens up for SQream.
It also brings us to an important question: How much of a future do GPU databases have as independent offerings? Any traditional CPU-oriented database could do what GPU databases do, by redesigning their architecture to leverage GPUs.
It won't be easy, of course, but if the incumbents would go for it, it would be hard to stop them. It's actually quite likely this would be done via acquisition, as exemplified by Blazegraph, which is now AWS Neptune. AWS has two birds with one stone there, adding a GPU-based graph database to its arsenal.
Leichner says SQream was built from scratch for the GPU processor and holds patents for much of that work. He concedes that while existing vendors can also try to port their DBMS to GPUs or create their own from scratch, it would seemingly make more sense to look at existing vendors in the GPU space as potential acquisition targets.
He also adds that SQream is not on the market to be acquired at this point in time, but he leaves a window open for the future. At this point, the X-IO partnership brings SQream closer to analytics on the edge.
Edge analytics are becoming increasingly important, and one of the key challenges in which Leichner sees use for GPU databases is to provide reliable handling of the exponentially growing data stores to streamline the storage and analytics of the data to garner business intelligence needed to compete.
When asked to comment on use cases for edge analytics and the competition posed there, for example, by Hadoop vendors such as MapR, his reply was that they don't see them as competition but as a complementary solution, or a solution that can also be part of the same eco-system.
Regardless of how this space evolves in terms of mergers and acquisitions we expect to see going forward, it will be interesting to watch the benefits of parallelism via GPUs becoming commoditized.