Stamford, CT-based analyst and market research firm Gartner released its annual data warehouse Magic Quadrant report Monday. On the one hand, data warehousing (DW) and Big Data can be seen as different worlds. But there's an encroachment of SQL in the Hadoop world, and Massively Parallel Processing (MPP) data warehouse appliances can now take on serious Big Data workloads. Add to that the number of DW products that can integrate with Hadoop, and it's getting harder and harder to talk about DW without the discussing Big Data as well. So, the release of the Gartner data warehouse report is germane to the Big Data scene overall and some analysis of it here seems sensible.
The horse race First, allow me to answer the burning question: who "won?" Or put another way, which vendor had, in Gartner's inimitable vernacular, the greatest "ability to execute" and "completeness of vision?" The answer: Teradata. Simply put, the company's 3-decade history; the great number of industry verticals with which it has experience; the number and diversity of its customers (in terms of revenue and geography); and the contribution of the Aster Data acquisition to product diversity really impressed Gartner.
But Teradata came out on top last year as well, and its price points mean it's not the DW solution for everyone (in fact, Gartner mentions cost as a concern overall for Teradata). So it's important to consider what else the report had to say. I won't rehash the report itself, as you can click the link above and read it for yourself, but I will endeavor to point out some overall trends in the report and those in the market that the report points out.
Logical data warehouse If there is any megatrend in the DW Magic Quadrant (MQ) report, it's the emergence of the logical data warehouse. Essentially, this concept refers to the federation of various physical DW assets into one logical whole, but there are a few distinct vectors here. Logical data warehouse functionality can allude to load balancing, disbursed data (wherein different data is stored in disparte physical data warehouses and data marts, but are bundled into a logically unified virtual DW), and multiple workloads (where relational/structured, NoSQL/semi-structured and unstructured data are integrated logically).
This multiple workload vector is a Big Data integration point too, with 10 of the 14 vendors in the report offering Hadoop connectors for their DW products.
In-memory is hot In-memory technology, be it column store-based, row store-based, or both, and whether used exclusively or in a hybrid configuration with disk-base storage, is prevalent in the DW space now. Gartner sees this as a competitive necessity, and gives IBM demerits for being behind the in-memory curve. On the other hand, it refers three times to the "hype" surrounding in-memory technology, and generally attributes the hype to SAP's marketing of HANA. Meanwhile, Gartner notes that HANA's customer base doubled from about 500 customers at the end of June 2012 to 1,000 at the end of the year.
Support for R Support for the open source R programming language seems to be accelerating in mainstream DW acceptance and recognition. Support for the language, used for statistics and analytics applications, is provided by 2013 DW MQ vendors Exasol, Oracle and SAP. Oracle offers a data connector for R, whereas Exasol and SAP integrate R into their programming and query frameworks.
I think it's likely we'll see adoption of R gain even more momentum in 2013, in the DW, Business Intelligence and Hadoop arenas.
Several players with customer counts at 300 or less Not everything in the Gartner DW MQ report focuses on big, mainstream forces. Alongside mega-vendors like IBM, Oracle, SAP and Microsoft, or veteran DW-focused vendors like Teradata, the report includes several vendors with relatively small customer counts. The report says that 1010Data has "over 250" customers and Infobright "claims to have 300 customers." And those numbers are on the high side of small with Actian (formerly Ingres) weiging in at "over 65" customers, ParAccel claiming "over 60," Calpont at "about 50 named customers" and the report explaining that Exasol "reports 38 customers in production and expects to have 50 customers by January 2013."
I'm not saying this to be snarky, but this is an important reality check. Many of us in the press/blogger/analyst community, myself included, somtimes assign big-vendor-gravitas to companies that actually have very few customers. Sometimes the tail wags the dog in this startup-laden industry, and readers should be aware of this.
That said, while ParAccel only claims "over 60" customers, one of its investors is Amazon, which licensed ParAccel's technology for its new Redshift cloud-based data warehouse service.
Multiple "form factors" Another trend pointed out by Gartner is the vareity of deployment/procurement configurations (or -- to use Gartner's term -- "form factors") that DW products are available in. The options offered by vendors include straight software licenses, reference architectures, appliances, Platform as a Service (PaaS) cloud offerings, and full-blown managed services, where vendors provision, monitor and administer the DW infrastructure. And, in the case of non-cloud options, vendors may base their pricing on number of servers, processor cores or units of data (typically terabytes). Sometimes they even let customers decide which model works best.
Many vendors offer several of these form factor and licensing options, and Gartner implies that the more such options a vendor offers, the better. Those that offer only one option may disqualify themselves from consideration by customers. Those that offer several, and especially those that allow customers the agility to move between deployment and pricing models, tend to score higher in customer satisfaction.
Data models Speaking of models, Gartner makes special mention that HP and Oracle offer industry-specific DW data models and that Microsoft, through certain partners, does as well. Gartner sees this as an important feature in vendors' data warehouse offerings. I would agree...data models can quickly convey best practices and serve, at the very least, as useful points of departure for accelerating DW implementations.
HCatalog for matadata management HCatalog, originally introduced by Yahoo/Hortonworks and now an Apache incubator project in its own right, acts as a metadata repository designed to unify storage and data management for Hadoop stack components like Hive, Pig and the Hadoop MapReduce engine itself. On the DW side of the world, ParAccel and Teradata are each integrating with HCatalog as a way to integrate Hadoop data into the DW design, rather than merely connecting to and importing that data. This would seem to indicate good traction for HCatalog, and perhaps we will see such support spread more ubiquitously next year. Microsoft on the upswing I think it's important to point out Gartner's coverage of Microsoft in this year's DW MQ report. Microsoft was in the Leaders Quadrant last year, but at its very lower-left corner, whereas this year it's smack in the center of that quadrant. Last year, the Redmond-based software giant led with its Fast Track data warehouse, based on its SQL Server Enterprise product. Its MPP data warehouse appliance, SQL Server Parallel Data Warehouse (PDW) had little momentum, and few customers.
I once served on Microsoft's Business Intelligence Partner Advisory Council, and was initially unimpressed with the PDW product. It struck me at the time as a product created to give Microsoft credibility in the Enterprise-grade database game and provide peace of mind for customers, and less of a product that was actually designed to generate siginificant unit sales.
But things have turned around. A year later, the product is up to its third "appliance update" (and much better aligned with non-PDW editions of SQL Server) and a bona fide version 2.0 of the product is due later this year. Gartner says PDW has been adopted by 100 new customers over the last 18 months, and is likely to accelerate further, as Dell's PDW-based appliance gains momentum.
The next version of PDW will include the new PolyBase component, which integrates PDW's MPP engine with data nodes in the Hadoop Distributed File System (HDFS) to provide true parallelized, non-batch, SQL query capability over Hadoop data.
And the next major version of SQL Server Enterprise will include an in-memory transactional database engine, code-named Hekaton. Add to that the ability to license SQL Server outright, obtain DW reference architectures for it, buy various SQL Server-based appliances, and to use SQL Server in the Amazon and Microsoft clouds (in Infrastructure as a Service or PaaS configurations) and the product's trajectory would seem to be upward.
What's it all mean? No matter what you may think of the merits of Gartner's influence in the technology market, there's no denying that influence exists. The DW MQ report is extremely important and seems especially methodical, well-thought out, and insightful this year. Analysts Mark A. Beyer, Donald Feinberg, Roxane Edjlali and Merv Adrian have produced a report that everyone in the field should read.