Big data vs. traditional databases: Can you reproduce YouTube on Oracle's Exadata?

Want to know how disruptive so-called big data efforts can be to traditional database companies? Try replicating YouTube on Oracle hardware and software.
Written by Larry Dignan, Contributor

Increasing data requirements, especially the unstructured information such as video, are going to relegate relational databases to the enterprise scrap heap as an emerging breed of vendors chips away at traditional software powers.

That's the overview from Cowen & Co. analyst Peter Goldmacher. In a 75-page report, Goldmacher walks through the database landscape and concludes that the consensus view that the growth of data will boost traditional database vendors is dead wrong. Goldmacher said:

We believe the vast majority of data growth is coming in the form of data sets that are not well suited for traditional relational database vendors like Oracle. Not only is the data too unstructured and/or too voluminous for a traditional RDBMS, the software and hardware costs required to crunch through these new data sets using traditional RDBMS technology are prohibitive. To capitalize on the Big Data trend, a new breed of Big Data companies has emerged, leveraging commodity hardware, open source and proprietary technology to capture and analyze these new data sets. We believe the incumbent vendors are unlikely to be a major force in the Big Data trend primarily due to pricing issues and not a lack of technical know-how.

Oracle doesn't buy Goldmacher's take. On Oracle's most recent conference call, executives talked up big data and how it will benefit the company.

The crux of Goldmacher's argument that big data will crush traditional database companies revolves around cost. Emerging big data players can price better than large database players like Oracle that have margins to protect. In other words, Oracle would have to charge 9x more than the blended average of big data vendors to solve data conundrums.

Over time, this price differential as well as the growth of corporate unstructured data will mean so-called big data players win. That means the likes of big fish like Oracle and IBM and middle-tier players---HP Vertica, EMC Greenplum and Teradata---will have to deal with the likes of Infobright, 1010 Data, Splunk and Cloudera.

To illustrate this point, Goldmacher did an interesting exercise where outlined how to replicate YouTube on proprietary enterprise systems. Here's what happens to costs when YouTube meets Oracle Exadata machines.

First the assumptions: Goldmacher estimated that YouTube consumption---user uploads of 48 hours of video a minute and 3 billion videos a day along with roughly 45 petabytes of viewed videos a day---would require at least 9 full-rack Exadata machines at $1.5 million each. There would be at least 18 Exadata machines to handle spikes. Those machines would add up to 14 Exalogic devices to serve data at $1.1 million per system. The software stack under Oracle would include WebLogic middleware, Oracle databases, Exadata optimized storage and Oracle as operating system. The open source comparison included JBoss middleware, MySQL, Hadoop and Red Hat Enterprise Linux as the OS.

The bottom line looks like this (click to enlarge):

In a nutshell, the Oracle Exadata capital expenses for hardware and software total $589.4 million compared to an open source and commodity hardware cost of $104.2 million. Annual expenses (staff and support) are $99 million for Oracle Exadata and $15.1 million for an open source stack. The personnel costs are based on the nine engineer staff of the original YouTube team.

Here's a look at the hardware involved:

The open source hardware stack consists of HP server racks, storage with Cisco Nexus switches.

But hardware is fairly simple. The beauty of Oracle's integrated hardware/software stacks---at least for the company---is the licensing and maintenance revenue stream.

Goldmacher noted:

At first glance, total core hardware costs of roughly $155M, just roughly 5% of Google’s current CapEx seem reasonable. This line of thinking lasts until Oracle presents the bill for its software: a not-insignificant $400M for database and Exadata storage licenses alone, bringing the total upfront investment to $570M.

Here's a look at the software costs:

And the open source side.

Now there are a few caveats. Goldmacher didn't create assumptions for in-memory databases like Membase because support pricing wasn't readily available. But overall, you get the picture. Big data may mean some large headaches for established relational database players looking to preserve chunky profit margins.


Editorial standards