Reproducing YouTube on Oracle Exadata

Increasing data requirements, especially unstructured information such as video, are going to relegate relational databases to the enterprise scrapheap, as an emerging breed of vendors chips away at traditional software powers.

Increasing data requirements, especially unstructured information such as video, are going to relegate relational databases to the enterprise scrapheap, as an emerging breed of vendors chips away at traditional software powers.

That's the overview from Cowen & Co analyst Peter Goldmacher. In a 75-page report, Goldmacher walks through the database landscape and concludes that the consensus view that the growth of data will boost traditional database vendors is dead wrong. Goldmacher said:

We believe the vast majority of data growth is coming in the form of data sets that are not well suited for traditional relational database vendors like Oracle. Not only is the data too unstructured and/or too voluminous for a traditional RDBMS, the software and hardware costs required to crunch through these new data sets using traditional RDBMS technology are prohibitive. To capitalise on the Big Data trend, a new breed of Big Data companies has emerged, leveraging commodity hardware, open source and proprietary technology to capture and analyse these new data sets. We believe the incumbent vendors are unlikely to be a major force in the Big Data trend primarily due to pricing issues, and not a lack of technical know-how.

Oracle doesn't buy Goldmacher's take. On Oracle's most recent conference call, executives talked up big data, and how it will benefit the company.

The crux of Goldmacher's argument that big data will crush traditional database companies revolves around cost. Emerging big data players can price better than large database players like Oracle that have margins to protect. In other words, Oracle would have to charge nine times more than the blended average of big data vendors to solve data conundrums.

Over time, this price differential, as well as the growth of corporate unstructured data, will mean that so-called big data players win. This means that the likes of big fish like Oracle and IBM, and middle-tier players — HP Vertica, EMC Greenplum and Teradata — will have to deal with the likes of Infobright, 1010 Data, Splunk and Cloudera.

To illustrate this point, Goldmacher did an interesting exercise where it outlined how to replicate YouTube on proprietary enterprise systems. Here's what happens to costs when YouTube meets Oracle Exadata machines.

First the assumptions: Goldmacher estimated that YouTube consumption — user uploads of 48 hours of video per minute, and 3 billion videos per day, along with roughly 45 petabytes of viewed videos per day — would require at least nine full-rack Exadata machines at US$1.5 million each. There would be at least 18 Exadata machines to handle spikes. Those machines would add up to 14 Exalogic devices to serve data at US$1.1 million per system. The software stack under Oracle would include WebLogic middleware, Oracle databases, Exadata optimised storage and Oracle as the operating system. The open source comparison included JBoss middleware, MySQL, Hadoop and Red Hat Enterprise Linux as the OS.

The bottom line looks like this (click to enlarge):

In a nutshell, the Oracle Exadata capital expenses for hardware and software total US$589.4 million compared to an open source and commodity hardware cost of US$104.2 million. Annual expenses (staff and support) are US$99 million for Oracle Exadata and US$15.1 million for an open source stack. The personnel costs are based on the nine engineer staff of the original YouTube team.

Here's a look at the hardware that was involved:

The open source hardware stack consists of HP server racks, storage with Cisco Nexus switches.

But hardware is fairly simple. The beauty of Oracle's integrated hardware/software stacks — at least for the company — is the licensing and maintenance revenue stream.

Goldmacher noted:

At first glance, total core hardware costs of roughly US$155 million, just roughly 5 per cent of Google's current CapEx, seem reasonable. This line of thinking lasts until Oracle presents the bill for its software: a not insignificant US$400 million for database and Exadata storage licences alone, bringing the total upfront investment to US$570 million.

Here's a look at the software costs:

And the open source side:

Now there are a few caveats. Goldmacher didn't create assumptions for in-memory databases like Membase, because support pricing wasn't readily available, but, overall, you get the picture. Big data may mean some large headaches for established relational database players looking to preserve chunky profit margins.

Via ZDNet US