Guest Post: Curt Monash reports from the front lines of a database revolution.
Mike Stonebraker created much of the database management industry. Now he wants to blow it up. Via a project called H-Store [pdf file], he proposes to manage high-end OLTP databases entirely in RAM, with no disks, no redo logs, little multi-threading, and only optimistic locking. And by the way, he wants to replace SQL with Ruby or Python.
Here are answers to some basic questions about H-Store, to the best of my understanding.
What is H-Store?
I-Store is a research project, with an associated prototype DBMS, focused on high-volume online transaction processing (OLTP).
What's new about it?
H-Store is designed to run all in RAM, with no synchronous persistence. It further makes the assumption that most transactions are short-running. The researchers argue that 90-95% of the processing in conventional DBMS goes into concurrency, locking, and logging code that H-Store doesn't need.
How can H-Store be ACID-compliant without disk?
By running lots of replicated RAM-based copies, geographically dispersed.
Are they really going to go without disk?
I don't think so. H-Store won't do synchronous disk writes, and recovery will come from other RAM copies, not from disk. But snapshot checkpointing fits well into the architecture, and it would be silly not to provide that.
Who's behind H-Store?
H-Store boasts pretty much the same researchers who developed C-Store, which was commercialized as columnar data warehouse DBMS Vertica. Team members I've spoken with include Michael Stonebraker (MIT), Samuel Madden (MIT), and Daniel Abadi (Yale). Brown University is also involved.
When can I have it?
H-Store lags C-Store by about three years, and Vertica started enjoying significant sales late last year, so a first approximation would suggest H-Store will be useful some time in 2010, with the first serious academic prototype being finished late this year. But let's not assume H-Store will succeed commercially as fast as C-Store did. It's one thing to adopt a complex-analytics product that will only ever have a handful of actual users, and quite another to bet a super-high-volume OLTP system on unproven technology.
Is H-Store going to be a complete replacement for Oracle?
No. Oracle does lots of things, and replacing it requires a variety of more specialized technologies. Stonebraker and I have been going back and forth about the exact list, but it's something like:
- High-end OLTP (Oracle, SQL Server, DB2 today – eventually H-Store
- Mid-range OLTP (MySQL, PostgreSQL, EnterpriseDB, Progress)
- Row-based analytic (Teradata, Netezza, DATAllegro)
- Column-based analytic (Vertica, ParAccel, Infobright) – these also win for RDF
- Text and XML (Microsoft/FAST, Autonomy, Google, Coveo, Marklogic, Attivio)
- Embedded (SQL Anywhere, solidDB)
- Stream non-DBMS (Coral8, StreamBase, Apama)
- Big cloud sub-DBMS (MapReduce, Hadoop, SimpleDB)
Where can I learn more?
Academically, H-Store is described in a paper and slide presentation. I examined H-Store and its assumptions at some length over on DBMS2, where you'll also find an extensive analysis of other specialized database technologies. The H-Store team writes for the Database Column blog, and H-Store discussion is expected over there soon. Also related is a series of posts Stonebraker and I have done on general database diversity.