Even for big data enthusiasts, HTAP is probably YAA. Cryptic acronyms aside though, the notion of having one database to process both transactional and analytical workloads has been a recurring one through the years. This is the road that MarkLogic, like a few others before it, is walking down.
The road to Jericho
MarkLogic started out as a pure-play XML database back in 2001. The founders had a background in search, so the notions of shared nothing architecture and inverted indices came naturally, even though they have not been widely replicated until recently. You could say that MarkLogic was somewhat of a maverick or maybe ahead of its time, and it therefore had a hard time hitting the mainstream. I have seen numerous executives either failing to grasp what it's all about -- or just not being able to see the use.
But, eventually, MarkLogic found its way to the enterprise, largely thanks to the publishing industry. It was around 2003 when Google started becoming a threat to paper publishers and MarkLogic started making its name. It did that by helping traditional publishers transition to a new era of content management, going from books and journals to apps delivering high-value content to the right people.
It was a good match, as publishers have a lot of unstructured information they could use a database for, and that's what MarkLogic touted itself as: a database for unstructured content, based on its schema-less data model. MarkLogic promotes its solution as a self describing format that accommodates all kinds of data, and it's flexible enough to make the need to know your schema in advance obsolete.
Sound familiar? It does sound a lot like NoSQL, so when the moniker and ensuing hype appeared, MarkLogic found itself on the same bandwagon with relative newcomers like MongoDB or CouchDB. Then there was the big data frenzy, which seemed to bring an end to the mandate that one database does it all. While that gave MarkLogic some traction, it also added to MarkLogic's competition and made it diversify its offering.
Traditionally, MarkLogic has been competing with the likes of Microsoft FAST and Oracle. While it still sees Oracle as its arch-rival, noting the scope and context of this rivalry says something about the evolution of MarkLogic as well as the industry at large. As MarkLogic gained access to the enterprise, it started seeing its product displacing traditional databases in different scenarios.
For some, MarkLogic was their transactional database of choice because of the ability to support unstructured and dynamic content, full ACID compliance, and distributed transactions with failover/DR within/across clusters (the kind of enterprise-oriented features that most NoSQL offerings lack). Others started using it for analytical purposes because of the way MarkLogic's indexing works.
MarkLogic has its roots in XML, but at the end of the day, representation may not be all that important. XML is verbose, but it compresses well because it's repetitive, so MarkLogic optimized data model aware compression and developed intelligent search engine style indexes. That made it a good match for ad-hoc queries against ad-hoc data that cannot be dealt with effectively with traditional solutions like hypercubes.
As a result, MarkLogic has found itself taking on both the Exalytics and the Endecas of the world, in addition to RDBMSs and data warehouses. It sees IBM and Oracle -- and, to a lesser extent, Microsoft -- as having tried and failed to compete with MarkLogic and XQuery due to their design assumptions, and it claims superiority over document stores on the grounds of being able to do more than documents and web apps.
The king is dead, long live the king?
Lately, MarkLogic has added another array of features to its arsenal, including semantics and the "bitemporal" capability. Semantics can be used to model entities in the database and representing interrelationships between them, e.g., customers buy products.
Bitemporal is used to track two dimensions of time: when something happened and when it was recorded in the database, e.g., recording a decision and the data on the grounds of which it was reached at the time, in addition to something like a counterparty cancelling the trade a half hour later.
But the most interesting new addition is probably the Optic API, which lets users look at data through different lenses. The same underlying data can be viewed as a document, a graph, or in tabular form supporting SQL. A new type of index has been introduced along with a new query optimizer for distributing workload across the cluster efficiently.
Perhaps, ironically, it looks like MarkLogic is trying to be what it once set out to dethrone: the one database to rule them all. The king is dead. Long live the king then? There are some serious battles to be won before making any such claims, but MarkLogic has a good set of weapons at hand for its dual-wielding style.
This article incorporates information from interviews conducted by Tony Baer with Joe Pasqua, MarkLogic's EVP of Products