Eric Frenkiel, CEO and Co-Founder of MemSQL, reached out to introduce himself and his company after reading a few of my recent comments on big data, distributed SQL database products, and memory-based database products.
Where's the bottleneck?
Most of the time, the performance of database products is limited by the performance of the underlying storage and network infrastructure. Traditional RDMBS products offered by suppliers, such as Oracle, IBM, Microsoft and the like, have done their best to address this performance bottleneck by pulling data into memory caches.
Suppliers of storage devices have developed flash-based storage appliances, in-system caches, storage server caches and flash-based storage devices to address the same bottleneck using a different approach.
Fenkiel belives that a better approach was to build a new generation of database software that was designed from the ground up to support in-system memory on a cluster of systems. This, he believes, would be the best approach for extreme transactional systems, analytics systems or even big data applications. So, MemSQL's architecture is based upon the use of system memory first and other storage mechanisms second.
MemSQL has built-in support for tiered storage as well. MemSQL's tiered storage architecture makes it possible to use an in-memory row store to push colums of data out to flash or disk-based stores. The software automatically moves data from memory to flash to disk as needed based upon policies.
One of the things that interested me the most was MemSQL's approach to parsing and executing SQL commands. Fenkiel said that MemSQL uses patented code generation technology to create a query execution plan that eliminates the need for interpretation along hot code paths. That is, repetitive SQL code is interpreted once and then executes as machine code thereafter.
The database engine was also designed from the ground up to live in a highly distributed cluster environment. MemSQL's distributed query optimizer enables queries to be parsed and then decomposed to execute in parallel. It also uses Multiversion Concurrency Control and lock-free data structures to enable highly concurrent data access without locking or sacrificing consistency.
Most of the database companies I've spoken to recently have focused on taking some established open source database software, such as MySQL or PostgreSQL, and extending it with technology that allows it to effectively manage multiple systems. Although doing that can dramatically improve overall performance and/or scalability, the architecture is still based upon a philosophy that data should reside on disks somewhere. The best of these products hold data in-memory or in flash-based caches part of the time, but their design still requires that it eventually ends up on disk.
MemSQL has looked at the problem that everyone else has looked at, but thought about it differently. Why not place the same data items in several different systems automatically? That way, consistency and reliability can be maintained while offering even higher levels of performance.
I recommend that companies needing to develop extremely high performance transactional or analytics systems that rely on SQL databases consider memSQL's approach.