X
Business

What features would an ideal transaction oriented big data database require?

The IT industry has decades of experience with traditional database engines that were designed to support transactional applications. Now the industry is exploring non-traditional database engines, such as hadoop, for web scale management of huge amounts of rapidly changing data. Hugely scalable transactional systems need something new. What do those requirements look like?
Written by Dan Kusnetzky, Contributor

A database start up, NuoDB, reached out to me to talk about a project they were working on to address the need for extremely scalable, web scale, transaction oriented database engines. The conversation caused me to think about what features would really be needed to address this nascent need.

The current crop of NoSQL database products typically don't address the requirements for transaction processing. The current SQL-based transactional database products don't address the volume, velocity or variety of data that must be addressed nor do they address the requirements for scalability and performance today's big data applications need.

Here is a list of features that would really need to be addressed for a product to be successful in supporting extreme transactional applications.

SQL

Although the structured query language has been around since the early 1970s, it still is a mainstay of transactional applications on systems ranging from handheld devices to PCs to midrange systems to mainframes.

To be accepted by the industry, a new web scale transactional database engine must support standard SQL and enough of the extensions offered by suppliers such as IBM, Microsoft or Oracle, to make application portability easy.

Nearly all of today's big data database offer limited support of SQL.

ACID Transactions

ACID is an acronym made up of the words atomicity, consistency, isolation and durability.

Atomicity means that the database engine offers features that make sure that every transaction is "all or nothing," that is if any part of the transaction fails, the whole transaction is rolled back.

Consistency means that the database engine makes sure that the data is maintained in a valid state as data is being moved into and out of the database.

Isolation means that if a database engine allows concurrent access, but makes sure that transactions don't interfere with one another.

Durability means that once a transaction has been committed, data will be maintained even if systems crash, power is lost or other system errors occur.

Most big data database engines don't make an attempt to support ACID transactions at all.

Data and application independence

If organizations are going to consider a web scale database engine, it must support common data types and application programming interfaces. This way it would be simple for an organization to drop in a new database engine, move data into this database, point their applications at the new database, and everything should function as expected.

Elasticity

A web scale transactional database should offer organizations the ability to scale up and down as needed. This means engaging new servers or releasing them dynamically as the workload changes.

Multi-tenancy

This is a concept that a system can be shared by multiple organizations without them knowing about one another or being allowed to interfere in any way with the use the other makes of the system. This also means that one organization's data is totally unavailable to the other.

Geographic distribution

Big data databases offer a significant feature that more traditional databases don't offer. They can scale by adding new systems and these new systems may be positioned in data centers far away from one another. Traditional databases may offer the capability to cluster servers to increase levels of performance or scalability, but the systems typically must be no more than a few kilometers distance from one another.

What else?

If you were designing a new database that supported today's transactional applications, but could deal with the huge, rapidly changing world of big data, what other features would be considered "must haves?"

Editorial standards