Google is today opening up the Bigtable technology behind most of its flagship offerings, as a fully-managed cloud NoSQL database service.
Set out by the search-to-cloud giant in an influential 2006 paper, Bigtable powers applications such as Gmail, Google Analytics and Google Search and is described by the company as designed for large ingestion, analytics and data-heavy serving workloads.
To date, Bigtable has not been explicitly available to the public, although it is the technology on which Google's schema-less NoSQL Cloud Datastore is built.
Now available in beta, Google Cloud Bigtable is accessed through the open-source Apache HBase API, making it natively integrated with much of the existing big-data and Hadoop ecosystem, the company said.
Cloud Bigtable integrates with other Google big-data products, such as messaging tool Pub/Sub, pipeline-builder Dataflow and analytics software BigQuery.
"It has a tremendously low, single-digit millisecond latency compared with other options out there and great price-performance, meaning the amount of data it can ingest, store and then write per dollar per month is extremely high," Cloud Bigtable product manager Cory O'Connor said.
Google says the new service offers twice the performance per dollar and half the total cost of ownership of its direct competitors.
"There's just this enormous amount of businesses that have these huge amounts of data and right now - we've talked to many of them - they're throwing away data or they're expiring it after a certain amount of time. They simply don't have the time horizon. They can't store enough data to be able to make these determinations," O'Connor said.
"All of this is being packaged in a product that you don't have to manage. Even if you had a piece of technology that could live up to these data sizes, managing has always been a challenge."
Creating or reconfiguring Cloud Bigtable is carried out through a simple user interface, with backing storage scaling automatically.
"When we say fully managed, this is not fully deployed or managed deployment. This is essentially an API that you provision with a guaranteed amount of server processor throughput behind it and unlimited flexible storage behind that as well," O'Connor said.
"What you have to do now, first off researching what database you want, getting licences for that database, getting support contracts for it, figuring out which VMs to use and prototyping the VM sizes and choosing memory - there are so many choices you have to make and so many numbers to research."
According to O'Connor, with storage, network, backup, and VMs to think about, conventional configurations are a complex business even without reckoning with deployment.
"That involves spinning up the VMs, deploying the software, configuring all the nodes - a ton of work going in there. For Bigtable you're literally going to pop into a website and the UI and you're going to say, ' I want a new cluster'," he said.
"It will ask for your name and essentially how much performance you want out of the cluster. You click the create button and within about two to three seconds, you've got the green little check box in the UI and you've got your cluster that's ready to do 100,000 reads and writes per second and scale the data to whatever you want immediately. Google has 10 years of history managing Bigtable. We know very well how to manage it."
O'Connor said the new service could, for example, be used by companies to move from an HBase or Cassandra cluster on premises or in the cloud.
"A lot of people in the Hadoop ecosystem say HBase is hard, even if they're experts at it. There are a lot of advantages that this has over running your own HBase or Cassandra cluster," O'Connor said.
"We see customers with multiple petabytes of data, reading and writing from the database a hundred thousand times a second and all sorts of various data, whether it's web data or sensor data. They have these instances, they have these databases and it's very hard to manage them."
The second area where Google expects Cloud Bigtable to find a role is in new projects in areas such as the internet of things, advertising, energy, financial services and telecoms.
Pricing is 65 cents per Bigtable node, which is the unit in which performance is provisioned. Each node delivers up to 10,000 reads and writes per second, or about 10MB/s of throughputs for scans where there are no individual reads and writes.
Storage is billed on a pay-as-you-go basis at 17 cents per GB per month for SSD-based storage. There will soon be a lower cost alternative based on hard-disk storage at 2.6 cents per GB per month, the same price as Google Cloud Platform object storage.
"That's amazing because what you have is a very hot high-performance database running on a storage tier that's the same price as slower, colder, blob-based storage," O'Connor said.
Data can be imported into the new service through an offline disk-based service or via an online transfer, where the data is scooped into an object store and from there into a Bigtable cluster.
O'Connor said the role of the HBase API in Cloud Bigtable will help reassure companies over potential fears about finding themselves locked into Google.
"Since this is delivered through the standard HBase open-source API and because we're providing easy services to import and export in standard formats, it makes it very easy for someone to say, 'You know what? I'll buy this. I know that if there's a reason why I don't like this, it's easy to get the data out into exactly the same system that was running it before," he said.
"Many people have said that's one of the main reasons why they're ready to take petabytes of data and dump it into this - because they have that assurance. They feel good about that open-source nature of the interface."
For security, Google is providing replicated storage and encryption of all data in flight and at rest.
The company has worked with a number of partners, from Sungard for financial data platforms, Pythian for monitoring, CCRi for real-time geospatial analysis, to Telit Wireless Solutions for data ingestion, to help firms build applications on Cloud Bigtable.
The beta is available initially in Google's central US region, Europe and APAC, with others geographies to follow.
"We've got a long future of betas that we're going to be releasing - different types of storage, different features running management, replication, monitoring, alerting - we've got a ton of different features that we have internally in Bigtable that HBase has and that we want to go along with," O'Connor said.
"We will release for general availability before the end of the year. But even after general availability, this is something that Google believes is tremendously valuable and the features will not stop before GA."
More on databases and big data
- Databricks CEO: Why so many firms are fired up over Apache Spark
- MySQL: Percona plugs in TokuDB storage engine for big datasets
- Cloudera links up with Hadoop developer Cask
- Mesosphere and MapR link up over Myriad to create one big data platform to rule them all
- Teradata rolls out big data apps, updates Loom
- MapR CEO talks Hadoop, IPO possibilities for 2015
- Teradata acquires archival app maker RainStor
- Hortonworks expands certification program, looks to accelerate enterprise Hadoop adoption
- Actian adds SPARQL City's graph analytics engine to its arsenal
- Splice Machine's SQL on Hadoop database goes on general release