With the aid of atomic clocks, GPS receivers and some of the most esteemed figures in computer science, Google has crafted a planet-spanning distributed database.
Google published information about the database, named Spanner, over the weekend in a wide-ranging research paper. The paper (PDF) describes Spanner as "the first system to distribute data at global scale and support externally-consistent distributed transactions".
In simple terms, Google has managed to design an information store that spans its fleet of datacentres around the world and lets applications read (and, to a lesser extent write) data without being crushed by huge latencies. Software using the system can replicate data across countries and continents, while having extremely fast read times.
"Spanner is impressive work on one of the hardest distributed systems problems — a globally replicated database that supports externally consistent transactions within reasonable latency bounds," Andy Gross, principal architect at Basho, a company that makes the Riak distributed database, told ZDNet.
Applications that use Spanner, such as Google's 'F1' advertising backend, can specify which datacentres contain which bits of data so that frequently read data can be located near users to reduce write latency. They can even specify how many datacentres store the data, to add as many layers of redundancy as there are Google datacentres.
Though Spanner-stored data can straddle the globe, due to latency concerns Google notes typical use cases will see applications spread their data across three to five datacentres in one geographic region.
Spanner is able to offer such a broad amount of geographic redundancy thanks to a method Google has developed of being able to give precise times to applications to let them write, read and replicate data without making mistakes.
"Spanner is impressive work on one of the hardest distributed systems problems" — Andy Gross, Basho
Spanner's 'TrueTime' API depends upon GPS receivers and atomic clocks that have been installed in Google's datacentres to let applications get accurate time readings locally without having to sync globally.
"The physical mechanism of closely synchronised clocks acts as a substitute for what would otherwise require some form of distributed co-ordination protocol," Peter Bailis, a graduate student in distributed systems at UC Berkeley, told ZDNet. "At large scale and over wide-area networks, minimising communication is particularly important."
Spanner has been in development for around five years, with information first leaking out about the technology in the press in 2009. Google hopes the database community will follow its lead in adopting a TrueTime-style system.
"As a community, we should no longer depend on loosely synchronised clocks and weak time APIs in designing distributed algorithms," the researchers write.
Some of the authors of the paper include Jeffrey Dean and Sanjay Ghemawat — key figures in the development of some of Google's most advanced technologies, including the Hadoop-spawning MapReduce and GFS systems.
"Like Google's MapReduce and BigTable papers, the publication of the Spanner paper provides the open source database community with a valuable peek inside the architecture and operations of the largest distributed systems in the world," Basho's Gross said.
Spanner is the successor to Google's Megastore system. Bailis says Spanner appears to significantly outperform its predecessor, though the best applications for it will be ones that read a lot of data but write relatively little.
There are still a few caveats that could prevent broad Spanner adoption, such as the latency cost of geographically distributed write operations.
"Whether the cost is worthwhile ultimately depends on the application. Spanner's read-only transactions have substantially lower overhead, so for read-mostly web workloads, the cost isn't so large," he says.
Also, not every company has the scale or resources to link GPS systems and atomic clocks to the servers in its datacentres.
So far, Google has moved much of the F1 advertising platform from MySQL to Spanner and will migrate other applications in the future, while attempting to reduce latency to access data. In the research paper it notes that Gmail, Picasa, Google Calendar, the Android Market and its AppEngine cloud all use Megastore, making them potential candidates for a Spanner upgrade.
"The distributed systems community often looks to Google for battle-tested systems designs," Bailis said. He thinks the publication of Spanner could fertilise the technology industry and cause "a resurgence of popular interest in massively scalable distributed transaction processing systems in the near future".