Google's 'leap smear' helps it deal with errant seconds

Google has developed a technique that lets is computers automatically compensate for leap seconds, increasing data reliability and saving its engineers from coding time awareness into their applications

Google has devised a way of keeping time coherent across its IT infrastructure even in the case of leap seconds, increasing the reliability of its web services.

The fix, called a "leap smear", allows milliseconds to be incrementally added to servers throughout the day via the standard internet time-keeping service NTP so that when the leap second occurs it has already been accounted for and the computers do not go awry as they have done in the past, Google announced on Thursday.

"Very large-scale distributed systems, like ours, demand that time be well-synchronised and expect that time always moves forwards," Christopher Pascoe, a site reliability engineer for Google, wrote in a blog on Thursday. "Computers traditionally accommodate leap seconds by setting their clock backwards by one second at the very end of the day. But this "repeated" second can be a problem."

Leap seconds are added every so often — there have been 24 since their introduction in 1971 — to compensate for the disparity between global timekeeping systems based on physical phenomena — such as atomic clocks — and time determined by the Earth's position within the solar system, according to Markus Kuhn, a senior lecturer in computer science at Cambridge University.

"Leap seconds... reconcile astronomical time, which is based on the rotation of the Earth, and physical time, which can be measured with amazing accuracy using atomic clocks," Markus Kuhn wrote. "Tidal friction within the Earth, caused by the gravitational pull of both the Moon and the Sun, continuously slow down the daily rotation of our planet."

Leap smear automates timekeeping

Google decided to create a fix that automated leap seconds across its infrastructure to avoid the problems the time change can cause for computers that operate at scale in tandem with one another, Google said.

Leap seconds can cause write operations to have hiccups and errors can emerge out of the interlinked large-scale infrastructure that Google operates, he wrote. A repeated second raises the question of what happens to data that comes in during that second. By gradually nudging Google's network time by a few milliseconds over the period immediately before a leap second, no large-scale disparities are introduced and the whole system gently drifts into the correct time by the moment the leap second occurs. 

"Our systems are engineered for data integrity, and some will refuse to work if their time is sufficiently 'wrong'," Pascoe wrote.

Google decided to fix the problem after a leap second in 2005 caused some of its clustered systems to stop accepting work during the errant second.

Because the fix involves the central timekeeping service, all of Google's other systems need no changes or adjustments. Google no longer has to sweep its codebase for anything that deals with time, and Google engineers no longer have to write code that takes leap seconds into account. 

Besides fixing the leap second problem, Pascoe said Google has been able use the leap smear investigation to develop techniques for better consistency and synchronisation across its infrastructure.


Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.