X
Business

Google shares more of its secret sauce: Protocol buffers

It's a common problem in computer science: how do you get data from one part of your program to another part? What if the two parts were written by different people, at different times, in different languages, on different machines?
Written by Ed Burnette, Contributor

It's a common problem in computer science: how do you get data from one part of your program to another part? What if the two parts were written by different people, at different times, in different languages, on different machines? Search giant Google has to deal with this issue all time time, only at a bigger scale than most of us. This week they shared the solution they use , a home-grown technique called Protocol Buffers.

Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice. These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format.

You can think of protocol buffers as kind of a cross between XML and IDL (Interface Definition Languages). Compared to XML, they have several advantages for serializing structured data. From the documentation:

 Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

Protocol buffers were written by Kenton Varda, based on an original design by Sanjay Ghemawat, Jeff Dean, and others. The package is used widely inside Google:

Protocol buffers are now Google's lingua franca for data – at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

Version 2.0.0 Beta is available for download now. It supports C++, Java, and Python, but you can easily add support for other languages. All the source code is provided under the Apache license, and there are no restrictions on using this in free or commercial software.

Editorial standards