'

Where 2.0: Raffi Krikorian explains Twitter's geo infrastructure

People want to talk bout places.Foursquare-like apps are the most popular applications today among nerds.

People want to talk bout places.

Foursquare-like apps are the most popular applications today among nerds. Twitter realized this early on once people started tweeting about restaurants and places they loved. They saw tons of pictures being uploaded to Flickr, and being tweeted out with very little geo-data. If you know where the picture is taken, it gives you more context.

Twitter's APIs are meant to be very simple, said Raffi Krikorian, who spoke about "geostreams" early this morning at the O'Reilly Where 2.0 conference in San Jose. Early on, Twitter added a location field on their user object. It's still around today, and it's just a free-form text string; it can be anything.

You've seen "Location: iPhone (37.4,-122.7) etc". Even though it's just a text string, it's "geo-encodable". Twitter is using that location data in a bunch of places, mainly search and trending topics.

Raffi from Twitter

As you know, Twitter is limited to 140 characters. There isn't much room for location data in a tweet. Enter the Geotagging API, announced in November 2009. Instead of adding location on the user object, it's on the status object. It's out of band data, completely meta. And it has native Twitter support, using GeoRSS and GeoJSON for encoding. The search API takes latitude, longitude, and radius now too.

Privacy is considered in Twitter's API too. If you disable geotagging, they will go through your stream and delete all of the geographic coordinates from each tweet.

Trendsmap took Twitter's geoAPI and built a huge aggregation of trending topics on a map. Google Maps accepts Twitter's endpoints as a search query and plots all tweets on the map.

Twitter's "geo-hose" allows you to track bounding box locations in the world. It will deliver you all the tweets in those borders in real time.

The trending API is most exciting to me. But this analysis of "hot conversaions" has no context on search.twitter.com because they trends aren't localized. You can query a particular location now, and find the trending topics of that spot.

Finally, there is a "geo-place" API. You can attach a name and a location of a place instead of attaching a latitude and longitude support. Sounds like Twitter is trying to rewrite the Geonames project. This gives more context than just random number coordinates.

The difference is that it's human-tagged, so it's not fully automatic. The data might be wrong if a computer does the analyzing; there are neighborhoods, and different areas of context that only a human could provide. You can reverse geo-code a lat/lon coordinate and Twitter's API will return a detailed object with a ton of information, including a polygon representation of the neighborhood containing the coordinates.

Twitter currently uses a MySQL-based spatial query system, but are switching over to Cassandra. They decided not to use geohashes for mapping out places or space-filling curves, but instead picked R-Trees.

Raffi decodes a Geohash

Using R-Trees, Twitter can map all of the United States on one of their servers.

Raffi explained project Rock Dove, which kind of went over my head. It's basically their backend system to process geodata at "tweetspeed". It reminds me of Google's horizontal backend servers that make the service so speedy. For example, Rock Dove is smart enough to not wait for Foursquare's lagging servers to process a status object for a tweet. Twitter uses GeoRuby on the front end.

Twitter's geolocation to-do list includes geo-filtering on location and a few other things. I am very interested in how their platform will evolve, but I think in the meantime, they are spending a lot of effort to scale their ever-growing infrastructure. Now the Failwhale can find his way throughout the seas.