Microsoft's Trinity: A graph database with web-scale potential

Microsoft's annual internal TechFest research showcase kicks off on March 6. So what better time to check out Trinity, a graph database research project, from Microsoft Research?

It's a good day when you finally find new information about a Microsoft codename I first heard a couple of years ago, but about which I never could find more information.

One of my readers (thanks, Gregg Le Blanc) sent me a link to a Microsoft Research page on codename Trinity, which is a "graph database and computing platform."

Given this week is Microsoft's internal TechFest Microsoft Research event for its employees (with March 6 being the day that Microsoft allows select media and guests to tour some of its exhibits), it's a good time to talk about yet another Microsoft Research project.

Here's Microsoft's explanation of codename Trinity:

"Trinity is a graph database and graph computation platform over distributed memory cloud. At the heart of Trinity is a distributed RAM-based key-value store. As an all-in-memory key-value store, Trinity provides fast random data access. This feature naturally makes Trinity suitable for large graph processing. Trinity is a graph database from the perspective of data management. It is a parallel graph computation platform from the perspective of graph analytics. As a database, it provides features such as data indexing, concurrent query processing, concurrency control. As a computation platform, it provides vertex-based parallel graph computation on large scale graphs."

And here's the requisite architectural diagram:

Trinity is built on top of the distributed memory-storage layer called "memory cloud." Utility tools provided by Trinity include a "fast billion node graph generator," the Trinity Shell and various management tools.

According to the Trinity page, the Trinity code is available only via the Microsoft intranet at this time. So why is it interesting? One potential use of Trinity is people search within a network. The Trinity applications page shows off as an example searching within a "Web-scale social network," like, say, Facebook. Microsoft's Bing search engine can check a user's Facebook network to see if there's anything relevant to pull, but doing so is a massive task which needs to be completed quickly.

In the demo they performed using an example of someone with 130 Facebook friends, this kind of two-hop query could be conducted in 10 milliseconds using Trinity. A three-hop would take 100 ms, the researchers said.

Another possible Trinity application is Probase, another Microsoft Research project designed to improve machine understanding of human communication. A first release of Probase was made available for download in May 2011. Trinity is the underlying infrastructure for the Probase knowledge base.

Version .06 of the Trinity manual is downloadable (as of January 2012). There's also a Hanselminutes podcast about Trinity dating from August 2011 which I never knew about until now.

Given Microsoft's increasing focus on big data and analytics, it seems like a project like Trinity could be a natural fit for one of Microsoft's product groups....

Update: Here's a post I missed last year mentioning Probase and Trinity from ReadWriteWeb, which also mentions Microsoft's no-longer-active Dryad project.