Technology web site TechCrunch is one of those staples (like ZDNet, of course) to which we all turn for news and analysis on the companies shaping the Web. Their CrunchBase directory provides a wealth of information on the companies and people featured in their stories (and elsewhere, as it's editable by anyone), and they recently took the step of opening up an API to the data.
As he reports on his blog, Benji has created Semantic Crunchbase, an expression of the Crunchbase content as that 'Linked Data' about which Sir Tim Berners-Lee and others are currently so passionate. Remember,
“Linked Open Data is the Web done as it should be.”
Benji is continuing to add features to his demonstration, and will be blogging some of them (including the intriguing-sounding 'Pimp my API') in future posts to his blog.
"Imagine [writes Nowack] you are looking for a job in California at a company that is at a specific funding stage. CrunchBase knows everything about companies, investments, and has structured location data. CrunchBoard on the other hand has job descriptions, but only a single field for City and State, and not the filter options to match our needs."
And then stop imagining, and just run the query.
"This is where Linked Data shines. If we find a way to link from CrunchBoard to CrunchBase, we can use Semantic Web technology to run queries that include both sources. And with SPARQLScript, we can construct and leverage these links. Below is a script that first loads the CrunchBoard feed of current job offers (only the last 15 entries, due to common RSS' limitations/practices, the use of e.g. hAtom could allow more data to be pulled in). In a second step, it uses the company name to establish a pattern join between CrunchBoard and CrunchBase, which then allows us to retrieve the list of matching jobs at stage-A companies with offices in California."
For more information on Benji, listen to a podcast interview he did with my colleague Danny Ayers earlier this year.