X
Business

Mash-ups, XML, and visualizing Flickr

Still at WWW2006, I also attended a panel moderated by Rohit Khare on Mashups, Web data, and APIs. Other participants were Frank Mantek, Jeff Barr, Dan Theurer, and Kevin Lawver.
Written by Phil Windley, Contributor
www2006.jpg
Still at WWW2006, I also attended a panel moderated by Rohit Khare on Mashups, Web data, and APIs. Other participants were Frank Mantek, Jeff Barr, Dan Theurer, and Kevin Lawver. These guys represent Google, Amazon, Yahoo! and AOL respectively. Perhaps the most interesting thing in the whole panel was Barr's discussion of best practices for companies exposing Web APIs:
  • Have a program - rather than just throwing an API out there, plan and resource it's management and use.
  • Get the business mode right - Are you doing this to enhance your business or is the API itself your business. You need a pricing strategy and a license.
  • Get the technology right - support SOAP, REST, and JSON. Support versioning and be as backwards compatible as possible.
  • Support developers - This means documentation and sample code.
  • Build community - this requires evangelism, forums, outreach, blogs, and lots of random interaction (which implies travel).

This afternoon, I went to a session on XML that was quite good. If you've ever wondered why XML parsers are so slow, then XML Screamer is for you--or maybe not. IBM isn't going to release it. Even so, you can read the paper and discover the techniques they used and build your own. What'd they do?

They built an integrated parser that pre-compiles schemas and optimizes across layers, avoids intermediate forms, and avoids format conversion. The result is a parser that's as much as 12 times as fast for certain tasks. Not bad. The conclusion: just because you can describe parsing as a stack of tasks doesn't mean you should build it that way.

I also enjoyed the talk on symmetric queries in XML by Shuohao Zhang. The problem with something like XPath is that the query implies directionality (parent, child, next, previous, etc.) and so small changes in the schema require that queries be rewritten. Zhang shows how adding a new function to XPath called closest allows symmetric queries and a query language that's more tolerant of schema changes. His approach is linear and functions as a pre-processor to a standard XPath query.

Visualizing Flickr Tags was an interesting talk by Andrew Tomkins from Yahoo! Some interesting statistics: there are 87 million tag objects in Flickr and 1.26 million of them are unique. They wanted to build a visualization of representative tags and associated pictures for varying time intervals. Representative is defined as relatively unique in a given interval. The problem is that the naive algorithm for doing that is just too slow to do longer intervals (like one month or one year). The paper provides details on a more sophisticated algorithm.

Editorial standards