Yahoo! Pipes and the mashup pipedream

Yahoo! Pipes chokes on common date formats in RSS feeds, which makes many of the simplest feed mashups impossible to achieve.
Written by Phil Wainewright, Contributor

Ever since I first started playing with RSS feeds, I've always dreamt that one day someone would come up with an online tool that makes it easy to aggregate multiple feeds. For a brief half-hour today, I thought maybe Yahoo! Pipes, which launched yesterday, would prove to be the answer to my prayers. But no. Instead, it illustrates just how elusive remains the dream of easy data mashups.

What I wanted Yahoo! Pipes to do was create a composite of several different RSS feeds on the topic of web services and SOA. I first set up a web page to publish this selection of feeds in 2002, but I finally gave up on updating the page roughly a year ago when one of the feeds switched to FeedBurner and somehow broke the feedreading program I'd written. In any case I'd always wanted to upgrade the page to present a composite 'river of news' with the freshest items at the top, instead of publishing each individual feed separately.

This is the sort of mashup that Yahoo! Pipes ought to excel at, but it fails at a very simple hurdle. Let me start off though by paying tribute to the designers of Pipe and make clear that they have made it exceptionally easy to link to feeds and mix them together. The trouble with making it so easy to get thus far is that it just gets you to the next obstacle that much more quickly. It took me no more than a few minutes — aided by a quick scan of this introductory tutorial — to fetch two feeds, filter one of them, splice them and then run a sort. But here's the problem I encountered when I looked at the sort output:

Examine the <pubDate> field in each of the three feed items you can see in this screenshot. The pubDate is in the very widely used RFC 822 format specified in Dave Winer's popular RSS 2.0 specification. But Yahoo! Pipes doesn't sort it as a date. The program treats the field as text and sorts it alphanumerically. So in descending order, instead of starting with the most recent date, it lists all the Wednesdays first, then all the Tuesdays, and so on until it finishes (not shown) in a flourish of Fridays. Of course the sort order bears no relation to any kind of calendar order. And that's just for feeds that all use pubDates. Many feeds put their item dates into a <dc:date> or a <published> field, using ISO 8601 format. Yahoo! Pipes provides no mechanism for normalizing these various date formats so that a composite feed can be sorted by publication date.

This of course is the perfect illustration of why data mashups are so darned difficult. At least there's a chance that Yahoo! Pipes will overcome these problems without getting too complex, and I hope the Brickhouse team who are apparently responsible for the Pipes project will prioritize finding some straightforward solutions to this really fundamental stumbling block. [UPDATE (added Feb 10th): Kevin Cheng from the Pipes design team has posted a TalkBack comment to say "Fixing date sorting (and normalizing common formats) is one of our top priorities.". That's great news, Kevin, thanks.]

But the problem here is the same problem I wrote about last summer when I decribed Google Maps as the fool's gold of mashups. It's all very well to do demonstration mashups that use deceptively well-structured data, but in the real world data structures are a semantic minefield. If the relatively shared semantics of RSS date fields contains so many pitfalls, imagine how much more difficult it is to mashup business critical data from many different enterprise sources.

Nevertheless, having said all that, Yahoo! Pipes is a great advance since it brings these issues into sharp relief, when previously they were masked from view. If it can iron out some of these remaining crinkles and really start to provide meaningful utility then it will provide a real spur for people to get to grips with all the other semantic dissonances and perhaps want to make an effort to structure their data using more easily shared formats and semantics — and that can only be a good thing.

This is not my last word on Yahoo! Pipes and the whole notion of mashing up and linking data and processes from around the Web. Next week, I want to look at some other approaches in addition to Yahoo!'s new experiment, as well as exploring the potential impact that such tools can have in really unleashing the creative power of the Web.

Editorial standards