A little over a year ago when I started my company, I was able to find a small office in the Empire State Building. I'm on the 72nd floor facing south, so the view is amazing. I wish I had better Internet service options though; I've realized it's just not that attractive to service providers to pull their cables to the top of such a tall, old building. In time, though, I've decided that the building might be more tech-savvy than I realized. That's because, with only a little contrivance, I believe I can use the building to explain MapReduce, without using code.
One of the things I do in my work is follow market share figures for various smartphone platforms. I typically rely on the findings of the larger analyst firms to figure out what's what, but I dream of one day getting getting my own numbers instead. It struck me recently that if I had a little more pull at the ESB, I could just total up the different smartphone handsets, by platform, in the building. After all, the building has a good distribution of city and suburban dwellers, different income levels, and a large enough population to have its own 5-digit zip code.
As I continue this data-gathering day dream, I think through how I could go about counting all these cell phones. I certainly couldn't do it myself. Even if I had the patience and the speed, the inefficiencies in getting between floors would hurt my performance, as the elevators can be slow, and no employee in the building is happy about people who get on and then off one floor later.
But then I have an idea. Since every floor has a fire warden whose job it is to count people, maybe I could use those folks as my agents on each floor. Each floor fire warden could go into each suite on his or her floor and write down, on a separate piece of paper for each major smartphone platform, the platform name and total number of handsets. I could tell the fire wardens to create a separate sheet of paper, per suite, for iOS, Android, Blackberry, Windows Phone, webOS and Symbian and could also tell them to disregard other phones. Each fire warden would likely have multiple sheets per platform, of course, since each sheet's count would correspond to a particular suite on the floor. But that's just fine.
When the fire wardens were done in all suites, they could put all their sheets in an envelope and drop it in the mail chute (in the hypothetical case that the chutes were still in use.) I could be waiting in the lobby, and when I knew that all fire wardens had completed their work, I could go around to the mail boxes at each chute and collect the envelopes with the smartphone count sheets.
As a next step, I'd go sit at the security desk, open all the envelopes and sort the sheets, by smartphone platform, into six new piles, putting each pile in an envelope. I'd have an intern bring two of the new envelopes up to the 10th floor, another intern bring two more to the 20th, and my third intern bring the last two to the 30th floor. The fire wardens on each of those three floors would open an envelope, total up the counts on the individual sheets, and write down the platform name and that grand total on a new sheet of paper. He or she would then repeat the process for the other envelope, writing its platform name and handset total on the same sheet of paper as the first. Each of my three interns would then take these new sheets from the fire wardens up to my office on the 72nd floor, where an assistant would be waiting. He'd then put the data from all three sheets of paper into a single spreadsheet, with platform names in column A and handset counts in column B. And with that I'd have my smartphone stats for the building. With the help of the friendly fire wardens, I'd get my answer pretty quickly too.
This example's not perfect, and I might update this post over time to make it more so. But if you can understand the process I just explained, then you can understand MapReduce. Just let this stuff sink in for a bit. In my next post, I'll introduce the vocabulary (jargon?) used in MapReduce-speak to explain what the building employees, suite numbers, smartphone platform names, handset counts, fire wardens, sheets of paper, and the final spreadsheet represent.