I have a problem - not that one :). This one involves that rarity for me: self-doubt about a technical recommendation juggling system ownership issues versus out-sourcing and customer co-locates.
The whole thing started as a simple question about a hardware refresh for an application they've had running on a little AMD/Linux grid for about five years now. What the application does is take in client files measured in the hundreds of gigabytes (often terabytes) to produce a relatively small number of large images measuring in the four to eight gigabyte range.
Twenty years ago tapes came by truck, that became boxes of cassettes, then Fedex packs stuffed with optical platters and later DVDs, and now some customers want to do everything instantly over the internet.
Unfortunately the company is in Canada and network costs are well over American expectations: the total cost for the two 100Mbps ports they currently maintain on their local metropolitan area network runs to about $6,000 a month - and those ports are about 65% idle with average monthly volume running just below 6TB between them. Note too that actual performance on links crossing peering boundaries on this piece of backbone often drops by an order of magnitude, so customer delivery of a streamed, 2GB, image zip can take forty minutes to an hour.
I've suggested three scenarios to them:
- convert their software to cell, (specifically to Linux mini-supers made up from playstation boards), and license whole racks as appliances to their customers for operation by customers on customer premises.
- co-locate the grid with, and effectively outsource data center operations to, a high bandwidth data center operator in Omaha.
- do the traditional thing: upgrade the grid, upgrade to Gbit ethernet ports, and charge the customer who wants internet turn-around a bandwidth surcharge.
I like the first option best: go appliance computing!. Nothing wrong with this picture: it's positive for the company and its customers - and the software, written in C and F77 for BSD4.3 on Vax but targeted to an Elxsi 6400 and since converted first to SunOS and then to SuSe/MP, is easy to port (but hard to optimize) for cell.
So what's wrong with this? Well, it's wholesale business change and even testing it properly requires putting other options on hold for at least six months, investing a bunch of expensive manpower, and risking the business relationship with a couple of important customers.
Option two, my second choice, is purely cost driven. Putting new machines into the Omaha center would cut processing time by an average of about four relative to the existing system, provide about eight times the burstable bandwidth for customers, and end up costing the company less than half what it pays out for data center operations now.
Equally importantly, the impact on customers is muted: internet customers don't typically care where the service is, and courier customers mostly wouldn't care because they tend to ship data from field offices, not head offices.
Unfortunately for my line of thinking here the business founder (and still majority owner) won't consider this - and the idea of moving processing without moving the company and its people strikes me as a plan to end the business badly: with some customer's confidential data falling into hands that aren't supposed to get it.
Option three is popular with some of the key players, but is a bet on the future being the past - and how often has that happened in anything IT related?
To complicate matters, the company has one competitor offering much the same service for roughly the same money (not how they see it, of course :) ), everybody's tight for cash, and volumes went through the floor last year when the boom died - so even casting internet transfers as an optional new, and premium, service has its risks.
What's needed for this decision is a nice, clear, way to quantify the relative risks of each approach - and that's where I'm stumped. I can argue a strong case for either the pro or con on each of these three options - but risk numbers I haven't got; and risk numbers are what I need.
Ever see a cartoon in which the hero flies off a cliff or a building and floats serenely in the air before discovering the lack of support and plummeting down? That's me on this issue: serenely confident in the opinion that business change is the way to go, but entirely without a leg to stand on. Duh, anybody got any ideas? Real data on leaks from out-sourced versus hugged systems that take system complexity and change into account? How about real data on changing the business model to keep up with IT changes? How about just real data on the speed and reliability of large file exchange across the internet?