Solaris vs AIX: The people problem

The people problem at the heart of this whole business of trying to predict data center costs under alternative Sun and IBM scenarios is simple and disconcerting: it's the people in charge, not the technologies, that really make the difference.

Please be aware, as I noted last week, that experience in both Sun and IBM environments has left me less than objective on the issue - and I'm therefore relying on readers to catch any errors I may make.

That said, I think that it is only possible to fairly compare the five year costs for a data center under alternative Solaris/SPARC and AIX/Power scenarios if you have a real data center staffed by real people, and therefore a real decision, to work from. There are two levels to this: first, IBM's pricing secrecy means that you can't get real numbers without first convincing IBM's sales people that you're ready and able to write the check; and, more subtly, operational differences, particularly with respect to applications and technology management, dominate all other long term cost components.

Given that Sun's hardware and software is generally cheaper, faster, and more versatile than IBM's, you'd expect Sun data centers to be generally cheaper and more effective than the IBM centric ones, but this isn't always true. In fact it's true only if that Sun center is run by people who understand what Solaris can do and how it should be used.

You don't often find that in larger data centers. What you find instead are decision makers who got their formative experience in data processing or Wintel and, correspondingly, often make counter-productive decisions about Solaris deployments - meaning, among other things, that the company they work for is often better served if senior management quietly accepts higher costs and reduced performance as the price of operational stability.

The big reason for this, I believe, is that someone who knows how to run Unix and does a good job at it becomes invisible to upper management. No crisises means no high profile crisis meetings and correspondingly no visibility to top management - with the unhappy consequence that the better you are at the job, the less chance you'll have of building the visibility and budget management record needed to promote up and out.

There's essentially no public data on this - but management assumptions about how to run the data center are far more important to long term costs than their specific hardware or software choices.

Consider, for example, the single most obvious difference between the IBM manager's One Right Way to do things and the Unix approach: the IBM guys always want to break wholes into walled off pieces where the Unix people naturally do the opposite: ignoring technical boundaries while crossing organizational ones to unite processes, people, and technologies.

Most people think of this terms of data processing's 1960s machine partitioning and virtualization solutions, but the bigger mirror for this is in the IT organization chart where the IBM manager spends most of his time, and most of the company's IT budget, keeping people from over stepping formal boundaries while the Unix guys act to unify the IT organization by cross training everyone for almost everything.

Look at one specific example of this: the data processing manager's assumption that splitting up resources improves utilization, from the perspective of a Sun sales manager and what you see is that the customer's willingness to spend five million bucks for the right to chop a 512GB, 72 processor E25K into a bunch of V490s makes the customer absolutely right to do so - but the bottom line for the company that customer works for is waste, inefficiency, and unnecessary applications run time costs.

You can see one consequence of this in some TPC/H results. Look closely at two Sun reports on the E25K and you'll see a large SMP machine with Oracle named user licensing at $1.35 million suffering only a 6% decline in the QpPH score between the 3TB and 10TB tests - and a cluster of ten dual AMD based X4500s getting to one third of the E25K's 3TB score for less than 10% of the hardware cost.

That happens because the E25K is designed for very large problems - things like aircrew scheduling or overall optimization within a Fortune 5000 class ERP - and applying it to problems that can be easily segmented wastes most of the value its customers pay for. Look at the IBM side on this, however, and you see an opposite pricing miracle - one that reflects both IBM's expectations about customer behavior and this same reality that the benchmark has been stable while the machines grew more powerful.

Specifically, IBM's 64 Power5, 128 core, p595 result of $53 per QpPH on the 3TB tests includes only $480,000 in Oracle processor licensing - while another IBM TPC/H filing shows 32 Model p570 Power6 boxes clustered around a Catalyst switch achieving an astonishing 343,531 QpPH on the 10TB test at a list price of $21.8 million.

What's going on is that the p595 runs up the highest hardware cost in the 3TB group despite not being designed as a large SMP machine - it's designed specifically to cope with mainframe workloads and built from up to eight eight way machines in the sure and certain hope that the people who buy it will partition it into smaller machines before use. As a result it achieves maximum processor utilization eight processors at a time - and 8 x $40K x 2 x 0.75 =$480,000.

So what does all this tell us? That all applicable combinations of small/big Sun/IBM machines can be used to deliver QpPH work, that management methods premised on using big machines where small ones can do the job cost a lot more money for the same result, and that management methods based on using just the right machine for the job are cheaper only if the gear comes from Sun.

The lesson in all this is subtle: align people with technology and both the IBM and Sun approaches work, but put a Sun guy in charge of that p595 and he'll want to his Oracle licenses to cover all processors; conversely, put an IBM guy in charge of that E25K and he'll want to break it into a bunch of virtual 490s - and, either way, the company they work for will get a lot less for a lot more.

And that's the people problem at the heart of this whole business of trying to predict data center costs under alternative Sun and IBM scenarios: it's the people in charge, not the technologies, that really make the difference - and because they'll always do is what they know how to do, failure to align skills sets with technologies before you buy means your costs will be about the same no matter what technology you hand them.

See: Part 4