Fun with IBM's Z10 numbers
Summary: As far as I'm concerned, when an organization buys an IBM mainframe to run Linux somebody should be fired - but not the DP guy who bought it: his boss: the non technical guy who approved it without carefully reviewing the alternatives.
If you're into high performance computing, IBM has the machine for you: 64 customized Power processor cores on four 16-way/380GB SMP boards at 4.4Ghz, each with up to 48GB/Sec in bandwidth to the world outside.
It's a hot machine, but costs are high:
| z10 Costs | |
| Capital cost | $26 million |
| Maintenance cost | $1,706,000/yr |
| SuSe Linux | $768,000/yr |
Here's some of what IBM said about the z10 in the original February 2008 press release
IBM's next-generation, 64-processor mainframe, which uses Quad-Core technology, is built from the start to be shared, offering greater performance over virtualized x86 servers to support hundreds to hundreds of millions of users.The z10 also supports a broad range of workloads. In addition to Linux, XML, Java, WebSphere and increased workloads from Service Oriented Architecture implementations, IBM is working with Sun Microsystems and Sine Nomine Associates to pilot the Open Solaris operating system on System z, demonstrating the openness and flexibility of the mainframe.
From a performance standpoint, the new z10 is designed to be up to 50% faster and offers up to 100% performance improvement for CPU intensive jobs compared to its predecessor, the z9, with up to 70% more capacity. The z10 also is the equivalent of nearly 1,500 x86 servers, with up to an 85% smaller footprint, and up to 85% lower energy costs. The new z10 can consolidate x86 software licenses at up to a 30-to-1 ratio.
If you want a lot of detail, be patient: there's a 170 page redbook (sg247515) describing the thing due for December 8/08 release.
Unfortunately IBM doesn't say what those "hundreds of millions" of users will be doing, doesn't define those 1,500 replaceable x86 servers, and doesn't allow anyone to publish authoritative benchmark results for the machine - so we can guess it would be insanely great as a Dungeons and Dragons host, but we really don't know how it would compare to something like IBM's own p595 or Sun's M8000.
We do know how some of the numbers compare to alternatives. For $26 million, for example, you could buy 21,131 low end Sun x86 servers - each with 2GB of memory and a dual core opteron - or, if you prefer mid range x86, the same money would buy 4,985 Sun M2 X4200 servers, each with two dual core opterons at 2.8Ghz, 8GB of RAM, 292GB of Disk, and Solaris.
If, however, you just bought 1,500 of these; you'd get to keep $18 million or so in change - enough, at $85K per FTE all in, to pay 21 additional IT staff for ten years.
You'd also get rather more processing resources:
| IBM z10 | 1500 X4200s | |
| CPU Cycles | 282Ghz | 16,800Ghz |
| Memory | 1,520GB | 12,000GB |
| Disk Storage | None | 438TB |
In fact, if you accept that each PPC cycle typically achieves about twice as much real work as an x86 cycle, and then cheerfully assume that the mainframe has essentially zero overheads on data transfer and switching, then you could more than match all of its throughput resources with only about 80 of these little x86 machines.
So why would anyone buy a z10 to run Linux? IBM's answer is that the z10 can virtualize 1,500 undefined x86 boxes or serve hundreds of millions of users - and there are circumstances in which both statements are perfectly true. People parsing sales brochures or attending data processing conventions can, for example, easily imagine 1,500 essentially idle servers or two hundred million users represented as records in a batch job.
The real reason people buy into this is worldview -but that's not the answer you get from people who make this kind of decision. Scratch one of them deeply enough to get past the personal attack asking the question will generate, and you'll find rationalizations couched in terms of the space, power, and staffing savings they get from consolidation.
In reality this argument is absurd: if you bought twice as many four way x86 servers as you need to match the mainframe's throughput, and then hired 20 full time IT staff at an all in cost of $85K per FTE, you could keep the entire $26 million in z10 capital cost in your pockets while paying for your cluster (including staff) just from the monthly maintenance and Linux licensing you're not paying IBM and Novell.
And yet the true believers will not only buy into this - but loudly tell other people they're right to do so. Why?
The answer, I think, is that data processing people still focus on system utilization as the primary measure for their own effectiveness - and, on that basis, a $30 million dollar system that approachs 100% average utilization is infinitely preferable to a half million dollar system that does the same job at 20% utilization.
You may think, as I do, that this is absurd, but it's a matter of world view; because data processing originally reported to Finance, budgets are givens and the focus is inward: on managing a system in which users are treated as nuisances and system utilization is king.
To a Windows user, or a Unix manager, utilization rates are completely irrelevant: nobody cares if the machine is idle much of the time, we care that the resources be available when users need them.
Thus when I look at a large Sun Ray server installation running at an average 12% utilization during working hours, I see a success - a system in which users get the resources they need when they need them. The data processing guy looking at the same system, however, sees absolute proof of complete management incompetence: a machine that's 88% idle.
That's the bottom line difference between the data processing and Unix world views: they measure themselves against utilization because that made sense when their profession evolved, and we measure ourselves against user satisfaction, service quality, and response times because those are the measures our users care about.
All of which brings us to the real bottom line question: who's responsible when some organization chooses to spend insane amounts of money to limit user computing resources? My answer is that it's not the data processing guy - he's just doing what his predecessors did and what he's been trained to do: buying to maximize utilization.
So who is it? It's the guy in the excutive suite: the guy who's so clueless about computing, and so desperate not to have anything to do with it, that he doesn't see that equating 1,500 complete x86 servers to only 64 PPC cores requires a profound belief in magic,
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
I just don't believe the IBM numbers
RE: Fun with IBM's Z10 numbers
Have you actually talked to anyone who has consolidated their datacenter? What you've missed, flat out and catastrophically, is facility costs. By not taking those into account, you're assuming they're equal between the options. There's not a snowball's chance in your theoretical datacenter that you can fit 4,950 x86 boxes (plus the network infrastructure - which is not free) in the same space, with the same power, and the same cooling.
Might want to take those into consideration next time - particularly the capital costs involved in merely adding the differential in floorspace, not to mention the racks of switches and routers. It changes your equation significantly.
No it doesn't
That's four racks, including external storage and the network gear.
You can fit all four into the same space at just about the same facility cost as the z10.
(Don't forget that the z10 requires external storage and networking -things that together actually need more space than it does.)
2) Have you heard of project blackbox? How about SWaP? Try the calculation - the box blows out the z10 by a considerable margin.
re: No it doesn't
boxes" are nothing more than the current PC external
network use. Yes there are IBM external network boxes that
are large by comparison to the PC but those are fading fast.
All current IBM OS's have TCPIP and that talks to the same
network infrastructure that the PC or rack mounted PC's talk
to. There might be cable types you have to swap (I am not
sure here) but essentially any current network "box" can talk
with IBM's current MF's.
You'd think so, but no
1- most network stats IBM publishes for VM/Linux pertain to communication among Linux instances, not with the external world; and,
2 - storage connectivity is still mostly archaic.
Combine those two and you geth bad news: less connectivity at lower densities.
Mainframe and I/O
RE: Fun with IBM's Z10 numbers
Ah, no?
Step outside Linux and zOS does very well on reliability - but it's not because of the hardware
as much as it is because of the operating procedures it requires.
?
That's right; vm does not reboot, Linux does
it depends on OS
You can get cheaper - from IBM!
In other words, running 1000 Linux instances on a P-series is much cheaper than running those same instances under VM on a "mainframe" box - and the underlying hardware is the same . . .
Yes in general, no in particular
2) the iSeries (PPC since the late 80s), the pSeries, and the zSeries all share both designs and components. However..
2.a - the zSeries processors have additional instruction set processing capabilities. They'll run native 360 code today. (Amazing really.)
2.b the zSeries processors use the standard four CPU board, but have the 5th slot populated with one that manages 48MB (64 really) of local memory to provide a shared 48MB of level 3 cache to the 4 main (quad core) processors. It's very slick - and reasonably effective.
3) what really makes a z a mainframe, a p an AIX/Linux box, and a i an iSeries is firmware. These are different - CP/SP do things differently than ipc does etc.
Thus a p595 will take longer to load a Linux instance than a z9/10 will and take longer to run jobs requiring more than 4 SMP cores.
?
2. However ?
2a. Ascending compatibility is required, including at the CPU architecture level. from Z, to ESA, XA, 370, up to 360.
3. any figures ?
Not being a mainframe lover myself..
As for the sharing of components it's true. I've actually seen a z9 and a p595 standing besides each other and there are components that are shared. For example the powersupply looks the same.
As for a job that requires more than 4 cores on a p595, should run slower than on a z10. I think, nahh.. I know you are wrong. The p595 is a beast.
// Jesper
Or buy on eBay...
Then obtain the inexpensive equipment and, most important, hire the 21+ staff members at a reasonable wage. Allow them to argue with each other about optimal utilization while someone humble sets up the system.
Murph, the difficulty of your scenario is that you require only the right answer win. The best way to accomplish something is to let the right answer, the wrong answer, and the bewildering answer all come to fruition together.
Let your motto be, The greatest happiness for the greatest number.
Once upon a time
So for several years people had AT boxes sitting on their desks with DEC VT220 screens on them, and DEC keyboards in front of them.
As far as I know none of the consultants assigned to help us by the provincial data processing department ever wondered how those ATs could be running BSD4.3 - or why the "instrument room" had to be kept locked all the time.
Good solution.
RE: Fun with IBM's Z10 numbers
numbers for their machines *BEFORE* delivery. After
delivery there are at least 2 sources for hard numbers. One
of the issues for years of trying to equate MIPS with CPU
speed. There are some people that try but any numbers
they come up with are (as they know) are fundamentally
flawed as IBM does a *LOT* of processing in instruction
buffers and hardware assists for some instructions. IBM
used to have a manual that had numbers in it but again
the application mix really determined a lot of the numbers.
I am not one to defend IBM but knowing a little of what
goes on inside of the overall system is almost pure magic
and it is wonderfully engineered. I have witnessed a large
IBM system brought to its knees by a single wire (called a
tri lead) going into the high speed buffer. Yes IBM
darkened the skies with specialists to fix the problem. Try
and get that with any other CPU manufacturer and I am
sure you get a response like "Call me next week".
Comparing performance numbers is an extremely difficult
task and it really takes an engineer to come up with real
numbers as it is a not easy task to do. Yes IBM even has
the software to let you measure (most) anything you would
ever want to measure. Is it perfect, no but probably good
enough for most engineers.
After about 6 months of availability there are (like I said)
two places you can get numbers (its not free btw). But they
use numbers that are supplied by real life users.
There are quite a few highly trained experts that keep
IBMs' toes to the coals on numbers and hardware/software
issues, you can bet there are a few people out there that
will argue (well I might add) that IBM is giving out good or
bad numbers. After a while you can pretty well zero on on
hot spots that you know that IBM is working on night and
day to get resolved. This goes for Hardware & Software
both.
Do not get me wrong IBM does a pretty good job on
keeping their products not on bleeding edge but maybe a
year behind (at least on the Mainframe side).
Numbers? No. Clouds of techs? sure