Fun with IBM's Z10 numbers

Fun with IBM's Z10 numbers

Summary: As far as I'm concerned, when an organization buys an IBM mainframe to run Linux somebody should be fired - but not the DP guy who bought it: his boss: the non technical guy who approved it without carefully reviewing the alternatives.

SHARE:

If you're into high performance computing, IBM has the machine for you: 64 customized Power processor cores on four 16-way/380GB SMP boards at 4.4Ghz, each with up to 48GB/Sec in bandwidth to the world outside.

It's a hot machine, but costs are high:

  z10 Costs
Capital cost $26 million
Maintenance cost $1,706,000/yr
SuSe Linux $768,000/yr

Here's some of what IBM said about the z10 in the original February 2008 press release

IBM's next-generation, 64-processor mainframe, which uses Quad-Core technology, is built from the start to be shared, offering greater performance over virtualized x86 servers to support hundreds to hundreds of millions of users.

The z10 also supports a broad range of workloads. In addition to Linux, XML, Java, WebSphere and increased workloads from Service Oriented Architecture implementations, IBM is working with Sun Microsystems and Sine Nomine Associates to pilot the Open Solaris operating system on System z, demonstrating the openness and flexibility of the mainframe.

From a performance standpoint, the new z10 is designed to be up to 50% faster and offers up to 100% performance improvement for CPU intensive jobs compared to its predecessor, the z9, with up to 70% more capacity. The z10 also is the equivalent of nearly 1,500 x86 servers, with up to an 85% smaller footprint, and up to 85% lower energy costs. The new z10 can consolidate x86 software licenses at up to a 30-to-1 ratio.

If you want a lot of detail, be patient: there's a 170 page redbook (sg247515) describing the thing due for December 8/08 release.

Unfortunately IBM doesn't say what those "hundreds of millions" of users will be doing, doesn't define those 1,500 replaceable x86 servers, and doesn't allow anyone to publish authoritative benchmark results for the machine - so we can guess it would be insanely great as a Dungeons and Dragons host, but we really don't know how it would compare to something like IBM's own p595 or Sun's M8000.

We do know how some of the numbers compare to alternatives. For $26 million, for example, you could buy 21,131 low end Sun x86 servers - each with 2GB of memory and a dual core opteron - or, if you prefer mid range x86, the same money would buy 4,985 Sun M2 X4200 servers, each with two dual core opterons at 2.8Ghz, 8GB of RAM, 292GB of Disk, and Solaris.

If, however, you just bought 1,500 of these; you'd get to keep $18 million or so in change - enough, at $85K per FTE all in, to pay 21 additional IT staff for ten years.

You'd also get rather more processing resources:

  IBM z10 1500 X4200s
CPU Cycles 282Ghz 16,800Ghz
Memory 1,520GB 12,000GB
Disk Storage None 438TB

In fact, if you accept that each PPC cycle typically achieves about twice as much real work as an x86 cycle, and then cheerfully assume that the mainframe has essentially zero overheads on data transfer and switching, then you could more than match all of its throughput resources with only about 80 of these little x86 machines.

So why would anyone buy a z10 to run Linux? IBM's answer is that the z10 can virtualize 1,500 undefined x86 boxes or serve hundreds of millions of users - and there are circumstances in which both statements are perfectly true. People parsing sales brochures or attending data processing conventions can, for example, easily imagine 1,500 essentially idle servers or two hundred million users represented as records in a batch job.

The real reason people buy into this is worldview -but that's not the answer you get from people who make this kind of decision. Scratch one of them deeply enough to get past the personal attack asking the question will generate, and you'll find rationalizations couched in terms of the space, power, and staffing savings they get from consolidation.

In reality this argument is absurd: if you bought twice as many four way x86 servers as you need to match the mainframe's throughput, and then hired 20 full time IT staff at an all in cost of $85K per FTE, you could keep the entire $26 million in z10 capital cost in your pockets while paying for your cluster (including staff) just from the monthly maintenance and Linux licensing you're not paying IBM and Novell.

And yet the true believers will not only buy into this - but loudly tell other people they're right to do so. Why?

The answer, I think, is that data processing people still focus on system utilization as the primary measure for their own effectiveness - and, on that basis, a $30 million dollar system that approachs 100% average utilization is infinitely preferable to a half million dollar system that does the same job at 20% utilization.

You may think, as I do, that this is absurd, but it's a matter of world view; because data processing originally reported to Finance, budgets are givens and the focus is inward: on managing a system in which users are treated as nuisances and system utilization is king.

To a Windows user, or a Unix manager, utilization rates are completely irrelevant: nobody cares if the machine is idle much of the time, we care that the resources be available when users need them.

Thus when I look at a large Sun Ray server installation running at an average 12% utilization during working hours, I see a success - a system in which users get the resources they need when they need them. The data processing guy looking at the same system, however, sees absolute proof of complete management incompetence: a machine that's 88% idle.

That's the bottom line difference between the data processing and Unix world views: they measure themselves against utilization because that made sense when their profession evolved, and we measure ourselves against user satisfaction, service quality, and response times because those are the measures our users care about.

All of which brings us to the real bottom line question: who's responsible when some organization chooses to spend insane amounts of money to limit user computing resources? My answer is that it's not the data processing guy - he's just doing what his predecessors did and what he's been trained to do: buying to maximize utilization.

So who is it? It's the guy in the excutive suite: the guy who's so clueless about computing, and so desperate not to have anything to do with it, that he doesn't see that equating 1,500 complete x86 servers to only 64 PPC cores requires a profound belief in magic,

Topics: Processors, Hardware, IBM, Linux, Open Source, Operating Systems, Oracle, Servers, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

31 comments
Log in or register to join the discussion
  • I just don't believe the IBM numbers

    in terms of real-world workloads and comparing Linux-to-Linux implementations. Maybe if you are doing some kind of multi-dimensional modeling you might get some advantage from the z10, but for work-a-day web hosting?
    terry flores
  • RE: Fun with IBM's Z10 numbers

    So... we're looking at the highest energy costs ever, and you're saying utilization isn't a concern?

    Have you actually talked to anyone who has consolidated their datacenter? What you've missed, flat out and catastrophically, is facility costs. By not taking those into account, you're assuming they're equal between the options. There's not a snowball's chance in your theoretical datacenter that you can fit 4,950 x86 boxes (plus the network infrastructure - which is not free) in the same space, with the same power, and the same cooling.

    Might want to take those into consideration next time - particularly the capital costs involved in merely adding the differential in floorspace, not to mention the racks of switches and routers. It changes your equation significantly.
    br1anwarner
    • No it doesn't

      1) you only need about 80 of those 4 way PC servers to match the mainframe. In the blog I suggest getting twice the Opterons you actually need - 160 of them.

      That's four racks, including external storage and the network gear.

      You can fit all four into the same space at just about the same facility cost as the z10.

      (Don't forget that the z10 requires external storage and networking -things that together actually need more space than it does.)

      2) Have you heard of project blackbox? How about SWaP? Try the calculation - the box blows out the z10 by a considerable margin.
      murph_z
      • re: No it doesn't

        Care to compare apple to apples? The IBM "external network
        boxes" are nothing more than the current PC external
        network use. Yes there are IBM external network boxes that
        are large by comparison to the PC but those are fading fast.
        All current IBM OS's have TCPIP and that talks to the same
        network infrastructure that the PC or rack mounted PC's talk
        to. There might be cable types you have to swap (I am not
        sure here) but essentially any current network "box" can talk
        with IBM's current MF's.
        ps2os2
        • You'd think so, but no

          Two things you need to know:

          1- most network stats IBM publishes for VM/Linux pertain to communication among Linux instances, not with the external world; and,

          2 - storage connectivity is still mostly archaic.

          Combine those two and you geth bad news: less connectivity at lower densities.
          murph_z
          • Mainframe and I/O

            Then Why mainframes are the core of a DATA center ?
            Anatine
  • RE: Fun with IBM's Z10 numbers

    too fun ! take all the parts, deeply integrate all theses parts and optimize the whole to get 99.999 services avaliability, add all support services you need to manage the complexity which exponentially increse with the numbers of services/workload, make it usable and efficient : This is a Z10 system, a balanced system to manage enterprise business. It is the best tool to manage your whole entreprise distributed systems environment...
    Anatine
    • Ah, no?

      THE RAS features don't really affect LINUX - just because the reboot is automated doesn't mean the application didn't crash. Remember: what counts isn't whether the hardware works, what counts is whether the application does.

      Step outside Linux and zOS does very well on reliability - but it's not because of the hardware
      as much as it is because of the operating procedures it requires.
      murph_z
      • ?

        You need to investigate more... A Z10 never "reboot" :-) this is based on deep hardware and software integration.
        Anatine
        • That's right; vm does not reboot, Linux does

          Because most of the RAS features are not accessible from the ghosted OSes.
          murph_z
          • it depends on OS

            most OS RAS features do not depend on hardware...
            Anatine
  • You can get cheaper - from IBM!

    If you buy an equivalent 64 CPU Quad Core Power box, it's much cheaper (a few million?). IBM's stated goal is to migrate all i-series gear to use the PPC chips (I believe they still have a CISC processor). Some of their "mainframe" products already use PPC chips - and yes, they cost more.

    In other words, running 1000 Linux instances on a P-series is much cheaper than running those same instances under VM on a "mainframe" box - and the underlying hardware is the same . . .
    Roger Ramjet
    • Yes in general, no in particular

      1) Yes - you could just buy your 160 quad Xeons from IBM. Much cheaper...

      2) the iSeries (PPC since the late 80s), the pSeries, and the zSeries all share both designs and components. However..

      2.a - the zSeries processors have additional instruction set processing capabilities. They'll run native 360 code today. (Amazing really.)

      2.b the zSeries processors use the standard four CPU board, but have the 5th slot populated with one that manages 48MB (64 really) of local memory to provide a shared 48MB of level 3 cache to the 4 main (quad core) processors. It's very slick - and reasonably effective.

      3) what really makes a z a mainframe, a p an AIX/Linux box, and a i an iSeries is firmware. These are different - CP/SP do things differently than ipc does etc.

      Thus a p595 will take longer to load a Linux instance than a z9/10 will and take longer to run jobs requiring more than 4 SMP cores.
      murph_z
      • ?

        1. You also need a toolkit to build your system :)))

        2. However ?

        2a. Ascending compatibility is required, including at the CPU architecture level. from Z, to ESA, XA, 370, up to 360.

        3. any figures ?
        Anatine
      • Not being a mainframe lover myself..

        But you are wrong on a few things. First of all the price of a Mainframe that only runs Linux is not the same as one running the full zOS. It's much much cheaper (or so says the mainframe people here, But I still expect that much cheaper is still expensive :)= )

        As for the sharing of components it's true. I've actually seen a z9 and a p595 standing besides each other and there are components that are shared. For example the powersupply looks the same.

        As for a job that requires more than 4 cores on a p595, should run slower than on a z10. I think, nahh.. I know you are wrong. The p595 is a beast.

        // Jesper
        JesperFrimann
  • Or buy on eBay...

    ... one of the blinking-light boxes from an old science fiction movie and put an IBM logo on it and leave it where it can be seen easily by anyone wandering by.

    Then obtain the inexpensive equipment and, most important, hire the 21+ staff members at a reasonable wage. Allow them to argue with each other about optimal utilization while someone humble sets up the system.

    Murph, the difficulty of your scenario is that you require only the right answer win. The best way to accomplish something is to let the right answer, the wrong answer, and the bewildering answer all come to fruition together.

    Let your motto be, The greatest happiness for the greatest number.
    Anton Philidor
    • Once upon a time

      In one job we had range of gear, all of it running BSD Unix, when the edict came down: switch everything to the IBM PC/AT.

      So for several years people had AT boxes sitting on their desks with DEC VT220 screens on them, and DEC keyboards in front of them.

      As far as I know none of the consultants assigned to help us by the provincial data processing department ever wondered how those ATs could be running BSD4.3 - or why the "instrument room" had to be kept locked all the time.
      murph_z
      • Good solution.

        Everyone's happy and people can accompllish what they expect to accomplish. Now transfer that acumen to running Windows unrestricted in the midst of a glowering, choking cloud seeping in from IT, and you'll realistically portray many situations.
        Anton Philidor
  • RE: Fun with IBM's Z10 numbers

    I do not recall of IBM *EVER* letting someone publish
    numbers for their machines *BEFORE* delivery. After
    delivery there are at least 2 sources for hard numbers. One
    of the issues for years of trying to equate MIPS with CPU
    speed. There are some people that try but any numbers
    they come up with are (as they know) are fundamentally
    flawed as IBM does a *LOT* of processing in instruction
    buffers and hardware assists for some instructions. IBM
    used to have a manual that had numbers in it but again
    the application mix really determined a lot of the numbers.
    I am not one to defend IBM but knowing a little of what
    goes on inside of the overall system is almost pure magic
    and it is wonderfully engineered. I have witnessed a large
    IBM system brought to its knees by a single wire (called a
    tri lead) going into the high speed buffer. Yes IBM
    darkened the skies with specialists to fix the problem. Try
    and get that with any other CPU manufacturer and I am
    sure you get a response like "Call me next week".
    Comparing performance numbers is an extremely difficult
    task and it really takes an engineer to come up with real
    numbers as it is a not easy task to do. Yes IBM even has
    the software to let you measure (most) anything you would
    ever want to measure. Is it perfect, no but probably good
    enough for most engineers.

    After about 6 months of availability there are (like I said)
    two places you can get numbers (its not free btw). But they
    use numbers that are supplied by real life users.

    There are quite a few highly trained experts that keep
    IBMs' toes to the coals on numbers and hardware/software
    issues, you can bet there are a few people out there that
    will argue (well I might add) that IBM is giving out good or
    bad numbers. After a while you can pretty well zero on on
    hot spots that you know that IBM is working on night and
    day to get resolved. This goes for Hardware & Software
    both.
    Do not get me wrong IBM does a pretty good job on
    keeping their products not on bleeding edge but maybe a
    year behind (at least on the Mainframe side).
    ps2os2
    • Numbers? No. Clouds of techs? sure

      The only numbers IBM releases (and coreect me if you can) relate the current generation to previous generations - but not to competitive architectures.
      murph_z