HP Superdome - dead as a dodo?
Summary: HP Itanium is under the cosh - here's why.
As HP Discover comes to a climax, HP has some uncomfortable questions it needs to answer. In this guest post by colleague John Appleby, he asks the question - is HP Superdome even relevant to the future of enterprise applications? If you believe what John says as fact then the answer is a resounding 'no.' So why is HP pumping money into this dead tech rat hole?
John Appleby says:
I had a slightly uncomfortable conversation with one of my sales people this week, who told me that one of their customers had just bought a brand new HP Superdome2 and wanted to know if our software would run SAP software. I had to explain to him that the SAP BusinessObjects portfolio no longer runs on that platform.
And in case you think they are being lazy, Oracle will not be developing for this platform any more. In case you think Larry Ellison is trying to screw HP, neither are Microsoft – neither for their Windows OS (the last version is Windows 2008 R2) or for their SQL Server RDBMS. Nor is Linux vendor Redhat.
In case you think there is a software vendor conspiracy, there are now only 5 vendors that sell Intel Itanium based systems: HP, Bull, NEC, Inspur and Huawei. And I hear that over 90% of the CPUs are bought for HP systems. So what’s wrong with it? Let’s see…
HP is paying Intel to keep it alive
When Oracle ceased development on the platform, HP went nuts and sued them for saying that Itanium was dead. It rather backfired when it turned out that HP was paying Intel $690m to keep it alive. Given HP’s precarious state right now, it would be remiss to suggest that this were a winning strategy.
Pace of innovation
The current chip was codenamed Tukwila and 2 years late to market. With 2 year old features and performance. It has under half the performance per core of equivalent Intel x64 and IBM Power7 CPUs as well as 50% more power consumption. The top-end CPU is 185W and 4 cores compared to the Intel Westmere-EX which is 130W and 10 cores. Yes – 1/4 the power per core and 5x the performance per socket.
The next generation CPU, Poulson, was scheduled for 2009 and still hasn’t been delivered in 2012. I think you know where Intel is investing its R&D: the successor to the x64 Westmere-EX platform, called Ivy Bridge.
Resilience, Availability & Serviceability
This used to be the reason to buy Itanium. But unfortunately in many ways, the Intel Westmere-EX has better RAS features than Itanium. Westmere-EX can predict and exclude memory failure, recover from memory failures and mirror memory. Plus Westmere-EX can predict and re-route chip interconnect (QPI) failures and recover. It is literally bulletproof.
Itanium has 2-year old technology in this respect and the pace of innovation in this area is really important because of in-memory computing.
Size and Power
This part is scary. A typical HP Superdome 128-core system is 6’6? high. An equivalent IBM Westmere-EX 80-core system is 12? high. The HP unit will use 6kW for the CPUs alone and the IBM will use 1kW. Obviously add some more for memory and other stuff, but you get the idea. Itanium is 1/6 the power performance. And will take up large swathes of datacenter space. And kill a lot of trees.
Angry Larry
Oracle have gone heavily after HP here with their “Cash for Clunkers” programme. Now this is typical Oracle bully behaviour but it is hard to argue with their logic.
HP Superdome customers are facing costly “forklift upgrades” when upgrading from dead-end PA-RISC and Itanium processors and HP-UX.
Now you can trade in your legacy HP Superdome servers and receive a 50% discount on Oracle’s Sun SPARC Enterprise M8000 and M9000 servers—secure and highly available servers for running mission-critical, enterprise database and business applications.
And this has had a dramatic effect on revenue – HP Itanium sales are falling quarter on quarter and are below $400m per quarter – falling from over $800m in Q4 2010. HP is suing Oracle over this but the damage has been done.
Note that a blogger went after Oracle for this with “who’s the clunker?“, but it is an awful article. Notably, the SPARC platform has a 5-year roadmap. The closest thing I can find to this from HP is Project Odyssey, which looks suspiciously like a roadmap to migrate customers from HP-UX/Itanium to Linux/x86, or this one that is from 2009.
Features & Function Comparison
Someone wrote a comparison of HP and Oracle on this which was clearly biased so I thought I would lay down some facts! Lets compare 3 roughly similarly powered systems (by SAP’s application benchmark). Please note that HP have not certified any systems so I had to estimate their SAPS rating based on data available for the SPEC benchmark.
| HP Superdome2 | IBM POWER7 | Intel Westmere-EX | |
| CPU | 32-CPU (128-core) | 8-CPU (64-core) | 8-CPU (80-core) |
| SAP SD 2-tier benchmark | 120k SAPS (940 SAPS/core) | 200k SAPS (3125 SAPS/core) | 120k SAPS (1500 SAPS/core) |
| Configuration & Cost | 512 GB of memory with HP-UX and 3 years basic HW and SW support lists for $1,722,390 | 512GB of memory, AIX UNIX and 3 years basic HW and SW support lists for <$1,000,000 | Intel Westmere-EX with 512GB of memory, SuSe Linux and 3 years basic HW and SW support lists for <$100,000 |
| Size and Power Consumption | 36U / 9kW | 8U / 3.2KW | 8U / 4kW |
| Roadmap | 2 more generations of Itanium, the first of which is 3 years late to market. | There is a commitment to 2 more generations of IBM POWER and they have a detailed roadmap available here. | See the below image to see the focus on x64 roadmap! |
| Scalability (single-system) | 128-cores, 4TB RAM, 240k SAPS | 256-cores, 8TB RAM, 700k SAPS | 80-cores, 3TB RAM, 120k SAP |
What will be the death knoll?
This is interesting because 95% of Itanium systems were shipped by HP in 2008, according to Gartner. 90% of those that run Itanium for SAP run the HP-UX OS. I’d love to see the stats but from my SAP statistics vs the overall systems sold, I estimate that at least 30% of those are used to run SAP – I suspect this is the biggest single software vendor that runs Itanium.
And SAP hasn’t said so, but they will stop development on the Itanium platform. They have to because the only database that runs on that platform is Oracle 11g (or MSSQL on Windows 2008).
Disclosure: This is a republished post from John Appleby's original post.
Add to this SAP’s promotion program around its own Sybase ASE database and HP’s financial inability to prop up Itanium and perhaps you will agree that the Superdome will move from an endangered species to a dead duck.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
Shame about this
Conversely, we'd have an Intel monopoly
Why do you think Intel didn't try to optimize Itanic?
Oracle or not, the math of the system does not add up to more sales
That's why customers are migrating to IBM
So you are not the only one not taking Oracle up on their offer!
Skrew Angry Larry
True, it is IBM
Does Appleby sound credible with his statements?
One, I knew HP had not submitted an SAP benchmark for Superdome 2. So, it's a big question how Appleby got his saps numbers for Superdome 2.
Second, his claim that Xeon has already eclipsed Itanium for RAS. I don't think so. When one of several CPU fails, which system will carry on without crashing the server? RISC and Itanium servers that both use UNIX will carry on. When a memory dimm fails, without memory mirroring, which system will carry on without crashing the server? RISC and Itanium servers systems will.
Nearly half of SAP's UNIX base are on Itanium-based systems. The biggest SAP installations are on Itanium-based Superdomes. This is not trivial.
There are different reasons why customers might choose an x86 platform over Itanium-based platform. An INTEL executive is right by saying it comes to the choice of OS capabilities. It's not just all hardware-driven choice. That is too simplistic. If customers wanted to consolidate and use hardware-based partitioning, they can't do that on Xeon with either Windows or Linux. If customers wanted to logical partitioning with dedicated I/O, they can't do that on Xeon because Windows and Linux can't do that.
So, is Appleby so ignorant of these choices customers make? His discussion is not constructive to customers and sounds more like a FUD material against the HP platform.
Given the issues above, is he credible?
Third, whose CPU roadmap has not experienced delays? INTEL is on schedule with Poulson, the next Itanium this year.
These things often get religious!
1) Estimate of HP Superdome2 SAPS rating
It is deeply worrying that HP haven't rated their own equipment, despite them being to market for 2 years. Draw your own conclusion into what that means! As I said in the article, I extrapolated the SPEC INT_Rate2006 benchmark, and here is the maths:
IBM (8 CPUs, 64 Cores), 2530 SPEC, 202180 SAPS (80 SAPS per SPEC)
Westmere-EX (8 CPUs, 80 Cores): 1920 SPEC, 124430 SAPS (65 SAPS per SPEC)
HP Superdome1 (32 CPUs, 64 Cores): 824 SPEC, 46380 SAPS (56 SAPS per SPEC)
HP Integrity Blade - current generation (8 CPUs, 32 Cores): 531 SPEC
So we take a 32 CPU Superdome and hope it will scale linearly (they have in the past) and that gives us 128 cores and 2124 SPEC. Multiply that by 56 and you get 118944 SAPS. It's not perfect but I bet it's not far wrong.
2) RAS
The Westmere-EX has better predictive failure, can shut down DIMMs and recover failure - RAS features that will not be available until the Poulson CPU. The SuperDome has better CPU failure, replacement and DIMM replacement functionality. Which do you want more? For in-memory databases, you want the former, because it's easier to deploy warm nodes. For other applications, you want the latter.
And if you want the latter, the Superdome is much better compared to IBM's POWER7 platform if we want to compare apples with apples and it compares poorly.
3) SAP UNIX customers
Sorry but your facts are wrong here. If we exclude Linux systems (and there are lot of Linux systems that are mission critical now) from the UNX category then there are 51% on IBM/AIX, 29% on HP/HP-UX and 20% on Oracle/Solaris. I have no doubt that a lot of them run mission-critical systems on HP-UX - that is the reason to buy into the platform after all - but the evidence of falling HP sales and increasing IBM sales suggests that they are migrating from HP onto IBM.
4) Customer choices
Let's go back to the facts for SAP customers: UNIX overall represents less than 50% of the install based in Large Enterprise. This is because Westmere-EX is eating UNIX's pie. As one memory engineer tells me:
"1. If you consider your usage of the computing platform to be mission critical, then you can't get enough RAS features. Any and all RAS features would probably be "must have".
2. OTOH, once you fall outside of the mission critical space, then it's just a matter of degree of how "mission critical" you consider your usage to be, versus the likelihood of failure, versus how much money you can save by forgoing that RAS feature."
And we can see the impact of this in the market - for example 91.4% of the TOP500 systems run Linux - mostly either on Opteron or Xeon. And we can see it in the sales figures. Here are the facts:
- Server market declined 2.4% year on year in Q1 2012
- Whilst HP lost market share in the UNIX market, it has gained it in the x86 market with 29.3% overall revenue market share
- IBM's UNIX revenue declined 3.7% year on year and it gained 6.3% market share
Reference: IDC http://www.idc.com/getdoc.jsp?containerId=prUS23513412
5) Delays
Yes, everyone's CPUs have been delayed at one time or another, but it is the delays to Tukwila that meant the Superdome2 was 3 years late to market and non-competitive. All the other vendors can, on some level, broadly compete.
I hope this clarifies some of your points!
Itanium and SAP?
failing CPU? Failing DIMM?
Seriously, is this an issue for you? I am not being snarky, I am asking a genuine question.
I work in an all Dell shop, Intel Pentium and/or Xeon, 32 and 64 bit depending on the unit, and in 11 years, I can only remember one server losing a CPU in a dual PIII CPU system, and one desktop losing a memory module, (which was probably bad from the start, but didn't access those bad bits often enough to come to my attention sooner.) This is a really rare thing from my perspective. If it does happen a lot, I am glad I do not need to use the hardware you do.
CPU vs DIMM
Itanium was a non-starter when it replaced Alpha.
If anything, I was extremely disappointed with HP dropping the proven (and still very scalable) PA-RISC, and later Alpha technologies for an unproven newcomer. I had a feeling this was going to happen (it's eventual failure), and it did. HP just didn't let on about the failure until 2 or 3 years after it should have.
Very True
I'm not sure that HP has let on about the failure; they are blaming Oracle for it. My personal opinion whilst Oracle may or may not be acting reasonably in dropping support, they are only pointing out what Intel and HP already knew. But HP are still claiming it's a valid platform that customers should invest in.
I would have put all my R&D budget behind the Alpha - I always believed it was the best RISC platform.
An performance focused view is irrelevant
To say Xeon has more RAS Features than Itanium is simply wrong. In fact Xeon caught up with Itanium just very recently on Westmere-EX. Technologies like Memory Patrol Scrubbing and Double Chip Kill have been in place on Itanium since Montvale. Also there is no such thing like excluding or shutdown a DIMM. In fact the Intel memory subsystem dictates that need to have lockstep-pairs available and memory access is split over multiple channels and SMI lanes to make use of a setup called hemisphere mode. If you loose an entire DIMM in this mode (shut the OS not support MCA-R you crash as soon as you access the memory block), you actually loose (after the reboot) pairs or (depending on the implementation) quads of DIMMs.
What Xeon can do is set aside ranks or entire sets of memory for failover, but it for instance misses hardened latches to cope with ECC errors in L1-L3 cache. QPI transaction retry is very simple compared to the interconnect fabric routing and protection that runs cell/blade-based system like a Superdome 2. These systems can scale I/O and CPU indecently, try that with x86.
You may be impressed by the Intel specs of the chip itself but in reality a lot of these features work only in conjunction with proper server Firmware or OS implementations. Take MCA-based memory error recovery for instance. Very few vendors implement this and it's mostly a feature that exists on paper since it's very hard to prove which kind of memory errors can actually be handled and how the OS/hypervisor will react. RAS Features like PCI Hotplug, CPU Hotadd, Memory Hotadd have been available on Integrity systems for a long time and since these days supported by HP-UX but only very recently RHEL made a bold statement to support this. But it only means, that the mechanisms are coded into the kernel, not that this has been a tested and field-proven technology. However you might take a guess how well it works given the few customers that are actually willing to rely on this instead of adding just another cheap box. If you would bet the uptime of your business on the proper implementation of such features that were unknown 2 years ago, go for it - I would'nt.
Xeon has greatly caught up on RAS features true, which is potentially good news for all x86 customers out there, as soon as firmware and OS vendors do their homework. Take a look at the IBM x3850. It took IBM over 18 months and two revised RedBook version to make them comfortable stating that the system runs properly with the MAX5 as a node controller. Node controllers have been implemented on Integrity with the Superdome 1.
In reality resiliency is more than recently introduced features on paper. It's about an entire ecosystem that is developed, tuned and maintained together, like Itanium, Integrity, HP-UX and add-on software like HP Serviceguard or MirrorDisk/UX. And has a long history in the field. Comparing this to x86 is non-sense since this is a conglomerate of products, firmware code, software, drivers and applications all from different vendors, working together on a very minimal subset of standards in a very fast-changing environment of x86 and Linux.
Itanium is not the the best performing processor. Also true. There are solutions that are cheaper and more energy-effective. However again it's the entire ecosystem of mission-critical platforms like Integrity or POWER that keep your traffic control and aerospace surveillance system online, not the processor.
The costs you save on Xeon vs. Itanium/Integrity you will be re-spend 3-4x times developing and testing a real bullet-proof solution (begin with firmware and OS patch cycles, as well as error detection on fabrics and networks). The costs you save on POWER (less cores to do the work) vs. Itanium/Integrity you will likely spend for the expensive IBM Support contracts and following fork-lift upgrade. IBMs POWER systems are monolithic and cannot be integrated in a converged IT.
Project Odyssey, should you have understood it even a bit, is about integration Integrity and ProLiant on a common mission-critical platform. They will take the SD2 architecture and make it run with Xeon-Processors. This way customer can use HP-UX as a real UNIX on Integrity SD2 Blades and enjoy the scalability and resilience of the SD2 hardware with all partitioning features and firmware capabilities with x86 SD2 blades. It's not about migrating, it's about choice and technology leadership. They call this converged infrastructure and it enables you to monitor, maintain, provision and control your Integrity systems in the exact same way as ProLiant systems. Even third-party. Even as private or hybrid-cloud implementation (HP CloudSystem). Try to do that with IBM.
Also in this context HP will make contributions to the FOSS community to enhance Linux to make it more mission-critical. The only companies today I know of that have the budget, the man power and the lever to run Linux in business-critical environments are Google and Amazon. For all other running Linux mission-critical is just too expensive.
As you can see mission-critical is way more than a processor technology. Mission-critical is also not about performance. Spending double or triple the amount of CPUs on Integrity or 3-4x times more on I/O on POWER is the smallest part on the whole equation. Operative costs make up to 60% of the total TCO of a server system, look at the IDC stats. This is even more true in such sensitive environments.
And by the way: SAP has made a strategic commitment to UNIX in general and HP-UX in particular, it was part of the whole Oracle Announcement thingy back last year. Go Google it. You can bet on them dropping Oracle as their primary database and going for Sybase as their new primary OLTP database, since it's their own now. They don't like Larry either. And yes, Sybase runs on HP-UX.
Some good points here
MCA-R works in runtime with Westmere-EX - see this YouTube video. http://www.youtube.com/watch?v=BDLn5oGBPok
I think there is a key point here which is for examples like in-memory computing, the RAS features of Westmere-EX are favourable to the features of Itanium. However for true mission-critical the complete replacement potential of the RAS features of the IBM Power 797 - see here: http://www.redbooks.ibm.com/redpapers/pdfs/redp4640.pdf are preferable to that of the HP SuperDome. Where does HP fit into the equation?
As for Project Odyssey - yes, I read more detail into it too but my facetious reply remains. To an outsider it looks like a way to migrate from Integrity to Proliant. Anyhow we are in agreement - the rules change for the fraction of mission-critical systems.
The problem is that I have not yet encountered a real mission critical SAP environment. Mostly SAP runs Finance, Operations, Supply Chain or Manufacturing processes - to simplify. None of these require the RAS features that make IBM z-Series or HP Superdome actually interesting.
And if you think that Sybase ASE is going to save the Superdome? It may be supported but only for SAP ERP 6.0 EhP5 customers running Unicode systems. Which represents less than 5% of the install base. By then, I suspect that they all will have gone elsewhere.
You make a good case... for IBM Power.
Nevertheless, everything you write about is available on IBM Power - AIX. Power 7 will blow Itanium out of the water on OLTP or any other core Unix type workload. Unix systems are about integrating the system from the silicon to the OS for maximum RAS and resource efficiency... which is why HP never should have outsourced the key component, CPU, of their Unix systems to Intel. If you want to race Formula 1, you have to design the engine.
"Try to do that with IBM."
See IBM PureSystems.
Bit of a red herring ....
It's true that Xeon is biting steadily into the RISC/Itanium market. HP clearly isnt' oblivious to that fact, as the largest supplier of Xeon systems on the planet. That doesn't mean Itanium is dead quite yet, any more than mainframes or SPARC are. Power will hang in for a while, but in the Ivy Bridge timeframe, Xeon will quite possibly surpass Power in per-core performance. HP will be pre-positioned to leverage that with the SD2 chassis via Odyssey. But even that won't spell the immediate demise of Integrity. Here's why.
First off, comparing current software ecosystems for Xeon to AIX or HP-UX is comparing apples to oranges. There are value propositions to the UNIX variants that just don't yet exist on Xeon. Look at Oracle, for instance. The resource granularity for Oracle licensing on Xeon is non-existent on Xeon unless you use Oracle's OS stack; regardless of how much performance you need, you license every core on a machine. On a ten-core Westmere, you're approaching a quarter million dollars per socket for Oracle Enterprise. That's $2 million on an 8-socket machine. I've built some extremely large application environments, and I have seen VERY few that need 80 cores of Westmere in a single database server. Those software costs can very rapidly offset any savings you get on buying Xeon over Integrity. HP has been selling a Proliant called the DL980 for a couple of years. It's an 8 socket machine using SD2 node controllers, and it performs better on SPEC than either IBM's 8-socket Westmere box or the Sunfire x4800. DL980 has displaced some Integrity servers, but the immaturity of the software ecosystem makes it unsuitable for many customers.
That's one reasone why spinning Odyssey simply as a move to get customers to Proliant is a bit simplistic. There's also the investment protection/evolutionary aspect. Odyssey won't just use the SD2 architecture, it will use the EXISTING SD2 chassis. Ultimate investment protection. As Xeon and Linux overtake power, high end Power systems will become very expensive doorstops. Customers will be able to use their SD2 chassis and much/all of the associated I/O subsystem for many years. That doesn't sound like a dead box to me. In fact, it sounds like the best deal on the planet for protecting a customer's investment in very expensive iron.
One final note; your assertion that SAP doesn't really require HA is misguided at best. I have architected a number of large-scale SAP environments, on a variety of operating systems and databases (including, but certainly not limited to, HP-UX). I have manufacturing customers that run SAP in mission critical environments. There's one that comes to mind immediately that has a single central instance for their worldwide SAP system. If it's down, it costs them several hundred thousand dollars per hour. I rearchitected their environment to achieve appropriate levels of HA after one costly outage. I don't think I'd have to look far to find other mission-critical SAP installations.
Roadmap?
On the Itanium side, I see two future generations listed, Poulson and Kittson.
My conclusion is that IBM is going to discontinue their research and development on Power after Power 8. I guess the demise of RISC will come much sooner to IBM.
IBM never releases full detail roadmaps