Hardware strategies and programming models: What's coming

Paul Murphy | September 13, 2005 1:59 PM PDT

Summary

Paul Murphy: The big players -- IBM, Sun, Microsoft, and Intel -- appear to have laid down four major hardware bets. Sun's is both the most announced and most misunderstood.

Right now the big players appear to have laid down four majorhardware bets:

• Sun is betting on Solaris and chip level multi-threading;

• IBM is betting on Linux and Cell based processing;

• Microsoft is betting on becoming the home computing standardby building an entertainment complex around its "nexus" digital rights management technologyand the IBM PowerG5 successor embedded in the X360 games console; and,

• Intel is betting on picking up the leftovers -playing follow the leaderon power and multi-core while trying to maintain performance forMicrosoft users and traditional x86 code.

All four look like hardware bets but are actually based on assumptions aboutsoftware and how the market responds to software change.


Paul MurphyPaul Murphy's Managing Linux blog covers a variety of platform issues relevant to enterprise-level decision making.
Thus Intel's announcements at its recent developer conference in San Franciscoincluded:

• a strategic emphasis on getting more processing per watt -a follower responseto initiatives by Sun, IBM, and AMD on improving power efficiency by design simplification,increased parallelism, and transmission cost reduction;

• expedited development and shipping of dual core products, including place holder early models consistingof two cores in one package with no significant design or other change;

• addition of two AMD developed 64bit compatibility instructions to thecommon x86 instruction set;

• the dropping of hyperthreading from major market CPU designs; and,

• a new technology making cache allocation dynamic for true multi-core CPUs aimed at makingthem more effective for uni-processor applications.

Look at the strategy behind these announcements and what you see is a major effort at playingcatch-up ball, and a big bet on the continuation of demand for CPU engines that run single, in line,processes as quickly as possible.

The end of Intel's hyperthreading says it all: most people turned this off in the PC's BIOS becauseMicrosoft simply never got fully behind the technology.Here the dynamic cache allocation technology isthe second shoe: Intel doesn't expecta miraculous change favouring parallelism in the code its products generally run and wants tokeep the customers who run single threaded processes as happy as possible.

Oddly, Sun's Java may be the only widely usedlanguage actually capable of automated multiple thread support on Intel, suggesting that it may be yearsbefore the PC industry catches up with the hardware and therefore that Intel's bet may be better thanit looks.

While this dead endsthe company as a long term strategy it works as a short term strategy, turningsunk costs into earnings value while attempting to do to AMD andMicrosoft, what AMD and Microsoft have done to Intel- twice.The first time came when Intelintroduced the Pentium Pro and genuinely 32bit computingto the Wintel market before that market was ready for it. That gave AMD the opportunity to steal marketshare by keeping 16bit instructions in their K-series CPUs, enabling older code to run better on AMD chipsthan on Intel's PII. More recently, of course, AMD pioneered 64bit x86, giving users the ability torun old 32bit code on new 64bit machines, pretty much driving Intel's Itanium 64bit chip out of themarket, and forcing Intel into the catch-up position it's in today.

In both cases Microsoft, at least from Intel's perspective, was missing in action - essentially sidingwith AMD by not actively supporting Intel initiated change. This time, however, Intel is bettingon maintaining backwards compatibility with the single threadedprocessing model embraced by the current x86 market while half-heartedly challenging Microsofton the home entertainment front.

Intel's strategy, in other words, is to hope that programmer inertia keeps existing programming modelsin the mainstream long enough for them to think of something to do.

Notice, that in saying this I'm adopting the Unix definition of a thread and not Microsoft's.Specifically a thread is:

A flow of control within a single UNIX process address space. Solaris threads provide a light-weight formof concurrent task, allowing multiple threads of control in a common user-address space, with minimalscheduling and communication overhead. Threads share the same address space, file descriptors (when onethread opens a file, the other threads can read it), data structures, and operating system state. A threadhas a program counter and a stack to keep track of local variables and return addresses. Threads interactthrough the use of shared data and thread synchronization operations.

A thread permanently assigned to a particular light weight process [LWP]is called a bound thread. Bound threads can be scheduledon a real-time basis in strict priority with respect to all other active threads in the system, not onlywithin a process. An LWP is an entity that can be scheduled with the same default scheduling priority asany UNIX process.

Although Windows theoretically allows a broadly similar form of threading, called fibres, these don't seem toget much actual use. Instead nearly all Windows threads are closer in nature to what Unix would callan unbound thread - a single flow of control that has to have a unique LWP assigned to it for context.

The nexus is a digital rights management technology that's beingembedded first in the X360 games console but has far wider implications. Most fundamentally nexus representsa single point at which three controls must match for the operating system to load, the DVD to play, or thegame to start. Thus a nexus equipped games console can be hooked to a home HDTV to provide a fully "rightsprotected" environment for a rented movie DVD, a network capable multi-user game, or home web access -andadditional technologies, including licensed use of Microsoft Office under Microsoft Windows/XP, can beplugged in at will.

For Microsoft it's the DRM appeal to movie and other entertainment producers that's the ten billion dollarbet here, but it's the implementation that's interesting from a programming model perspective. Theprocessor in the first X360 (like the original, an "all round" performer) is a three core, six thread, 3.2Ghz PowerPC compatible derived mainly fromthe G5 processor. What's important about that is that this machine embeds the Unix thread modelin hardware and therefore requires a fundamental change in programming model from what worksfor x86.

Although I don't have access to Microsoft's research on this, I'd bet a month's earnings that their numbersshow Apple's Darwin and MacOS X shell running extremely well on this machine, and Windows/XP runningstolidly: meaning without new failures, but slowly - probably at about the level of a 1.5Ghz x86 machine.The reason for this is that code already optimized for the G4/G5 environment should compile very well forthis machine, but code from the x86 world will work about as well as x86 games and applicationsrecompiled for the Mac have always worked - slowly.

In particular the approach to object orientation embedded in Windows NT 3.51continues in Windows/XP and implies a "waterfall" approach to passing control between objects thatmediates against multi-threading in the Unix sense of datalinked processes running in parallel - meaning that thecurrent code base cannot reasonably be retrofitted to the PowerPC programming model.

On the other hand Microsoft's "big top" project is rumored to have many of the technical characteristics,including Unix style thread support and asynchronous control flows, touted as part of the "Longhorn" visiononly a few years ago but now apparently abandoned in favor of another NT generation. If deliverable, suchan OS could be a good fit for the multi-core, multi-threaded PowerPC in the X360 but applicationswritten for x86 would obviously not run well. What we have here, in other words, issuch a fundamental shift in design philosophy that you'd have to expect Microsoft to practice thetechnical equivalent of serial monogamy with respect to these sets of ideas.

Just how they plan to do that isn't obvious - but one possibility would be to use the existing Macintoshcode base for Office to deliver that for X360 fairly early in thegame, swap in the networked operating system when it becomes available, and thenleverage the resulting home computing base back into the business office as an NTreplacement technology right about the time x86 dies out of the market. That may sound weird, but look at the benefits if they pull it off: access to real multi-threading, an end to the use of unlicensed software,and an end to the backwards compatibility issues for x86 hardware that have turned a five million line VMS cloneinto a sixty million line monster.

IBM's strategy, in contrast, at least looks very clear: bet the software business on taking over Linux anduse it to push the cell architecture into the services markets for everything from desktop to super computerswhile partners Sony and Toshiba push it into entertainment and the Asian volume PC market.

In its present form the Cell processor is made up of building blocks that amount to eight wayGRIDs on a chip. Thus there is one PowerG5 derived master processor and eight special purposeunits that handle task execution in parallel. Key to the programming model for this thing is an accesslayer, or abstraction, that goes well beyond the old microcode idea to function as an operatingsystem for the grid -and isn't limited to single assemblies.

A critical consequence of this is that thecreation and management of larger grids made up of eight way blocks is tobe handled by this machine level OS, not the Linux OS running on top of it. In some ways this lookslike a nice simplification offering security, portability, and programming advantages. Inothers, however, it looks a lot like a kludge designed to get around the limitations ofthe older Mutex/Locks based architecture in the current Linux kernel. Had IBM decided, for example, togo with something like Dragonfly this abstraction layer would nothave been necessary with consequent simplifications in the programming, networking, and security models.

The instruction set and microcode for the master processor means thatstandard Linux and other open source code ports easily to this environmentbut makes little use of the grid unless appropriately modified. When modified to fit theabstraction layer, however, code that runs on a single eight-waychip, also runs unchanged on complexes made by linking cells. As a result IBM could be poisedto deliver technologies that will give its users a single code base from a laptop PC containing apartial cell assembly to petaflop super computers containing hundreds of them.

Making it happen won't be easy - in fact early uses of the cell in Sony's PlayStation three relyon an external GPU despite the fact that the cell is better at this kind of work than the GPU is,simply because the programming model becomes too difficult without it.

IBM has recently posted extensive Cell documentation including an overview by chief designerPeter Hofstee that includes this bit:

The most productive SPE memory-access model appears to be the one in which a list (such as ascatter-gather list) of DMA transfers is constructed in an SPE's local store so that the SPE's DMAcontroller can process the list asynchronously while the SPE operates on previously transferred data.In several cases, this new approach to accessing memory has led to application performance exceedingthat of conventional processors by almost two orders of magnitude, significantly more than anyone wouldexpect from the peak performance ratio (about 10x) between the Cell Broadband Engine and conventionalPC processors.

Sounds delightful doesn't it? But there's tremendous value there too: getreally good at programming for the grid and you can get in the range of 100 times Intel's performance just on the eight way standardmachine. Getting started on this is easy, unfortunately gettinggood at it is very hard. Here's how Arnd Bergman put it in theLinux programming model for Cell:"porting Linux to run on Cell's PowerPC core is a relatively easy task because ofthe similarities to existing platforms like IBM pSeries or Apple Power Macintosh,but this does not give access to the enormous computing power of the SPUs."

Basically Intel's bet is that IBM's strategy will founder on the complexity of the Cell's programmingmodel and that the resulting refusal to change by a massive majority of programmers will force Microsoftto rethink its plans too. In effect they're betting they can't go wrong by under estimating people -making me wonder if there isn't an element of self-mockery in their choice of "Dunnington" asthe code name for the next generation Xeon.

Of the four, Sun's strategy is both the most announced and most misunderstood. Like IBM, Sun has been doing multi-core for a number of years, but their software strength has consistently been where IBM hasbeen weakest: in Unix style symmetrical multi-processing [SMP] and 64 bit binary compatibilityfrom a single mid nineties UltraSPARC II at 200Mhz to today's 144 core, 1.4Ghz, SunFire 25K.At the software level Sun's strategy is tocapitalize on this lead by pushing Solaris both inwards and outwards at the same time - inwards towardincreased on chip functionality, and outwards to make network resourcesmore and more easily available to local processes.

On the hardware side this strategy expresses itself in what Sun calls chip level multi-threading: asolution to the gap between memory access and CPU speeds based on automatically interleaving anumber of processes (called threads if they share a namespace) on a processor while running the memoryaccesses needed to support this in parallel. Even the initial hardware, however, implements this attwo levels. Thus the first Niagara CPUs will have eight cores on the chip assembly, each of whichis capable of interleaving four threads. In effect, where IBM's cell is a grid on a chip, Sun's multi-coresystems represent multi-threaded SMP on a chip.

At the most superficial level the biggest differences between Sun's strategy and those being pursuedby Microsoft and IBM arise because Sun's new products are backwards compatible and theirs aren't. Thusa Sun customer who loads an existing SPARC binary on the new machine isn't likely to see a significantdecrease in performance while an IBM or Microsoft customer would first need to re-compile and then accept a tremendous performance hit - perhaps as high as 50%, if the code isn't alsore-written to use the new programming model.

From a Sun marketing perspective the ability to run old binaries without significant penalties is likely to prove extremely valuable to general business customers. It won't, however,matter much in entertainment, scientific processing, financial analysis,and operations research because those markets are dominated by highly technicalpeople willing to revise code to get better performance. That, therefore, is whereIBM will focus its initial Cell marketing - and, incidently, why Sun has been soclose mouthed about floating point performance on CMT systems.

In other words:

• Intel's strategy is to bet on people not making the transition to a new programming model.

• It's not clear where Microsoft is going - but the smart money is that they'll try to haveit both ways: letting Windows/XP run out the x86 string, and focusing on entertainment functionsfor the X360 while working to deliver the distributed OS ideas behind the original Longhorn visionin the X360 network environment before leveraging that back into the office as an NT/x86 replacement.

• IBM will use Linux on Cell throughout its product line and is clearly committed to developing tools thatmake existing Linux applications work well, working on advanced parallelization software for the science markets, and letting Sony drive work onthe visualization side of the tools and applications business.

• Sun's public strategy is to build applications support on its open source heritage,integrate cheaper, faster, storage with their on chip SMP offerings, and pushSolaris more and more into the direction of the Plan9 second generation Unix ideas: really the core Unixideas migrated from a machine focus to a network focus -delivering user services from anywhere toanywhere.

Notice that of the four big hardware bets in play, only Sun's represents a programming modeladvancing the art of Unix -the others reflect varying degrees of cynicism about thecustomer. Thus Intel is basically betting it can slow or even stopsignificant change; Microsoft is betting on DRM and home entertainment to drive a change from the uni-process x86 model to the standard Unixthreading model; and, IBM is fundamentally betting on taking over Linuxto sell a dramatically more efficient way of implementing ideas onlylawyers can readily distinguish from those underlying Sun's mid nineties open source GRID software.


Paul MurphyPaul Murphy wrote and published The Unix Guide to Defenestration.Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-relatedmanagement issues.

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity