X
Tech

Hardware strategies and programming models: What's coming

Paul Murphy: The big players -- IBM, Sun, Microsoft, and Intel -- appear to have laid down four major hardware bets. Sun's is both the most announced and most misunderstood.
Written by Paul Murphy, Contributor

Right now the big players appear to have laid down four major hardware bets:

• Sun is betting on Solaris and chip level multi-threading;

• IBM is betting on Linux and Cell based processing;

• Microsoft is betting on becoming the home computing standard by building an entertainment complex around its "nexus" digital rights management technology and the IBM PowerG5 successor embedded in the X360 games console; and,

• Intel is betting on picking up the leftovers -playing follow the leader on power and multi-core while trying to maintain performance for Microsoft users and traditional x86 code.

All four look like hardware bets but are actually based on assumptions about software and how the market responds to software change.


Paul Murphy Paul Murphy's Managing Linux blog covers a variety of platform issues relevant to enterprise-level decision making.
Thus Intel's announcements at its recent developer conference in San Francisco included:

• a strategic emphasis on getting more processing per watt -a follower response to initiatives by Sun, IBM, and AMD on improving power efficiency by design simplification, increased parallelism, and transmission cost reduction;

• expedited development and shipping of dual core products, including place holder early models consisting of two cores in one package with no significant design or other change;

• addition of two AMD developed 64bit compatibility instructions to the common x86 instruction set;

• the dropping of hyperthreading from major market CPU designs; and,

• a new technology making cache allocation dynamic for true multi-core CPUs aimed at making them more effective for uni-processor applications.

Look at the strategy behind these announcements and what you see is a major effort at playing catch-up ball, and a big bet on the continuation of demand for CPU engines that run single, in line, processes as quickly as possible.

The end of Intel's hyperthreading says it all: most people turned this off in the PC's BIOS because Microsoft simply never got fully behind the technology. Here the dynamic cache allocation technology is the second shoe: Intel doesn't expect a miraculous change favouring parallelism in the code its products generally run and wants to keep the customers who run single threaded processes as happy as possible.

Oddly, Sun's Java may be the only widely used language actually capable of automated multiple thread support on Intel, suggesting that it may be years before the PC industry catches up with the hardware and therefore that Intel's bet may be better than it looks.

While this dead ends the company as a long term strategy it works as a short term strategy, turning sunk costs into earnings value while attempting to do to AMD and Microsoft, what AMD and Microsoft have done to Intel- twice. The first time came when Intel introduced the Pentium Pro and genuinely 32bit computing to the Wintel market before that market was ready for it. That gave AMD the opportunity to steal market share by keeping 16bit instructions in their K-series CPUs, enabling older code to run better on AMD chips than on Intel's PII. More recently, of course, AMD pioneered 64bit x86, giving users the ability to run old 32bit code on new 64bit machines, pretty much driving Intel's Itanium 64bit chip out of the market, and forcing Intel into the catch-up position it's in today.

In both cases Microsoft, at least from Intel's perspective, was missing in action - essentially siding with AMD by not actively supporting Intel initiated change. This time, however, Intel is betting on maintaining backwards compatibility with the single threaded processing model embraced by the current x86 market while half-heartedly challenging Microsoft on the home entertainment front.

Intel's strategy, in other words, is to hope that programmer inertia keeps existing programming models in the mainstream long enough for them to think of something to do.

Notice, that in saying this I'm adopting the Unix definition of a thread and not Microsoft's. Specifically a thread is:

A flow of control within a single UNIX process address space. Solaris threads provide a light-weight form of concurrent task, allowing multiple threads of control in a common user-address space, with minimal scheduling and communication overhead. Threads share the same address space, file descriptors (when one thread opens a file, the other threads can read it), data structures, and operating system state. A thread has a program counter and a stack to keep track of local variables and return addresses. Threads interact through the use of shared data and thread synchronization operations.

A thread permanently assigned to a particular light weight process [LWP] is called a bound thread. Bound threads can be scheduled on a real-time basis in strict priority with respect to all other active threads in the system, not only within a process. An LWP is an entity that can be scheduled with the same default scheduling priority as any UNIX process.

Although Windows theoretically allows a broadly similar form of threading, called fibres, these don't seem to get much actual use. Instead nearly all Windows threads are closer in nature to what Unix would call an unbound thread - a single flow of control that has to have a unique LWP assigned to it for context.

The nexus is a digital rights management technology that's being embedded first in the X360 games console but has far wider implications. Most fundamentally nexus represents a single point at which three controls must match for the operating system to load, the DVD to play, or the game to start. Thus a nexus equipped games console can be hooked to a home HDTV to provide a fully "rights protected" environment for a rented movie DVD, a network capable multi-user game, or home web access -and additional technologies, including licensed use of Microsoft Office under Microsoft Windows/XP, can be plugged in at will.

For Microsoft it's the DRM appeal to movie and other entertainment producers that's the ten billion dollar bet here, but it's the implementation that's interesting from a programming model perspective. The processor in the first X360 (like the original, an "all round" performer) is a three core, six thread, 3.2Ghz PowerPC compatible derived mainly from the G5 processor. What's important about that is that this machine embeds the Unix thread model in hardware and therefore requires a fundamental change in programming model from what works for x86.

Although I don't have access to Microsoft's research on this, I'd bet a month's earnings that their numbers show Apple's Darwin and MacOS X shell running extremely well on this machine, and Windows/XP running stolidly: meaning without new failures, but slowly - probably at about the level of a 1.5Ghz x86 machine. The reason for this is that code already optimized for the G4/G5 environment should compile very well for this machine, but code from the x86 world will work about as well as x86 games and applications recompiled for the Mac have always worked - slowly.

In particular the approach to object orientation embedded in Windows NT 3.51 continues in Windows/XP and implies a "waterfall" approach to passing control between objects that mediates against multi-threading in the Unix sense of data linked processes running in parallel - meaning that the current code base cannot reasonably be retrofitted to the PowerPC programming model.

On the other hand Microsoft's "big top" project is rumored to have many of the technical characteristics, including Unix style thread support and asynchronous control flows, touted as part of the "Longhorn" vision only a few years ago but now apparently abandoned in favor of another NT generation. If deliverable, such an OS could be a good fit for the multi-core, multi-threaded PowerPC in the X360 but applications written for x86 would obviously not run well. What we have here, in other words, is such a fundamental shift in design philosophy that you'd have to expect Microsoft to practice the technical equivalent of serial monogamy with respect to these sets of ideas.

Just how they plan to do that isn't obvious - but one possibility would be to use the existing Macintosh code base for Office to deliver that for X360 fairly early in the game, swap in the networked operating system when it becomes available, and then leverage the resulting home computing base back into the business office as an NT replacement technology right about the time x86 dies out of the market. That may sound weird, but look at the benefits if they pull it off: access to real multi-threading, an end to the use of unlicensed software, and an end to the backwards compatibility issues for x86 hardware that have turned a five million line VMS clone into a sixty million line monster.

IBM's strategy, in contrast, at least looks very clear: bet the software business on taking over Linux and use it to push the cell architecture into the services markets for everything from desktop to super computers while partners Sony and Toshiba push it into entertainment and the Asian volume PC market.

In its present form the Cell processor is made up of building blocks that amount to eight way GRIDs on a chip. Thus there is one PowerG5 derived master processor and eight special purpose units that handle task execution in parallel. Key to the programming model for this thing is an access layer, or abstraction, that goes well beyond the old microcode idea to function as an operating system for the grid -and isn't limited to single assemblies.

A critical consequence of this is that the creation and management of larger grids made up of eight way blocks is to be handled by this machine level OS, not the Linux OS running on top of it. In some ways this looks like a nice simplification offering security, portability, and programming advantages. In others, however, it looks a lot like a kludge designed to get around the limitations of the older Mutex/Locks based architecture in the current Linux kernel. Had IBM decided, for example, to go with something like Dragonfly this abstraction layer would not have been necessary with consequent simplifications in the programming, networking, and security models.

The instruction set and microcode for the master processor means that standard Linux and other open source code ports easily to this environment but makes little use of the grid unless appropriately modified. When modified to fit the abstraction layer, however, code that runs on a single eight-way chip, also runs unchanged on complexes made by linking cells. As a result IBM could be poised to deliver technologies that will give its users a single code base from a laptop PC containing a partial cell assembly to petaflop super computers containing hundreds of them.

Making it happen won't be easy - in fact early uses of the cell in Sony's PlayStation three rely on an external GPU despite the fact that the cell is better at this kind of work than the GPU is, simply because the programming model becomes too difficult without it.

IBM has recently posted extensive Cell documentation including an overview by chief designer Peter Hofstee that includes this bit:

The most productive SPE memory-access model appears to be the one in which a list (such as a scatter-gather list) of DMA transfers is constructed in an SPE's local store so that the SPE's DMA controller can process the list asynchronously while the SPE operates on previously transferred data. In several cases, this new approach to accessing memory has led to application performance exceeding that of conventional processors by almost two orders of magnitude, significantly more than anyone would expect from the peak performance ratio (about 10x) between the Cell Broadband Engine and conventional PC processors.

Sounds delightful doesn't it? But there's tremendous value there too: get really good at programming for the grid and you can get in the range of 100 times Intel's performance just on the eight way standard machine. Getting started on this is easy, unfortunately getting good at it is very hard. Here's how Arnd Bergman put it in the Linux programming model for Cell: "porting Linux to run on Cell's PowerPC core is a relatively easy task because of the similarities to existing platforms like IBM pSeries or Apple Power Macintosh, but this does not give access to the enormous computing power of the SPUs."

Basically Intel's bet is that IBM's strategy will founder on the complexity of the Cell's programming model and that the resulting refusal to change by a massive majority of programmers will force Microsoft to rethink its plans too. In effect they're betting they can't go wrong by under estimating people - making me wonder if there isn't an element of self-mockery in their choice of "Dunnington" as the code name for the next generation Xeon.

Of the four, Sun's strategy is both the most announced and most misunderstood. Like IBM, Sun has been doing multi-core for a number of years, but their software strength has consistently been where IBM has been weakest: in Unix style symmetrical multi-processing [SMP] and 64 bit binary compatibility from a single mid nineties UltraSPARC II at 200Mhz to today's 144 core, 1.4Ghz, SunFire 25K. At the software level Sun's strategy is to capitalize on this lead by pushing Solaris both inwards and outwards at the same time - inwards toward increased on chip functionality, and outwards to make network resources more and more easily available to local processes.

On the hardware side this strategy expresses itself in what Sun calls chip level multi-threading: a solution to the gap between memory access and CPU speeds based on automatically interleaving a number of processes (called threads if they share a namespace) on a processor while running the memory accesses needed to support this in parallel. Even the initial hardware, however, implements this at two levels. Thus the first Niagara CPUs will have eight cores on the chip assembly, each of which is capable of interleaving four threads. In effect, where IBM's cell is a grid on a chip, Sun's multi-core systems represent multi-threaded SMP on a chip.

At the most superficial level the biggest differences between Sun's strategy and those being pursued by Microsoft and IBM arise because Sun's new products are backwards compatible and theirs aren't. Thus a Sun customer who loads an existing SPARC binary on the new machine isn't likely to see a significant decrease in performance while an IBM or Microsoft customer would first need to re-compile and then accept a tremendous performance hit - perhaps as high as 50%, if the code isn't also re-written to use the new programming model.

From a Sun marketing perspective the ability to run old binaries without significant penalties is likely to prove extremely valuable to general business customers. It won't, however, matter much in entertainment, scientific processing, financial analysis, and operations research because those markets are dominated by highly technical people willing to revise code to get better performance. That, therefore, is where IBM will focus its initial Cell marketing - and, incidently, why Sun has been so close mouthed about floating point performance on CMT systems.

In other words:

• Intel's strategy is to bet on people not making the transition to a new programming model.

• It's not clear where Microsoft is going - but the smart money is that they'll try to have it both ways: letting Windows/XP run out the x86 string, and focusing on entertainment functions for the X360 while working to deliver the distributed OS ideas behind the original Longhorn vision in the X360 network environment before leveraging that back into the office as an NT/x86 replacement.

• IBM will use Linux on Cell throughout its product line and is clearly committed to developing tools that make existing Linux applications work well, working on advanced parallelization software for the science markets, and letting Sony drive work on the visualization side of the tools and applications business.

• Sun's public strategy is to build applications support on its open source heritage, integrate cheaper, faster, storage with their on chip SMP offerings, and push Solaris more and more into the direction of the Plan9 second generation Unix ideas: really the core Unix ideas migrated from a machine focus to a network focus -delivering user services from anywhere to anywhere.

Notice that of the four big hardware bets in play, only Sun's represents a programming model advancing the art of Unix -the others reflect varying degrees of cynicism about the customer. Thus Intel is basically betting it can slow or even stop significant change; Microsoft is betting on DRM and home entertainment to drive a change from the uni-process x86 model to the standard Unix threading model; and, IBM is fundamentally betting on taking over Linux to sell a dramatically more efficient way of implementing ideas only lawyers can readily distinguish from those underlying Sun's mid nineties open source GRID software.


Paul Murphy Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.

Editorial standards