Snow Leopard geared for multicore future

Mac OS X 10.6 begins a longer-term Apple attempt to get ahead by cracking a problem facing the entire computer industry: squeezing useful work out of modern processors.
Written by Stephen Shankland, Contributor
Apple began shipping Snow Leopard on Friday, but the true importance of the Mac OS X update likely will emerge well afterward.

That's because Mac OS X 10.6 begins a longer-term Apple attempt to get ahead by cracking a problem facing the entire computer industry: squeezing useful work out of modern processors. Instead of stuffing Snow Leopard with immediately obvious new features, Apple is trying to adjust to the new reality in which processors can do many jobs simultaneously rather than one job fast.

See Also: Special Report: Snow Leopard

"We're trying to set a foundation for the future," said Wiley Hodges, director of Mac OS X marketing.

Apple shed some light on its project, called Grand Central Dispatch, at its Worldwide Developer Conference in June, but most real detail was shared only in with programmers sworn to secrecy. Now the company has begun talking more publicly about it and other deeper projects to take advantage of graphics chips and Intel's 64-bit processors.

The moves align Apple better with changes in computing. For years, chipmakers such as Intel and Advanced Micro Devices had steadily increased the clock rate of their processors, and programmers got accustomed to a performance boost with each new generation. But earlier this decade, problems derailed the gigahertz train.

First, chips often ended up merely twiddling their thumbs more because slower memory couldn't keep the chip fed with data. Worse, the chips required extraordinary amounts of power and produced corresponding amounts of hard-to-handle waste heat.

And so began the mainstream multicore era, in which processors got multiple computing engines called cores that work in parallel. That's great for some tasks that can be easily broken down into independent pieces, but programmers were accustomed to a more linear way of thinking where tasks execute in a series of sequential steps.

Enter Grand Central Dispatch, or GCD. This Snow Leopard component is designed to minimize many of the difficulties of parallel programming. It's easy to modify existing software to use GCD, Apple said, and the operating system handles complicated administrative chores so programmers don't have to.

Overall, Illuminata analyst Gordon Haff believes, the computing industry really is only beginning now to tackle parallel programming in earnest. If building mature parallel programming tools is a 10-chapter book, the industry is only at chapter two right now, he said. But with no other alternative, the book will be written.

"It has to happen," Haff said. "If you look at history of information technology, things that have to happen really do happen."

Burdensome threads
One way programmers have dealt with the arrival of multicore processors--and with the multiprocessor machines that preceded them--is through a concept called threads. There are various types, but generally speaking, a thread is an independent computing operation. For programmers to take advantage of multicore processor, they assign one thread to each core, and away they go, right?

Not so fast. Threads come with baggage. Each requires memory and time to start. Programs should be broken up into different numbers of threads depending on how many cores a processor offers. Programmers have to worry about "locking" issues, providing a mechanism to ensure one thread doesn't change data another thread is already using. And one threaded program might step on the toes of another running at the same time.

Some tools to ease the difficulties, such as Intel Threading Building Blocks, are available, but threads remain complicated.

"We looked at this and said it needs a fundamental rethink. We want to making developing applications for multicore easier," Hodges said. "We're moving responsibility for the management code into the operating system so application developers don't have to write and maintain it."

Blocking and tackling
The core mechanisms within GCD are blocks and queues. Programmers mark code chunks to convert them into blocks, then tells the application how to create the queue that governs how those blocks are actually run. Block execution can be tied to specific events--the arrival of network information, a change to a file, a mouse click.

Apple hopes programmers will like blocks' advantages: Older code can easily be retrofitted with blocks so programmers can try it without major re-engineering; they're lightweight and don't take up resources when they're not running; and they're flexible enough to encapsulate large or small parts of code.

"There's a lot of overhead around threading that means you want to break your program into as few pieces as possible. With Grand Central Dispatch, we say break your program into as many tiny pieces as you can conceive of," Hodges said.

Another difference with the Grand Central Dispatch approach is its centralization. The operating system worries about managing all applications' blocks rather than each application providing its own oversight. That central view means the operating system decides which tasks get which resources, Apple said, and that the system overall can become more responsive even when it's busy.

Other foundations
There's a second mechanism in Snow Leopard that gives a new way for programmers tap into hardware power: OpenCL, or Open Computing Language. It lets computers use graphics chips not just to accelerate graphics but also some ordinary computations.

To use OpenCL, programmers write modules of code in a variation of the C programming language called OpenCL C. Snow Leopard translates that code on the fly into instructions the graphics chip can understand and transfers necessary data into the graphics system memory. Many tasks won't benefit, but OpenCL is good for videogame physics simulation or artificial intelligence algorithms, technical computing chores, and multimedia operations.

The three major makers of graphics chips--Intel, Nvidia, and AMD's ATI--have endorsed OpenCL, and the Khronos Group has made it a standard. That means programmers are likely to be able to reuse their OpenCL code with Windows applications, too.

Graphics processors employ parallel engines that suit them for running the same processing chore on many data elements. For computers without a graphics chip, though, OpenCL also can employ that parallel execution strategy on ordinary multicore processors.

The 64-bit transition
Apple began its 64-bit transition years ago with the PowerPC processors it used before switching to Intel chips. With Snow Leopard, nearly the full suite of its software--Mail, Safari, Finder, iChat, iPhoto--become 64-bit programs.

Intel chips these days are 64-bit, but what does that get you over 32-bit chips? Briefly, it can let heavy-duty programs use more than 4GB of memory, improve performance by offering more chip memory slots called registers, and speed up some mathematical operations.

Moving to a 64-bit design doesn't guarantee instant speedup, though. In one developer document, Apple states: "Myth: My application will run much faster if it is a 'native' 64-bit application. Fact: Some 64-bit executables may run more slowly on 64-bit Intel and PowerPC architectures." One issue: the doubled length of references to memory addresses.

Apple encourages programmers to test their software to see if the 64-bit incarnation is faster. All Apple's own applications that moved to 64-bit versions are faster, the company said.

The 32-bit kernel
However, the core component of Mac OS X, the kernel, is still 32-bit software by default on consumer machines such as MacBooks and iMacs. Apple has written it so that applications can handle more than 4GB of memory, though, and the kernel can manage it all.

In its developer document on 64-bit performance, Apple states: "Myth: The kernel needs to be 64-bit in order to be fully optimized for 64-bit processors. Fact: The kernel does not generally need to directly address more than 4 GB of RAM at once."

Apple's 32-bit kernel hits limits with very large amounts of memory, though. "Thus, beginning in Snow Leopard, the kernel is moving to a 64-bit executable on hardware that supports such large memory configurations," its Xserve server line and Mac Pro workstations, the company said.

The tricky aspect of moving from a 32-bit kernel to 64-bit kernel is that drivers--software that let the operating system communicate with devices such as printers, video cards, and hard drives--must also be 64-bit. That's not so bad when it's a hardware device under Apple's control, but it's harder to move the full collection of third-party devices with their own drivers.

Apple argues it's not hard to make the jump, though. "As a driver developer, you must update your drivers with 64-bit binaries. Fortunately...many drivers 'just work' after changing the compile settings," the company said in a reference document.

This all may sound very low-level, but for programmers, Apple actually is working at a higher level than most. That could be an asset since many attempts to embrace parallel programming imposed more demands than most programmers were willing or able to handle.

And attracting programmers is key. Ultimately, Apple's deeper technology moves such as Grand Central Dispatch and OpenCL will be a success only if the company can get other developers to use them.

This article was originally posted on CNET News.

Editorial standards