80 isn't nearly enough

What an exciting week this has been. We unleashed the 'Era of Tera' by showcasing the world’s first programmable processor that can deliver Teraflops performance with remarkable energy efficiency.
Written by Justin Rattner, Contributor

What an exciting week this has been. We unleashed the ‘Era of Tera’ by showcasing the world’s first programmable processor that can deliver Teraflops performance with remarkable energy efficiency.

It’s rather extraordinary that after decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in a mainstream processors within the next ten years and quite possibly even less. It is therefore reasonable to ask the question: what are we going to do with this sudden abundance of processors?

The answer is somewhat obvious on the server side of things. More cores and more threads means more transactions per unit time, assuming that all those cores are given the necessary appropriate memory and I/O bandwidth. Other computationally intensive applications in scientific and engineering computing are also likely beneficiaries. I’m talking about seismic analysis, crash simulation, molecular modeling, genetic research, and fluid dynamics.

On the client end of the wire, things aren’t as obvious or straightforward, but they are no less interesting. The abundance of cores is likely to lead to a very different approach to resource allocation. For decades operating systems have been optimized for managing the very scarce processor resources, by cleverly multiplexing many tasks or threads across one or now two or four cores. As quality of service has become more important to users, we’ve all come to realize the limitations of this approach as frames get dropped from video streams or productivity applications pause while the video goes full tilt. A different approach, and one that probably hasn’t received enough attention from the research community, is to dedicate cores to providing particular functions. The allocations become more static than what we see today, but they can certainly be changed over longer periods of time ranging for seconds to hours or even days.

As an example, we could conceive of a multi-function computing appliance that contains a processor with perhaps three dozen cores: we might allocate four of those cores to running the core productivity and collaboration applications. Another cluster of cores, on the order of a dozen, might provide very high quality graphics and visualization. Media processing, beyond encode/decode which would best be handled by dedicated hardware, would be the responsibility of yet another cluster of, say six cores. Still other clusters might be do real-time data mining on various streams of data flowing in from the Internet. Various bots operating within this cluster might be assembling news, shopping, or investing. The key idea here is to let the abundant hardware resources replace a lot of very complex OS code. It’s replaced by cluster or partition management code, which doles out the resources, but stays out of the way until there’s a major shift in the workload.

TJGeezer suggested using Tera-Scale capability along with huge amounts of NAND in an iPOD size container for AI applications. He may be right. One can easily imagine clusters of cores supporting an advanced human interface with real-time speech and vision or language translation. A lot of algorithmic development would have to take place to make this feasible, but there is no doubt in my mind that we’ll have the hardware resources needed to host them. The statistical algorithms that will form the heart of these future recognition systems are highly parallel and thus a great fit for a high core count architecture.

An abundance of cores also enables new ways to deal with challenges associated with system operation in the face of device failures and cosmic radiation. Think of the collection of cores as a redundant array of computing engines (RACE). Two or more cores could be used in tandem to detect and correct faults. If a core becomes unreliable, it can simply be removed from service without significantly affecting overall system performance

As we pack more and more computing resources into smaller areas, managing power and heat in a very fine grain manner will be critical. If we have more cores than are needed to execute the desired set of workloads, we can swap threads between cores whenever one becomes too hot. It’s like the hot potato game – move the potato fast enough and you never get burned. We’ll need the ability to adjust supply voltages, operating frequencies, and sleep states of individual cores in matters of microseconds.

While the challenges are somewhat mind-boggling on both the hardware and software sides to develop and fully utilize these future Tera-Scale platforms, the benefits and opportunities from putting these computing capabilities into the hands of all users are equally incredible.

So how many cores could you use, and what would you use them for? ArsTechnica user dg65536 said it best in his post – “Now that I think about it...80 isn't nearly enough.”

Editorial standards