Chipmaker Intel has been investigating the issue of scaling the number of cores in chips through its Terascale Computing Research Program, which has so far yielded two experimental chips of 80 and 48 cores.
In November, Intel engineer Timothy Mattson caused a stir at the Supercomputer 2010 Conference when he told the audience that one of the Terascale chips — the 48-core Single-chip Cloud Computer (SCC) — could theoretically scale to 1,000 cores.
Mattson, who is a principal engineer at Intel's Microprocessor Technology Laboratory, talked to ZDNet UK about the reasoning behind his views and why — while a 1,000-core chip isn't on Intel's roadmap — the path to creating such a processor is now is visible.
Q: What would it take to build a 1,000-core processor?
A: The challenge this presents to those of us in parallel computing at Intel is, if our fabs [fabrication department] could build a 1,000-core chip, do we have an architecture in hand that could scale that far? And if built, could that chip be effectively programmed?
The architecture used on the 48-core chip could indeed fit that bill. I say that since we don't have cache coherency overhead. Message-passing applications tend to scale at worst as the diameter of the network, which runs roughly as the square root of the number of nodes on the network. So I can say with confidence we could scale the architecture used on the SCC to 1,000 cores.
There is no theoretical limit to the number of cores you can use. It's more complicated than that.
But could we program it? Well, yes: as a cluster on a chip using a message-passing API. Is that message-passing approach something the broader market could accept? We have shared memory that is not cache-coherent between cores. Can we use that together with the message passing to make programming the chip acceptable to the general-purpose programmer?
If the answers to these questions are yes, then we have a path to 1,000 cores. But that assumption leads to a larger and much more difficult series of questions. Chief among these is whether we have usage models and a set of applications that would demand that many cores. We have groups working on answers to that question.
As I see it, my job is to understand how to scale out as far as our fabs will allow and to build a programming environment that will make these devices effective. I leave it to others in our applications research groups and our product groups to decide what number and combination of cores makes the most sense. In a sense, my job is to stay ahead of the curve.
Is there a kind of threshold of cores beyond which it is too difficult to program to get maximum use of them? What is it — 100, 400?
There is no theoretical limit to the number of cores you can use. It's more complicated than that. It depends on, one, how much of the program can be parallelised and, two, how much overhead and load-imbalance your program incurs. We talk about this in terms of Amdahl's law.
This law says that we can break down a program into a part the speeds up with cores — the parallel fraction — and a part that doesn't — the serial fraction. If S is the serial fraction, you can easily prove with just a bit of algebra that...