Chipmaker Intel has been investigating the issue of scaling the number of cores in chips through its Terascale Computing Research Program, which has so far yielded two experimental chips of 80 and 48 cores.
In November, Intel engineer Timothy Mattson caused a stir at the Supercomputer 2010 Conference when he told the audience that one of the Terascale chips — the 48-core Single-chip Cloud Computer (SCC) — could theoretically scale to 1,000 cores.
Mattson, who is a principal engineer at Intel's Microprocessor Technology Laboratory, talked to ZDNet UK about the reasoning behind his views and why — while a 1,000-core chip isn't on Intel's roadmap — the path to creating such a processor is now is visible.
Q: What would it take to build a 1,000-core processor?
A: The challenge this presents to those of us in parallel computing at Intel is, if our fabs [fabrication department] could build a 1,000-core chip, do we have an architecture in hand that could scale that far? And if built, could that chip be effectively programmed?
The architecture used on the 48-core chip could indeed fit that bill. I say that since we don't have cache coherency overhead. Message-passing applications tend to scale at worst as the diameter of the network, which runs roughly as the square root of the number of nodes on the network. So I can say with confidence we could scale the architecture used on the SCC to 1,000 cores.
There is no theoretical limit to the number of cores you can use. It's more complicated than that.
But could we program it? Well, yes: as a cluster on a chip using a message-passing API. Is that message-passing approach something the broader market could accept? We have shared memory that is not cache-coherent between cores. Can we use that together with the message passing to make programming the chip acceptable to the general-purpose programmer?
If the answers to these questions are yes, then we have a path to 1,000 cores. But that assumption leads to a larger and much more difficult series of questions. Chief among these is whether we have usage models and a set of applications that would demand that many cores. We have groups working on answers to that question.
As I see it, my job is to understand how to scale out as far as our fabs will allow and to build a programming environment that will make these devices effective. I leave it to others in our applications research groups and our product groups to decide what number and combination of cores makes the most sense. In a sense, my job is to stay ahead of the curve.
Is there a kind of threshold of cores beyond which it is too difficult to program to get maximum use of them? What is it — 100, 400?
There is no theoretical limit to the number of cores you can use. It's more complicated than that. It depends on, one, how much of the program can be parallelised and, two, how much overhead and load-imbalance your program incurs. We talk about this in terms of Amdahl's law.
This law says that we can break down a program into a part the speeds up with cores — the parallel fraction — and a part that doesn't — the serial fraction. If S is the serial fraction, you can easily prove with just a bit of algebra that...
... the best speedup you can get, regardless of the number of cores, is 1/S. So the limit on how many cores I can use depends on the application and how much of it I can express in parallel.
The 48-core SCC processor could theoretically scale to 1,000 cores, according to Mattson. Credit: Intel
It turns out that getting S below one percent can be very hard. For algorithms with huge amounts of "embarrassingly parallel" operations, such as graphics, this can be straightforward. For more complex applications, it can be prohibitively difficult.
Would Intel ever want to scale up a 1,000-core processor?
That depends on finding applications that scale to 1,000 cores, usage modes that would demand them, and a market willing to buy them. We are looking very hard at a range of applications that may indeed require that many cores.
For example, if a computer takes input from natural language plus visual cues such as gestures, and presents results in a visual form synthesised from complex 3D models, we could easily consume 1,000 cores.
Speaking from a technical perspective, I can easily see us using 1,000 cores. The issue, however, is really one of product strategy and market demands. As I said earlier, in the research world where I work, my job is to stay ahead of the curve so our product groups can take the best products to the market, optimised for usage models demanded by consumers.
Would the process of fabricating 1,000 cores present problems in itself?
I came up with that 1,000 number by playing a Moore's Law doubling game. If the integration capacity doubles with each generation and a generation is nominally two years, then in four or five doublings from today's 48 cores, we're at 1,000. So this is really a question of how long do we think our fabs can keep up with Moore's Law. If I've learned anything in my 17-plus years at Intel, it's never bet against our fabs.
Our product roadmap takes our "what is possible?" output and figures out "what does the market demand?".
Why is the 48-core processor not in Intel's product roadmap?
I need to be very clear about the role of the team creating this chip. Our job is to push the envelope and develop the answer to the question: "what is possible?". This is a full-time job. Our product roadmap takes our "what is possible?" output and figures out "what does the market demand?". That is also a full-time job.
Intel's product roadmap will reflect what our people in the product groups figure will be demanded by the market in the future. That may or may not look like the 48-core SCC processor. I have no idea.