IDF 2011: Intel makes the case for more cores

IDF 2011: Intel makes the case for more cores

Summary: On the final day of the Intel Developer Forum, CTO Justin Rattner made the case for more powerful PCs and servers with tens or even hundreds of processing cores.

SHARE:
18

On the final day of the Intel Developer Forum this week, CTO Justin Rattner made the case for more powerful computers, or more specifically for many more cores. During his keynote Intel demonstrated a range of interesting applications-both consumer and business-that harness the power of multi-core and many-core PCs and servers. And he sought to debunk the widespread belief that you need to be "some kind of freak to actually program these things."

(You can view Webcasts of the IDF keynotes here.)

Five years ago at IDF, Rattner introduced the Core microarchitecture and the shift to using more cores running at lower speeds, rather than one or two very fast cores, to improve performance and power-efficiency. At the time, he said, no one could have imagined that a few years later we'd be talking about processors with tens or even hundreds of cores. This includes not only CPU core (what Intel refers to as IA cores) but other specialized cores such as graphics processing units (GPUs) and accelerators-a concept known as heterogeneous computing.

Intel's main product in this emerging segment is the Knights family of processors based on the company's Many Integrated Cores (MIC) architecture. Some customers are already testing Knights Ferry, a development chip, and reporting that can port existing multi-core applications to the MIC architecture and realize good speed-ups, according to Rattner. Based on these results, Intel will "soon" launch Knights Corner, a processor with more than 50 cores manufactured using the company's most advanced 22nm process. As part of a separate Tera-scale Computing Research Program, Intel Labs recently announced a prototype Single-Chip Cloud Computer (SCC) with 48 cores designed for scale-out cloud applications. Finally Intel has been busy creating better tools to address the challenge of programming these many-core processors, he said.

Rattner showed results of a series of application tests on systems ranging from one to 64 cores. The tests were not limited to traditional High-Performance Computing (HPC) applications-which often lend themselves to systems with many cores and threads-but also included business and consumer applications such as home video editing. The results looked very good (though the devil is always in the details with benchmarks), in some cases showing speed-ups that were close to linear. In other words, doubling the cores nearly doubles performance. "This has given us a lot of confidence that people are going to be able to put this architecture to work," Rattner said.

Andrzej Nowak from the CERN openlab, a collaboration with companies such as Intel to develop computer technology for the Large Hadron Collider, talked about the group's many-core efforts. The massive collider generates 40 million particle collisions per minute producing 15 to 25 petabytes of data per year (a petabyte is equal to 1,000 terabytes). To analyze all of this data, the openlab uses software that consists of millions of lines of code and 250,000 Intel cores distributed across hundreds of data centers. Nowak said the fact the same programming tools from Xeon server processors also work on the MIC architecture makes it easier to port this software. Because the workload is "heavily-vectorized and highly-threaded," it scales almost linearly with the number of cores and threads. "We will take any amount of cores you can throw at us," Nowak said.

To prove that many-core can work on both the server and clients, Rattner highlighted a series of real-world applications. Noting that many Web applications were really a collection of databases accessed by many users concurrently, Rattner said traditional servers were not designed for these sorts of workloads. He demonstrated how a different type of server, with a 48-core processor and in-memory database, could address this problem by handling about 800,000 transactions per second. Similarly, on the client side, Brendan Eich, the CTO of Mozilla and inventor of JavaScript, said that when he created the scripting language "in 10 days in May 1995" it was not designed for parallel applications. Intel Labs announced Parallel Extensions for JavaScript, code-named River Trail, which leverages multi-core and many-core to speed-up JavaScript applications. In the demo, a 3D Nbody Simulation in Firefox ran at 3 frames per second on a single processor and at 45 frames per second using all of the cores. Intel said these extensions will enable a new class of browser-based apps in areas such as photo and video editing, physics simulation, and 3D gaming.

One of the more intriguing demos was an LTE wireless base station, developed as part of a project with China Mobile, which uses standard PC parts including a second-generation Core i7 processor. Rattner said Intel will be doing field trials with China Mobile and other partners next year, adding that that it will try a similar approach with routers and switches. Communications and networking gear generally uses programmable logic devices or specialized ASICs, but Intel believes that it can match the performance with off-the-shelf multi-core CPUs. In the final demo, Intel showed how a PC can use facial recognition to decrypt and display only the correct images from a photo album on the fly. This demonstration used both the IA cores and on-die graphics in Sandy Bridge.

"I hope at this point there is no question in your mind that the time is now-- if you haven't already started--to build multi-core or many-core applications and you don't need to be a ninja programmer to do it," Rattner said.

If this isn't ambitious enough, Intel has an even bigger goal in mind: an exascale computer by 2018. An exaflop is one quintillian (10^18) floating-point operations per second. To put that in perspective, Nvidia's Tesla C2070 GPU is capable of 515 gigaflops, or billions of operations per second. The world's fastest supercomputer the K Computer at the RIKEN Advanced Institute for Computational Science in Kobe, Japan, is capable of 8 petaflops, or 8 quadrillion (10^15) floating-point operations per second.

The real challenge here, though, is power. Today's petascale supercomputers already use seven to 10 megawatts, so simply scaling them up isn't an option. An exascale computer would require several nuclear power stations to supply its six gigawatts of power. The practical limit for a datacenter is around 20 megawatts, which means we will need a 300x reduction in total system power to build an exascale computer. Intel's Shekhar Borkar is leading the company's effort to develop a prototype system by 2018 as part of the DARPA-funded Ubiquitous High Performance Computing project. Three other organizations, Nvidia, MIT and Sandia National Laboratory, are also developing prototype "ExtremeScale" supercomputers.

One way to reduce system power is to make the CPU more efficient. To illustrate this, Rattner demonstrated an experimental Pentium-class chip, code-named Claremont, which is capable of operating close to the threshold voltage of the transistors-the power required to switch a transistor on and off. CEO Paul Otellini had already given a quick preview of this chip running Windows earlier this week, but Rattner showed it running Linux and offered more details. Because Claremont operates within a couple hundred milliwatts of the threshold voltage, it sips power and can be run entirely from a solar cell about the size of a postage stamp. Intel got a 5x reduction in power using the older Pentium core, but it could achieve an 8x reduction using a newer core, Borkar said. Intel also showed Claremont's "wide dynamic range," meaning its ability to boost the frequency up to ten times to handle more intensive tasks, by running a Quake demo.

Rattner also talked about the Hybrid Memory Cube, a concept developed by Micron that consists of a stack of DRAM chips in a compact cube with an efficient, high-performance controller and interface. Intel said the HMC is capable of nearly 1Tbps of throughput yet it uses seven times less power than today's DDR3 DRAM. Stacked memory is difficult to manufacture, and therefore still relatively expensive, but the HMC seems like a promising concept for networking equipment and servers.

We're at a significant point in time where technology is no longer the limiting factor," Rattner concluded.

Topics: CXO, Hardware, Intel, Processors

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

18 comments
Log in or register to join the discussion
  • Intel is late; IBM and Sony already promisted that with Cell architecture,

    ... and AMD already produces chips with like 48 cores for years.
    DDERSSS
    • I don't think so. Reference?

      @DeRSSS They have a 12 core I think. Actually Intel had a prototype with > 80 cores.
      DevGuy_z
  • RE: IDF 2011: Intel makes the case for more cores

    Now if only programs could leverage more cores, but I suppose when we get them on our desktops they will begin to be used/useful with time....chicken and egg thing :)
    rogerdpack2
    • I utilize multi-cores.

      @rogerdpack2 You would be surprised at how many applications leverage MC. Visual Studio will compile projects in parallel for example. I had a 800K+ lines of code go down from 20 minutes to 3 minutes on an i7. Parallel compilation was to blame as the single core actually had a higher clock speed and yet was almost 7x slower even though the disks were had comparable speed..
      DevGuy_z
    • RE: IDF 2011: Intel makes the case for more cores

      @rogerdpack2 Take a look at what's happening on the GPU end of things :). GPUs already put today's CPUs to shame with massive parallelism.
      CobraA1
  • Great article

    I only am skeptical about the "soon" part of the equation. I've been using multicores for years, 50 would be just swell.
    lass23
  • RE: IDF 2011: Intel makes the case for more cores

    The problem for more cores is that so many applications will never be able to use multiple cores. Years ago Dave Kuck who is one of the experts on parallelism pointed out that if you had infinite parallel hardware and your program was 50% parallizable, then with infinite hardware you could double the speed. The kicker was that he pointed out for most applications that 50% parallizable was not achievable!
    oldsysprog
    • Many applications utilize multi-core.

      @oldsysprog But that's not all, you aren't limited to the behavior of a single application. My development tool compiles code in parallel. And I may have multiple things going on simultaneously. Compiling, doing a simultaneous checkout. When I look at the performance graphs I see all cores involved.
      DevGuy_z
      • Seriously, You have no clue

        @DevGuy_z Do you really have no clue of the difference between what is parallel compilation and what is parallel architecture?

        One means that the compiler will COMPILE multiple files in parallel processes, the other means that your application was DESIGNED and IMPLEMENTED to run multiple processes in multiple cores at the same time.

        There is a HUGE difference between between a compiler using parallel compilation and an application supporting parallel processing. If the application wasn't WRITTEN to take advantage of multi-cores, the compiler WILL NOT magically add parallel processing to it.
        wackoae
  • RE: IDF 2011: Intel makes the case for more cores

    ARM computing for servers. Small, energy efficient, doubling processing power every year. You could fit thousands of next gen ARM processors in these servers and they could probably take less power than your Gaming PC.
    Bakabaka
    • ARM for servers?? I guess you have no idea of what a server does

      @Bakabaka ARM CPUs are relegated to simple devices for a reason, not just because they are less power intensive than other platforms.

      If they are not powerful enough to be in a simple, cheap desktop, what makes you believe that they are powerful enough to run on a server?
      wackoae
  • NONSENSE!

    If multiple small cores is the way to go ... then why have both Apple and M$ abandoned Flash in their tablets ...
    ... because multiple small cores aren't powerful enough for consumer computing.

    The architecture and/or programming just isn't there yet.

    This post is little better than an INTEL ad.
    jacksonjohn
    • RE: IDF 2011: Intel makes the case for more cores

      @johnfenjackson@... <br><br>I don't know if you know this, but Flash (actually, the ActionScript language used by Flash) doesn't support the multithreading needed for multiple cores, but other languages do.<br><br>Maybe they're abandoning Flash because it can't use multiple cores.
      CobraA1
    • RE: IDF 2011: Intel makes the case for more cores

      @johnfenjackson@...

      My responce is that hardware has always lead software and it takes time for software to catch up. This will be no different.
      NoAxToGrind
  • Why is this article news?

    Intel IS late to the game here. Consider their white paper that was meant to challenge the concept that GPU compute is 100X faster (http://www.cs.utexas.edu/users/ckkim/papers/isca10_ckkim.pdf). They claim that GPGPU computing is only about 14x as fast and 2.5x on average. NVIDIA responded to the whitepaper (http://www.cs.utexas.edu/users/ckkim/papers/isca10_ckkim.pdf) with proofs that in certain applications GPU's are up to 300X faster.

    "...more cores running at lower speeds..." This sounds exactly like a GPU.
    mkpelletier@...
  • RE: IDF 2011: Intel makes the case for more cores

    Amazing stuff,
    lawrenceweeks
  • RE: IDF 2011: Intel makes the case for more cores

    This is unbelievable,I just got a $829.99 for only $103.37 and my mom got a $1499.99 HDTV for only $251.92, they are both coming with USPS tomorrow. I would be an idiot to ever pay full retail prices at places like Walmart or Bestbuy. I sold a 37" HDTV to my boss for $600 that I only paid $78.24 for. I use http://alturl.com/9yhe4
    lawrenceweeks
  • Swap sheer speed for thousands of cores

    Skin is a super slow, massively parallel computer.

    Much easier to ensure processes have enough power.

    Turn off cores completely when not required - saves more power.
    Patanjali