Larrabee, we hardly knew ye

Larrabee, we hardly knew ye

Summary: Intel has announced that Larrabee, the multi-core AMD and NVidia GPU killer, has been canceled. While Larrabee research lives on and we might see it re-emerge in some form in the future, it won't be any time soon.


Intel has announced that Larrabee, the multi-core AMD and NVidia GPU killer, has been canceled. While Larrabee research lives on and we might see it re-emerge in some form in the future, it won't be any time soon.

Larrabee was appealing to developers because it used a standard x86 instruction set plus some additional vector processing units and a conventional memory architecture. In theory, this should be easier to program than CUDA or Cell because of its familiarity to programmers, generous memory sizes, and fast pathways to get data in and out of the chip.

Another project at Intel, called the Single chip Cloud Computer (SCC), is similar to Larrabee in that it uses an array of simple x86 processors. However SCC uses a mesh router network instead of Larrabee's ring, and appears to be targeted towards running cluster applications (currently coded with MPI and run on commodity Linux servers) instead of SIMD applications (currently coded with threads or chip-specific APIs like CUDA).

Intel would like developers to use architecture-neutral languages like Ct in order to exploit the vectorization capabilities in current Intel chips and get ready for many-core hardware in the future. Ct is not yet in public beta, though there was a similar product available from RapidMind before Intel bought them and folded them into Ct. Ct takes an algorithm written in C++ templates, translates it to an intermediate format, and compiles that format at run time to best use whatever hardware you have. It works best, of course, on Intel hardware.

Another option for developers is OpenCL, a GPGPU extension of OpenGL. Compatible drivers are only just beginning to appear for ATI and NVidia hardware. It will run code on the GPU, the CPU, or a mixture of CPU and GPU, depending on your hardware.

Of all the options currently available for multi-core programming, the one that appeals to me the most right now is Grand Central Dispatch. It uses nice, clean extensions to the C and Objective-C languages plus some operating system support for system-wide lightweight thread scheduling. GCD has been ported to MacOS X and FreeBSD. Until it is available on Linux and Windows, however, it doesn't have any hope of becoming mainstream.

Topics: Operating Systems, Hardware, Intel, Linux, Networking, Open Source, Processors, Software, Software Development

Ed Burnette

About Ed Burnette

Ed Burnette is a software industry veteran with more than 25 years of experience as a programmer, author, and speaker. He has written numerous technical articles and books, most recently "Hello, Android: Introducing Google's Mobile Development Platform" from the Pragmatic Programmers.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • For anybody thinking this was "The Next Big Thing"(TM)

    And never thought it would be dead in the water, all I have to say is:

    "told you so!"


    Long live AMD!
    • If you.....

      "never thought it would be dead in the water", are you sure you really said that? ;-)
      • I did

        I always thought that this would be a big disaster. Intel doesn't get graphics - their existing integrated GPU's should be proof of that. For them to offer some kind of x86-compatible GPU is just silly, since x86 just doesn't have the FPU and streaming process optimizations necessary to make it a half-decent platform for graphics, and both AMD and NVIDIA are ROFL'ing at them, and for good reason.
        • Yes, but

          since Intel will probably not get permission to acquire Nvidia (if it were for sale) what are Intel's options going forward, with the blurring of the CPU/GPU line and the integration of graphic functions on the CPU die. Will they not have to do something?
          • The only option for Intel

            is to rethink their strategy.

            They need to get back into talks with NVIDIA to get them back to manufacturing core logic chipsets with integrated GPU's for Intel's processors. That's the only hope for them.

            For now, AMD is in a MUCH better position to take advantage of the market with their cash infusion from NVIDIA's patent infringement settlement, if not for Intel's delays caused by waffling with NVIDIA.

            Intel's dispute with NVIDIA is causing both of them to slide. AMD will come out ahead in this game. They already have a better product in the mainstream and budget segments. The Core 2 chips with Intel GMA or the grossly under-available NVIDIA mainstream GeForce chipsets that are still floating about can't cut it against the value proposition of AMD using their own superior core logic chipsets with ATI GPU technology.

            ATI only needs to work on power efficiency. They already have DX11 cards for the enthusiast market. Bringing CPU's down to the power efficiency level of Core 2 CPU's is just another fab step away for them, and they aren't missing much by not having products with comparable performance to the Core iX line, since a full family of Core iX's isn't due until 2H10, and the enthusiast market (to which the i5's and i7's occupy) starting to dry up.
  • Care to elaborate....

    on the relevance of Grand Central Dispatch to current and near future GPU's from AMD/Nvidia? Will they be of any value in boosting performance?
    • GCD won't help

      GCD is just a new thread scheduler for the Mac, because Mach has a pretty weak auto-scheduler compared to Windows (which balances single-threaded applications very well between multiple cores).

      GCD isn't designed to work with GPU's. What it does is give the developer new API's on the Mac to schedule threads through different CPU cores. Nothing special actually.

      OpenCL is the library set that contains the GPU functions. So far, there is no connection between GCD and OpenCL. OpenCL also doesn't contain a lot of options for controlling data streams through the GPU. Sure, you can stream a lot of instructions in sequence very fast, but you can't easily break up the streams into logical threads like you can with CPU instructions. GPU controllers and drivers handle most of that work automatically (like Windows does with single-threaded apps). It's apples and oranges.

      FYI: When Apple announced "Grand Central" (original name), they had said it was going to be a unifying API that supports GPU's and CPU's in mixed varieties, allowing for "nearly unlimited processing bandwidth" in a computer system. They have since changed their tune. Ever since the name change, they have removed all mention of GPU support in GCD.
      • Thx * 2 (nt)

        • No problem!

          Some of the happy happy joy joy stuff that Apple is talking about in Snow Leopard comes off as new, miraculous technologies. The problem is, it's already been started. Intel is working with developers to maximize multi-threaded applications, while NVIDIA wants devs using CUDA directly, and exclusively. ATI, meanwhile, has their own API set called Stream, which doesn't have as much attraction, but that will mean less and less over time.

          The big benefit from OpenCL is that it will unify API coding under a single umbrella that's cross-compatible with different GPU native API's. If you know your history, take a look back at 3dfx. They developed their own hardware-dependent GLide API. When developers started off, they coded directly for GLide (often in DOS). When Direct3D started to mature, they focussed on that, because they could code once, and run anywhere (read: on any GPU hardware). The 3dfx Windows drivers just translated Direct3D calls into GLide calls, because the cards understood GLide a whole lot easier than Direct3D at the time. NVIDIA also optimized their early Riva and GeForce 1 and 2 cards towards OpenGL because they thought the cross-platform stuff would supercede DirectX. ATI was smarter. Although their original Radeon cards were pretty pathetic, they heavily optimized the GPU for DirectX calls, and it paid off. Now they have the fastest GPU's on the market, and keep NVIDIA on their toes.

          Anyway, OpenCL is like OpenGL. It's a cross-platform, hardware-independent API. Except that Microsoft has already talked up DirectX11 Compute Shader, which basically does the same thing, but works better with Direct3D-compliant hardware (and Visual Studio makes it easier to code for). So aside from Mac's and Linux, and maybe a few engineering apps on Windows that might use it, OpenCL will be as small a player as OpenGL.

          Now, remember I said about ATI having DX11 GPU's already? Look at it this way: they already have the best (and only) DX11 Compute Shader platform presently shipping, and they're using ATI Stream as the native API, meaning that developers can start making those hardware-independent GPGPU apps on Windows NOW. It's gotta feel pretty bad to be NVIDIA (or a CUDA-targeting developer) right now.

          How's that for a market position?
      • @ Joe_Raby

        Can you actually provide a benchmark of the two
        schedulers? How do you know it's weaker?

        As far as I understand Windows does not balance single
        threaded applications between cores at all. Direct
        your attention to the 4 benchmarks on the following

        The top 2 are single-threaded, performance is
        dependent on clock speed. The bottom 2 multi-threaded
        and performance is dependent on number of cores.

        I would also like to point out that Windows games are
        the only GPU accelerated apps that use DirectX every
        thing else uses OpenGL. OpenCL is a big deal.
        • Your understanding is wrong, actually

          "As far as I understand Windows does not balance single threaded applications between cores at all."

          Windows will balance single-threaded applications between cores automatically. Threads are devoted to the core that has more idle time. Mach doesn't do that unless the application is coded specifically for that, otherwise it uses the first core. You can easily test this with a few single-threaded applications, and watch your CPU graphs. Microsoft's documentation is much more thorough though.

          Also, your link is broken.

          "I would also like to point out that Windows games are the only GPU accelerated apps that use DirectX every thing else uses OpenGL."

          Wrong again. All of Autodesks current versions of Autocad support Direct3D, and on NVIDIA hardware, the recommended option for most applications is to use Direct3D because it offers more options than OpenGL. In fact, modelling in Autocad 2010 requires a Direct3D workstation-capable graphics card, because OpenGL isn't even supported.

          Read the top 2 lines (second one is more important) here:

          • not very convincing

            I know mach doesn't split threads, thats why they made GCD. You still offer no evidence that the NT kernel can and I'm pretty sure it can't.

            Your right I shouldn't of said only. I left out autoCAD because I thought Microsoft owned Autiodesk. I may be wrong about that too but they're very tightly coupled and have been for some time. Most of their competitors however use OpenGL.

            I know DirectX/3D is always expanding but I think it has more to do with Microsoft's deep pockets than merit. Unless you can at least provide a good reference link to convince me otherwise.

            Sorry about the broken link, copy/paste It'll work.
          • Not what I said

            "I know mach doesn't split threads, thats why they made GCD. You still offer no evidence that the NT kernel can and I'm pretty sure it can't."

            What I said is that NT *balances* threads. If you launch a single-threaded application, the NT kernel uses the core with the most idle time. Mach doesn't do that automatically, and it's stupid that it doesn't.

            GCD fixes that. It was a fix because Apple didn't have a proper thread scheduler for managing multiple threads (whether multiple single-threaded applications, or a single multi-threaded application). Microsoft doesn't need a new threading API because the functionality is already there.

            "Most of their competitors however use OpenGL."

            And how often do you hear about CAD competitors to Autodesk?
          • Ok the link doesn't work

            Here it is.


            Here's another one I stumbled upon looking for the previous link. The last benchmank is the only multi-threaded one.


            In these two the pattens gets clear. Changes in architecture are the only things that cause deviation from the cores vs. clock theory.

            single threaded:


            multi threaded:


            Even if your right Windows doesn't do a very good job.
          • You don't get it

            Those programs don't split the workload up into equal sized threads. What happens in most multi-threaded video applications is that each thread is a different process altogether. One for video decoding of the source, one for video re-encoding, one for audio decoding, and one for audio re-encoding, sometimes not even that many processes (many video encoding apps don't do audio at the same time as video). Each one of those tasks will be using a different amount of processing power. Windows balances those threads by itself, whereas Mach doesn't unless the application is specifically coded to micro-manage threads with custom code. What Apple is doing, is finally bringing Windows coding simplicities to OS X using GCD API's.

            As of right now, there are no feasible ways of splitting up threads into "micro-threads" to have them be further balanced out to multiple cores. If you can do that with existing code, that it isn't very good code to begin with. Micro-dividing code bits into even smaller threads means additional overhead for managing threads.

            It's like this: If you have a video file that has a high bitrate, it means it's not that well compressed. That's great and all because you have high quality, bit the stored bits are also hard to stream to a processor and output unless you have a fat pipe going to the processor, otherwise you end up with stuttered output. In comparison, if you have a highly-compressed video using H.264 compression and a decent enough bandwidth so as not to lose any visual detail, the storage bits are smaller. Again, that's fine and all, but you need to have a fast processor to decode the video, otherwise you end up with stuttered output. Micro-managing threads is like that - when you categorize and segment everything down into very small details, what you end up with is more and more groups that are more difficult to manage. Video encoding applications usually don't segment each logical task down into further segments because today's processors don't have enough cores, and thread management technologies aren't at that level yet anyway.

            What I'm ultimately saying is that your example is bad because you don't seem to understand how programs are divided down into individual threads. It's very rarely 50:50 (for dual-core) or 25:25:25:25 (for quad-core).
          • Changing Your story Now?

            You said: "Mach has a pretty weak auto-scheduler compared to Windows (which balances single-threaded applications very well between multiple cores)."
            Which is why I took issue with it.

            FreeBSD's Mach (what OS X uses) has had multiprocessor support since the late 90's. Your just wrong.

            Regarding the last paragraph. Don't argue with me argue with tom or anand. Their benchmarks have shown this consistently. But I bet you never even looked.
          • What do the benchmarks show exactly?

            You list benchmarks that compare multithreading to singlethreading of video encoders - all on Windows. There is no comparison to any other OS, nor is it in balanced thread splits.

            Audio processings takes a lot less processing bandwidth than video. Video isn't broken up into multiple threads except in some programs that are using effect filters in some cases, which these weren't. These were simple transcoding jobs. The audio is being handled in a separate data stream in a different thread. It isn't logical to break up video streams into individual blocks unless you can designate frames into multiple data streams, and video cards are better at that (most do that semi-automatically anyway) than any CPU, so most software vendors won't bother unless it is a GPGPU app.
      • @ Joe_Raby

        Can you provide any benchmark between the 2
        schedulers? As far as I know Windows does not balance
        single threaded applications among cores at all. Can
        you substantiate that claim?

        Direct you browser to the 4 benchmarks linked:

        The top two are single threaded and performance
        dependent on clock speed. The bottom two are multi
        threaded and performance dependent on number of cores.

        I would like to point out that Windows games are, for
        the most part the only applications using DirectX,
        everything else uses OpenGL.

        You seem knowledgeable, but possibly just enough to
        spread some disinformation.
      • More than a thread scheduler

        GCD is not just a new thread scheduler for the Mac.

        It's more than a thread scheduler because it includes C language extensions (similar to Cilk) that make multi-core programming more approachable for the developer. The language extensions are supported by the CLang compiler which is part of LLVM. However I have hope they can be standardized.

        It's not just for the Mac because it's open source (Apache license) and can be ported to any system.

        And while the current GCD implementations don't support GPUs, the potential is there for either GPU, or more generally hybrid heterogeneous (big core/little core) computing in the future.
        Ed Burnette
        • Apple changed their tune when they changed the name

          Originally they had said it was going to be for GPU's, but at the same time they changed the name from "Grand Central" to "Grand Central Dispatch" from the Snow Leopard pages, they also removed any reference to GPU support which was previously listed (the terminology was that it could run on any processor core in the system from CPU to GPU). Why? Were they trying to bite off more than they could chew at that time?