Larrabee, we hardly knew ye
Summary: Intel has announced that Larrabee, the multi-core AMD and NVidia GPU killer, has been canceled. While Larrabee research lives on and we might see it re-emerge in some form in the future, it won't be any time soon.
Intel has announced that Larrabee, the multi-core AMD and NVidia GPU killer, has been canceled. While Larrabee research lives on and we might see it re-emerge in some form in the future, it won't be any time soon.
Larrabee was appealing to developers because it used a standard x86 instruction set plus some additional vector processing units and a conventional memory architecture. In theory, this should be easier to program than CUDA or Cell because of its familiarity to programmers, generous memory sizes, and fast pathways to get data in and out of the chip.
Another project at Intel, called the Single chip Cloud Computer (SCC), is similar to Larrabee in that it uses an array of simple x86 processors. However SCC uses a mesh router network instead of Larrabee's ring, and appears to be targeted towards running cluster applications (currently coded with MPI and run on commodity Linux servers) instead of SIMD applications (currently coded with threads or chip-specific APIs like CUDA).
Intel would like developers to use architecture-neutral languages like Ct in order to exploit the vectorization capabilities in current Intel chips and get ready for many-core hardware in the future. Ct is not yet in public beta, though there was a similar product available from RapidMind before Intel bought them and folded them into Ct. Ct takes an algorithm written in C++ templates, translates it to an intermediate format, and compiles that format at run time to best use whatever hardware you have. It works best, of course, on Intel hardware.
Another option for developers is OpenCL, a GPGPU extension of OpenGL. Compatible drivers are only just beginning to appear for ATI and NVidia hardware. It will run code on the GPU, the CPU, or a mixture of CPU and GPU, depending on your hardware.
Of all the options currently available for multi-core programming, the one that appeals to me the most right now is Grand Central Dispatch. It uses nice, clean extensions to the C and Objective-C languages plus some operating system support for system-wide lightweight thread scheduling. GCD has been ported to MacOS X and FreeBSD. Until it is available on Linux and Windows, however, it doesn't have any hope of becoming mainstream.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
For anybody thinking this was "The Next Big Thing"(TM)
"told you so!"
:p
Long live AMD!
If you.....
I did
Yes, but
The only option for Intel
They need to get back into talks with NVIDIA to get them back to manufacturing core logic chipsets with integrated GPU's for Intel's processors. That's the only hope for them.
For now, AMD is in a MUCH better position to take advantage of the market with their cash infusion from NVIDIA's patent infringement settlement, if not for Intel's delays caused by waffling with NVIDIA.
Intel's dispute with NVIDIA is causing both of them to slide. AMD will come out ahead in this game. They already have a better product in the mainstream and budget segments. The Core 2 chips with Intel GMA or the grossly under-available NVIDIA mainstream GeForce chipsets that are still floating about can't cut it against the value proposition of AMD using their own superior core logic chipsets with ATI GPU technology.
ATI only needs to work on power efficiency. They already have DX11 cards for the enthusiast market. Bringing CPU's down to the power efficiency level of Core 2 CPU's is just another fab step away for them, and they aren't missing much by not having products with comparable performance to the Core iX line, since a full family of Core iX's isn't due until 2H10, and the enthusiast market (to which the i5's and i7's occupy) starting to dry up.
Care to elaborate....
GCD won't help
GCD isn't designed to work with GPU's. What it does is give the developer new API's on the Mac to schedule threads through different CPU cores. Nothing special actually.
OpenCL is the library set that contains the GPU functions. So far, there is no connection between GCD and OpenCL. OpenCL also doesn't contain a lot of options for controlling data streams through the GPU. Sure, you can stream a lot of instructions in sequence very fast, but you can't easily break up the streams into logical threads like you can with CPU instructions. GPU controllers and drivers handle most of that work automatically (like Windows does with single-threaded apps). It's apples and oranges.
FYI: When Apple announced "Grand Central" (original name), they had said it was going to be a unifying API that supports GPU's and CPU's in mixed varieties, allowing for "nearly unlimited processing bandwidth" in a computer system. They have since changed their tune. Ever since the name change, they have removed all mention of GPU support in GCD.
Thx * 2 (nt)
No problem!
The big benefit from OpenCL is that it will unify API coding under a single umbrella that's cross-compatible with different GPU native API's. If you know your history, take a look back at 3dfx. They developed their own hardware-dependent GLide API. When developers started off, they coded directly for GLide (often in DOS). When Direct3D started to mature, they focussed on that, because they could code once, and run anywhere (read: on any GPU hardware). The 3dfx Windows drivers just translated Direct3D calls into GLide calls, because the cards understood GLide a whole lot easier than Direct3D at the time. NVIDIA also optimized their early Riva and GeForce 1 and 2 cards towards OpenGL because they thought the cross-platform stuff would supercede DirectX. ATI was smarter. Although their original Radeon cards were pretty pathetic, they heavily optimized the GPU for DirectX calls, and it paid off. Now they have the fastest GPU's on the market, and keep NVIDIA on their toes.
Anyway, OpenCL is like OpenGL. It's a cross-platform, hardware-independent API. Except that Microsoft has already talked up DirectX11 Compute Shader, which basically does the same thing, but works better with Direct3D-compliant hardware (and Visual Studio makes it easier to code for). So aside from Mac's and Linux, and maybe a few engineering apps on Windows that might use it, OpenCL will be as small a player as OpenGL.
Now, remember I said about ATI having DX11 GPU's already? Look at it this way: they already have the best (and only) DX11 Compute Shader platform presently shipping, and they're using ATI Stream as the native API, meaning that developers can start making those hardware-independent GPGPU apps on Windows NOW. It's gotta feel pretty bad to be NVIDIA (or a CUDA-targeting developer) right now.
How's that for a market position?
@ Joe_Raby
schedulers? How do you know it's weaker?
As far as I understand Windows does not balance single
threaded applications between cores at all. Direct
your attention to the 4 benchmarks on the following
page:
http://www.tomshardware.com/reviews/athlon-ii-
The top 2 are single-threaded, performance is
dependent on clock speed. The bottom 2 multi-threaded
and performance is dependent on number of cores.
I would also like to point out that Windows games are
the only GPU accelerated apps that use DirectX every
thing else uses OpenGL. OpenCL is a big deal.
Your understanding is wrong, actually
Windows will balance single-threaded applications between cores automatically. Threads are devoted to the core that has more idle time. Mach doesn't do that unless the application is coded specifically for that, otherwise it uses the first core. You can easily test this with a few single-threaded applications, and watch your CPU graphs. Microsoft's documentation is much more thorough though.
Also, your link is broken.
"I would also like to point out that Windows games are the only GPU accelerated apps that use DirectX every thing else uses OpenGL."
Wrong again. All of Autodesks current versions of Autocad support Direct3D, and on NVIDIA hardware, the recommended option for most applications is to use Direct3D because it offers more options than OpenGL. In fact, modelling in Autocad 2010 requires a Direct3D workstation-capable graphics card, because OpenGL isn't even supported.
Read the top 2 lines (second one is more important) here:
http://usa.autodesk.com/adsk/servlet/index?siteID=123112&id=6711774&linkID=9240618#section26
not very convincing
Your right I shouldn't of said only. I left out autoCAD because I thought Microsoft owned Autiodesk. I may be wrong about that too but they're very tightly coupled and have been for some time. Most of their competitors however use OpenGL.
I know DirectX/3D is always expanding but I think it has more to do with Microsoft's deep pockets than merit. Unless you can at least provide a good reference link to convince me otherwise.
Sorry about the broken link, copy/paste It'll work.
Not what I said
What I said is that NT *balances* threads. If you launch a single-threaded application, the NT kernel uses the core with the most idle time. Mach doesn't do that automatically, and it's stupid that it doesn't.
GCD fixes that. It was a fix because Apple didn't have a proper thread scheduler for managing multiple threads (whether multiple single-threaded applications, or a single multi-threaded application). Microsoft doesn't need a new threading API because the functionality is already there.
"Most of their competitors however use OpenGL."
And how often do you hear about CAD competitors to Autodesk?
Ok the link doesn't work
http://www.tomshardware.com/reviews/athlon-ii-x3,2452-7.html
Here's another one I stumbled upon looking for the previous link. The last benchmank is the only multi-threaded one.
http://www.tomshardware.com/reviews/athlon-ii-propus,2414-9.html
In these two the pattens gets clear. Changes in architecture are the only things that cause deviation from the cores vs. clock theory.
single threaded:
http://www.tomshardware.com/charts/2009-desktop-cpu-charts/DivX-6.8.3,1382.html
multi threaded:
http://www.tomshardware.com/charts/2009-desktop-cpu-charts/Mainconcept-Reference-1.6.1,1385.html
Even if your right Windows doesn't do a very good job.
You don't get it
As of right now, there are no feasible ways of splitting up threads into "micro-threads" to have them be further balanced out to multiple cores. If you can do that with existing code, that it isn't very good code to begin with. Micro-dividing code bits into even smaller threads means additional overhead for managing threads.
It's like this: If you have a video file that has a high bitrate, it means it's not that well compressed. That's great and all because you have high quality, bit the stored bits are also hard to stream to a processor and output unless you have a fat pipe going to the processor, otherwise you end up with stuttered output. In comparison, if you have a highly-compressed video using H.264 compression and a decent enough bandwidth so as not to lose any visual detail, the storage bits are smaller. Again, that's fine and all, but you need to have a fast processor to decode the video, otherwise you end up with stuttered output. Micro-managing threads is like that - when you categorize and segment everything down into very small details, what you end up with is more and more groups that are more difficult to manage. Video encoding applications usually don't segment each logical task down into further segments because today's processors don't have enough cores, and thread management technologies aren't at that level yet anyway.
What I'm ultimately saying is that your example is bad because you don't seem to understand how programs are divided down into individual threads. It's very rarely 50:50 (for dual-core) or 25:25:25:25 (for quad-core).
Changing Your story Now?
Which is why I took issue with it.
FreeBSD's Mach (what OS X uses) has had multiprocessor support since the late 90's. Your just wrong.
Regarding the last paragraph. Don't argue with me argue with tom or anand. Their benchmarks have shown this consistently. But I bet you never even looked.
What do the benchmarks show exactly?
Audio processings takes a lot less processing bandwidth than video. Video isn't broken up into multiple threads except in some programs that are using effect filters in some cases, which these weren't. These were simple transcoding jobs. The audio is being handled in a separate data stream in a different thread. It isn't logical to break up video streams into individual blocks unless you can designate frames into multiple data streams, and video cards are better at that (most do that semi-automatically anyway) than any CPU, so most software vendors won't bother unless it is a GPGPU app.
@ Joe_Raby
schedulers? As far as I know Windows does not balance
single threaded applications among cores at all. Can
you substantiate that claim?
Direct you browser to the 4 benchmarks linked:
http://www.tomshardware.com/reviews/athlon-ii-
The top two are single threaded and performance
dependent on clock speed. The bottom two are multi
threaded and performance dependent on number of cores.
I would like to point out that Windows games are, for
the most part the only applications using DirectX,
everything else uses OpenGL.
You seem knowledgeable, but possibly just enough to
spread some disinformation.
More than a thread scheduler
It's more than a thread scheduler because it includes C language extensions (similar to Cilk) that make multi-core programming more approachable for the developer. The language extensions are supported by the CLang compiler which is part of LLVM. However I have hope they can be standardized.
It's not just for the Mac because it's open source (Apache license) and can be ported to any system.
And while the current GCD implementations don't support GPUs, the potential is there for either GPU, or more generally hybrid heterogeneous (big core/little core) computing in the future.
Apple changed their tune when they changed the name