Cool Codes

Cool Codes

Summary: I firmly believe that we are sitting on a plateau just waiting for the next order-of-magnitude leap in computer performance and capability to unleash a new age of application innovation


Monday of last week was one of those “convergence” days. I’m sure you know the feeling. Besides being my 29th wedding anniversary, it was the first day of the Hot Chips conference at Stanford University. Before my wife and I drove out to Half Moon Bay to celebrate, I was on stage at Mem Aud to give the opening keynote of the conference with a talk entitled Cool Codes for Hot Chips and to announce a new multi-core applications initiative. I’ll come back to the latter item in a moment.

The theme of my keynote was very much related to the question I raised in my last post – have we reached the end of applications or are we at the start of a new wave of innovation? Even though many of your comments had assumed I was in the opposite camp, I firmly believe that we are sitting on a plateau just waiting for the next order-of-magnitude leap in computer (and communication) performance and capability to unleash a new age of application innovation.

To get off this application plateau we have to have access to some radically better hardware. Unfortunately, the hardware won’t happen unless the architects (and their bosses) believe there will be software to take advantage of the new hardware. To resolve this chicken-and-egg question, we need to start building and testing working prototypes of these future applications. That’s what we’ve been doing at Intel for the last three years, and I took the opportunity at Hot Chips to call for a community wide-effort along the same lines.

 A collection of future applications, ones that take today’s systems beyond their limits would serve two purposes. First, it would help stimulate much more thinking about what can be and should be done. More programmers would pick up the challenge and start thinking more expansively about the future. Second, it would give architects and engineers a set of working, prototype applications against which to evaluate the efficiency and programmability of their new designs.

Let me share one of the demos that I used at Hot Chips as an example of what’s possible if one has the necessary processing power.

Here’s the basic recipe (click on an image to see the video in action):

  1. Take input from four cameras located in the corners of a room (Fig. 1a)
  2. Analyze the video streams to extract the location and motion over time of the individual body parts (torso, arms, legs and head) based on a programmed skeletal model
  3. Animate a synthetic human figure with skin using ray-tracing and global illumination within a virtual scene based on the actual kinematics determined in step 2 (Fig. 1b)

Camera_Input_1a.pngFig. 1a

Body_Tracing_Output.pngFig. 1b

While live-action movie animations usually sprinkle LEDs over actors wearing dark clothes and then just track the bright lights, the Intel system works without any special markers on the person. You literally walk into the camera-equipped room and it just works.

The applications for this technology are wide open beyond the obvious ones in game play: you might compare your golf swing to that of Tiger Woods or see how you look walking or even dancing in a new outfit without ever putting it on. Given the model has your physical information, you’d know if you need the next size up or if the color isn’t quite right given your skin tone.

This system is appealing to us not because Intel is planning to ship one of these applications, but because it points to a broad new class of algorithms that we refer to as “recognition, mining and synthesis” or RMS.

The recognition stage answers the question “what is it?” – modeling of the body in our prototype system. Mining answers the question “where is it?” – analyzing the video streams to find similar instances of the model. And synthesis answers the question “how is it?” – creating a new instance of the model in some virtual world.

This flow between recognition, mining and synthesis applies beyond the entertainment and visual domains. It works equally well in domains as diverse as medicine, finance, and astrophysics.

Such emerging “killer apps” of the future have a few important attributes in common – they are highly parallel in nature, they are built from a common set of algorithms, and they have, by today’s standards, extreme computational and memory bandwidth requirements, often requiring teraFLOPS of computing power and terabytes per second of memory bandwidth, respectively. Unfortunately the R&D community is lacking a suite of these emerging, highly-scalable workloads in order to guide the quantitative design of our future computing systems.

The Intel RMS suite I mentioned earlier is based on a mix of internally-developed codes, such as the body tracking and animation prototype, and partner developed codes from some of the brightest minds in the industry and academia. As researchers outside of Intel learned more about the suite, they started to ask if we could make it publicly available. Since it contains a mix of Intel and non-Intel code, we couldn’t just place it in open source. A conversation last spring about the suite with my good friend Professor Kai Li of Princeton gave rise to the idea of a new publicly available suite, and my Hot Chips keynote gave me the opportunity to engage the technical community in its development.

At the end of the keynote I announced the creation of a publicly available suite of killer codes for future multi-core architecture research. I also announced that Intel would contribute some of our internally-developed codes in body-tracking and real-time ray tracing to launch the effort. I was also pleased to announce that Professor Ron Fedkiw at Stanford will contribute his physics codes, the University of Pittsburgh Medical Center will add their medical image analysis codes, Professor David Patterson at UC Berkeley will provide codes of the “Seven+ Dwarfs of Parallel Computing”, and Professors Li and JP Singh at Princeton will make additional network and I/O intensive contributions including content-based multimedia search, network traffic processing, and databases.

Professors Li and Singh have graciously offered to manage contributions to the suite and host the repository. A workshop is being arranged for early next year to establish some guideline on contributions. I’ll provide more information here as the date gets closer.

And that brings me back to the question of when will we have the computational capability to break free from today’s rather quaint applications? Sooner than most people think if we come together to create the future.

Topic: Intel

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Check Robt Cringely's column this week

    I had a professor once who talked about synchronicity, where a new idea suddenly emerges from many seemingly unrelated directions simultaneously. Robert X Cringely (at was on this week about a St. Louis, Missouri-based company named Appistry that seems to do exactly what you're talking about. Appistry generates "application fabrics" that spread an application linearly across multiple processors. Cringely says an intelligence application that took five day to run on a fast processor now runs in 48 minutes, spread across a 150-processor fabric. It's the linear scaling that's interesting - if Appistry is all that Cringely reports, wouldn't that pretty much take us immediately into the territory you're describing in your column?
    • parallel processing

      You can track the hills and valleys of advancement in the power of a single processor (including both technical and economic difficulty) by looking at how much work is being put into parallel processing. And vice versa.

      I'd look for a significant downward trend in the time constant of exponential increase in your favorite processor power measure before anticipating a really major resource reallocation in the favor of processor dedication and parallelization over single processor power.

      I think Rattner is not so much talking about the hardware as the nature of the software. Given the power we now have, the general philosophy behind the new software we choose to develop is starting to feel a little short in the cuffs. He's proposing that a major change in this philosophy is soon to be upon us, but not without a bit more power and some initiative.
  • Great Leap Forward

    We are about dead in the furure of automation other than refinement in digital operations. All we can do is shrink the switch and think of new ways to combine and package the switch. The digital switch is BINARY; to really have a computing/communication revolution is to develop a true multi-state switch that in 'n'-state. Think of the capacity of programming and operations if one could magically replace all binary computing switches (re: transistors and gates) with something that had, say, an eight-state switch. So instead of YES-NO, ON-OFF, TRUE-FALSE computing, you had an additional six states of being that also has meaning.
    This is where the next true revolution will occur.

    • Great Leap Forward?

      I'm curious about your subject line there. Am I missing a metaphore?
    • I'm not so sure...

      I've thought about that before, and I'm not so sure that it would truly be an advancement. In fact, I suppose the idea in general leans towards the analog vs. digital debate (all analog values can be represented digitally, the finer the detail required, the more bits used, while the representation of a bit is really a timed snapshot of an analog wave value OR a charged particle with an analog value above or below a certain threshold, which value is translated into a one or zero, or some-such similar convention used, depending on the technology employed ). I remember when studying physics, it seemed when looking at the nature of the material and energy of the universe around us, that as one focused in smaller and smaller, there was an alternating phenomenon of discovery that goes something like this: (analog) waves are made of discrete (digital) particles, which in turn seem to be made of elements that have wave-like properties, etc. In fact, I don't think we've reached the bottom building block element yet (we thought it was the atom, but then we found what atoms were made of...), so whether the essence of the world is digital or analog, who knows? But I digress.

      The thing is, the functionality of an eight-state switch can be created with a combination of multiple binary switches, so the exercise of realizing the advanced possibility of using "n-state" switches is really a conceptual one; the only advantage on the hardware side-of-things would be when one could create an eight way switch cheaper and/or that would perform faster than the same functionality provided by current binary switches. Also, the binary switch is decision making at it's most basic root level: yes or no, on or off. It may be found that starting at a level of possibilities anywhere above that may introduce vast other inefficiencies into the system; I don't know for sure, but my intuitive speculation leans in that direction.

      So, while I bet there is probably a way to advance the computer by re-inventing it at its most foundational levels, I'm not yet convinced at a hardware level that an "n-state" switch or some sort of "analog computer" would really be the leap forward that we want.
  • What about the little guys?

    Mr. Rattner -

    As I have stated in previous TalkBacks as well as my own blog response to your previous article, the small developers are being left behind. What is Intel and its partners doing to address this? More and more, the applications that users touch are small applications written in managed code environments such as Sun's Java and Microsoft's .Net. Multi-core and multi-CPU and HyperThreading only complicate the lives of developers, they do not make it any easier. while the hardware's ability to handle parallel code is increasing dramatically over the last few years, the tools to actually write this code have not improved much. With the exception of a few developers working on big projects, most of us do not have much experience working with intensely multithreaded applications.

    So I am repeating again my challenge to you and to Intel: help the developer who is not using C++ to write applications to take advantage of your new CPUs!

    Justin James