Versus Intel, has AMD's day of reckoning arrived?

Versus Intel, has AMD's day of reckoning arrived?

Summary: On several occasions, over the past couple of years, I've heard Intel executives predict that any advantages that AMD has on the desktop would evaporate. Although I can't tell if it's the case here, one explanation Intel offered as part of its prediction was AMD's choice to marry its memory controllers to its processors in the silicon.

TOPICS: Processors

On several occasions, over the past couple of years, I've heard Intel executives predict that any advantages that AMD has on the desktop would evaporate. Although I can't tell if it's the case here, one explanation Intel offered as part of its prediction was AMD's choice to marry its memory controllers to its processors in the silicon.  It wasn't necessarily a bad design choice.  In fact, from a performance perspective, the choice worked, contributing to the performance advantages that AMD has been enjoying on several fronts against Intel.  Intel could have made that choice too and in fact has with other processors such as the infamous i960 (once used for network co-processing but now found in embedded applications). 

But, as Intel explained it to me, due to certain design, manufacturing, and motherboard constraints, once you integrate the processor with the memory controller, you have stick with a specific engineering design (from a speeds and feeds perspective) for much longer periods of time than you do if you take the modular approach that Intel takes.  The result (again, as it was explained to me) is that Moore's Law begins to outpace the long term design choice and, instead of breaking open the bottleneck in performance like an ntegrated design choice does when it's first introduced, it starts to become the bottleneck. 

The general idea is reflected in this story about AMD's Athlon 64 X2 4800+ published by last August:

Since the memory controller is built into the processor, the major drawback for AMD's Athlon 64 X2 right now is its support for current memory standards. As we all know it, prices of DDR2 memory modules have reached a very comfortable level today and we have arrived at an inflection point where prices of DDR2 memory are beginning to cost less than DDR1. If this carries on, it will soon be too costly to build AMD-based systems since they only support the older DDR standard. Think of it this way, is it worth to invest in an old memory technology? We can already hear skeptics chanting away that DDR2 operates at much higher latencies, but that's not quite true any longer. Lower latency DDR2 memory is fast replacing the initial high latency ones and as mentioned, the prices are coming down. Although AMD has plans to roll out new processors supporting DDR2, it is not without a price. Firstly, current motherboard manufacturers have to redesign their boards for DDR2 memory, including introducing a new socket (dubbed M2) to support these new DDR2 processors. Secondly, AMD actually planned to introduce these new processors only in the second half of 2006 - which, in our opinion, is way too late. If you purchase an Athlon 64 X2 today, a year later, your memory modules, processor and motherboard will go obsolete once the new Socket M2 hits the market.

Intel has told me that the net result is a game of leapfrog where the earlier processors in any new generation of integrated designs like AMD's would probably outperform Intel's processors on certain benchmarks but that as time passed and Moore's Law took its toll, that Intel's modular design would allow it to introduce certain improvements more frequently thereby causing Intel to regain the lead, perhaps until AMD launched its next generation of integrated design. Again, I don't know if that's the case here and, while the explanation seemed plausible, Intel may have been blowing smoke at me.  But, what is the case is that Intel's prediction of the results appears to be coming true from a benchmark perspective.  According to George Ou:

So last Friday when I saw the first set of independent benchmark results pitting a mid-end Intel E6600 "Conroe" 2.4 GHz CPU (due next month) against the just released flagship extreme edition AMD FX-62 CPU, I started wondering if AMD worst nightmare was coming true.  Intel's ~$250 E6600 CPU annihilated AMD's ~$1000 Extreme Edition AM2 based FX-62!  This effectively means that AMD's flagship desktop performance CPU will be obsolete by the end of next month when Intel [releases] the CPUs codenamed Conroe. The 2.4 GHz Conroe E6600 CPU is a 65 watt part while Intel's Extreme Edition Conroe CPU will operate at 2.93 GHz and still be 40 watts lower than AMD's FX-62 which runs at 120 watt TPD.  AMD's power advantage over Intel's current Pentium 4 NetBurst architecture just vanished in to thin air with the introduction of Intel's Core 2 architecture next month.

Obviously, there's much more to benchmark performance than just the speed of the path from the processor to the memory controller.  In-Stat's Kevin Krewell explains:

Meanwhile, in Austin, Texas (the Real Home of AMD)...It looks remarkably like AMD was caught flat-footed by the improvements Intel made in the Core microarchitecture. AMD was probably expecting performance parity from Intel’s new microarchitecture, with a slight Intel advantage on (lower) power. But Intel’s Core microarchitecture looks a lot more capable, with a wider-issue core, faster SSE hardware, wider internal buses, more prefetchers, 4MB of L2 cache, and a deeper pipeline. It now looks as if AMD will go from being king of the hill in desktop and volume server processors to being just competitive—at best....AMD will be introducing a new processor socket design to support DDR2 memory, and that should give AMD a few percentage points of improvement in performance, but Intel was showing Conroe systems at IDF with 20% or better performance over AMD’s fastest processor on CPU-challenging computer games. Even with DDR2 memory, AMD will very likely lose its position as top dog in the gaming and enthusiast markets that it enjoys today.

But regardless of what's going on under the hood, it's the results that count.  Regarding those results, AMD has drawn the testbed that yielded those results into question (and in fact cajoled attendees about it during a briefing last week). However, in seeing more than just a frog-leap, Ou thinks it will be Intel that gets the last laugh for now:

The problem here is that this new Intel lead is not the usual leapfrogging where one competitor edges out the other, it's a massive lead across the board!  AMD will be shifting to a 65 nm process by the end of the year and adding 128 bit floating point processors by the middle of next year though it's not certain if they can make a massive performance gain while making a massive reduction in power consumption ..... Intel on the other hand told me that they won't be standing still and they don't ever intend to make the same mistake of allowing the NetBurst architecture to stay around for more than 4 years again.

Not only that, Intel may be looking to end the game of leapfrog by once again (like it did with the hybrid 32/64 architecture) copying AMD's approach.  In what could very likely become a hybrid approach that offers the best of both worlds -- the short-term gains of memory integration and the long-term gains of modularity -- it appears as though Intel will be offering desktop, mobile, and server processors that take the integrated approach by 2009.  According to a piece by the Register that quotes In-Stat and asks Intel to build DRAM units into desktop, mobile CPUs?:

Intel is to follow AMD's lead an integrate memory controllers into its microprocessors, market watcher In-Stat has forecast. By 2009, it reckons, 70 per cent of all x86 processors shipping will have their own memory controller, it said.

Such a move by itself might not be able to neutralize AMD's ability to play leapfrog. After all, going toe-to-toe with AMD by copying what it does assumes that AMD doesn't have more technological innovation up its sleeve and history has proven otherwise.  But, what it may do is refocus the competition between the two companies on a war that Intel has traditionally done well with -- manufacturing and fab capacity.  If for example, Intel can use its manufacturing prowess to somehow yield shorter integrated design cycles than AMD, then AMD's reputation for netting performance breaktrhroughs by way of innovation moves to front and center.  Whereas AMD had to basically go it alone to get this far (from an innovative point of view), the fact that it is very good friends with Sun these days (another company that's doing some very innovative stuff in processor design), should not be lost in any big picture analysis.

Topic: Processors

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Intel controlled tests show Conroe slower than P4

    In multi-tasking,

    "We found that it was faster than the current flagship Pentium Extreme Edition 965 processor in nearly every single-threaded scenario, but there were times where Conroe fell behind in multi-tasking scenarios. One area that completely blew us out of the water was the sheer prowess when it comes to gaming. This is an area where Intel has been traditionally weak.

    This initial preview serves as an indication as to Conroe's performance - but again, we iterate that we will draw our final conclusions as to its performance when we have the complete, shipping hardware in our hands - something that no website or company has yet, save Intel. "

    I have pointed out that Conroe has good single threaded performance (or only when one core is needed) because of its massive 4MB cache. In multitasking scenario, each core gets less cache, and performance drops immediately.
    • Sadly, what does this tell you about how games are written?

      Can't games actually be written to take better advantage of multitasking and multithreading?

      • Two things

        There is no evidence that AMD CPUs are better at multitasking and multithreading.

        Second, the lead programmer at ID Software tells us that using multiple CPUs in gaming is very difficult. They've made some improvements in this area, but the results don't scale perfectly. You don't get a doubling in speed when there are 2 cores.
      • What's the point?

        [i]Can't games actually be written to take better advantage of multitasking and multithreading?[/i]

        That seems like a very, very bad idea. Keep in mind that the main bottlenecks in the twitch games are:
        1) Rendering. Monster cycle hog.
        2) Physics. Varies.

        The trouble is, they're pretty much the only processes that matter. The physics section figures out where everything is (including possible collisions) and hands off to the renderer which does all that [i]shiny![/i] stuff. It's essentially linear.

        Adding the overhead of task- and thread-switching, synchronization, etc. would just increase the cycle cost and slow things down. The only way doing that would help is when there are multiple physical processors, in which case (for instance) one could handle physics and the other rendering.

        Trouble is, that would speed up the multi-CPU performance at the cost of slowing down the single-CPU performance. The games would have to be rearchitected pretty much from the ground up to get there, and some of the customers would actually lose performance.

        How would you make that proposal to Management?
        Yagotta B. Kidding
        • I wonder if a different architecture might help

          for example, a set of 16 bit gpus each work on the physics in their part of the screen, and another set of 16 bit gpus work on the rendering on their part of the screen. Sort of like a mini beowulf to get memory-bandwidth up.
          (obviously physics gpus pass data to rendering gpus).
          I have no idea about the challenges faced by low level graphics operations, so if this is a stupid comment, don't flame me too much.
    • No in the recent independent tests

      You're referring to the AnandTech numbers. Even though Intel set the test bed up, AnandTech got to make requests to "even the playing field" which didn't do anything for AMD. Now independent results are confirming this.
      • cache size makes a difference

        George, the smallest Conroe will have twice the cache of the largest AMD processor. There is very little doubt about why Conroe is so amazing. Previously, AMD could maintain a lead on Intel's old-school Pentium processors by using its integrated memory controller to make up for a smaller cache sizes -- but the cache size advantage that the Pentiums enjoyed were only from 1.5-2x those of AMD's. AMD will now have to invest in larger caches to compete against this to nullify it again. I don't think AMD will bother until it transitions to 65nm itself.

        Now as for the cache itself, Intel invested in a shared cache, which means that any single core can have access to the whole cache for itself if the other core is shutdown. So for a 4MB Conroe, if both cores were operating, then each would have 2MB of cache for their own use, but if one is shutdown, then the other core would have full access to the 4MB. So Conroe would show decent multitasking performance, but amazing single-tasking performance.
        • Why only decent multitasking?

          Even if you chopped the cache in half, it's still as much as AMD's chip.
          • Cache and multitasking

            [i]Even if you chopped the cache in half, it's still as much as AMD's chip.[/i]

            Yes, but thanks to (much) lower memory latency, the cost of a cache miss is less for AMD.

            The easy way to think of it is that going to main memory is so expensive that it dominates the throughput. In that case, your execution time is the product of your cache miss rate and your memory latency. It's obviously more complicated than that, but that gives you the idea.

            That's also why games are a lousy benchmark for most applications. They actually have extremely small working sets most of the time, so they can run almost entirely in cache. Putting a CPU with great game performance to work as a webserver is likely to be very disappointing.
            Yagotta B. Kidding
          • Not just games

            Games, audio and video encoding, 3D rendering is what the desktop cares about. For servers, the integer performance and floating point advantage on Woodcrest is huge.
          • Didn't say just games

            [i]Games, audio and video encoding, 3D rendering is what the desktop cares about.[/i]

            Which is why using a game as a benchmark isn't necessarily a good plan. Audio and video processing, for instance, hit main memory a lot more than games and rendering do.

            Bigger caches help with them, but not as much as they do with a smaller working set (lower cache hit rate.)

            [i]For servers, the integer performance and floating point advantage on Woodcrest is huge.[/i]

            Servers are the classic example of "more memory, even if the processor is slower." Intel's presentations leading up to the FBDIMM standard are quite illustrative on that point -- basically, sheer quantity of memory is so important that they were afraid that server performance would hit the wall this year [1].

            According to Intel's own performance modeling, server performance isn't going to be materially aided by the current crop of chips. The big boosts to server performance will come from:
            (1) FBDIMM, which allows much more attached RAM, and
            (2) Attached memory channel similar to AMD.

            Since the second will take them several years, they're pretty much betting the server farm on FBDIMM.

            Don't take my word for it -- ask Intel to show you the material that they were presenting to the memory industry in 2003-2004 leading up to the JEDEC FB-DIMM standard push.

            [1] For a memory channel, there's a point where the number of devices on the bus is inversely proportional to the speed of the bus. We hit that point a few years ago. After that, increases in memory speed mean decreases in memory size -- and people won't pay more for faster CPUs that reduce system performance.
            Yagotta B. Kidding
          • Isn't AMD going to FBDIMMS next year?

            AMD likes to say that the FBDIMMS eat up an extra 6 watts TPD per DIMM (Intel says it's 4). This is assuming that all the DIMMs are being used to their maximum power levels.
          • FBDIMM

            [i]AMD likes to say that the FBDIMMS eat up an extra 6 watts TPD per DIMM (Intel says it's 4). This is assuming that all the DIMMs are being used to their maximum power levels.[/i]

            We're back to the difference between Intel and AMD power rating systems. AMD rates based on how much power the hardware [b]can[/b] generate, Intel based on how much power they [b]budget[/b] to the hardware in an active power management scheme.

            If you want to, you can throttle back an FB-DIMM to use less than a watt (burst mode -- you shut down the memory bus in between bursts.) I don't advise doing so unless you're doing something like RAM-based disk emulation because the performance utterly sucks.

            In general, anything you hear from Intel regarding FB-DIMM is pretty much generated by their marketing department. They [i]need[/i] FB-DIMM to be low power, so they have dictated to the industry that it will be. In the same vein, they dictated to the industry that the price difference between an FB-DIMM and a registered DIMM should be no more than $20 to OEMs.

            Ask someone who understands the FB-DIMM spec how realistic that latter figure is. Intel hasn't budged on it, though, because a higher price is going to hurt them (badly) in server sales. They're pretty much betting the server farm on FB-DIMM.

            My own read on it, and the read of perhaps a majority of JC-45, is that their chances are about as good as those of their other bet-the-server-farm venture, the Itanic.

            Sometime when I'm less afraid of hearing personally from Intel Legal, I may write a blog on how FB-DIMM is a good example of why JEDEC standards take longer to reach the market than "standards" dictated by Intel or Microsoft. In short, it's because they have enough engineering in them to actually [b]work.[/b]
            Yagotta B. Kidding
          • FBDIMMS pricing

            FBDIMMS will most likely be expensive at first, has a little more latency, but better throughput. Sounds like RAMBUS a few years ago.
          • FBDIMM Spin Cycle

            [i]FBDIMMS will most likely be expensive at first, has a little more latency, but better throughput. Sounds like RAMBUS a few years ago.[/i]

            George, if you want to take this offline ping me.

            Basically, the FBDIMM performance is a matter of physics and the economics is a straight inequality.

            The latency is dictated by the spec, and it's not "a little more." It's the buffer delay for each DIMM in the chain (twice), plus the controller buffer delay. If you don't have at least three FBDIMMs, there's no point in using them -- plain DIMMs are cheaper and faster. The added latency is not trivial. Unlike RDRAM, it can't ever get better [i]because FBDIMMs use the same DRAM as plain DIMMs.[/i]

            As for the cost, the same inequality applies. FBDIMMs require a minimum two extra PCB layers, many more vias, an expensive controller chip, and additional cooling [i]in addition to the same bill of materials as a plain DIMM.[/i] The plain DIMM will [i]always[/i] be cheaper.

            What's more, economies of scale will always favor the plain DIMM because there are too many applications (and more each day) where the plain DIMM will do the job. That pushes the FBDIMM out to the low-volume end of the curve, which means that economies of scale make it even [u]more[/u] expensive -- the vicious cycle.

            Buffering a DIMM is a good thing in a lot of ways, but direct-attach RAM is always going to be cheaper and faster for the low end of the pyramid. Don't ask Intel for the roadmap on this one, ask the memory vendors. Off the record, Mian will tell you straight, for instance.
            Yagotta B. Kidding
          • Email me and we'll take it offline

            If you can email me, or send me your phone or skype contact info, I'll give you a call.
          • Conroe has 2-4 times the cache

            [i]Even if you chopped the cache in half, it's still as much as AMD's chip.[/i]

            George that's exactly my point, Intel's cache is at least twice that of any of the AMD chips. Even with Conroe's lowest end chip, the 4MB, just one of its cores has as much cache as both of the cores in an AMD processor put together. The high-end Conroe, with 8MB, just one of its cores has twice as much cache as both cores on AMD processors. Is there any wonder why it's doing so well?
    • You can't fix stupid

      I haven't read so much bias BS in my entire life. Does Sharikou work for AMDs marketing BS department? Sharikou deserves the De De Dee from Carlos Mencia Show.

      For your info, Sharikou references are very weak.

      Here is something more concrete:

      If you are a hardcore gamer or designer, you don't take sides with a chip company. After August things will get very interesting with AMD because the only thing they can honestly (very rare for AMD) tout is the Opteron dual-core.
      • Agree!

        Do not need to waste time on reading Sharikou's blog. refer to the link below for my summaries on his blog.
  • This could hurt Intel profits?

    I'm no accounting guru, but it seems to me that if Intel's new $250 processor outperforms Intel's current flagship processor that sells for over $1000, I would think Intel's profits are about to fall off a cliff.
    On the otherhand, Intel may price these new processors higher than George Ou thinks they will.

    Time will tell.