AMD: Is closing the quad-core deficit enough?

AMD: Is closing the quad-core deficit enough?

Summary: AMD kicked off the "premier" of their latest microprocessor offering with a launch party at the Herbst International Exhibit Hall Monday night.  Partners like VMware, HP, Dell, Sun, Oracle, IBM, Microsoft and others were on hand or were there by video link to celebrate the launch of AMD's single-die quad-core milestone processor.

SHARE:
71

AMD kicked off the "premier" of their latest microprocessor offering with a launch party at the Herbst International Exhibit Hall Monday night.  Partners like VMware, HP, Dell, Sun, Oracle, IBM, Microsoft and others were on hand or were there by video link to celebrate the launch of AMD's single-die quad-core milestone processor.  Barcelona is critical for AMD since it was getting battered by Intel's ten-month head start in quad-core processors which used a cheaper-to-manufacture dual-die process.

AMD argues that its single-die "native quad-core" process with its lower latency is architecturally superior to Intel's dual-die process.  But the challenges of manufacturing a massive 283mm squared die with high yields and high clock speeds is daunting and the fact that Barcelona is 6 months late and 600 MHz short makes this fact painfully clear.  AMD’s executive VP Mario Rivas admitted back in March that he wished AMD had “immediately done a MCM - two dual cores and call it a quad-core” if he could do it all over again.  Intel takes the easy manufacturing route of combining two 143mm squared dies which allows Intel to mix and match the best combinations.  Intel's soon to launch 45nm chip takes the die size down to an even more manageable 107mm squared.

Earlier this year, AMD had told several news organizations such as ZDNet and TGDaily that Barcelona will outperform Intel's Clovertown 2.66 GHz quad-core processor by margins of 20 and 50 percent on both integer and floating point.  It was initially implied that AMD was comparing a 2.6 GHz Barcelona processor, but it wasn't confirmed until AMD distributed 2.6 GHz benchmarks in July with similar performance claims.  The actual launch speed for Barcelona was 2.0 GHz and it fell short of AMD's original claims by a significant margin.  The actual benchmarks which were leaked to me last Friday which are now confirmed by published SPEC.org results indicate a 24% deficit on SPECint_rate2006 and a 15.4% lead on SPECfp_rate2006 over Intel's best two-socket processors which is a far cry from the 20 and 50 lead claimed by AMD earlier in the year.

<Next page - SPEC CPU 2006: Intel Clovertown/Tigerton and AMD Barcelona>

SPEC CPU 2006: Intel Clovertown, Tigerton and AMD Barcelona

The charts below were compiled from official SPEC.org published results as of September 12 2007 with the lone exception of the IBM results.  They are virtually identical to the Barcelona results I extrapolated from the leaked slides sent to me last Friday.  AMD has implied that their 2350 score may now be a few tenth of a point higher on SPECint_rate2006 but I don't know exactly how much and it's such a small delta that I'll wait until the official results get published.

<Next page - Pros and cons of AMD and Intel architecture>

Pros and cons of AMD and Intel architecture

While it's true that AMD's single-die process and superior memory subsystem allows AMD to scale at a near perfect trajectory as it increases clock speed and socket count, AMD sacrificed raw execution speed which puts it at a lower starting point compared to Intel (see note below).  This is primarily due to the 4-issue execution engine in Intel’s Core Micro-architecture versus AMD’s 3-issue execution engine.  Intel on the other hand sacrificed the integrated memory controller but implemented a faster execution engine and a massive 8 MB level 2 cache that mitigates the effects of slower memory.

Note: From IBM’s latest data, Intel still holds more than a 31.2% advantage on SPECint_2006 performance and more than an 18.2% advantage on SPECfp_2006 at 1.9 GHz on single threaded performance when comparing Intel Clovertown to AMD Barcelona.  This is calculated by looking at the Barcelona 1.9 scores of 11.3 and 11.2 on SPECint_2006 and SPECfp_2006 versus an Intel E5335 2.0 at 15.6 and 14 adjusted down by a ratio of 1.9/2 which is conservative performance for Intel at a theoretical 1.9 GHz.

Single threaded performance plays an essential role on current applications that aren't multithreaded well but they still play a role in certain tasks that fundamentally don't thread well.  On servers, single threaded performance allows a busy thread to borrow memory bandwidth from idle threads whenever the system isn't fully throttled.

Simply put, Intel starts fast but scales slower while AMD starts slow and scales faster as the core count and clock speed goes up.  This is why a Barcelona 2.0 GHz processor loses to an Intel Clovertown 2.0 or Tigerton 1.86 GHz processor on SPECint_rate2006 but once you get to ~2.5 GHz, the clock-for-clock performance advantage on SPECint_rate2006 swings over to AMD.   So in order to overcome Intel at 3 GHz on general purpose benchmarks like SPECint_rate2006, AMD needs to get to the high 2s on GHz if they want to beat Intel's Clovertown 3 GHz processor.  Of course Intel isn't going to sit idly by and watch their lead evaporate.

[Update 9/30/2007 - Fixed and clarified numbers using 9/30/2007 SPEC numbers - To quantify the scaling of AMD and Intel CPUs, Intel starts with a huge 31.2% SPEC CPU 2006 integer advantage at 1.9 GHz over Barcelona and even more against Opteron K8 when we look at single threaded performance.  However, Intel can only scale 1 to 8 cores at 64.7% efficiency at 2 GHz and 53.6% efficiency at 3 GHz.  AMD Opteron K8 and Barcelona scale 1 to 8 cores at 87% at 2 GHz to 87.4% efficiency at 3 GHz.  In a two-socket 8-core platform, Intel Clovertown scales the clock from 2 to 2.33 GHz at 58.7% efficiency and drops down 52% efficiency by the time you scale 2 GHz to 3 GHz.  AMD Opteron K8 dual-core on the other hand for 8-cores scales the clock from 2 to 3 GHz at 77.3% efficiency.

Barcelona however seems to be scaling poorly for SPEC CPU 2006 floating point from 2 GHz to 2.5 GHz with an efficiency of 46.6%.  That seems to be due to the fact that AMD can't do a fractional multiplier for the memory clock like they can do for the CPU so they're forced to run the memory at 312.5 MHz instead of 333 MHz.  If AMD can switch to DDR2-800, then the scaling will probably be a lot better but most of the Barcelona class servers announced only go up to DDR2-667 except for Sun which supports DDR2-800.]

My definition of scaling efficiency: If a processor make a 50% increase in clock speed but only realizes a 40% increase in performance, I call that 80% clock-scaling-efficiency.  If a processor or a computer goes from 1 socket to 4 sockets but it only sees a 3 fold increase in speed, I call that 75% core-scaling-efficiency.

I really doubt I'm the first to think of this method of expressing scaling efficiency, but I haven't seen anyone explain it this way so I'm calling it my definition for now until I know otherwise.

Scaling inefficiencies can be more brutal towards Intel on applications that require even more memory bandwidth or it can be more generous on application benchmarks like SPECjbb2005 where Intel's lead is even larger than SPECint_rate2006.  SPECfp_rate2006 for example is a floating point benchmark that represents HPC (High Performance Computing) scientific and engineering workloads and it requires massive amounts of memory bandwidth.  The memory requirements of HPC applications play so well to AMD's architecture that even a 2 GHz Barcelona can destroy a 3 GHz Clovertown on SPECfp_rate2006 by a factor of 15.4%.

Unfortunately for AMD, benchmarks that are important to the IT world like SPECjbb2005, SPECweb2005, TPC-C, and SAP were conspicuously missing at the Barcelona launch.  Intel by contrast featured a plethora of record-breaking benchmarks at last week's Tigerton launch featuring all of the above metrics.

Both Intel and AMD are well aware of their own architectural shortcomings despite the fact that neither camp is eager to advertise it from a marketing standpoint but their roadmaps tell another story.  Intel will move to a memory architecture called CSI (Quick Path) in late 2008 that is similar to AMD's memory architecture and AMD announced at their July analyst meeting that their next generation platform called "Bulldozer" coming in late 2009 will feature an improved execution engine that addresses single threaded performance.

<Next page - AMD versus Intel on price, performance, and power efficiency>

AMD versus Intel on price, performance, and power efficiency

There was some good news for AMD at last night's Barcelona launch as they upgraded their projections for year-end Barcelona parts from 2.3 to 2.5 GHz.  The 2.5 GHz quad-core processor will make AMD a lot more competitive in the four-socket server segment and it will help in the mainstream two-socket segment.  2.5 GHz while it may not deliver the performance crown will be a huge improvement over AMD's current situation and it indicates a faster ramp up than previously expected if AMD can deliver on 2.5 GHz this year.

AMD's 2 GHz Barcelona is already priced very competitively and there is little question it will sell well, the problem is that Intel will very likely batter AMD's average selling price in the 2 GHz value market segment.  For example, the 2 GHz Barcelona Opteron 8350 is priced below Intel's 2.13 GHz half-cache Tigerton CPU yet it offers better SPECint_rate2006 performance.  While that's a good deal for the customer, it probably isn't such a great deal for AMD's margins.  Once Barcelona gets to 2.5 GHz, it should be able to sell those chips for double the price yet remain price competitive against Intel, and improve its margins.

Intel's ten month lead on price, performance, and performance/watt (when comparing Intel's quad-core to AMD's dual-core Opteron servers) battered AMD but Barcelona closes the quad-core deficit.  While the current Barcelona 2.0 GHz launch part won't reclaim general purpose performance crowns like SPECint_rate2006, it will allow AMD to reclaim a performance/watt leadership on many workloads and this is primarily due to Intel's FBDIMM (Fully Buffered memory) power-consumption liability.  Taking the performance/watt lead would not have been possible so long as Intel retained an exclusive quad-core advantage.

Scott Wasson of TechReport ran a series of detailed tests that showed a dual 95W TDP (Thermal Design Power) AMD Opteron 2350 8-core server beating a dual 50W TDP Intel L5335 server.  When you're expecting to see a ~80 watt advantage (processors don't actually hit TDP in real world applications and that's why you shouldn't expect 90W delta) for Intel based on the CPU differences, the memory controller probably factors in an extra 25W and the eight FBDIMMs probably cost the Intel server an extra 60W (AnandTech measured 60W difference on 8 FBDIMMs).  The difference in the memory subsystem explains how an 80W advantage on the CPUs can turn in to a 5W deficit for Intel.  If we use more reasonably priced E5335 80W parts, we're probably looking at a ~55W deficit for the Intel server.

Note on SPECjbb2005: While Scott Wasson does a lot of good work, his unofficial SPECjbb2005 results in the same article should be disregarded.  Wasson's results for the Intel X5365 is off by 61% from the record breaking published SPECjbb2005 results.  SPECjbb2005 isn't memory bandwidth heavy so it favors Intel's architecture.  In Wasson's defense, AMD sent him and other reviewers these parts on the Friday before launch and he probably didn't sleep much over the weekend getting that massive review ready for Monday morning so I don't want to be too hard on him for this.  So far, there are no published SPECjbb2005 scores for any AMD Barcelona processors yet and it wouldn't take the crown even if it doubled the score of an Opteron 2-socket 2222SE server.

AMD's new-found edge on performance/watt may be short-lived in the two-socket space because of Intel's jump to 45nm Penryn this November.  According to leaked OEM roadmaps, Intel will be able to launch a mainstream part at 3 GHz within the 80 watt TDP envelop.  Penryn coupled with FSB1600 and 50% more 24-way associative cache along with other improvements will probably take Intel in to uncharted territories on performance and performance/watt even with the FBDIMM power liability.  In the low-power segment, Intel's new "San Clemente" DDR2 chipset (which Intel will not comment on) coupled with a 2.66 GHz 50 watt low-voltage Penryns will undoubtedly be interesting given DDR2's known power advantages over the FBDIMM architecture that Intel currently relies on.  AMD will have a little less pressure in the four-socket market if they can quickly ramp up to 2.5 GHz because Intel's jump to 45nm "Dunnington" which is the successor to Tigerton probably won't arrive until the second half 2008.

<Return to top>

Topics: Intel, Hardware, Processors

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

71 comments
Log in or register to join the discussion
  • Are you talking "on-Paper" or "In Reality" ?

    From what I've read, the two are somewhat different.
    BitTwiddler
    • All numbers I cite are published on spec.org

      All numbers I cite are published on spec.org. They're not "on-paper"; they're the official industry standard benchmarks approved by the SPEC counsil.
      georgeou
      • Spec.org only useful for marketing

        Their benchies don't follow computing needs in the real world.
        Uber Dweeb
  • George on SPECjjb2005

    George says "...like SPECjbb2005 where Intel???s lead is even larger than SPECint_rate2006."

    but later admits:

    "So far, there are no published SPECjbb2005 scores for any AMD Barcelona processors yet ...."

    which begs the question: which is it? Are there benches to compare, or not?

    I don't think it can be both ways.
    halbhh2
    • Never claimed both ways

      I'm talking about SPECjbb2005 in general terms and how it has a larger lead over known AMD K8 scores than the lead on SPECint_rate206. A 2 GHz Barcelona isn't going to double the score of a 3 GHz Opteron 2222 and that's a fairly reasonable statement.
      georgeou
    • He also asks us to disregard existing benchmarks

      He'll probably do the same for the results of AnandTech's SPECjbb comparison (http://www.anandtech.com/IT/showdoc.aspx?i=3091&p=5) since they also don't match his estimate.

      Those runs are "unofficial" and he's waiting for vendor-provided numbers on SPEC.org. I wonder when he'll be back at calling those "certified".

      AnandTech's article mentions the problem with those numbers, which I pointed out for SPECCPU several times and George continues to ignore: The publishing rules only mandate the disclosure of the exact circumstances and configuration of the benchmark that lead to the submitted results. They do not enforce comparable results and it's difficult (in case of SPECjbb2005 often impossible) to draw conclusions from two separate scores, even if you carefully take the configuration into account.

      I'll say it again: The raw numbers on spec.org (and if it suits the vendor, the "best score vs. best score" comparison) are marketing numbers. If George wants to continue to focus on who's the best benchmarketeer in the arena, so be it.

      Oh, and regarding the "conspicuous" absence of those benchmarks from the launch slides: Nobody keeps Intel from publishing Barcelona numbers on www.spec.org. They did it for AMD's Turion, so why not for the Opteron?
      CFKane
      • Neither AnandTech or TechReport are claiming official SPEC results

        Neither AnandTech or TechReport are claiming official SPEC results. I'm going with the official published results on SPEC.org.
        georgeou
        • They're still claiming comparativeness

          "Neither AnandTech or TechReport are claiming official SPEC results. I'm going with the official published results on SPEC.org."

          How convenient, since there are no numbers for AMD's Barcelona, you can write whatever you wish. SPEC does not claim that their "official" numbers are directly comparable, yet you choose to prefer them over the "unofficial" ones that do.
          CFKane
          • When published spec scores are available, that's what people go with

            When published spec scores are available, that's what people go with. Surely you're not suggesting we use TechReport and AnandTech over published spec scores right? Oh wait, you're the guy who said this so I guess you would.
            http://talkback.zdnet.com/5208-10533-0.html?forumID=1&threadID=38675&messageID=708792&start=0

            Too bad for you normal people don't think that way.
            georgeou
          • You're contradicting yourself

            Where's your high claim of revealing the TRUTH in that? Is it in reiterating what people will go with anyway? Creating a picture that people will go with is the sole purpose of marketing. So whoever has the better marketing also has the better product? I didn't think the situation in the US was that bad...

            And thanks for linking to my other post, because it perfectly fits in here. I expect AMD to do whatever is best for their marketing. That doesn't imply that I'm siding with them. I expect the same of Intel and I find it very understandable that they're tuning their own compiler to give the best SPEC results possible for their CPUs - after all they're marketing both products over the same scores - a double strike. I do not understand the same behavior for a blogger, though. What's in it for you?
            CFKane
          • Same reply to you as always

            http://talkback.zdnet.com/5208-10533-0.html?forumID=1&threadID=38675&messageID=708813

            You say it's ok for AMD to be deceptive in the name of marketing and you say I'm being deceptive for reporting it. There's no way to reason with someone who engages in this sort of backwards logic.
            georgeou
          • And now you're ridiculing yourself

            "You say it's ok for AMD to be deceptive in the name of marketing and you say I'm being deceptive for reporting it."

            That's not what I said and you know it.

            "There's no way to reason with someone who engages in this sort of backwards logic."

            There's no way to reason with someone who refuses to understand what he's reading. Backwards logic is to ignore solid facts based on an insinuated bias.
            CFKane
  • Give your readers a break

    You really overdid it with the numbers this time. I'm used to reading technical documents, but still skipped entire paragraphs in this post. If you can't provide graphs to present your numbers, leave them out all together. The casual reader doesn't want to be slain with hosts of numbers he has to consider back and forth to form a picture in his mind.
    CFKane
    • Don't worry about me

      You're free to speak for yourself and you're free to skip paragraphs or the entire article, but you don't need to worry about me.
      georgeou
      • George, your reply is strange.

        He wasn't worrying about you. He said that you overdid it with the numbers. That's all.
        nizuse
    • give George a break

      When George doesn't publish numbers, people complain that he needs to be more scientific and less subjective. When he puts numbers, you don't like them because is a pain to read.

      I don't intend to defend George just for fun, but I think this time he made a very good job. And I find it sad no matter how hard he tries to do a good research, some people will always always complain.
      patibulo
      • why are some of you guys such assholes

        seriously.. it takes a lot of work to put an article like this together.

        Thanks for the info. This chip wars make things interesting and entertaining as well.
        pcguy777
    • His Crayon Broke.

      Last time I checked George wasn't capable of actually making graphs that related to source figures. As can be seen in the following example.

      http://blogs.zdnet.com/Ou/images/amd-price-drops.png

      Georges inability to do high school graphs can be clearly demonstrated by their absence in this post when their is source data available. Therefore I am under the impression that having to produce a graph with a y-axis based on actual source data is beyond the scope of Georges ability.

      Or more likely, he broke his graph crayon on his last post.
      Bozzer
      • why a graph?

        Why on Earth do you want a graph here? There is nothing to graph, except what has been graphed by AMD and others. Graphs here would just make the post even longer to read. Really, if a few numbers there bother you, then you are reading the wrong news.

        You are right: the graph from George you cite is pretty sad. But that has nothing to do with the fact that a graph here would not bring much, except cluttering and confusion.
        patibulo
        • As an example

          Look at the (updated) scaling paragraph on page 3. That would have worked much better in a graph, since you would understand at a glance what is being said and could easily derive additional information, which you need to calculate from the numbers given now.
          CFKane