Pentium 4: New chip, old problems

Expecting the Pentium 4 to be everything you want it to be? Prepare for a few disapointments

With the Pentium 4, Intel has made the biggest update to its IA-32 processor line in many years. In some ways, this is good -- the PIII design is running out of steam rapidly -- but Intel's slavish devotion to ever-increasing clock speeds has resulted in some trade-offs that will disappoint the raw speed addicts. Much of the design work has gone into the pipeline, a part of the chip that takes each instruction and guides it through the business of being executed. Early processors had no pipeline, an instruction was fetched from memory, then executed, then another was fetched then executed, and so on. The simplest pipeline arranges things so that the second instruction is fetched at the same time as the first one is executed; while each instruction takes the same time to run, the processor is in effect dealing with two at once. The P5 architecture has a five stage pipeline, the P6 -- as used in the Pentium III -- has ten and the Pentium 4 has twenty. The longer the pipeline the faster it can go, as each stage is comparatively less complex and can be driven faster than with shorter pipelines. Imagine an ocean-going yacht capable of carrying twenty five people at once: those same twenty-five people in twenty-five speedboats will go faster. So while the first cut of the Pentium 4 at 1.4GHz isn't that much of a step up from the top PIII, it has a lot more room for improvement. But long pipelines have lots of problems. One is dependency -- if instruction number one depends on a result from instruction number twenty before it can finish, then nothing's going to happen until twenty's done and you might as well not have a pipeline at all. Then, if you load in twenty instructions but a calculation in instruction number one means that the program wants to run a completely different set of instructions, you have to throw everything away and start again. These are worst-case examples, but lesser versions of these occur a lot -- and a longer pipeline is more vulnerable than a short one. Intel has gone a long way to circumvent these problems. Old-style x86 instructions aren't that easy to pipeline, but the pipeline never touches them: by the time they get there, they've been translated into the Pentium 4's internal instruction format. The chip also caches the instructions in this internal format, and does so along the path it thinks the program's going to do -- when it gets it right, most of the work's been done well before the instructions have to be executed. The execution engine runs at twice the speed of the pipeline and there are two basic logic units instead of one, all of which means that the number of times that anything has to wait for something else is much reduced. Also, to keep everything fed, the memory bus that couples this ravenous code-eating monster to the outside world has been upped to 400MHz from the old 100/133MHz speed. Which is fine, but pretty helpless in the face of the main problem: latency. While the new RDRAM memory controller can keep up with the faster speeds as long as it's running in a straight line, it has exactly the same delays when the processor asks for data that's not in sequence. As long as you keep accessing memory in order, things fly -- as soon as you have to skip to a different part of memory, everything crashes to a halt while the memory gets around to it. And the way that Windows is designed, this sort of request is very common indeed -- any application that makes a lot of system calls will make a lot of jumps to other parts of memory. At this point, most of the improvements in the Pentium 4 are useless. Caches have to be emptied and refilled, the pipeline starts again from scratch, predicative buffers predict the wrong thing. The limiting factor is memory latency -- how long it takes to get raw data out of a new place and into the processor. With RDRAM, this also limited the top-end PIII systems, and the result is that on Windows software, a 1.4GHz Pentium 4 won't be that much faster than a 1GHz PIII. If you have other software, if you recompile to use the Pentium 4's new SIMD integer and floating point maths/multimedia instructions, if you make executables that are sympathetic to the Pentium 4's architecture, or if you use non RDRAM memory that is both fast and low-latency, then the Pentium 4 system will knock the socks off its predecessors. Quite possibly including the IA-64 range, which is an intriguing possibility. But the first crop of Windows-running, Rambus-bound Pentium 4s will satisfy none of these requirements. Thanks to Sophie Wilson of Element 14 for help in research Intel's new Pentium 4 "Willamette" processor (Willy for short) will become public news: it's really not worth buying. At a clock speed of 1.5 GHz -- Guy Kewney says it's barely faster than a Pentium 3 at 1 GHz. Intel, in short, has a little Willy!
Go to AnchorDesk UK for the news comment.
See Chips Central for daily hardware news, including interactive roadmaps for AMD, Intel and Transmeta. Take me to ZDNet Enterprise Have your say instantly, and see what others have said. Click on the TalkBack button and go to the ZDNet News forum. Let the editors know what you think in the Mailroom. And read what others have said.