For most people, Intel's Itanium flagship processor remains a mystery. In the five years since the architecture was launched, what was promised as a general purpose server — even workstation — chip has retreated steadily into a high-end niche. It does well there, but arguably not well enough to support Intel's research and development budget, but it's hidden from most people's daily experience. The core question — how well does it actually work? — is hidden beneath salesmanship, positioning and industry FUD, not all of it Intel's.
But now some light has been shed on the chip's actual working. In "Itanium —A System Implementor's Tale", a paper to be presented at the Usenix 05 conference next week, four researchers from the University of New South Wales, and one from HP's Palo Alto labs, report on their experience of making Itanium fly. They report favourably on some well-known features of Itanium's design, most notably that it excels at floating-point number crunching, but then explore the reasons why it doesn't do nearly so well on bread and butter computing.
One of the major problems they highlight is the poor quality of compiler code generation. Because of Itanium's EPIC design the chip relies heavily on the code it runs being efficiently formatted. This means that the compiler itself has to work out the most effective way to order the instructions it spits out, so that they flow freely through the chip without creating conflicts for internal resources or waiting for results from each other. Furthermore, instructions have to be bundled together in groups that are given to the processor in one go, and the relationship between instructions within a group — and with those in subsequent groups — is critical for efficient work.
None of the above makes it easy to generate good code, but some of Itanium's features make it even harder. One major issue is instruction latency — the number of clock cycles needed to separate two instructions where one produces a result and the other consumes it. If you ask for the result too soon, the chip stalls, holding up processing for current and next instruction groups, which can slows things down dramatically.