Mozilla engineer Rob Sayre set about benchmarking Firefox 4.0 beta against a selection of other browsers and found that IE9 was about ten times faster at one certain test (the math-cordic test) than the other browsers, with IE9 completing the test in around 1ms while Chrome and Opera took around 10ms.
Curiosity piqued, Sayre did some further investigating:
One last issue that can crop up has to do with over-specialization for a specific test. While I was running the SunSpider tests above, I noticed that IE9 got a score that was at least 10x faster than every other browser on SunSpider's math-cordic test. That would be an impressive result, but it doesn't seem to hold up in the presence of minor variations. I made a few variations on the test: one with an extra "true;" statement (diff), and one with a "return;" statement (diff). You can run those two tests along with the original math-cordic.js file here.
All three tests should return approximately the same timing results, so a result like the one pictured above would indicate a problem of some sort.
This effect shows up nicely in the raw benchmark tests results I carried out last week. Notice how the math-cordic test result for IE9 are consistent.
So what could be behind this. Three possibilities spring to mind:
- Deliberate optimization for the SunSpider test
- Accidental optimization for the SunSpider test
Can we put this down to cheating, as suggested by Digitizor (which was later picked up on by Slashdot)? Well, without access to the code it's impossible to be sure, and we don't have access to the code. The effect of this one aberration is quite small and tweaking the values from 1ms to 10ms in the tests I ran only drops the SunSpider score to 403.7ms per run, up from 394.7ms. But this is just one result out of many. It depends if there are other, more subtle, optimizations there.
I'm not ready to call this a cheat yet, but it's certainly fishy. But even is there is some degree of optimization, I'm more likely to believe that it's accidental rather than deliberate. The consistency of the result on IE9 is odd in that across multiple machines I get a consistent score of 1ms, which is not something I'd expect to see. Combine that with the fact that the change made to the benchmark code by Sayre should "functionally" make no difference, the fact that you can see wildly different results is again very odd and not something I'd expect to see.
Sayre has submitted this as a "bug" to Microsoft.
The take away: Benchmarks are odd, fickle things. Put too much faith in the numbers and you lose sight of the wood for the trees.
[UPDATE: It seems some commentators find it hard to see the wood for the trees and wonder why anyone would be suspicious of this test result. I understand the need for nerd-rage diplomacy when dealing with anything Microsoft (pro or anti) but this is about the facts at hand.
Let me sum them up for you:
- Result is consistent across multiple runs (1ms, no variation)
- Result is consistent across multiple platforms - I've run the test on several systems and get a 1ms +/- 0.0% with each and every run.
So it could be a bug, or could be a feature. Either way it's an inconsistency in the code that needs attention just in case it has implications elsewhere. There are many people who make serious business decisions based on benchmark results.]
Ed note: Changed headline of post.
Microsoft had the following to say about the issue:
The company said it will follow up with more on its IE blog.
[UPDATE: Microsoft has now attributed this anomaly to dead code elimination but this explanation still doesn't account for the fact that the three functions tested by Sayre (and I've run them myself) all include he same amount of dead code ... the addition of true and return statements to the function doesn't in any way change the amount of dead code the JScript engine has to process.
Why the JS engine gives different results for what is functionally the same code with the same amount of dead code is still very interesting and worthy of discussion. The fact that the piece linked to on /. made wild unsupported accusations doesn't change the fact that there's something interesting going on here. Like I said, it's highly unlikely to be cheating but if dead code elimination can achieve such good results for the code used for the math-cordic test, this performance should be translated to the variants using the true and return statements too.]
[UPDATE 2: I've been experimenting with a SunSpider deadcode fork (https://github.com/cheald/SunSpider-deadcode) and the results show that IE 9 does indeed carry out dead code elimination, but that it only seems to kick in under certain circumstances.
Lot of good discussion here on different code thrown at IE9 being hadled differently ... http://apps.ycombinator.com/item?id=1913368]