Processors are general purpose devices. They have to be able to handle hundreds of instructions, and that means that each time they execute one they have identify what it is and make many decisions about what bits of themselves to use in servicing it. That's slow and tricky to speed up - and has exercised processor designers for decades.
A report from the University of California at Riverside has a rather intriguing development. When a processor spots a set of instructions being repeatedly executed, it creates a temporary design in hardware specifically to perform the same task. This design is loaded into a reconfigurable set of logic gates, which run at full hardware speed. No need to execute the software instruction by instruction.
The reported speed-ups are remarkable - ten, a hundred, even a thousand times improvement - and the press release predicts great things for this. The inventor, Professor Frank Vahid, says that this will be applicable to all processors.
Now, this is interesting - but don't get too excited just yet. Using FPGAs (Field Programmable Gate Arrays - great seas of logic that can be reconfigured into different hardware designs on the fly) is a well-respected and effective method of speeding up computing, and there are a number of intriguing possibilities raised if you start integrating this with the heart of a processor. Lots of issues about bandwidth, memory cohesion and bus design are either solved or simplified.
But FPGA computing has one big drawback. It is hellishly difficult to extend to the general case. The whole field is being held back not by the cost of the hardware (which is large) or willingness to make it work, but by evolution's thoughtless failure to create enough extremely brainy people. If you have a particular supercomputing task, a year or so, and some of the few computational FPGA wizards in captivity, you can create a machine that will outperform anything on the planet. Otherwise, you're stuffed - the business of designing hardware to implement algorithms more efficiently than a general purpose processor is very hard. There are compilers that can do the donkey work, but the results are frequently disappointing - and you'll still need a megabrain-year to fine tune what you get.
Now, try taking that process and automating it - not for particular tasks, but for the general task of spotting loops and reconstituting them in an FPGA. That's much, much harder. Many processors have tried similar ideas related to recoding instruction streams for more efficient execution, and this can work - but it tends to be a simple minded translation. Inline on-the-fly code analysis is an order of magnitude harder, and that's before you get to the bit where you have to invent the logic that can replicate it, only faster.
Prof Vahid is no slouch: he's got all the big chip companies looking to license his ideas, and he's presented a paper (behind a paywall) about how it works that got mentioned in dispatches from a hardware/software design synthesis conference. But I'll be surprised if the benefits are significant for existing software, if they don't need compilers and other tools that can understand how to create sequences of instructions that are easy to recognise or contain other hints to the hardware, and if testing and verifying the designs doesn't turn grown men into mewling kittens.
Worth keeping an eye on, though.