APL and parallelism

The Niagara CPU, for example, is expected to run 32 concurrent threads on eightcores at about 1.4Ghz - while using less than 64 watts at maximum sustainedthroughput on all cores.

Quite a number of otherwise respectable people seem to be believe that Sun's throughput computing initiative will never produce a CPU appropriate to workstation or laptop uses. Their basic argument is that the first generation Niagara CPUs are weak on floating point and that all of them trade off relatively low megahertz for the ability to do lots of stuff in parallel.

The Niagara CPU, for example, is expected to run 32 concurrent threads on eight cores at about 1.4Ghz - while using less than 64 watts at maximum sustained throughput on all cores.

This would, of course, make it a great choice for laptops and workstations - if the software made sufficiently effective use of the multi-thread execution capabilities that 1.4 x 32 (or 1.4 x 16 for "rejects" used in laptops) machines would yield a considerable improvement over the x86 and PowerPC competition.

Think about this a bit and you'll see that it boils down to one issue: are there really relatively common workstation tasks for which no significant parallization can reasonably be achieved?

I've come to believe, in looking at this, that the answer is no: there aren't any of significance. In other words, that 8 cores running at 1.4Ghz should be equal to or greater than 1 core running at 11Ghz - whether or not further multi-threading is applicable.

The obvious counter example, of course, is compilation with its total insistence on sequence - but I think that's more a joint artifact of how today's compilers are developed and how people think about programming, than anything intrinsic to the problem.

Consider A Programming Language (APL) as a counter-example. APL was developed by Ken Iverson for use as a computer interpretable mathematical symbology. Learn to think in APL and you'll find yourself naturally using a lot of non scalar arrays -something most people will instantly realize lends itself nicely to parallel execution.

More subtly, however, an APL interpreter doesn't actually have to do interpretation in the literal sense of compile-on-the-fly, execute, and return. Instead it can be implemented as a shared memory application functioning like any other Unix daemon to accept input, process it, and return the result. Look at it in that light and it should be immediately obvious that even non-looping APL programs with scalar or character arguments could make very effective use of Niagara style on chip SMP - effectively making an eight way, 32 thread, machine feel like a 44Ghz UltraSPARC II to a user.

If that's right, and I'm pretty sure it is, then there's no fundamental reason to believe that a C pre-processor couldn't be developed to convert traditional C and C++ code to a chunked format appropriate to an APL like "C-Daemon" - thereby demonstrating that 32 x 8 x 1.4 really is always a lot more than ten times 3. dot something and incidentally establishing a whole new programming paradigm.