Transactional memory and Solaris

When Sun gets its "Rock" CMT/SMP machines out the door they'll be able to greatly reduce lock use in the kernel - producing systems that scale like grids but retain SMP efficiencies.

Sun's forthcoming "Rock" chip has hardware support for two features not found in other widely used CPUs: thread scouting and transactional memory.

The hardware scout component isn't a technology I understand - but its consequences are simple and obvious: it eliminates the effect of most wait states to significantly speed single threads.

As far as I know hardware scout has relatively few direct implications for Solaris beyond compiler optimization opportunities because the system is already highly responsive to cache changes and capable of dealing with fairly arbitrary amounts of memory.

Transactional memory is similar in that the chip level technology is more than a bit opaque to non specialists, but quite different in the complexity and obviousness of its consequences.

Here's how Rock's primary designers, Marc Tremblay and Shailender Chaudhry, described transactional memory at the recent ISSCC:

Transactional memory is the ability to perform a set of instructions atomically (a transaction). New instructions and microarchitectural structures enable the execution of transactions without expensive atomic instructions. This enables multiple threads to enter a critical section simultaneously and allows programmers to write applications based on transactions as opposed to complex locks.

Both Linux and Solaris use a lot of locks - Solaris, in particular, relies on very fine grained locking to achieve nearly linear performance scaling on the SPARC processors it's designed for. (In fact, Solaris locks were a critical design consideration for the original UltraSPARCs, with three instructions added to the V9 specification just to minimize lock entry overheads.)

To see how important locking is on Solaris 10 try something as simple as:

rt % lockstat -A 'hello world' |& grep even

Which, on an idle Sun 150 produces:

Adaptive mutex hold: 36920 events in 0.727 seconds (50807 events/sec)
Spin lock hold: 1413 events in 0.727 seconds (1944 events/sec)
R/W writer hold: 852 events in 0.727 seconds (1172 events/sec)
R/W reader hold: 2408 events in 0.727 seconds (3314 events/sec)

Look at the total on a busy SMP system like an overloaded Sun 6900 walking an Oracle transaction database with too many users, and you'll see lock numbers easily hitting millions per second - and right now each one of these is handled by a C language process queuing up the lock access, an assembler routine querying lock status, and a C handler if that status fails (i.e. the resource is in use).

The most obvious implication here is that Rock and its successors will allow Solaris kernel developers to make most these lock processes go away - and for applications that will initially mean simple re-compiles to take advantage of new libraries but in the longer term spark new designs eliminating many of the cycle absorbing complexities of present day multi-threading.

In my opinion, however, three longer term implications are far more interesting: first, it will probably resurrect the C/Fortran pair as widely, but oppositely, prefered programming languages.

Second, Rock should mean that the next Solaris generation will scale the way grids do - but without the computational inefficiencies, power use, space requirements, and software management overheads that go with grids.

And, third, x86 is at least five years behind Sun in this stuff - so x86 OSes, including Windows, the BSDs, Solaris/x86, and Linux, will fall increasingly further behind on most cost/performance measures.