Thinking about multicore and software

Thinking about multicore and software

Summary: Some comments by Donald Knuth are more balanced than others - and his views on programming issues associated with multi-core hardware exemplify this.

SHARE: recently ran a fascinating Andrew Binstock interview with Donald Knuth touching on some of the same topics we discuss here - programming methods, the value of open source, and the links between programming and hardware change.

I want to get back to some of his ideas about programming languages next week, but look at his views on problem and programmer adaptability to the emerging multi-core world today.

Here's the key exchange on this:

Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?

Donald: I don't want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they're trying to pass the blame for the future demise of Moore's Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won't be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific - until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I've written well over a thousand programs, many of which have substantial size. I can't think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]

How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I'm wrong.

I know that important applications for parallelism exist - rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)

The machine I use today has dual processors. I get to use them both only when I'm running two independent jobs at the same time; that's nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn't be any better off, considering the kind of work I do - even though I'm using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it's a pipe dream. (No - that's the wrong metaphor! "Pipelines" actually work for me, but threads don't. Maybe the word I want is "bubble.")

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I've been talking about my technical work, however, not recreation. I also admit that I haven't got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they've begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me most - at the cost of incompatibility with legacy x86 programs.)

Notice that he's talking about personal use here -he does most of his work on a dual core, x86, laptop running Linux (Ubantu) - and in that context he's obviously right: he'd be better off if it were possible to trade that second core for even 25% fewer wait states on the other one.

What's important to note, however, is that his comments only partially apply to Sun's CMT architecture and don't apply at all to environments in which many users share the same machine for broadly similar tasks.

The issue with personal tasks is simple: if the user has to wait for the computer, then the system is too slow -and because there are many tasks which we think of as inherently sequential and for which we don't know how to write effective parallel processing code, many of today's multi-core computers effectively deliver only single core performance and are therefore slower than we're paying for.

In Knuth's case a TeX compile for a long chapter uses one processor, and that processor is almost completely memory bound for most of the time - in fact a 2.4Ghz Intel Centrino dual core running Linux will do the job on one core that spends upwards of 80% of its time waiting for memory.

In contrast a 2Ghz Sun "Rock" chip with hardware scout enabled and not much else to do will complete the same task at very nearly clock speed - meaning in about one fourth the Centrino's elapsed time and entirely without changing a line of code.

The current T2 CPUs do not have hardware scout and the attempt to build equivalent software into the compilers has not delivered a working product; but the T2 will run the code, absent competing work load, at the maximum rate allowed by memory limitations - and because it has greater aggregate bandwidth but the same read time bottleneck, I'd expect it to take just about as long as the Centrino.

(Obviously this depends on the task sequence and scale. The Centrino will be faster for very small jobs, the T2 faster for extremely large ones - and both will do better the second time a job is run in quick succession than the first time. I don't have TeX here but you can, for example, cheat the clock a bit on a mid size text processing job by pre-loading the document:

% time nroff -me f > f1
7.0u 0.0s 0:07 95% 0+0k 0+0io 0pf+0w
% sleep 300;
% ps -e | wc
324 1391 9956
% wc f &; !t
wc f ; time nroff -me f > f1
53991 416489 2512805 f
6.0u 0.0s 0:06 93% 0+0k 0+0io 0pf+0w

Not, mind you, that this kind of thing is useful in the real world.)

Notice, however, that this comparison assumes that the code hasn't been modified for the T2. In reality, people who bought T2 workstations would probably modify the code to suit - more Knuth:

Andrew Binstock: You are one of the fathers of the open-source revolution, even if you aren't widely heralded as such. You previously have stated that you released TeX as open source because of the problem of proprietary implementations at the time, and to invite corrections to the code -both of which are key drivers for open-source projects today. Have you been surprised by the success of open source since that time?

Donald Knuth: The success of open source code is perhaps the only thing in the computer field that hasn't surprised me during the past several decades. But it still hasn't reached its full potential; I believe that open-source programs will begin to be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise to improve the code.

For example, open-source code can produce thousands of binaries, tuned perfectly to the configurations of individual users, whereas commercial software usually will exist in only a few versions. A generic binary executable file must include things like inefficient "sync" instructions that are totally inappropriate for many installations; such wastage goes away when the source code is highly configurable. This should be a huge win for open source.

In other words, a key benefit of open source is better hardware adaptability as people delete the overhead associated with making the code run on the generic target in favor of customization for their specific machines. In the CMT case, for example, the standard SPARC/Solaris binary would work, but you would normally reduce the executable in both size and run-time by compiling it specifically for the UltraSPARC T2 -and then go on from there to add whatever specific optimizations make sense for your workload.

So here's a bet: for the kind of coding, compiling, and text processing workload Knuth ascribes to himself, a T2 based workstation would provide broadly similar performance to what he has if he makes no code changes, and outperform the Centrino by up to five times on key jobs if he does.

The multi-user case is simpler. Tim Bray's "Widefinder" project uses a simple sysadmin job: finding the top ten downloads according to an Apache log file, to illustrate the limitations of CMT style multi-threading - arguing, incorrectly I believe, that the apparently serial nature of jobs like this means something in terms of the difficulty of switching to a heavily multi-processing, multi-threading, environment like Solaris/CMT.

The reason I don't think it does has nothing to do with the task itself - which makes his point nicely - but everything to do with how CMT hardware is used. Thus in his illustration having 64 concurrent threads available doesn't much affect job execution time - but in the real world it does because the CMT/coolthreads architecture allows a form of load once, run many times approach to sysadmin work just like this that other approaches to multi-core performance enhancement can't match.

Specifically what happens in the real world of CMT usage is that the first such report provided to a key user spawns requests for many more - and when the count gets past two or three the CMT/Perl strategy of loading the data once and then forking off many report threads to run in parallel, gets the job package done significantly sooner than the traditional multicore strategies of serial or parallel execution - and the more custom reports users demand from the same data, the bigger the CMT advantage will get.

(Notice that CMT/Solaris is much more tightly integrated for multi-threading than a Linux/Multi-core x86 machine: Knuth could load Bray's records into memory once, and then fire off two processing routines, but he'd still pay a memory access penalty on the second process, still face cache contention, and, if his is an Intel paired core package instead of an AMD dual core, face additional wait states for memory access and processor switching.)

So what's the bottom line? I think Knuth is perfectly correct with respect to the case he describes - because you can't buy a T2 laptop and his work doesn't fit the x86 multi-core model very well; that his conclusions cannot be generalized to other CPU architectures and other workloads; and, that his comment about open source allowing people to create efficient, custom, binaries probably makes sense, has tremendous appeal, and ultimately contradicts the limitation of his comments about multi-core to the x86 world.

Topics: IT Employment, Hardware, Open Source, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Independent Jobs

    In the Enterprise, dual core makes sense for handling anti-
    virus, encryption, and other annoying security overhead
    functions. When you start going beyond two cores, I just
    don't get it.
  • RE: Thinking about multicore and software


    Provided a link to this blog on , which is a website for resources related to multicore processors.
    • Thanks! (NT)


    The MULTICS system was great because the OS and the hardware were tuned to work together. Sun's approach to virtualization - using a single instance of the OS in multiple "containers" vs. using multiple (separate) instances (everyone else), works well with the CMT type architecture. So the T2 is the continuation of Sun's philosophy - one of becoming less general use and more of "specialty" - where hardware margins can perpetuate a proprietary platform.

    The disastrous forking of UNIX has spilled into the forking of programming languages. Come out with new hardware, and "POOF" you need a new PL to take advantage of it (or a new compiler design like the Itanic). As with any fork, you take a critical mass of talent and split it into more sub-critical masses - so attaining that great *BLAST* of computing advancement becomes vanishingly small.

    What is happening here? Since it is very hard to lock people into proprietary hardware - how do you make those outrageous margins? You make proprietary PLs! M$'s impending "D" release is but another straw on the IT camel's back. Where do you find your talent pool? I can see it now - 1 day after "D"'s release, companies will be posting want ads for someone with 2 years of experience in it.

    This is why I founded PAPPL - People AGAINST the Proliferation of Programming Languages. Having a thousand ways to do something and then adding number 1001 - is not helping anybody. The PL scene today looks like the deregulated airline industry of the 80's - everyone and their brother started an airline. But in the real world, companies go bankrupt and disappear - unlike the PL world. Am I just tilting at windmills or do you also see a problem there?
    Roger Ramjet
    • People AGAINST the Proliferation of Programming Languages

      I have to at least chuckle at this, today there are fewer programming languages and variants than there were 25 years ago! It is the nature of things to try to do it better, with yet another programming language.

      There is a problem with many of the new programming languages, namely they do learn from the experiences and papers of the older research. If I had to choose one classic example of this it would be exception handling, we still see big arguments on whether execptions should be restartable, even though the definitave paper that showed that restartable exceptions do not work in the face of optimization was presented 30 years ago, and never has been sucessfully contradicted.
      • Fewer programming languages???

        Sorry chum, but that's like saying there are fewer words in the English language today than there were twenty years ago. It's just not true. Languages rarely die. 25 years ago you could count a couple dozen programming languages. Today there are almost that many that have been ".Net"ified and hundreds that haven't. When you start off your point on a clearly losing argument, the rest becomes moot.
        • You're right - and wrong too

          1) yes there are more.

          2) what he should said.. (I think) is something like this:

          fewer languages are used for the bulk of programming today than 20 years ago..

          i.e. decision standardization has set in.
          • Actually, there were more

            First I stated programming languages and variants.

            There were a heck of a lot of one company programming languages back 25 years ago. DEC used Bliss which came in 3 variants, Bliss/Vax, Bliss/11, and Bliss/10. DG had DGL (Data General Language), and no less than 3 variants of Pascal for programming system stuff. Just about every company out there had its own system programming language. And that does not count the extensions for common languages like Fortran, Cobol, etc.

            Then you have the various reseach languages, that got used in many cases for commercial work. These included Sail, Cedar, Mesa, Clu, ...

            This does not even count things like N variants of Modula, extension for object orientation for C (C++ and Objective C were only two of the many).

            Working in the industry from 1980 to 1987 I know I looked at code from over 40 different programming languages (or major variants) while doing my work. I doubt that most folks can come close these days.
          • Denying the numbers, agreeing about the experience

            I think there are several hundred programming environments, languages, or dialects available for any Unix (except MacOS X) now. It would be interesting to get real numbers - want to volunteer?

            Today, however, 99% of programmers never do serious work with more than two or three - so, sure back then we had to adapt to many, and now most people don't.
    • Windmills

      There really aren't that many languages in heavy commercial use...let's see...

      Visual Basic

      ...once you get past these you are slowly strolling down the long tail. Based on buzz you'd think Ruby or Python or Scala or whatever were going to be the next huge thing, and maybe they will, but they aren't yet.

      Everything else is a testing ground for the features that may make it up into a mainstream language, and every once-in-a-while the source of a successor. I think Java will probably be replaced, and C# will (or possibly already has) evolve so far beyond it's original conception that it isn't really C# anymore. Java could follow the C# route, but I don't think it will and hope it doesn't. A new language is better.
      Erik Engbrecht
  • Evidence

    "So here???s a bet: for the kind of coding, compiling, and text processing workload Knuth ascribes to himself, a T2 based workstation would provide broadly similar performance to what he has if he makes no code changes"

    Do you have any evidence to prove this? All of the evidence I've seen says this is completely wrong, and quite frankly based on the design of the T2 processor it should be wrong.

    "...and outperform the Centrino by up to five times on key jobs if he does."

    I have doubts there, those changes most likely represent a significant rewrite, and how much does a T2 cost relative to a Centrino?
    Erik Engbrecht
    • Answers

      1) Sun has pages of benchmarks; if you won't believe them - try the machine for yourself. They have a try and buy your employer might find attractive.

      But read my lips: there is a disadvantage relative to intel on small jobs, an advantage on big jobs, and non linear trade-offs in between.

      2) The actual cost of a T2 CPU, by itself, is about the same as the x86 top end from Intel. The computers cost embedding these things cost more - duh: higher standards, more bandwidth, smaller markets, etc etc.
  • Open source continues to be marginalized...

    ... on its way to being returned to hobbyists and a few technical uses. The economics leading to the rejection of open source are the same as those defining successful and unsuccessful hardware.

    Quoting the quoted article:

    "The success of open source code is perhaps the only thing in the computer field that hasn???t surprised me during the past several decades. But it still hasn???t reached its full potential; I believe that open-source programs will begin to be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise to improve the code."

    Of course, software is now a large industry, one of the most important in the US and worldwide, so capitalist impulses and consolidation made open source an inevitable failure. Academics have their eccentricities and blind spots.

    But how does software work? Proprietary code, inevitably, from a huge manufacturer. Then adapted to industries. Then adapted to specific organizations. Along with many companies issuing their protected code to supply niche functionality unavailable from the more broadly sold products.

    So his piece of the truth is that software does indeed work as a service industry supplied by a series of large companies. With small companies bought out for their good ideas where necessary. Even those founded as open source operations.

    Why? Standardization and tools. Easiest and cheapest to buy someone else's work. That that might not be true, or at least not as true as non-technical staff would prefer it to be, has very little relevance to IT buyers and sellers.

    Organizations devising their own software are probably decreasing net as the number outsourcing or replacing exceeds the number deciding to train and maintain a large staff.

    How is hardware relevant to this discussion? Ease and standardization.

    How many buyers are most concerned with obtaining the last measure of performance, especially considering price and cost of maintenance?

    If it comes from a reputable source (reputable source as distinguished from open source), is at an acceptable price, and requires little new investment, it has an edge. Improvements are a sales point, not a determinant of whether a sale will be made.

    The software companies can ignore the new hardware features or make use of them. Their business is as certain as that of the hardware makers.

    So I think that a hardware change will be used only when and as it makes software more saleable. If you want to know when any advantage from dual core will be used, check the projected investment in IT. Just as Apple had maintained an Intel project quietly for years, so dual core will be developed in the background to be called upon as needed when money is to be made.

    It's not about the hardware.
    Anton Philidor
    • Anton continues to shoot from the lip

      >Of course, software is now a large industry, one of the most >important in the US and worldwide, so capitalist impulses and >consolidation made open source an inevitable failure

      I think your pronouncement of the failure of open source is a little premature.

      How does all this proprietary code get adapted? Since only the company that owns it can modify it, the best anyone can do is ask for these adaptations. And if the market is too small, capitalism demands that you ignore these requests because it will cost more than you will ever recover from it.

      >So his piece of the truth is that software does indeed work as >a service industry supplied by a series of large companies. >With small companies bought out for their good ideas where >necessary. Even those founded as open source operations.

      Yes, let's get rid of those nasty little companies that provide most of the innovation and jobs. While we're at it, let's lay off most of the people that were in the company.

      >If it comes from a reputable source (reputable source as >distinguished from open source)

      Now you're just being nasty. Personally I wouldn't give a dime for Microsoft's reputation.

      >How is hardware relevant to this discussion?

      It's not. It's just another anti open source rant from Anton.
      Hemlock Stones
      • Worse his economics are all wrong

        Anton, as he usually does, has to throw in the incorrect notion that free markets and capitalism automatically equal a proprietary model, specifically Microsoft's model.

        Neither is correct.

        This would deny that open source players like Red Hat, Ubuntu, Mandriva, Novell and are not capitalist and free market responses to both real and imagined shortcomings of Microsoft and other closed source suppliers to the market.

        His model of one size fits all reconfigured for smaller or specialty users of the software fails too on many fronts simply because it doesn't.

        He then ignores the fact that FLOSS people can and do take existing code which generalizes a particular task, whatever that may be and rewrite the parts necessary for specialist and small market use without the need of paying Microsoft or some of it's MVP partners their extremely high rates of pay to kludge something together.

        On one hand Anton accuses open source programmers and being amateurs and then accuses open source of devaluing the pay of "real" professional programmers. He can't demonstrate any of this, of course, execept as it relates to his peculiar world view.

        His final shot is, as usual, the assertion that open source's quality is so low that it can't be trusted in mission critical situations.

        This is false in so many ways. The Internet runs on open source software and no amount of pointing to gamed and overblown estimates of IIS usage at the end points is about to change that. Mars missions are running on open source. Microsoft is whining that NASA is ignoring it's wonderful Windows platform and we'll see how long a device based on Windows actually runs if NASA can get something built on Windows that will actually be worth launching.

        Strikes me that in many ways open source is considered reputable, particularly Linux and BSD, on a par with closed source, Windows, and in many cases the preferred and more reputable source.

        Hardware is relevant to the discussion even though, in Anton's world view, it can't be till Microsoft starts to control it. Until then it simply cannot exist, according to Anton.

        Reality is that both capitalism and free markets are speaking, and loudly, in the choices made to deploy open source widely even to the point of choosing it over, say, Microsoft.

        What Anton misses is that monopoly does not equal capitalism though it does resemble mercantilism in far too many ways.

        Monopoly is a capitalist and free market abberation and failure. Temporary though it always is.

        It's just another ignorant anti open source rant from Anton, to be precise.


  • Multi-core Processors

    I have a Q6600 processor with 4 gig of ram in a one year old computer, and occasionally see all 4 cores running near 100% Very frequently, the sum of all the core activity is 100% or more. When you have things like AVG anti virus running along with Diskeeper and the other general house keeping chores that Windows Vista performs, you can be thankful for the extra cores. I have had V8 cars where I only needed the extra power that they provided on an intermittent basis, but was sure glad that it was around when needed. I will be ready to purchase an 8 or 16 core processor when it arrives.
  • Links?

    All of the Sun benchmarks I've seen deal with multiuser server-style computing. I've also seen some scientific computing ones as well.

    In both cases they were already-parallel code. I don't doubt the T2 rocks at execution of fairly concurrent software.

    I think it sucks at CPU-bound serial-execution software. That's my experience with it.

    This also the sweet-spot of the x86.

    You need parallel execution in order to make the T2 sing, and most common programming paradigms are abysmal for doing parallel execution.
    Erik Engbrecht
  • Multi-core and DBMS

    I'm a ME student at UT in Austin, and pretty green in the multi-core field, but I'm doing some research in the advantages of multi-core processing in the control system world. Isn't DBMS another important application for parallelism?
    • Database and multi-cores

      Actually, database operations do not work very well with multi-cores because of data synchronization problems when multiple operations are occurring on the same data. So they tend to fall into the highly data interdependent class of operations.
      Hemlock Stones
      • it depends...

        The problem isn't many operations on the same data, the problem is write operations. Algorithmically many query operations can be parallelized, although in practice you can hit problems with them being IO bound.

        Databases tend to have a lot of potential synchronization issues but not that many actual ones, so converstative approaches lead to locks everywhere but those locks aren't in practice needed.

        So in practice you user techniques like MVCC at the database level and optimistic locking at the application level, which alleviate the issues significantly.
        Erik Engbrecht