Why C wins: the cold realities of abstraction

Why C wins: the cold realities of abstraction

Summary: The sad reality underlying many of the handy abstractions we rely on every day is that they only work most of the time - and what that means is that the abstractions we choose to use define limits to the quality of our work.

TOPICS: Networking

Joel Spolsky's "The Law of Leaky Abstractions". is so well written I can't effectively abstract it - so here's the whole of his introduction:

There's a key piece of magic in the engineering of the Internet which you rely on every single day. It happens in the TCP protocol, one of the fundamental building blocks of the Internet.

TCP is a way to transmit data that is reliable. By this I mean: if you send a message over a network using TCP, it will arrive, and it won't be garbled or corrupted.

We use TCP for many things like fetching web pages and sending email. The reliability of TCP is why every exciting email from embezzling East Africans arrives in letter-perfect condition. O joy.

By comparison, there is another method of transmitting data called IP which is unreliable. Nobody promises that your data will arrive, and it might get messed up before it arrives. If you send a bunch of messages with IP, don't be surprised if only half of them arrive, and some of those are in a different order than the order in which they were sent, and some of them have been replaced by alternate messages, perhaps containing pictures of adorable baby orangutans, or more likely just a lot of unreadable garbage that looks like the subject line of Taiwanese spam.

Here's the magic part: TCP is built on top of IP. In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.

To illustrate why this is magic, consider the following morally equivalent, though somewhat ludicrous, scenario from the real world.

Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes. Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn't have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country. Hollywood Express works by checking that each actor arrives in perfect condition, and, if he doesn't, calling up the home office and requesting that the actor's identical twin be sent instead. If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn't even tell the movie directors in California what happened. To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.

That is, approximately, the magic of TCP. It is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers. As it turns out, a lot of computer programming consists of building abstractions. What is a string library? It's a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers. What is a file system? It's a way to pretend that a hard drive isn't really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes.

Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn't, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can't do anything about it and your message doesn't arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.

This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can't quite protect you from. This is but one example of what I've dubbed the Law of Leaky Abstractions:

All non-trivial abstractions, to some degree, are leaky.

Abstractions fail. Sometimes a little, sometimes a lot. There's leakage. Things go wrong. It happens all over the place when you have abstractions.

Back in 1954 a guy named Tom Godwin wrote a story under the title "The cold equations." The basic plot holds that a courier pilot on a one way mission to deliver desperately needed medical supplies has exactly enough fuel to land the vehicle's planned mass, discovers that the eighteen year sister of one of the people he's been sent to save has stowed away on board, and then has to eject her out the airlock to complete his mission and save, among others, her brother.

My copy of the story is in a collection edited by David Hartwell and Kathryn Cramer - here's part of their introduction to it:

[The Cold Equations] is one of the most popular and controversial hard sf stories of the last fifty years, a story that stacks the deck and then plays with the reader's emotions with carefully juxtaposed cliches that imply a deus ex machina - then frustrates that false expectation.

... Godwin's story angered many readers when it appeared in the fifties, nearly all of whom wanted the problem solved by violating some scientific principle or law. ...

The point of the story, of course, is that scientific law cannot be violated under any circumstances, and ignorance of scientific law can kill you, no matter how sincere you are.

Godwin's staging doesn't look reasonable today, but his science does and the basic lesson in the story is as clear and applicable now as it was then: the rules by which the Universe works are real, are ours to discover, and the only moral good lies in aligning our actions with the absolutes governing the universe - not with wishful thinking, social theorizing, or some imagined standard of compassion, but with the reality of what works, and what doesn't.

Magic doesn't - and the lesson in the story is that neither the author nor the audience can change that.

In Spolsky's context Godwin's equations are non leaky, non trivial abstractions - meaning that his law should be amended to read something like this:

The probability that an abstraction leaks varies directly with the number of people involved in creating it, and exponentially with the number of abstractions it subsumes.

In other words, the simpler and closer to reality the constructed abstraction is, the less likely it is to leak - or, more succinctly: genius programs in K&R C.

Topic: Networking

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Interesting.

    Many years ago, I built my own simple computers, and programmered them in machine code, then moved on to assembly language. One day, I heard something about this language called C, and ordered K&R. I read that book in one go, got a C compiler, and never looked back. Though I don't program in C much now, I still think C is vastly important language, and appreciate your POV.

    However, all (turing-complete) programming languages are abstract. Turing equivalence implies all these languages have the power of a turing machine - [b]nothing more, nothing less[/b] and that any one can be used to emulate/implement any other, i.e. it's possible to "compile" C to any other turing complete language. Machine languages are different in only one minor respect - we have built machines that are able to execute them directly. C's advantage is that it is closer to these machines than many other languages, but it has no inherent functional advantage because [i]anything[/i] I write is C can be converted to any other language (ok, library issues aside).

    The problem with being closer to the hardware, is that you are further away from many practical problems we wish to solve. If you are building an OS or system software, your problem domain is also not so far away, and hence C is the dominant langauge here.

    If you are building, for example, business applications, I would say that C represents an extremely leak abstraction over that domain, and other languages are often a better choice.

    So, I agree with your general idea, but not your conclusion. I would say the correct conclusion is that you should pick the best language for the job, and that isn't always C.
    • Agreed (NT)

    • This has nothing to do with C, & every CS student should already know this

      An abstraction allows you write code in a more human understandable and less verbose way. Abstractions are not limited to C++ or OOP or any other language or paradigm. In fact any good C code is going to be filled with abstractions and it will likely even have objects (which is not the only way to have abstractions). Of course objects in C are not as nice as objects in C++, they are usually structures and arrays that have a host of functions which perform operations on them.

      Regardless, every CS student should know about the 90/10 rule of code optimization.

      Abstractions are designed mostly for the 90% of the code which will be doing 10% of the work. When you optimize code you are suppose to profile it, find the bottle necks, and optimize that 10% that does 90% of the work.

      Because of the 90/10 rule of code optimization you do not need all of your abstractions to be optimal performers. However, the few that do prove to be bottle necks should have methods available to by pass the abstraction. It is often a good idea to leave back way to getting as close to the bits and bytes as needed to achieve your performance goals.

      Abstractions are not the enemy, in fact abstraction do a wonderful job of helping the developer (given that they understand the problem domain) think logically about a problem and see the full scope of the problem. When a programmer step back and look at the process as a whole they are more likely to see opportunities to enhance the algorithm is which have far more performance benefits than a micro optimization will. As a result some OOP will code actually perform better than the procedural version.

      Regardless, when the concern is perform you should not throw out all of the benefits of abstractions for all of your code when the bottle neck is only in 10% of your code.

      Abstractions allow problems to be solved on a higher level of thinking, they help enforce code patterns and rules (private methods, public methods..), and they establish relationships between parts of the code that will tell programmers more about how a system works than trying to trace execution paths will.
      • This is truth....

        And the performance difference between C and C++ is nonexistant with a good compiler.
        • Furthermore...

          The performance difference between C/C++ and JIT'd Java or .Net is negligable in much real-world code.

          For many (I would say most) IT projects, delivering a timely, correct and reliable solution should be a far more important priority than performance.
      • About code maintenance as well

        Abstraction helps to enforce interaction paths and therefore minimise possible fault vectors.

        Minimising fault vectors helps make code more maintainable because:

        a. less likely to create unwanted side efeects

        b. code reviewing for troubleshooting can be more focussed.
    • Correct.

      The trouble many people have here is that they don't realize that there are multiple abstractions going on simultaneously. You're abstracting the hardware, sure, but (especially in applications programming) you're abstracting the business model. The language that "wins" is the one that provides the best compromise between the various abstractions. That's why we no longer code applications in machine code for specific hardware (which is the logical conclusion if we consistently apply the reasoning in the OP).

      Many languages offer the means to produce quality craftmanship: Genius is not a one-trick pony.
    • Three things, in order, closest first ...


      1) The human abstraction.
      2) The language abstraction.
      3) The operational abstraction.

      3 is the one that counts. 1 & 2 have not delivered if 3 is a crock.

      Something that always used to be overlooked in 3 was the "state" of what was currently happening, ie, if Firefox dies on me right now, restarting it with "restore previous session" will have these words in these form fields. That's a positive example.

      I argue that web apps, stateless excepting a session identifier, operate in a context that naturally makes that better and easier. We're finally moving away from the original clunky world of GUI's (version 0.01).
      • Web apps are clunky.

        Web apps? You're joking, right?

        They're a solution looking for a problem.

        You have to learn several languages to make it all work. A markup language, a scripting language, and usually a relational database language. None of which were ever designed for use in a real application.

        The markup language was originally designed to put hyperlinks into documents and make them look pretty, and not much more.

        The scripting language was originally a toy to make the markup language a bit more "dynamic" in some unspecified way.

        The relational database language was designed for business use, and was never really meant for consumer applications.

        If you wanna have fun, toss in a styling language to make the markup language look even prettier or a OOP language that decided that "purity" was more important than usability.

        It's all a complex mess of many languages, to be honest. None of which were originally intended to be used in the way they are used today. This is more clunky than any language I've used to create GUIs with.

        The only reason we tolerate it is because it works on so many platforms and doesn't require the user to download and install extra software.
        • Clunky web apps are currently being built, yes.

          Javascript is getting easier and better. DOM libraries abound these days.

          Data has to be stored somewhere. On a DBMS or on a filesystem are usually pretty good choices.

          The way it's generally going is interaction via messages to the backend, and this in turn improves the underlying message based protocols as they mature (eg render HTML or JSON, backend or frontend heavy?). This is handy as we move towards something other than x86 computing.

          The DOM stuff, well it's nice to have a relatively free hand on the design and look and feel. Don't you think that web apps that closely emulate desktop apps are plain boring?
          • re: Clunky web apps are currently being built, yes.

            "Javascript is getting easier and better."

            JavaScript was never really designed for large applications. Getting "easier and better" won't fix fundamental flaws with the language.

            "DOM libraries abound these days."

            DOM is great for pages, but not so much applications.

            "This is handy as we move towards something other than x86 computing."

            . . . except we're [b]not[/b] moving towards something other than x86 computing, and languages like C++ and Java have been designed to work on a variety of architectures for quite some time now.

            "The DOM stuff, well it's nice to have a relatively free hand on the design and look and feel."

            So why use a markup language which was never really designed for that purpose?

            "Don't you think that web apps that closely emulate desktop apps are plain boring?"

            Well, for some odd reason people seem to think that web apps that closely emulate desktop apps are the future.

            Yes, I think they are boring and we shouldn't be pushing to put everything in "the cloud" just because we can. There's no go reason to put everything in "the cloud" and start getting rid of desktop apps.
      • Correction...

        We've long since moved away from that model, you're just catching ; )
  • Using the same logic

    The more we abstract away from the real life models we are trying to mimic in the computer, the more chance there will be for leaking. Going back to C is not the solution because we create a massive chasm between the subject matter and the tools to model it in the computer. It is this leakyness that has brought us down the road to abstraction on the computer. While I don't disagree that there is a price to pay for this abstraction, there certainly is a cost in time to market and relevance when we do not abstract on the computer. From the article though, it seems that most of the problems are related to systems not fully commiting to one abstraction mechanism, which in turn puts us in the situation of having to work with the various implementations of the data.

    PS: Going back to C would be as sensible as going back to assembly code because assembly is less abstracted from the hardware.
    • Points of inflection

      My guess is that if we had a valid means of quantifying "abstractiveness" versus effectiveness we'd find a curve with clear points of inflection - and C on one of those.
  • What about maintainability??

    C is a good powerfull language. But a good C based program is only good for as long as the original depeloper is working on it.

    Where C lags behind is in maintainability and reusability. Even code written by the best C developers end up spaghetti code in the final stage. With a lot of C&P, quick hacks, the final product usually becomes a maintenance nightmare.

    And lets not talk about code count .... C programs usually have 5 to 10 times more lines of code than the same program, with the exact same functionality written in C++ or other OO languages.

    Don't give me wrong ... the quality of the code is always directly proportional to the quality of the developer. Bad spaghetti code can be written in any language or script (can you say Perl). But the problem is more predominant in functional languages like C.
    • Maintainability is more a programmer skill than a language feature

      I've seen some very maintainable software written in C and some very unmaintainable code written in C++ and Java. Modules and libraries can be written in C that implement abstractions that are as useful as those written in OO languages. Overuse (ore misuse) of abstraction often makes code difficult to understand and debug. Some abstraction layers over databases I've seen are often incomprehensible to someone who's familiar with SQL. Abstractions need to be intuitive to their users in order to be maintainable.
      • "Intuitive" is a myth

        There is no "intuitive" everything is learned. What would make more sense is "understood by their users in order..."
      • Intuitiveness and relevance.

        That's what can make a whole project good or bad.

        And there's no point having the right idea at the wrong time.

        Programmers should communicate socially more and technically less.
      • Abstraction depends upon architects and designers

        Proper abstraction is in the realm of the architect and chief designers designing the object hierarchy (regardless of the languages used).

        A poorly built and implemented hierarchy will result in more redevelopment because more has to be changed to accommodate design changed. How many have had to keep putting bandaid fixes in place because marketing sold the prototypes as product rather than a sample that had to be re-engineered for maintainability, scalability and change (most of which will happen during development)?
  • POSIX itself (surprised it was not named SOSIX) is an abstraction. <nt>