Apple's 'goto fail' tells us nothing good about Cupertino's software delivery process

Apple's 'goto fail' tells us nothing good about Cupertino's software delivery process

Summary: The fact that Apple's infamous SSL validation bug actually got out into the real world is pretty terrifying.

SHARE:
TOPICS: Security
98

There's been a lot of press about Apple's "goto fail." To catch you up, this was a bug that caused the proper validation of SSL certificates OS X and iOS to fail.

The result was that it was easier to trick any iOS and Mac devices to accept invalid certificates, and to present them as valid to the user.

Most of the analysis focused on the use of C's goto statement, particularly around whether its being used at all constituted bad practice and, by extension, and whether the bug was somehow linked to the usage of goto.

The fact that this code made it into production at all is a shocking indictment of Apple's engineering team. Forget goto -- who cares about how the developers put their code together? The issue here is that the problem was not found, and resolved, before the code got out of the door.

Practice

Here's the actual code in question:

SSL validation setup failure

If you're not used to reading code, what's happening here is that this code starts with a certificate that needs checking. The line that does the actual magic is sslRawVerify -- this is the method that actually checks the certificate. The part with the error in it is just setting things up to make that call.

The problem with the code is that the erroneous second goto stops sslRawVerify from ever being called. The function returns without doing this work, simply returning an optimistic "everything's OK!" back to the caller.

The goto statement is quite old-fashioned, and hasn't been regarded as a "smart" thing for developers to use for a good long while. The problem is that it allows developers to create "flow" in code that is unnatural. Most programming constructs are clear to read -- they tend to go in one direction, branching when decisions need to be made, or looping around when sets of data need to be worked on. Working in defined structures in that way allows developers to glance at code and understand its meaning.

Goto allows the program to fly around all over the place, resulting in missed subtleties. This makes code hard to read.

Reviewing

There is nothing unclear about the code that contains the bug. There are no surprises, and it's easy to read. Specifically, the developer's intention comes through loud and clear. goto may be old-fashioned, but it's not inappropriate in this setting.

The second goto fail obviously should not be there. Either the line has been duplicated by accident, there was a if check above it that was deleted but left in by the developer by accident, or there was a problem merging the code written by two developers.

(Merging is a little complicated for non-programmers to understand. From time to time, two developers will work on the same code in individual copies of the master code. When two copies of the same code are sent back to that master code, the source control system that manages everything has to make decisions about how to combine the two developers' work. This process can go wrong.)

Developers always make mistakes. They always create bugs. Any software organisation understands this and knows it's to be expected. You can judge software organisations by a) how good they are writing code that's easy to maintain, and b) how good they are at stopping bugs before they go into production.

My ZDNet colleague Stilgherrian flagged one part of the process that failed here, namely that of code reviews. In code reviews, a second developer looks at the code to try and find issues with it. An issue can be an error or just an area in the code that needs refining. 

Code reviews are tricky to call in this situation. They can't cover 100% of the code base, so it's understandable that no one developer managed to spot this bug created by another developer. 

Also, this code is open source. Apparently, no one outside Cupertino using it spotted the issue either. So maybe we shouldn't give the Apple developers a hard time for that. We can, however, give them a hard time for something much worse.

Automation

The real fail in all this comes from the fact that the bug was not detected by any automated test suites.

Bugs in software are found by testing. Testing happens in two ways -- someone sits down and tries the software looking for faults (manual testing), or separate software is produced that tests the software (automated testing).

Competence at automated testing is an indicator of sophistication of the software development team and its sponsors -- the better you are at developing software, the more likely you are to use automated testing.

However, automated testing is a complex proposition--it's expensive and difficult. For many, automated testing is optional, with good enough results being achieved from manual testing.

But there are certain classes and types of software where "good enough" cannot be achieved with manual testing. Validation of SSL certificates falls into both camps. It's a core part of security infrastructure, and it's baked into the operating system.

This software needed automated testing because it would be too difficult to do manually. Since testing software requires the rejection of invalid certificates, you need a bunch of them. Practically the only way to do that is to use automated testing where invalid certificates are included with the suite.

Conclusion

The problem here is not whether goto was used or not. The problem isn't even whether the code was reviewed by another developer. The problem is that somehow Apple was able to put that code into production. Why on earth was that not caught by an automated test suite?

And, as my Twitter friend Explainoit said:

Screen Shot 2014-03-18 at 20.01.18
Indeed.

One way to read that, however, is that the lack of literal s*** falling on Cupertino implies that no one uses the Apple kit for anything important.

This bug ended up just being embarrassing. Can you imagine if this bug had been found in Windows and server rooms and enterprises all over the land had to be emergency patched with the fix SSL?

What do you think? Post a comment, or talk to me on Twitter: @mbrit.

Topic: Security

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

98 comments
Log in or register to join the discussion
  • Yet more ZDnet Anti-Apple Rhetoric....

    Much has been made of the SSL/TLS bug which occurred in OS X 10.9.1

    Although of significant concern at the time which also affected iOS both platforms have been fixed. Mavericks took a little longer as the fix was incorporated in to the OS X 10.9.2 update which made perfect sense to do so.

    What is overlooked is the SSL/TLS threat was not just present on Apple platforms as Red Hat reported the same on GNU/Linux.

    However the GNU/Linux threat was the subject of just one article and not flogged to death as in the case of Apple.

    So the question that needs to be asked is. Are ZDnet being paid with incentive to give Apple negative publicity and cannot be relied upon for impartiality.
    5735guy
    • you should wake up, iFan of iToys

      "
      Most Mobile Banking iPhone Apps Are Full of Security Flaws.
      iPhone is most vulnerable, least secure smartphone in the market, security firm finds.
      iPhone Security Flaw Can Let Apps Act as Keyloggers = Everyone knows what you type.
      Apple iOS Apps Leak More Personal Info Than Android.
      40% of iOS popular apps invade your privacy without any permission.
      "
      anywherehome
      • An Apple a day keeps security away

        credits to Dhiraj M.
        Uralbas
      • Apple

        Large and celebrities companies enjoy the benefits of publicity. Don't forget that publicity is a two sided sword. It comes with good and bad.
        jsargent
    • GnuTLS is not even close to the same problem

      Apple fanboys like to think Linux had the same problem but it's a different issue with GnuTLS.
      GnuTLS flaw requires a valid version 1 cert to even have a chance at cracking it with the error condition. It's like saying you need access from inside in order to pick a doorlock.
      Apple's goto fail is like having a doorlock which can be turned from the outside with ANY key or even a screwdriver. If any crook tried that doorlock in the last 18 months, they would have got thru with ease.
      warboat
    • Are we a little sensitive?

      The author is exactly right. If this had happened to Microsoft, it would have been front page on every tech journal. Apple gets a nearly free pass.
      larsonjs
      • What?

        It was Apple and on the front page every where.
        DannyO_0x98
      • If it had?

        My my... It has ALREADY happened!!! Nobody remembers the CN validaton bug? The MS CryptoAPI had a nasty bug which remained unnoticed for YEARS. It was more obscure but it still allowed to do nasty stuff. Getting a cert for say *\0.mydomain.com was all you needed to pose as any site. Because the buggy MS code would validate this as * instead of noting the CN length mismatch, easy to notice thanks to ASN.1 but noooooo, they never checked that...
        danixdefcon5
      • MS has BIG Karma

        Real BBBBBIIIIGGGGG debts to pay. It has good things but a BBBIIIIIIIGGER Karma to pay.
        orendon
    • Um, no Red Hat and co. did not report the same thing

      The flaws were quite different, and make no mistake - the SecureTransport bug in the Apple code is MUCH much more serious.

      I do hope Apple is reviewing their process. For one, they should insist that all if calls be done with curly braces - that would have made one unreachable line of code a minor style problem, rather than a colossal screw up. If I were the dev head, I'd be rapping knuckles of anyone who chooses two characters of terseness over the safety of braces.

      The second thing I'd do is modify their inhouse GCC to spit out "unreachable code" warnings, and then ensure the build system does not emit code with build warnings unless it has been personally certified that it can go out with warnings by me.

      This was a big mess, however that happens in this business, there's no avoiding it from time to time. BUT! You had better learn from it when it does. Here is hoping Apple did.
      Mac_PC_FenceSitter
      • GCC

        I think they have gone completely for clang and llvm.

        Activating the unreachable code warning seems good, but in practicality with an os code base that goes back to the late 80s, it could be that some bugs were patched around by making the bad code unreachable. Maybe not, I wouldn't presume.

        I also think there's a significant probability that goto fail was there by an act of malice. It looks like a merge error, but why does that happen in such a critical point?
        DannyO_0x98
    • Rhetoric?

      Seriously?
      Microsoft compilers from 10 years ago would have detected this unreachable code.
      Apple security culture and science is weak. Fortunately, like the article mentions indirectly, their OS is not a significant part of corporate infrastructures.
      TheCyberKnight
      • Compiler checks

        Not just compilers Microsoft uses, but any ANSI standard compiler would have thrown an error about unreachable code.
        Jim__J
    • Please, you're just a caricature of the "Pro-Apple Enthusiast"

      but that's just my opinion.

      Of course you "see" this as no big deal.
      William.Farrel
    • No, he's right

      The double goto is something that should have been caught easily. Someone at Apple should be very embarassed.
      John L. Ries
      • But the open source community gets a

        pass?
        baggins_z
        • Is there an incident you care to cite?

          Nobody should get a pass. Corporate execs are responsible for what their employees do and don't do on the job. Leaders of open source projects are responsible for the code they accept. And everyone is responsible for the code he writes or maintains.
          John L. Ries
    • Classic misdirection

      Rather than acknowledging the gravity of the offense, simply attack the source. Conspiracy theory alert! ZDNet has it in for Apple. Now shall I refer you to any of 12gazillion ZDNet articles regarding Android or Windows in which the comments overwhelmingly complain that ZDNet is too pro-Apple or too anti-Android.

      Nice try though.
      dougpierson@...
    • GNU/Linux

      GNU/Linux = Free whereas Apple = Paid. If this had been Windows then the same attention would have been given to it because they are paid. Still, who knows who else has similar code? I'm just glad that Apple does make code for Aircraft systems. Although they might end up in your car in a little while.
      jsargent
  • Unbelievable ...

    that someone, even a fool, is dumb enough to even try to defend Apple on this one.

    Still, back to the subject at hand. No I don't think the implication (behind the tweet) is that no-one is using Apple for anything important. Surely, even a regular person checking their bank account balance is doing something very important to them?

    No, I think the implication is that blinkered fanbois (see 5735guy above) get so weak at the knees in the face of Apple's logo, that the 'Halo Effect' means they won't see a massive fail (pun intended) even when it's staring them in the face.

    The OP is right. Had that been Google, or Microsoft, or whoever else - there literally would have been public executions about it. But because it's Apple the fanbois will label it "Anti-Apple Rhetoric" and pretend it's no big deal -.-
    5hagg1