Last week's mini-Y2K: What went wrong?

On 7 January, two events shook the Net: these were just the latest Y2K-type glitches, and they demonstrated just how fragile the Internet still is

The Y2K Bug was so-called, of course, because it triggered on a date. What made it so dangerous was that it was in very old code -- written when the trigger date was so far in the future that it couldn't possibly be a risk.

Last week saw two other date-sensitive events that seemed to cause widespread disruption. Among the problems: users of a Microsoft Navision Axapta ERP system saw response times soar; online banks in Singapore reportedly went offline, or at least refused to do any banking; some Java applications crashed, Norton AntiVirus had a breakdown; and at least one person thought he had been fired by email.

The problems stemmed from VeriSign's certificate business. Certificates are arguably as crucial to the sound working of the Internet as are notions of date and time. Without certificates we would have no way of knowing that the site we are using is secure, or even its true identity.

That little yellow padlock that appears at the bottom of your browser every time you access a secure part of a Web site hides a wealth of information, and in particular the certificate path. Indeed, the only way we can trust the certificate itself is by knowing its genealogy. Who issued it? How do we know that the issuer is trustworthy? Who issued the certificate that says the issuer itself can be trusted?

These are important questions, and we rely on the answers for certainty and peace of mind that the applications and Web sites we are using can be trusted. Suppose, for instance, you use antivirus software which every so often downloads a new batch of virus signatures. You want to be absolutely confident that these virus signatures really do come from the antivirus software publishing company which they purport to come from, and just as sure that they have not been tampered with on their way to you. Certificates hold the answer.

Last week we saw what can happen when that chain of trust, from the certificate issuing authority, through to the software publisher (or Web site) and over the Internet to our servers and PCs, breaks down.

On Wednesday morning, when Symantec's Norton AntiVirus product -- installed on thousands if not millions of PCs -- trundled off across the Internet to pick up the latest load of virus signatures, it came back behaving in a distinctly odd manner. Users reported instances of their PCs locking up or slowing down so much as to be unusable; Symantec itself said that Microsoft Word and Excel were refusing to start.

Elsewhere on the Internet at about the same time, users began noticing other strange behaviour.

"I had my Outlook crash for no good reason," wrote one ZDNet UK reader. "There was a cryptic message saying that my credentials were no longer valid, which is pretty scary in a corporate environment. You don't know if you have a virus or are being laid off!" To add insult to injury, this correspondent found himself locked out of his company's Microsoft Navision ERP system, and unable to log in to his online bank. Other correspondents found their Java applications simply failed, and yet more found themselves presented with odd error messages when trying to access secure areas of their own corporate Web sites or other e-commerce sites.

There seems to have been two distinct and separate problems, both of which were date-related, and both of which should have been flagged up a whole lot more prominently than they were. And both of which transpired on 7 January 2004.

VeriSign, which according to its own figures issues some 25 percent of digital certificates in Europe and indeed a good number worldwide, holds the key to both problems.

The Norton AntiVirus problem was caused, says VeriSign, by the expiration of a certificate revocation list (CRL) called Class3SoftwarePublishers.crl, on 7 January. As applications -- including Norton AntiVirus -- attempted to check the list so they could verify that the certificates they were checking were still valid, they got little help, and so tried again. The effect of all those copies of antivirus software -- and other applications that we didn't hear about -- repeatedly checking whether the certificates were valid, was to increase traffic to VeriSign's CRL server one-hundredfold. In effect, VeriSign suffered from a self-inflicted denial of service attack.

It is arguable that had Norton AntiVirus been designed better, this denial of service attack on VeriSign's servers would not then have backfired and stalled PCs across the world, but there is not space to get into that debate here.

So anyway, onto the second problem, which was again caused by an expiration -- this time of one of VeriSign's own root certificates (also known as root certificate authorities). Root certificates are the parents of those certificates used to sign secure Web sites and other code. Normally this all works fine, but if a root certificate expires then, in effect, so do all its children. The result is that any code requiring a certificate to run or to prove its authenticity, cannot do so if the parent of the code's certificate has lapsed.

At first glance, the idea of a VeriSign root certificate expiring sounds laughable -- comparable even to Microsoft letting its .com domain name lapse -- but in this case VeriSign knew that the root certificate was to be pensioned off. The company says that all global server IDs issued since December 2001 had a new root certificate, and has been providing instructions on how to manually install it. Obviously it is in VeriSign's best interest to ensure that its customers are using the latest (valid) certificate authority, just as it is in its interest to ensure that the CRL server is accessible at all times, whatever the level of traffic. However, emails we have received suggest that the company didn't do enough.

"I purchased my Java Code Signed certificates from Verisign in October 2003," wrote one correspondent. "There were no warnings I received indicating any action on my part was necessary. Additionally, it was more than error messages for users trying to access secure areas, JAVA applications that relied on these Verisign Code Signed Certificates simply failed."

Another wrote: "Two of three certificates we purchased in 2003 had this problem. Neither my network admin nor I were ever notified by [VeriSign], and since the SSL information is sent via email, they obviously had our email addresses."

VeriSign says that the expiration of the Certificate Revocation List was unrelated to the expiration of the root certificate. There is no reason to doubt that. However, the company could and should have done more to warn its customers and the Internet community at large of both issues. On Friday, VeriSign was notably reticent about the issue, only posting an advisory to address the Norton Antivirus issue late in the day. Indeed the only forwarning I'm aware of came via Cryptonomicon, who noticed an incidental entry on Jupiter Research's Microsoftmonitor Weblog by senior analyst Joe Wilcox.

In his blog, Wilcox indicated that this may not be the first time we've seen such problems in recent history, pointing to the problems experienced by Microsoft SharePoint customers back in November 2003. According to Microsoft, the problem that affected installations of SharePoint Services on 24 November were due to "code that verifies the signatures of the dynamic-link libraries (DLL) that are installed with Windows SharePoint Services." At the time Microsoft said this was due to an error in the verification algorithm that did not permit the signatures of the DLLs to be verified, but as Wilcox noted after some poking around, certificates issued for Microsoft for the purpose of code signing expired on 24 November, exactly the same time as SharePoint Service decided it no longer wanted to be installed.

Obviously, Microsoft, Symantec, Java writers, online banks, and managers of ERP systems all need to be more proactive about certificates. I'd like to imagine that both Microsoft and Symantec have learnt their lessons. But the onus must really fall on those organisations who want to be the most trusted of the trusted. Remember, nobody issues certificates to VeriSign -- it issues its own, indicating that we should trust it implicitly. Trust is something that has to be earned, and the only way to do that is to be open. Next time, VeriSign needs to be even more proactive, and work with its partners and customers ahead of time. Just like we all did with Y2K.