No one considers the 20-year-old SHA-1 hash function secure, and browser makers are well on the way to phasing it out. But until yesterday's revelation by researchers at Google and CWI Amsterdam, there was no known reliable way of causing a SHA-1 collision.
The SHA-1 algorithm produces a 160-bit mathematical representation, or hash value, that should be unique for a given file. It's been used to ensure the integrity of everything from digital certificates for HTTPS websites, to managing commits in code repositories, and protecting users against forged documents.
Now the researchers, using a technique called SHAttered, have demonstrated that two PDFs with different content can have the same hash, which should never happen.
"We hope that our practical attack against SHA-1 will finally convince the industry that it is urgent to move to safer alternatives such as SHA-256," the researchers said.
The two-year effort was led by CWI Amsterdam researcher Marc Stevens and Google head of anti-abuse research Elie Bursztein, and relied on some serious computing power from Google.
The attack required nine quintillion (9,223,372,036,854,775,808) SHA-1 computations and took the equivalent of 6,500 years of single-CPU computations to complete phase one of the collision, and 110 years of single-GPU to finish phase two. Although that process sounds long, it's 100,000 times faster than a brute-force attack on SHA-1.
The researchers note in a paper that the estimated cost of a collision attack has fallen significantly in the past few years, which is why it's being phased out for signing HTTPS certificates.
Microsoft's Edge and Internet Explorer will warn users if a site is using a SHA-1 certificate by mid-2017, while Google's Chrome made the move in January. Firefox will do so in early 2017.
Based on a SHA-1 attack developed by Stevens, cryptographer Bruce Schneier in 2012 estimated a SHA-1 collision attack would have a processing cost of around $700,000 in 2015 using spot prices for Amazon EC2 instances, which would fall to $173,000 in 2018.
As noted in the SHAttered paper, the second and more expensive phase of the attack, which led to a full SHA-1 collision, was far less costly than the 2012 estimate. This phase relied on a cluster of GPUs hosted by Google. Again, using Amazon Web Services pricing as a yardstick, they write:
"Using a p2.16xlarge instance, featuring 16 K80 GPUs and nominally costing $14.4 per hour would cost $560,000 for the necessary 71 device years. It would be more economical for a patient attacker to wait for low spot prices of the smaller g2.8xlarge instances, which feature four K520 GPUs, roughly equivalent to a K40 or a GTX 970. Assuming thusly an effort of 100 device years, and a typical spot price of $0.5 per hour, the overall cost would be $110,000."
Potentially affected applications include SHA-1 signed digital certificates, email PGP/GPG signatures, and GIT.
Google has released a tool to test if a file is part of a collision attack, and has added protections in Gmail and G Suite to detect its PDF collision technique.
The researchers highlight that Linus Torvald's code version-control system Git "strongly relies on SHA-1" for checking the integrity of file objects and commits.
"It is essentially possible to create two Git repositories with the same head commit hash and different contents, say, a benign source code and a backdoored one," they note.
However, Torvalds said on a mailing list yesterday that he's not concerned since "Git doesn't actually just hash the data, it does prepend a type/length field to it", making it harder to attack than a PDF.
"Put another way: I doubt the sky is falling for Git as a source control management tool. Do we want to migrate to another hash? Yes. Is it game over for SHA-1 like people want to say? Probably not," wrote Torvalds.
"I haven't seen the attack details, but I bet: (a) the fact that we have a separate size encoding makes it much harder to do on Git objects in the first place (b) we can probably easily add some extra sanity checks to the opaque data we do have, to make it much harder to do the hiding of random data that these attacks pretty much always depend on."