As documented in DRAM error rates: Nightmare on DIMM street, DRAM error rates are hundreds to thousands of times higher than thought -- a mean of 3,751 correctable errors per DIMM per year. Which assumes your DIMM has error correcting code (ECC) to correct those errors. If not:
Everything is fine until the data corruption means a missed memory reference or an incorrect value or a flipped bit in a file writing to disk. What you see is a "file not found" or a "file not readable" message or, worse yet, silent data corruption - or even a system crash. And nothing that says "memory error."
In Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors, researchers Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Konrad Lai Onur Mutlu - all of CMU - and Chris Wilkerson of Intel Labs, found that commodity DRAM chips are vulnerable to disturbance errors. Moore's Law has reduced cell sizes and made them more susceptible to adjacent current flows.
By reading from the same address in DRAM, we show that it is possible to corrupt data in nearby addresses. More specifically, activating the same row in DRAM corrupts data in nearby rows. We demonstrate this phenomenon on Intel and AMD systems using a malicious program that generates many DRAM accesses. We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers. From this we conclude that many deployed systems are likely to be at risk.
The root cause of the errors: rapid voltage fluctuations on the wordline of a row of memory cells. The wordline voltage is raised in order to read bits in the row of cells.
A program that issues as few as 139,000 reads to a specific wordline can induce an error. As many as 1 in every 1700 cells is susceptible to such errors.
That may not seem like a lot, but in an 8Gbit chip that means more than 4 million bits are susceptible to malicious bit flipping. Since most systems don't use ECC the system has no way of knowing it is being attacked.
The authors suggest an inexpensive solution they call probabalistic adjacent row activation (PARA):
The key idea of PARA is simple: every time a row is opened and closed, one of its adjacent rows is also opened (i.e., refreshed) with some low probability. If one particular row happens to be opened and closed repeatedly, then it is statistically certain that the row's adjacent rows will eventually be opened as well. The main advantage of PARA is that it is stateless. PARA does not require expensive hardware data-structures to count the number of times that rows have been opened or to store the addresses of the aggressor/victim rows.
PARA won't eliminate adjacent bit flips - it is probabalistic, not deterministic - but it can be tuned make them rare. It may even help with the much higher than specified error rates seen in the field.
Intel has been working on this problem for years, but today DRAM from all three major vendors is prone to this attack. The researchers found that over 80% of tested DIMM modules had disturbable cells.
Yet again we see the storage technologies that underpin our digital civilization are much less robust than we expect or need. Industry needs to get serious about digital storage quality.
The good news: the solution proposed in the paper seems to combine low cost with good effectiveness. Let's hope vendors are listening!
Comments welcome, as always. Any bets on how long before we see DRAM disturbance attack?