A 1TB DIMM is coming

A 1TB DIMM is coming

Summary: When it comes to low latency and high bandwidth, the memory bus is hard to beat. But a terabyte? Diablo Technologies has announced one.


In-memory computing is all the rage. But there are problems with large memory servers: DIMMs use a lot of power and board space, and DIMMs are expensive.

When it comes to low latency and high bandwidth, the memory bus is hard to beat. But a terabyte? Diablo Technologies has announced one.

The secret is using flash instead of DRAM. Flash is much denser, lower cost, and uses much less power. And it's cheaper, too.

As a result, you can put 1 terabyte of flash on a standard-size DIMM. This moves in-memory computing from expensive technology for critical apps to a much more affordable technology for many applications.

How did they do it?

The TeraDIMM looks to the system just like a regular DIMM. The form factor and power are the same. There are no changes to the motherboard or applications, only a driver — available for Windows, Linux, and VMware — that makes the device look like either storage or system memory.

Of course, flash memory lacks some very important characteristics of DRAM. It wears out; it takes much longer to write; and it needs specialized controllers to manage all of its issues.

This is the secret sauce of the new product. An ASIC manages the flash and makes it look like either storage or extended main memory.

  • Endurance: The product is designed to handle 10 full capacity writes — 10TB for a 1TB DIMM — every day for five years.

  • Performance: 3-5µs write latencies.

  • Capacity: Multiple modules can be pooled by the driver. Driver can broadcast rates to multiple modules for availability.

Use cases:

  • Virtual machines: There is lots of read-only traffic in VMware. Placing virtual machines in main memory is significantly faster.

  • High-frequency messaging: Low constant latency, even better than PCIe devices.

  • Memcache: Uses main memory as a much bigger cache. Popular in hyper-scale clusters.

The Storage Bits take

There have been non-volatile DIMMs on the market before this. But this is the first one I've seen that that has all the elements needed for success: unique technology; economic advantage; top-tier VC and OEM support; and use cases in a rapidly growing part of the market.

No, you probably will not be buying these for your gaming rig. This is intended as an enterprise server class technology, and it will be priced as such.

But this brings new life to the concept of in-memory computing. Imagine 8TB of main memory to begin to see the possibilities.

Comments welcome, as always. How would you use a terabyte — or more — of main memory?

Topics: Storage, Cloud, Hardware, Servers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • how about price?

    Without knowing the price tag, any opinion is useless. This stuff sounds expensive.

    Also, for what CPUs is it designed? Certainly, Intel's CPUs can't support 1TB DIMM capacities. Not today.

    I believe, something like this makes more sense if attached to the PCIe (or DMI, or Hypertransport or what you have) bus.
    • Just Guessing...

      But from information in specs for old mainframe technologies that used the same addressing concepts but on a MUCH smaller scale, the drivers may implement some "windowing" form of virtual addressing. For example, the driver (which can send physical addresses as data to the chip's controller via the I/O bus) might allocate 1GB pages out of the total as "quarters" of the machine memory for different processors, or for a processor in a specific (VM or "region") address space. The CPU could issue privileged I/O instructions to set the "gigapage" for subsequent accesses, or set different gigapages for execute, read-from and write-to addresses for interpage moves.

      Again, not knowing specifics, something along those lines might work. The applications or virtual machines may THINK their pages are being swapped out and back in, while they are actually just in different parts of the big chip all the time.
  • Fill the ARC

    I would build a ZFS based storage appliance (Nexenta / iXSystems) and use 1TB DIMMs for the ARC. Hello performance!
    • Re: Fill the ARC

      Good idea but.. the ARC is by definition volatile. While there is work in progress, which is almost complete to make L2ARC persistent and which sizes like this, that would be great.
  • 10 Writes per day.

    The problem is that you're still going to have areas of the memory that are used more often than others, negating that 10 per day statistic. Some areas may be only written to once a day, and others 100 or a 1000. Those high use areas will break down first, no matter how little the rest is written to. That is, and has always been, the danger of flash memory.

    So, they're going to have to either design intelligence on the DIMM to make sure writes are spread out across the DIMM, or the motherboard will have to do this. Basically the same self preservation logic that is in SSDs.

    While the possibilities are great, so are the chances for chaos as a server DIMM commits seppuku in the middle of a large transaction.
    • this is unlikely...

      With how this works, I'm pretty confident that they already have write balancing across the cells like SSDs, so I doubt this is an issue. This stuff doesn't work like normal memory.
  • This sounds more like a storage solution.

    As memory, it makes little sense, since it's slower, with a limited life span. Memory is written and read far more frequently than storage. Ten full writes per day would be severely limiting.

    As fast storage, these would be great. You read storage far more often than you write it. Depending on the cost, I could see these making their way into replacement drives. Current SSDs are just too small to use when you handle a large amount of data. A drop in replacement hard drive which contains 4x 1TB DIMMS would rock. (Again, depending on price.)
  • Can't wait for the day when...

    ...we no longer differentiate between memory and storage. At some point it will all be storage. This looks to be a first step towards that (though I don't see any reason why we couldn't combine flash and RAM into a single address space today).
    • We've been there

      Ye, IIRC the old IBM System 38 featured a flat memory where the disk was an extension of main memory. Made memory management easy - there wasn't any - and it was a popular mid-range system. But we've moved on.

      R Harris
      • 25 Years of Planned Anti-obsolescence...

        (http://www-03.ibm.com/systems/i/) 25 Years of Planned Anti-obsolescence... I am sure that this is the wrong forum for this reply, but what the hay-ho...

        The "old" IBM System 38 became the IBM AS/400 which became an IBM iSeries which became the "new" IBM i (When IBM couldn’t figure out how to market a product they change its name). This midrange highly scalable multi user application and database server still uses a "flat memory (model) where the disk (is) an extension of main (storage). It was as ingenious back then and still is today!
  • Presumably You Mean "Tebibyte" (TiB) Rather Than "Terabyte" (TB)

    Solid-state memory is normally sizes by powers of 2, not powers of 10.
  • Interesting for sure.

    Question is what is the value of this memory versus the next closest memory tier is PCIe based flash. From a bus access perspective the access speed of a 8 byte access DIMM is ~10 ns while 8 byte acces to PCIe is ~250 ns. The read access time of a flash chip is ~25 micro seconds or 25,000 ns. This shows a less than 1% performance difference between a read access between the Flash DIMM and the PCIe Flash device. Since flash chips are page based devices accessing Flash memory like a DIMM is sub-optimal. Best to use Flash as it was designed as a page based device. Assuming a page based device what is the advantage? Assuming a 2K page the PCIe bus transfer time of 1 microsecond is still short relative to the 25 microsecond flash access time.

    What use case makes this work better than a PCIe based Flash device, I'm missing the low level details that make the case for this class of memory being the break through? My guess is that the implementation is using a cache front end to buffer the flash chips which even if it is perfect can only impact latency by 1 microsecond or 4% at best.

    ~25 microseconds read, ~250 microseconds writes for MLC-1: http://cseweb.ucsd.edu/users/swanson/papers/FAST2012BleakFlash.pdf
    10 ns access time for a DIMM: http://www.tomshardware.com/forum/280837-30-latencies-ddr3
    PCIe bus latency: http://stackoverflow.com/questions/12223395/transaction-size-and-latency-between-cpu-and-ram-ram-and-pcie2-0-16x-device
    40 microseconds latency for Fusion IO: http://www.storagesearch.com/ssd-29.html

    Phase change memory might be the break through but using flash as a DIMM I'd like to see more data on the use case that is substantially better and why?
  • Right socket/bus but wrong tech...

    I think Diablo has the socket/bus right, but I think Flash and DRAM are on their last legs and the folks at Crossbar www.crossbar-inc.com are probably working with the future.
  • Someday mechanical storage must die.

    Solid state is just better. As this Pretty Amazing New Stuff (PANS) comes out it brings down the cost of legacy mechanical stuff for us home users. I can fit 32TB of usable RAID storage in my desktop PC now, or buy a Backblaze box with 180TB in it. I am not sure what I would do with all that. More likely to drop $700 on one of those new 1TB Samsung SSDs this year.

    But down the road this sort of thing will just be built into all consumer gear by default as Enterprise PANS will have moved on to something else. Cool beans.
  • Something's off here

    Flash access is 25 microseconds read, according to kjbatx, and 10 times that for write - yet the specs for write latency you state are 3-5 microseconds. Even at that, and assuming an 8 byte wide data bus, that's 200 to 300 kilotransfers per second. Do you mean 3-5 microseconds random access delay - what we normally associate with seek and rotational latency in a spinning disk - or cyclical transfer rate? If the latter, then even at 300 kilotransfers per second, that's 2.4 megabytes per second - horrendously slow. Do you mean 25 nanoseconds? More and clearer specifications are needed before one can evaluate the application utility of this device.