SAP has made headlines with its 100 percent in-memory database, HANA. In fact, as reported by ZDNet's Rachel King only yesterday, the company announced that it will offer HANA as a managed cloud service. Hana has made a big impression, to be sure.
But HANA requires the entire database to be in memory, and while memory prices have declined dramatically, memory still commands a huge premium over disk, on a terabyte-by-terabyte basis. Plus, a single server can accommodate only so much memory in the first place, so petabyte-scale data work on HANA can require huge hardware expenditures.
Teradata sees the value of the in-memory database movement, but the company likely can't ask its customers to take the gold-plated approach of using memory exclusively, with disk storage present only as a fail-safe. Further, the company has studied the probabilities, and found that after a certain percentage of your data is stored in memory instead of disk, the returns quickly diminish. In fact, Teradata found that 43 percent of all IO (input/output) hits only 1 percent of all disk cylinders, and that 94 percent of IO hits only 20 percent of all cylinders. Talk about your 80-20 rules!
With that in mind, the company is announcing the addition of "Intelligent Memory" functionality to its venerable database appliances. Intelligent Memory will see to it that the most frequently used (the "hottest") data in a database stays resident in a special extended memory region and infrequently queried ("cold") data stays on disk. The determination of which data is hot and cold is updated dynamically, with Teradata consequently moving certain data in and out of RAM, at opportune times, in terms of processing inactivity on the cluster.
Combination of approaches
While this may at first sound like a caching scheme, that's not the case. A cache typically moves recent data into RAM, rather than frequently used data. That's a simpler approach, but it's also valuable. Teradata employs a caching approach as well; what it calls the FSG (File Segment) cache. And Teradata is smart enough to make sure that no data kept in the FSG cache will be moved into Intelligent Memory, and vice versa.
The 14.0 release of Teradata added columnar storage technology (whereby all the values for a column/field are stored contiguously, rather than all the values for a row/record/item being stored together). This allows for high rates of compression, as column values are often close in value, and certainly in order of magnitude. In the upcoming Teradata 14.10 release, Intelligent Memory will recognize columnar storage and maintain it. This means that high rates of compression will be maintained as well, allowing more data to fit into Intelligent Memory. It also means that if only certain columns' data is "hot", and that data is in columnar storage, then only those columns' data will get moved into Intelligent Memory, yielding further efficiencies.
You get in-memory, and you get in-memory, and you get in-memory
Interestingly, while the FSG cache is only available on certain Teradata editions, Intelligent Memory is available on all of them. So all customers will benefit, and since new, high-memory cluster nodes (or a cloud service) are not required, even smaller customers can look forward to better performance.
The combination of columnar compression, caching, and in-memory placement of hot data means that Teradata's approach to the in-memory craze is reasoned and has customers' interests in mind. I might prefer to think of the feature as "Reasonable Memory", but I guess that doesn't have quite the same ring to it.