Storage is a touchy subject among datacentre managers. Users can't get enough of it yet it's expensive. Never mind the plummeting costs of raw disk space, when it comes enterprise-level storage kit, the prices go through the roof, as vendors add features that allow the company to save money both by managing storage with fewer people, and to make do with less raw storage.
The main way they do this is via thin provisioning. Yet there's a dark side to thin provisioning that few if any vendors want to talk about. Maybe it's because they don't think it's a problem. Or maybe it's because they don't have an answer.
Thin provisioning is a way of allowing storage to report to an operating system that more storage exists than is physically installed. It saves buying more storage than you actually need right now, since the alternative is buy all the storage you might conceivably want for the lifetime of the system, which might be three to five years.
Annual data growth is said by analysts to be in the 50 percent range, so buying all you might need now means buying capacity that's going to remain unused for most of the time. Thin provisioning promises to cut that cost by employing an algorithm that works somewhat like commercial aviation overbooking.
Assume that most people will turn up but a small proportion won't, and the airline will book - say - 105 percent of the seats. The problem of what to do if all the storage passengers turn up is obvious: buy more storage.
But what if some of the passengers get off? I can't torture this metaphor too much more, but that's effectively what happens when users delete files from their network storage. For example, someone might dump the contents of their hard disk onto the network as a backup before changing laptops. Afterwards they delete that 250GB disk dump. Or what about Microsoft Exchange? It creates logfiles by the dozen and deletes them afterwards.
But what happens to the deleted data? To the OS, the file pointers have been erased and so from Windows' point of view it's gone, but it doesn't know it's running on a SAN. The storage system, which is operating at block level, has no way of understanding that the data is no longer in use. So a system that looks like it's 90 percent full and require expensive expansion may in fact contain a significant proportion of deleted data.
The problem is not new but then, solutions aren't that thick in the ground either. So next time your storage vendor sidles up to you and mutters about the joys of thin provisioning, ask what happens to deleted data. Not zeroed out data -- few OSes do that on deletion -- but deleted data.