Home & Office

Why do you never delete anything?

When data was stored on paper (remember those days? It seems so long ago), information was stored for as long as we needed it, then discarded.
Written by Manek Dubash, Contributor

When data was stored on paper (remember those days? It seems so long ago), information was stored for as long as we needed it, then discarded. This was because new filing cabinets took up more floor space, which was a limited resource.

Now we never delete anything. I was talking to someone recently who boasted that his emails went back 20 years, while the oldest I think I can locate is only 15 years old. Why on earth am I keeping this stuff? And even if you're a business, what relevance will an email or document that's more than 10 years old at the most still retain?

There are compliance reasons for keeping some data, and IT vendors like to use those reasons to plug home the assumption that deleting anything is close to heresy, not to mention illegal. But so often, it isn't.

Unless you're in the professional embarrassment business, most old emails and documents simply aren't worth keeping most of the time, as the circumstances under which they were generated have changed. Either the individual(s) who write it aren't in the same positions or company, or the technology, legal environment, or market have moved on, or the recipient is no longer interested in that data.

This doesn't apply to every document, of course. But it does apply to a huge number, and the problem we have is deciding which is which. So we don't do that. Instead, we've become very clever at minimising the volumes of data we store from both electronic and physical perspectives.

As storage gets denser with each technological generation, each document effectively takes up less floor space. Add the capabilities of technologies such as compression, thin provisioning and deduplication, and the floor space per megabit shrinks further.

But will there come a point where even these technologies and methodologies reach breaking point? We assume not. We proceed on the basis that, despite the tsunami of data now flooding company networks and storage systems, somehow that data will be accommodated. Right now, that works because Moore's Law still holds when it comes to storage, because it's cheaper to buy more space than to insist that employees spend time deleting unnecessary emails, and because the 'never delete anything just in case' mentality is a shibboleth that seems unchallengeable.

Yet almost every time, you will never look at that data again. As a document makes its way down the tiers from hot to archive, residing eventually as a sequence of cold bits on an inert tape somewhere in a vault, chances are that it will never be disturbed. You can institute data retention policies and bellyache to CIOs and upwards all you like, but this is the fate of almost all end user data.

Partly it's because everyone colludes in this fallacy that all data is useful. CEOs believe that nothing should be deleted just in case the lawyers demand it, CIOs believe it because they know how the CEO feels about it, IT managers because the volumes of data under their command are burned onto their CV -- the more the the better -- and end users won't delete stuff because there's nothing in it for them.

If there were a financial imperative to reduce the volumes stored, and if it could easily be determined which data is worth storing which isn't, that would help enormously.

It would mean that future generations won't spend years sifting through petabytes of data to find a nugget or two that may or may not be in there somewhere, it will discourage the squirrel mentality from which too many of us suffer in the wider world too, and ultimately it will save money because you won't have to buy so much storage. So what needs to happen is that providers of archive stores need to increase their prices, CIOs need to insist that richer metadata -- data about data -- is attached to every document, and we need to think more about data deletion as a discipline.

Oh yes, and we need to stop using Microsoft Outlook, which stores emails in vast monolithic PST files that are not only vulnerable if a single bit goes bad but, as they grow, increasingly contain data not worth the disk space they occupy.

Editorial standards