Why do you never delete anything?

Why do you never delete anything?

Summary: When data was stored on paper (remember those days? It seems so long ago), information was stored for as long as we needed it, then discarded.

TOPICS: Networking

When data was stored on paper (remember those days? It seems so long ago), information was stored for as long as we needed it, then discarded. This was because new filing cabinets took up more floor space, which was a limited resource.

Now we never delete anything. I was talking to someone recently who boasted that his emails went back 20 years, while the oldest I think I can locate is only 15 years old. Why on earth am I keeping this stuff? And even if you're a business, what relevance will an email or document that's more than 10 years old at the most still retain?

There are compliance reasons for keeping some data, and IT vendors like to use those reasons to plug home the assumption that deleting anything is close to heresy, not to mention illegal. But so often, it isn't.

Unless you're in the professional embarrassment business, most old emails and documents simply aren't worth keeping most of the time, as the circumstances under which they were generated have changed. Either the individual(s) who write it aren't in the same positions or company, or the technology, legal environment, or market have moved on, or the recipient is no longer interested in that data.

This doesn't apply to every document, of course. But it does apply to a huge number, and the problem we have is deciding which is which. So we don't do that. Instead, we've become very clever at minimising the volumes of data we store from both electronic and physical perspectives.

As storage gets denser with each technological generation, each document effectively takes up less floor space. Add the capabilities of technologies such as compression, thin provisioning and deduplication, and the floor space per megabit shrinks further.

But will there come a point where even these technologies and methodologies reach breaking point? We assume not. We proceed on the basis that, despite the tsunami of data now flooding company networks and storage systems, somehow that data will be accommodated. Right now, that works because Moore's Law still holds when it comes to storage, because it's cheaper to buy more space than to insist that employees spend time deleting unnecessary emails, and because the 'never delete anything just in case' mentality is a shibboleth that seems unchallengeable.

Yet almost every time, you will never look at that data again. As a document makes its way down the tiers from hot to archive, residing eventually as a sequence of cold bits on an inert tape somewhere in a vault, chances are that it will never be disturbed. You can institute data retention policies and bellyache to CIOs and upwards all you like, but this is the fate of almost all end user data.

Partly it's because everyone colludes in this fallacy that all data is useful. CEOs believe that nothing should be deleted just in case the lawyers demand it, CIOs believe it because they know how the CEO feels about it, IT managers because the volumes of data under their command are burned onto their CV -- the more the the better -- and end users won't delete stuff because there's nothing in it for them.

If there were a financial imperative to reduce the volumes stored, and if it could easily be determined which data is worth storing which isn't, that would help enormously.

It would mean that future generations won't spend years sifting through petabytes of data to find a nugget or two that may or may not be in there somewhere, it will discourage the squirrel mentality from which too many of us suffer in the wider world too, and ultimately it will save money because you won't have to buy so much storage. So what needs to happen is that providers of archive stores need to increase their prices, CIOs need to insist that richer metadata -- data about data -- is attached to every document, and we need to think more about data deletion as a discipline.

Oh yes, and we need to stop using Microsoft Outlook, which stores emails in vast monolithic PST files that are not only vulnerable if a single bit goes bad but, as they grow, increasingly contain data not worth the disk space they occupy.

Topic: Networking

Manek Dubash

About Manek Dubash

Editor, journalist, analyst, presenter and blogger.

As well as blogging and writing news & features here on ZDNet, I work as a cloud analyst with STL Partners, and write for a number of other news and feature sites.

I also provide research and analysis services, video and audio production, white papers, event photography, voiceovers, event moderation, you name it...

Back story
An IT journalist for 25+ years, I worked for Ziff-Davis UK for almost 10 years on PC Magazine, reaching editor-in-chief. Before that, I worked for a number of other business & technology publications and was published in national and international titles.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Completely agree with this, but from the end user's perspective (especially at home where you really don't need all that stuff) another point worth mentioning is the sheer size of bundled storage.

    In the late 90's my family had one PC with 2x750mb hard drives. Wow, we really made sure no useless old stuff was stored. Then we upgraded to one with a 20gb hard drive. Much less of a problem, but it needed to be managed. At uni I had a laptop with 30gb - enormous! But, by that point I had downloaded video files and large game installations. My current laptop is three years old with a total of 500gb storage - I have no idea what's on there but it doesn't come close to being full. So why manage what's on there?
  • I think this might be a fair strategy for a single end user - although after a while some management is likely to be needed - but for corporations, it just doesn't work as so much of the innovation in storage tech goes into managing stuff that could/should have been deleted. This adds up to a lot of cash outlay: right now it might be less than actually managing the data not the storage but it'll all come to a sticky end, you mark my words!
    Manek Dubash