Your social data is doomed, and don't count on Facebook to save you

Your status updates, your uploaded photos, your videos, all of it is going to be inaccessible sometime in the future. Not just by you, but by your descendants as well.


Facebook, Twitter, WhatsApp. Instagram. Pinterest. All of these and nameless others have become tied into our digital lives more than we can possibly imagine.

see also

Our smartphone-obsessed society will leave behind few digital Mona Lisas

A de-emphasis on the permanence of expression as well as a lack of desire to preserve digital content will almost certainly result in the loss of many culturally significant works.

Despite their importance to us on a day to day basis -- and us spending so many countless hours per year sharing and reading shared materials on our devices -- we have given little thought to the permanency of those materials.

Virtually everything related to our daily lives that our civilization produces as a record of our existence is now stored in digital form.

The content we consume as audio, video and printed word, as well as all of our essential communications -- the balance of it exists dispersed on huge storage arrays owned by public cloud service providers.

Sure, there is some data that is kept privately -- on local device storage on consumer PCs, phones and tablets. Some people maintain devices specifically for backup. Enterprises have significant storage assets because there are regulatory controls that require them to keep certain kinds of data for a certain length of time.

They also have a vested interest in business continuity.

But that privately held data, while significant -- pales in comparison to the amount of data that is generated and stored in the public cloud.

I have already written about the real potential of us not being able to retrieve and re-assemble data structures fifty or a hundred years from now, that we may inadvertently lose significant cultural works created in digital formats.

But what about individual records of our existence?

As we rely more and more on cloud storage used at public service providers, succumbing to the convenience and the utility of having access to one's "lifestream" from any endpoint device, we also put the collective records of our existence in the hands of third parties that do not necessarily have long-term data preservation as a core priority.

Our data at public service providers like Facebook and Google has a single purpose -- to be monetized in exchange for being able to be share that data with others. That is the contract which is well-understood.

The data has significance to the provider only if it can be monetized in some way. So status updates, tagging, photographs, videos and the like will only be stored long term if they have value to the provider.

It's easy to understand how something that is much as a year old, perhaps five years old might still be of interest to a provider like Facebook. But ten year old data? Twenty? Difficult to say.

Unless providers have an implicit SLA that is defined according to levels of paid service -- which doesn't really exist today because Facebook, like many others is strictly a free service to end-users -- then there is no guarantee of data retention at all, even on a relatively short term basis that is measured in a single decade.

special feature

AI and the Future of Business

Machine learning, task automation and robotics are already widely used in business. These and other AI technologies are about to multiply, and we look at how organizations can best take advantage of them.

And as devices are more and more capable of recording higher density and more complex data formats such as high-definition or even 4K video, and public clouds start making those formats shareable, there will be a higher level of investment that will be required by the public cloud provider in their storage infrastructure.

This is going to cause a significant change in how public cloud providers classify data, and it is also going to drive heavy commoditization of the storage industry itself. Hyperscale providers will look for increasingly more ways to store that bulk data cheaply instead of using proprietary technology such as SANs.

But even with heavily commoditized technology such as JBODs acting as cost drivers, the amount of exponential growth of publicly stored data is going to consign a lot of material into the bit bucket. There's just no way we can store all of it forever.

Snapchat at least operates on the principle that it can't save anything permanently. Everything it does is considered disposable.

So let's face it. Your social data is doomed. Your status updates, your uploaded photos, your videos, all of it is going to be inaccessible sometime in the future. Not just by you, but by your descendants as well.

If you really want to preserve this stuff then you are going to have to take steps to maintain them yourself.

Part of the problem is going to be what data formats to use, and just how much of it do you care about it being accurately reconstructed.

A folder full of JPEGs or even MPEG4 videos may be easy to move between storage providers over the course of multiple decades as long as you diligently keep track of this stuff and you maintain multiple copies.

But reconstructing your Facebook or Twitter stream? Assuming you always have the capability to export that data over the lifetime of those services, how would you reconstruct it anyway?

Let's assume you have the full XML of ten years worth of Facebook data that you yourself generated. There are no good tools that exist today that would allow you to view that data offline or at a different provider in the exact same context as using Facebook itself.

special feature

The Evolution of Enterprise Storage

How to plan, manage, and optimize enterprise storage to keep up with the data deluge.

And Facebook, like many public cloud social networking services, constantly changes the way its data structures are stored as it adds new features. Facebook data isn't like Microsoft Office documents, PDFs or even Open Document Format where the source data is expected to persist and be restored and viewable over long periods of time as long as copies are kept.

Facebook also doesn't have to publicly document the way its internal data is stored because it isn't required to share its data with anyone. It only shares its data when it sees a benefit to doing so.

If you took snapshots of your Facebook stream at one year intervals, chances are that the XML definitions are going to look completely different each time.

You would need tools capable of reconstructing different versions of that feed. And Facebook is just one of the public cloud services people use every single day.

So maybe digital forensics experts might be able to re-assemble culturally pertinent records, assuming they can get copies of things. But your children and grandchildren? I wouldn't count on it.

What are you doing to preserve records of your "lifestream" for you and your family? Talk Back and Let Me Know.


You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.
See All
See All