I don't know where these ideas come from nor am I probably the first one to think of it but I'm contemplating how big data will affect me at home—or at least how it could. "Big Data," as its known today, refers to data sets that are too large to process by standard or traditional means. Using that definition, what makes me, or anyone else think that big data will ever find its way to my house? That's the perfect question to ask. The answer, once you know it, won't surprise you at all.
It's certainly no surprise to you that data grows at an exponential rate. Not just the amount of data but also the size of each data component grows. Remember 1.44MB floppy disks? Fifteen years ago I could copy an entire operating system onto a single diskette (DOS, XENIX, Minix, etc.). Ten years ago, I ran Linux from two such floppy disks (with X). At that time, Windows came on a single CD disk. A CD-ROM disk could hold ~700MB, which I thought was huge. Then came DVDs with their enormous capacities (4GB/8GB).
I suppose that storage has to expand ahead of data because we keep using more of it. Windows now requires one of those large DVDs for delivery. A single photograph from my DSLR camera weighs in at four to five megabytes. Even simple, single Microsoft Word documents are now too large for those old diskettes.
My first thumb drive was 64MB and I thought that was so cool. Now I need a 32GB one just for stuff.
I think you get the picture.
Data is bigger. There's more of it. And data grows exponentially with time.
So the thought of big data at home isn't so surprising.
Think about all of the data points that you now have on paper, in your head, or stored in various locations on the web, on disks, on thumbdrives, on floppies (kidding), or in the cloud. I personally store 50GB of data on Dropbox. It's mostly pictures but there's also a lot of documents stored there too. That's a lot of data for one guy.
Think of all of the music you own. Add up all of your bills, transactions, phone calls, TV shows, banking, medical records, maintenance records for all of your various electronics, home, appliances, cars, and so on.
That's a lot of data to sift through. When you start considering all of the data you've collected, it probably runs into the low terrabyte range, doesn't it? That's big. It's even bigger when you consider that the amount of data grows exponentially. It gets out of hand in a short amount of time.
If you digitized all of the data in your life, where would you store it?
Exactly. There aren't a lot of choices.
How would you store it?
Exactly. That's the problem facing technology.
We need a big data solution for personal data. It has to be secure, flexible, inexpensive, expandable, accessible from anywhere, and easy to manage.
The cloud is the most likely candidate for data storage. But where will that cloud exist? At your home? In a data center? In multiple data centers?
The answer is all of the above.
It will have to exist everywhere to fulfill all of the requirements that I listed above. Since data centers probably won't have the capacity to store the data, we will have to have some of the responsibility placed on us. To that end, we'll have to "donate" capacity to the collective cloud in the forms of storage, CPU, and memory.
There is a company, the name escapes me (sorry), that has a setup like this for storage. When you join their cloud, you donate an amount of storage to the cloud where your data is stored and so is other people's data. Some of their data gets striped to your disk, some of yours is stored elsewhere too, and the whole thing is encrypted and very failsafe.
Think about the SETI@home project when you think personal big data clouds. SETI@home is the project to search for extraterrestrial life that uses a bit of your computing power to assist in that quest. There are other worthwhile projects that also use the BOINC client too. I'll leave that research up to you.
My point here is that big data is about to invade your home in a big way. If I were in Silicon Valley and could tap into the minds of smart, young entrepreneurs, I'd bring the solution to the world. But I'll have to leave it to those who are.
So for those of you interested in this quest, there are really two problems to solve:
One is the digitization of one's data and the other is the storage of it. I've described how the data can be stored and how one can enter the personal big data cloud. Digitization of your data is one I'll have to think on for a while but it will have to involve a combination of manual entry, scanning, scraping from other sources, and automated entry from transactions.
Have I just invented "Big Brother?" Or have I simply enabled people to free themselves from the burdens of bits of paper, multiple records, multiple formats for data, and prevented data loss in the case of disaster?
You tell me.
What do you think the impetus will be to bring big data home? Do you think there's a need? How long do you think it will take to implement my plan? Talk back and let me know.