Every time I post something on "Big Data," I get quite a bit of Email with readers' thoughts on a good definition. A reader calling himself /herself "Mikey" sent a very short response that went to the heart of the topic. Here's a segment of what "Mikey" had to say:"
Think three Vs.
- Volume - The sheer amount of data, whether from a webscale user base (Twitter, Facebook) or a huge amount of machine/sensor data (clickstreams, power grid monitors etc.)
- Variety - Data is more than validated strings in fields - it's text, images, video, and all sorts of machine data formats
- Velocity - Wherever and whoever it's coming from, you have to capture tens or hundreds of thousands of writes per second, maybe even millions. You need distributed systems, usually, because if you just try to throw performance and hardware at it you'll eventually always lose.
I would also add extreme amount of retail point of sale data to the reader's "Volume" list. Other than that, "Mikey" has the use case nailed.
The technology that supports Big Data, on the other hand, is much to complex to describe in a few short bullets.