In our, Andrew Brust and I debated the need for data scientists. Neither of us was comfortable with the term, but there is no doubt that today, analyzing big data requires a unique skill set.
The big issue is that big data is a couple of orders of magnitude greater scale than anything we've dealt with before. Add to that fact that we are dealing with streaming data — data that is coming in real time — that we intend to act on. This is not your father's data mining application.
Big data is often looking at what is trending. Whether it is flu or the latest on Taylor Swift, streaming data tells us where we are going, not where we've been.
It is the predictive aspect of big data that calls for actual science — the making and testing of hypotheses — so we can understand which trends are meaningful and which are spurious. If that nut can be cracked then data scientists will have earned their titles and their pay.
Check out the debate for more from both Andrew and I.
The Storage Bits take
Scaling always breaks something. Maybe not right away, but scale is consistently one of the toughest problems in computer science, as well as life.
The advent of massive storage — driven by the vastness of the internet's content — has taken our ability to store and manipulate data beyond our current ability to analyze it for actionable information. We have a lot to learn and much to gain if we can master the information that our technology now allows us to gather.
Comments welcome, of course. My guess is that big data is about where computers were in 1960. Agree?