Data scientists: Hype or help?

Data scientists: Hype or help?

Summary: Scale matters. Massive data — especially streaming data — requires its own ecosystem. It's not just small data made bigger.

TOPICS: Big Data, Storage

In our ZDNet Great Debate, Andrew Brust and I debated the need for data scientists. Neither of us was comfortable with the term, but there is no doubt that today, analyzing big data requires a unique skill set.

The big issue is that big data is a couple of orders of magnitude greater scale than anything we've dealt with before. Add to that fact that we are dealing with streaming data — data that is coming in real time — that we intend to act on. This is not your father's data mining application.

Big data is often looking at what is trending. Whether it is flu or the latest on Taylor Swift, streaming data tells us where we are going, not where we've been.

It is the predictive aspect of big data that calls for actual science — the making and testing of hypotheses — so we can understand which trends are meaningful and which are spurious. If that nut can be cracked then data scientists will have earned their titles and their pay.

Check out the debate for more from both Andrew and I.

The Storage Bits take

Scaling always breaks something. Maybe not right away, but scale is consistently one of the toughest problems in computer science, as well as life.

The advent of massive storage — driven by the vastness of the internet's content — has taken our ability to store and manipulate data beyond our current ability to analyze it for actionable information. We have a lot to learn and much to gain if we can master the information that our technology now allows us to gather.

Comments welcome, of course. My guess is that big data is about where computers were in 1960. Agree?

Topics: Big Data, Storage

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • humm

    "Add to that fact that we are dealing with streaming data – data that is coming in real time – that we intend to act on."

    Doesn't sound prudent. Short term decisions only help the short term.

    "It is the predictive aspect of Big Data that calls for actual science"

    Assuming it's accurately predictive, which isn't guaranteed.

    This all seems theoretical at this point, not something that is proven to be true.
  • My guess is that Big Data is about where computers were in 1960.

    Yes, I think that is a fair assessment of where big data is on the curve. I guess the first thing that comes to mind for most about big data is something social media related. But consider all of the weather data for the planet, much in real-time but more in archives.

    Crunching those numbers backward and forward for predictive reasons is quite a daunting task. As time progresses and we grow the weather data archives even larger the predictive opportunities increase but the job of calculating the predictions grows more difficult to perform.
  • 1960 NO WAY!

    More like the mid 1800's

    Also I take umbrage to **.... data tells us where we are going, not where we've been. **

    The tail may be shorter in analysing such vast amounts of data but it is still in the past. Never the future! And you correctly say so **Big data is often looking at what is trending.**

    Theoretically the nut has been cracked, we just need computational power and reliable programmers.
  • Big Data and statistics

    Big Data is too often statistics and nothing else (machine learning, etc.). Other that integrating it with NLP, structure is ignored. The semantic web provides a type of structure, but is ignored. In that aspect, it is a rehash of the heavy reliance on statistics in the past.
    • Future statistics are hard to come by

      So building models with past data is really the best you can do.
      John L. Ries
      • Re

        How else would you otherwise build models? You always need some past data ..
        Javiar Sandra
  • Someone needs to know what to do with the numbers

    I completely agree that we cannot completely rely on data scientists. According to this, the amount of people that are able to work in this field will need to grow. However, as big data becomes more mainstream, more skill sets will be able to handle the big data coming in