Big data: we have the technology, but do we have the people?

Big data: we have the technology, but do we have the people?

Summary: Big data solutions such as Hadoop 'isn’t rocket science, people can learn it.'


Organizations are awash in big data, opening up huge opportunities to understand and predict customer preferences and market growth. In a hyper-competitive global economy, having the right information means competitive advantage.

There is a catch to all this, however. To get to information nirvana, companies need people with the right skills to get them there. People who know how to manage data, build analytics systems, and help make sense of the data.

A recent survey of data scientists by EMC bears this out. A total of 83% felt that new technology would increase the demand for data scientists, and 64% believe that it will outpace the supply of available talent. In fact, a McKinsey Global Institute study predicts that within the next six years, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

TechTarget’s Beth Stackpole also pointed out that today’s professional workforce is trained to manage traditional, structured data environments, but are not ready to handle big data environments and open-source platforms such as Hadoop and MapReduce. “While data management teams typically have a well-defined set of expertise around managing and organizing highly structured data and modeling and creating reports in SQL, those conventional skill sets don’t translate well to the unstructured, flat-file part of the big data world, where command lines and NoSQL database technologies are the core building blocks of most of the emerging platforms.”

Hadoop, an Apache open-source project, is a collection of open-source components designed to to store massive amounts of data across multiple nodes and compact it into an accessible format called the Hadoop Distributed File System (HDFS). MapReduce, often used in conjunction with Hadoop, is a programming construct for building an analytical capability on top of the data. NoSQL (”not only SQL”) databases typically handle non-structured data, including Weblogs, documents, text, PDF, video and audio.

At the same time, companies shouldn’t have to look too find the talent they need to manage big data challenges and opportunities. As part of a series of Webcasts, co-sponsored by Informatica and Cloudera, I had the opportunity to speak with executives and consultants at the front line of the big data explosion.

For example, Binh Tran, CTO and co-founder of Klout, pointed out that skillsets are the “number one” challenge the social networking rating service is wrestling with. “When we first started, it was a matter of digging into it, getting into the online documentation. Finding people with production experience on a large scale is basically difficult. We had to hire people out of the Yahoo and Facebook world.” Tran reports seeing more colleges, at least in the Silicon Vally region, offering Hadoop and MapReduce as part of their curricula.

David Menninger, analyst with Ventana Research, pointed to recent survey results in which more than three-fourths of 169 executives say staffing and training issues are the greatest obstacles to making the most of big data.

Skills are short, but the situation is not hopeless, Cloudera’s Omer Trajman points out. The ability to address big data solutions such as Hadoop “isn’t rocket science, people can learn it,” he states. Just a few years ago, there were “only two people who knew Hadoop” — now those numbers are expanding. “We encourage organizations look at skillsets they have internally and train people. There are a lot of folks who have the right background and can learn to use Hadoop. “It’s more than just finding individuals who already learned and hiring them… there are individuals within your organizations who can really grow into these roles…  there’s a lot of folks who can learn Hadoop.”

Here are the positions that will play a role in big data:

System administrators: Responsible for the day-to-day operation of the cluster. “They may manage the hardware components either directly and indirectly, identify the need for additional hardware and bring it on-board.” Responsibilities also include monitoring and configuration, he adds. “They’re also responsible for integration of Hadoop with other systems.”

Developers: Build the platform and analytics apps. “They have familiarity of the tools or algorithms, they might be writing or packaging or optimizing or deploying different MapReduce jobs. “They’ll source and maintain different libaries,” Trajman adds. “Their role is similar to the DBAs role in the database world.”

Data analysts/data scientists: Data analysts and data scientists fall into the same category, Trajman says. These professionals apply algorithms to analytic problems, and do data mining. “Their ability to tell a story with the data is what defines them.” In addition, Trajman says, “they may have domain expertise. They’ll help create data products, create data solutions that drive the business.”

Data stewards: Ultimately responsible for the collection of quality data “Data stewards curate and catalog all the incoming data. There’s a lot of data floating around organizations, and Hadoop can get it centralized. So identifying the upstream data models, having a background in ETL [extract, transform, load] and data modeling are all typical skillsets and backgrounds.”

“All of these skillsets actually exist today in organizations,” says Trajman.

(This article was cross-posted at SmartPlanet Business Brains.)

Topics: Banking, Data Centers, Data Management, Enterprise Software, Hardware, Open Source, Software, Storage

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • RE: Big data: we have the technology, but do we have the people?

    The skills shortage isn't a problem for companies that invest in their staff by enabling them to take the training and seminars that are available. Companies that don't provide for employee development don't deserve to succeed.
    • RE: Big data: we have the technology, but do we have the people?

      @Reality-based Couldn't agree more. I think most companies are too short-sighted when it comes to training -- there's far too little investment being made in employee training & development, and thus skills shortages come back to bite them.
      • They, corporations etc

        Have been in that mindset since the wild-west days of the .com boom. Since then, you're (stereotypically) supposed to work out what a company does etc and become fantastic/competent on your own dime/time. This mindset isn't just an IT thing, it's pretty much endemic. People are the new widgets.
    • Yes, it IS rocket science

      "Training and seminars" might be the answer for learning how to use Hadoop, but that's not going to substitute for a Masters or Ph.D. in math or statistics, which is what you'll want in the people who are doing the modeling. There's a lot of "don't know what they don't know" in people transitioning from a structured data environment to one based on probabilities and statistics.
      Robert Hahn
      • Agree 100%

        @Robert Hahn

        But I suspect that the mathematicians and statisticians will want to stay with the model that has a sound mathematical basis - relational in other words.

        The "Big Data" technologies will be gone quite soon, so learning about them is not very important or relevant.
      • RE: Big data: we have the technology, but do we have the people?

        @Robert Hahn @jorwell Big Data technologies are here to stay. You don't need to have a Ph.D in Math to enter the Big Data field. Yes you need to have a strong background in algorithms and statistics. Well, you can learn R plus Machine Learning algorithms. Can you pls let me know from which research or study you concluded that Big Data will vanish soon ?:) You can read Mckinsey study and several scientific researches about the future of Big Data before giving any biased conclusion.

        Thank you.
      • RE: Big data: we have the technology, but do we have the people?

        @Robert Hahn <br><br>Modeling is but one, and a minor one of that, of the use cases people are working on with big data. In all of the projects we've kicked around at the CHUG, I think one was based on model building. Most are much more practical, focused on how to bring value from data directly back to the end user.<br><br>In the end, Hadoop is just a tool. As is R.
      • RE: Yes, it IS rocket science

        @Robert Hahn Data science is multi-disciplinary. What you're referring to is data mining (sometimes called data modeling) and is a combination of statistics and machine learning. Additional analytical techniques include data visualization and online analytical processing (OLAP). OLAP is important because it enables data aggregation and the subsequent drill-down into detail or the drill-across to other, potentially related data sources. OLAP also supports the creation of forecasting models.<br><br>Additional skills include identifying data sources resident in an enterprise/organization, assessing the redundancy and quality of the data sources and extraction, and, very often, transformation of data from the data sources into a data analysis environment.<br><br>And, of course, domain experts familiar with both the data and subject(s) at hand.
        Rabid Howler Monkey
    • Or you could just take responsibility for yourself, invest in yourself

      get the skills for yourself, and be in high demand for good paying jobs instead of whining about how unfair it is that your company doesn't wipe your nose and give you a bath, milk and cookies and sing you to sleep.
      • Amen to that!!!

      • RE: Big data: we have the technology, but do we have the people?

        @baggins_z Sounds like your company sucks to work at, given that they obviously have no training budget, and the turnover rate is probably on a par with McDonalds, as is the quality of work.
  • RE: Big data: we have the technology, but do we have the people?

    Can't agree more on the increase in demand for Big Data roles. Businesses nowadays want to build data-driven organizations to gain a long term competitive advantage. This was driven by the success of several leaders in the industry like LinkedIn, Google, Facebook , Amazon , and Twitter.<br><br>I came from a Business Intelligence background and now I'm entering the world of Big Data. I recently got a certificate in Hadoop Fundamentals (Free course) from IBM Big Data University ( I recommend it to any one who want to start a career in Big Data.<br><br>Thanks for this great article!
  • RE: Big data: we have the technology, but do we have the people?

    The real problem here is on the "Analyst" and "DW" side, they are trained in a certain set of tools and methods for better or for worse. <br><br>Anyone with a rudimentary knowledge of the command line, SQL and a modern scripting or programming language can learn the Hadoop basics in a day. Amazon's Elastic Map Reduce is a great place to start, they have AMIs and procedures ready to go and better yet a trove of Data to play with at S3.<br><br>I'd argue that what Big Data is really opening up, some call it "Data Science" is much more of an earth shaker than just being able to deal with larger and larger data sets that break the relational database paradigm. In a short time we've gone from a scarcity to an abundance of data. <br><br>Here we see the bypassing of the traditional "Data Analysis" process and the pushing of data right back to the end user. There aren't enough analyst hours on the planet to sort through what we are now collecting, and the info would likely be stale by the time we'd be able to use it. We need automation and instant feedback. This is what excites me. <br><br>FB, LI, Google, Mint, etc have all excelled here.<br><br>How can we take this newly acquired abundance of data (behavioral, analytical, etc) and use it to make search run better, or show me what my friends are gravitating towards, or how much someone spends on average at the Cheesecake Factory on Michigan Avenue in Chicago (bring back!), etc.
  • RE: Big data: we have the technology, but do we have the people?

    "Skills are short, but the situation is not hopeless, Cloudera???s Omer Trajman points out. "

    Well, if they're willing to teach it, I'll do my best to learn it.

    The biggest problem is that you're unlikely to see colleges teach it any time soon.

    There's an ever-widening gap between business needs and what they teach at college - I've learned that myself the hard way X(.

    . . . although I will point out that the first step in being successful is creating a great product, and that all the analytics in the world won't help if your product sucks. I do wonder if it's really worth the cost to do this stuff, or if companies are simply being desperate to make a profit on bad products.

    This is an awful lot of resources going into what used to be done via spreadsheets. Do we really need to crunch such a vast amount of data to see what's going on?

    But hey, maybe I'm wrong. And I'm willing to join if they'll hire me. Computer Science degree, probably the closest thing they'll get to what they want coming out of colleges right now. Analyzing large data sets was one of the things were were taught.
  • RE: Big data: we have the technology, but do we have the people?

    Nice article Joe! The role definitions are really good and I hope this becomes an Industry standard for Big Data.
  • Big Data is not new

    @AliRebai<br><br>"Can you pls let me know from which research or study you concluded that Big Data will vanish soon"<br><br><br><br>Well things like Hadoop don't look very up to date to me. As far as I can see it lacks a query language which means you are forced down to using low-level procedural languages like Java rather than more modern declarative approaches. <br><br>There seems to be no provision at all for constraints, which means there are considerable risks that the data will be inconsistent. This means that no matter how clever a model you build you will get wrong answers out of it. I would see this as being a showstopper. <br><br>You clearly don't need a PhD in Mathematics or Statistics to use any kind of DBMS, however to build the kind of models that would produce reliable results on very complex data you probably do, but the question here is with the complexity of the data - and I question how adequate the "big data" approaches are at representing complex data models. The complexity of the model is a far greater challenge than the size of the data.
    • RE: Big data: we have the technology, but do we have the people?


      You are very much uniformed, google hive (or pig). Or how about using ruby or python on top?

      "You clearly don't need a PhD in Mathematics or Statistics to use any kind of DBMS, however to build the kind of models that would produce reliable results on very complex data you probably do,"

      This is old school analysis, useful in some cases but slow and limited in application. This is not why the engineers at Google designed MapReduce, they did it to make search scale better. Same goes for FB, LI, etc ... the others who are hammering the hell out of data sets that would make any SQL server cringe.

      Please use those PHD skills and do some research before you post.
      • Big Data is overrated

        There's so much to do when you have a large set of data that often times you can only do a selective portion well while giving up on the other. All these big data ideas are tuned to favor searching data while giving up on other basic routines such as sorting data.

        When Google was competing with MSFT for a email server solution they lost out b/c they could not even sort and the customers were shocked to learn it. Google's hilarious explanation was that customers should not need sorting ability while the real reason was Google's own infrastructure was so tilted toward search it could not sort well.

        Big Data only solves one part of the equation. It really is not so much of a panacea for the future data need.
      • Old style analysis?

        @mobile_manny <br><br>What exactly is the new style analysis? On what principles is it based? <br><br>Two other important points: <br><br>1. Changing the physical representation need not mean throwing away all of the advantages of the relational model (flexible query language, logical inference, constraint checking). The whole point of the relational model is that it is a logical model that is completely independent of how you represent the data physically. It always amuses me when people talk about "relational storage" as this phrase makes absolutely no sense whatsover. You can't store anything in a relation, it's a mathematical abstraction.<br><br>2. SQL <> Relational. SQL violates some key elements of the relational model. A new alternative language would be much simpler to use and would in all probability give better performance too. However even SQL supports logical constructs like quantification natively while most other popular languages force you down into machine level constructs like loops (Python and Ruby too). In SQL you get a bit nearer to doing your logic directly in logic (which is how it should be, surely?)<br><br>Finally I think new methods of physical storage for relational DBMSs is a very valuable area of research that will lead to big steps forward in performance. I see no need whatsoever to throw away the relational baby with the index sequential bathwater.<br><br>The RDBMS is one of the very best tools we have. It would be crazy to throw it away.<br><br>I don't have a PhD in mathematics by the way, I'm just your common or garden database designer.
  • Were you born with SQL already in your brain? No.

    You had to learn it. So what's different about big data or Hadoop? Employees will learn how to use those, too. Beth Stackpole is worrying for no good reason.
    It stands to reason that someone isn't going to be brilliant with them on Day 1. But that doesn't mean there's a problem. Have a little patience.