In fall of 2012, MIT's Sloan Management School issued a report discussing the differences between big and regular data, and also differences in skills that the two demanded.
Organizations that utilize big data differ from those with traditional data practices in that they:
- Pay attention to flows as opposed to stocks;
- Rely on data scientists and product and process developers as opposed to data analysts; and
- Move analytics from IT into core business and operational functions.
How does this play out in business? Companies want to measure customer sentiment or respond to breaks in train tracks in time to effect preemptive change — and they need to analyze the data coming in from remote points as it flows in, not after it has been 'stocked' in a master database or migrated to a data warehouse. They also need the heuristics and statistical analysis skills to know which questions to ask of this data, and how to ask questions to arrive at new processes and even new products that the business sees commercial potential in. To get there, companies must somehow combine these technical skills with people possessing strong business acumen and understanding.
This 'skills bill' isn't easy to fill in businesses today, which is exactly why we keep hearing about competition intensifying for big data professionals. At the same time, however, big data in businesses doesn't run well without contributions from traditional data competencies.
For instance, 59 percent of companies responding to a 2012 survey conducted by analyst firm Information Difference said that their big data projects were 'highly linked' to their master data repositories. In many cases, master data (e.g. customer data, product data, and so on) was being used as 'vectors' into big data queries that began the process of probing piles of unstructured and semi-structured big data for clues on how customers react to certain offers, or how products were being accepted in certain markets, and so on.
In these cases, it was traditional master data that actually formed the core of what big data queries were constructed from — and so it was no surprise that 67 percent of respondents in the same survey also said that master data was driving big data, rather than the other way around.
How companies are actually exploiting their big data, coupled with their continued reliance on enterprise master data, suggests that a 'mixed' skills set might be in order for big data workers.
Big data skills vs traditional skills
First, let's get into some specifics about how big data skills differ from traditional data skills.
Big data demands new programming and analytics skills that today's data analysts typically lack. Most of these skills fall under the heading of 'data science'. They include a strong background in mathematics and statistical analysis, familiarity with newer statistical programming languages like R, a knowledge of analytics modeling techniques, knowledge of data subject matter, and the ability to experiment with data without fear in order to get results. In the past, most data science work has been performed in academic environments, so if the problem was extremely complex, it was understood that the answers might not come right away.
Big data also demands a new set of technical skills that aren't readily found today in many enterprise data centers. Among these 'hard skills' are data architecting that includes the build-out of databases that span terabytes of data, being able to administer software frameworks like Hadoop, expertise in databases like , Cassandra or HBase; or in analytics programming languages and facilities like R or Pig.
But if these are some of the hard skills areas, big data also demands a set of soft skills that enterprise IT has customarily been short on. These include the ability of people to think broadly across the organization, to understand the bottom-line needs of the business, to know which analytics questions to pose to get to those bottom lines, and to measure and communicate results.
Big data demands a new set of technical skills that aren't readily found today in many enterprise data centers.
In the 1980s, a Los Angeles aerospace company grew frustrated with the inability of technically oriented IT personnel who couldn't speak the language of the business. The company conducted a study and learned that music majors and other liberal-arts graduates could be employed and trained as systems analysts, and that they could bridge the gap in understanding so the IT-business connection could be made. It worked.
This kind of thinking again resurfaced when a Booz Hamilton Allen management consulting vice president said companies were experiencing success by adding physicists and music majors to their data science teams — because of their ability to look at big data problems in new ways.
A mix of new and established skills
The moral of the story is that enterprises need a new set of skills for big data — and colleges and universities in partnership with technology solution providers are hard at work to supply them. At the same time, however, there are spots on the new 'data science' teams that are evolving in enterprises with room for those well versed in business knowledge and analytics query development; and those with skills in well-established query products that have extended analytics capabilities and proven track records — such as SAS or Cognos.
One characteristic of enterprise big data environments that isn't necessarily present in academic settings is that time-to-results is paramount. SETI, for example, spent 50 years probing radio emissions for signs of extraterrestrial life, enlisting volunteers to analyze this unstructured data for signal patterns that might indicate communications from another civilisation. It never identified a pattern that could establish extraterrestrial life.
Enterprises, on the other hand, have sales forecasts, product launches and business risks to manage — not to mention customer satisfaction and fulfillment. If something goes wrong, or is predicted by analytics to go wrong, the organization will marshall every on-board skills set it has to solve that problem — whether it's possessed by a recent university graduate who is a statistician-programmer, or by a business-savvy analyst with SAS expertise.