Big data: we have the technology, but do we have the people?
Summary: Big data solutions such as Hadoop 'isn’t rocket science, people can learn it.'
Organizations are awash in big data, opening up huge opportunities to understand and predict customer preferences and market growth. In a hyper-competitive global economy, having the right information means competitive advantage.
There is a catch to all this, however. To get to information nirvana, companies need people with the right skills to get them there. People who know how to manage data, build analytics systems, and help make sense of the data.
A recent survey of data scientists by EMC bears this out. A total of 83% felt that new technology would increase the demand for data scientists, and 64% believe that it will outpace the supply of available talent. In fact, a McKinsey Global Institute study predicts that within the next six years, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
Hadoop, an Apache open-source project, is a collection of open-source components designed to to store massive amounts of data across multiple nodes and compact it into an accessible format called the Hadoop Distributed File System (HDFS). MapReduce, often used in conjunction with Hadoop, is a programming construct for building an analytical capability on top of the data. NoSQL (”not only SQL”) databases typically handle non-structured data, including Weblogs, documents, text, PDF, video and audio.
At the same time, companies shouldn’t have to look too find the talent they need to manage big data challenges and opportunities. As part of a series of Webcasts, co-sponsored by Informatica and Cloudera, I had the opportunity to speak with executives and consultants at the front line of the big data explosion.
For example, Binh Tran, CTO and co-founder of Klout, pointed out that skillsets are the “number one” challenge the social networking rating service is wrestling with. “When we first started, it was a matter of digging into it, getting into the online documentation. Finding people with production experience on a large scale is basically difficult. We had to hire people out of the Yahoo and Facebook world.” Tran reports seeing more colleges, at least in the Silicon Vally region, offering Hadoop and MapReduce as part of their curricula.
David Menninger, analyst with Ventana Research, pointed to recent survey results in which more than three-fourths of 169 executives say staffing and training issues are the greatest obstacles to making the most of big data.
Skills are short, but the situation is not hopeless, Cloudera’s Omer Trajman points out. The ability to address big data solutions such as Hadoop “isn’t rocket science, people can learn it,” he states. Just a few years ago, there were “only two people who knew Hadoop” — now those numbers are expanding. “We encourage organizations look at skillsets they have internally and train people. There are a lot of folks who have the right background and can learn to use Hadoop. “It’s more than just finding individuals who already learned and hiring them… there are individuals within your organizations who can really grow into these roles… there’s a lot of folks who can learn Hadoop.”
Here are the positions that will play a role in big data:
System administrators: Responsible for the day-to-day operation of the cluster. “They may manage the hardware components either directly and indirectly, identify the need for additional hardware and bring it on-board.” Responsibilities also include monitoring and configuration, he adds. “They’re also responsible for integration of Hadoop with other systems.”
Developers: Build the platform and analytics apps. “They have familiarity of the tools or algorithms, they might be writing or packaging or optimizing or deploying different MapReduce jobs. “They’ll source and maintain different libaries,” Trajman adds. “Their role is similar to the DBAs role in the database world.”
Data analysts/data scientists: Data analysts and data scientists fall into the same category, Trajman says. These professionals apply algorithms to analytic problems, and do data mining. “Their ability to tell a story with the data is what defines them.” In addition, Trajman says, “they may have domain expertise. They’ll help create data products, create data solutions that drive the business.”
Data stewards: Ultimately responsible for the collection of quality data “Data stewards curate and catalog all the incoming data. There’s a lot of data floating around organizations, and Hadoop can get it centralized. So identifying the upstream data models, having a background in ETL [extract, transform, load] and data modeling are all typical skillsets and backgrounds.”
“All of these skillsets actually exist today in organizations,” says Trajman.
(This article was cross-posted at SmartPlanet Business Brains.)
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
RE: Big data: we have the technology, but do we have the people?
RE: Big data: we have the technology, but do we have the people?
They, corporations etc
Yes, it IS rocket science
Agree 100%
But I suspect that the mathematicians and statisticians will want to stay with the model that has a sound mathematical basis - relational in other words.
The "Big Data" technologies will be gone quite soon, so learning about them is not very important or relevant.
RE: Big data: we have the technology, but do we have the people?
Thank you.
RE: Big data: we have the technology, but do we have the people?
RE: Yes, it IS rocket science
Or you could just take responsibility for yourself, invest in yourself
Amen to that!!!
RE: Big data: we have the technology, but do we have the people?
RE: Big data: we have the technology, but do we have the people?
RE: Big data: we have the technology, but do we have the people?
RE: Big data: we have the technology, but do we have the people?
Well, if they're willing to teach it, I'll do my best to learn it.
The biggest problem is that you're unlikely to see colleges teach it any time soon.
There's an ever-widening gap between business needs and what they teach at college - I've learned that myself the hard way X(.
. . . although I will point out that the first step in being successful is creating a great product, and that all the analytics in the world won't help if your product sucks. I do wonder if it's really worth the cost to do this stuff, or if companies are simply being desperate to make a profit on bad products.
This is an awful lot of resources going into what used to be done via spreadsheets. Do we really need to crunch such a vast amount of data to see what's going on?
But hey, maybe I'm wrong. And I'm willing to join if they'll hire me. Computer Science degree, probably the closest thing they'll get to what they want coming out of colleges right now. Analyzing large data sets was one of the things were were taught.
RE: Big data: we have the technology, but do we have the people?
-Trendwise
Big Data is not new
RE: Big data: we have the technology, but do we have the people?
You are very much uniformed, google hive (or pig). Or how about using ruby or python on top?
"You clearly don't need a PhD in Mathematics or Statistics to use any kind of DBMS, however to build the kind of models that would produce reliable results on very complex data you probably do,"
This is old school analysis, useful in some cases but slow and limited in application. This is not why the engineers at Google designed MapReduce, they did it to make search scale better. Same goes for FB, LI, etc ... the others who are hammering the hell out of data sets that would make any SQL server cringe.
Please use those PHD skills and do some research before you post.
Big Data is overrated
When Google was competing with MSFT for a email server solution they lost out b/c they could not even sort and the customers were shocked to learn it. Google's hilarious explanation was that customers should not need sorting ability while the real reason was Google's own infrastructure was so tilted toward search it could not sort well.
Big Data only solves one part of the equation. It really is not so much of a panacea for the future data need.
Old style analysis?
Were you born with SQL already in your brain? No.
It stands to reason that someone isn't going to be brilliant with them on Day 1. But that doesn't mean there's a problem. Have a little patience.