The popular big data program Apache's Hadoop is difficult to use. Indeed, Datanami, an important big data publication, recently found that "the Hadoop dream of unifying data and compute in a distributed manner has all but failed in a smoking heap of cost and complexity". One reason? "It's just a very complicated stack to build on."
Hard to use or not, Hadoop is very popular. Facebook, for one, keeps over 100 petabytes of data on Hadoop. By 451 Research's count, Hadoop is growing at a 38 percent compound annual growth rate (CAGR) through 2020 and by then will reach $4.4 billion in revenue.
Simultaneously, the demand for Hadoop experts is growing at the same rate. According to Foote Partners' IT Skills and Certifications Pay Index, "the need for big data skills also continues to lead to pay increases -- about 8 percent over the last year."
To meet this demand, The Linux Foundation and ODPi, a non-profit organization committed to improving the big data ecosystem, is offering this course. Like other Linux Foundation classes before it, the course will be offered through edX, the non-profit online learning platform from Harvard University and Massachusetts Institute of Technology (MIT). This free course will begin in early June.
"As innovation across the Hadoop landscape continues to skyrocket, we're thrilled to provide accessible, vendor-neutral education for the big data community," said ODPi's Director, John Mertic. "ODPi is committed to reducing ecosystem complexity and, with Roman Shaposhnik [a Hadoop committer and ODPi VP of Technology] leading this 'Introduction to Apache Hadoop' edX course, we look forward to sharing insights that make Hadoop manageable for organizations of all sizes."
Students will learn:
- The origins of Apache Hadoop and its big data ecosystem.
- Deploying Hadoop in a clustered environment of a modern day enterprise IT.
- Building data lake management architectures around Apache Hadoop.
- Leveraging the YARN framework to enable heterogeneous analytical workloads on Hadoop clusters.
- Leveraging Apache Hive for an SQL-centric view into the enterprise data lake.
- An introduction to managing key Hadoop components (HDFS, YARN, and Hive) from the command line.
- Securing and scaling your data lakes in multi-tenant enterprise environments.
The course includes six chapters, each with a short graded quiz at the end. A final exam is required. Students may take the complete course at no cost, or add a verified certificate of completion for $99.
Considering how difficult Hadoop is to master and how strong the demand is for Hadoop programmers, springing for a certificate that shows you have a Hadoop clue looks to be a smart career move.
- The cloud is disrupting Hadoop
- Strata: Cloudera, MapR and others focus on consolidating the sprawl
- The Linux Foundation and edX offer free cloud infrastructure MOOC