Hadoop 2.0 makes MapReduce less compulsory and the distributed file system more reliable.
Big on Data
Veteran data geek Andrew Brust covers Big Data technologies including Hadoop, NoSQL, Data Warehousing, BI and Predictive Analytics.
Andrew Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology. He has been a developer magazine columnist and conference speaker since the mid-90s, and a technology book writer and blogger since 2005. Andrew serves as Senior Director, Technical Product Marketing and Evangelism at Datameer, a big data analytics company.
Hadoop Streaming allows developers to use virtually any programming language to create MapReduce jobs, but it’s a bit of a kludge. The MapReduce programming environment needs to be pluggable.
As innovative as Hadoop is in toto, its components can benefit from optimization, perhaps significantly. One vendor that’s been in the database business for three decades isn’t just talking about those optimizations. It’s building products around them.
Microsoft has a reputation for modifying external technology when adopting it. But in the case of Hadoop, Microsoft is so far staying true to the core technology, providing optional integration with its own stack, and making it easier for people to work with Hadoop and get excited about it.
Big Data is in a golden age of horizontal opportunity, keeping the prerequisite of vertical market expertise at bay. This provides some early opportunities for tech services firms to gain industry specialist expertise. Big Data is a Big Equalizer.
The Hadoop Distributed File System (HDFS) is a pillar of Hadoop. But its single-point-of-failure topology, and its ability to write to a file only once, leaves the Enterprise wanting more. Some vendors are trying to answer the call.
Our last post presented an analogy for MapReduce. In this post, we layer real MapReduce vocabulary over the example to help decode the jargon that sometimes blocks understanding of Big Data.
Can a skyscraper completed in 1931 be used to explain a parallel processing algorithm introduced in 2004? In this post, I use the anology of counting smartphones in the Empire State Building to explain MapReduce...without using code.
Big Data infrastructure and competency can seem distant from the workaday world of retail planning, strategy and analysis. Bringing the two worlds together would be quite useful though. At least one vendor is trying, through acquisition, integration and leadership experienced in both.
Complex Event Handling (CEP) is the category of technology focused on handling large, continuous streams of data that must be processed in real-time. CEP is distinct from Big Data in the eyes of some, and yet inextricably tied to it as well.
Microsoft's SQL Server 2012 has released to manufacturing. This release of the 20+ year-old database has tie-ins to Hadoop and Big Data analytics in general.
To many, Big Data goes hand-in-hand with Hadoop + MapReduce. But MPP (Massively Parallel Processing) and data warehouse appliances are Big Data technologies too. The MapReduce and MPP worlds have been pretty separate, but are now starting to collide. And that’s a good thing.
Big Data is all the rage these days, as are its constituent technologies like Hadoop, NoSQL, and the mystical discipline of data science. But it turns out that understanding of, and a consensus definition for, Big Data are rather elusive. This blog is here to address that.