Hadoop 2.0 makes MapReduce less compulsory and the distributed file system more reliable.
Big on Data
Veteran data geek Andrew Brust covers Big Data technologies including Hadoop, NoSQL, Data Warehousing, BI and Predictive Analytics.
Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.
Hadoop Streaming allows developers to use virtually any programming language to create MapReduce jobs, but it’s a bit of a kludge. The MapReduce programming environment needs to be pluggable.
As innovative as Hadoop is in toto, its components can benefit from optimization, perhaps significantly. One vendor that’s been in the database business for three decades isn’t just talking about those optimizations. It’s building products around them.
Microsoft has a reputation for modifying external technology when adopting it. But in the case of Hadoop, Microsoft is so far staying true to the core technology, providing optional integration with its own stack, and making it easier for people to work with Hadoop and get excited about it.
Big Data is in a golden age of horizontal opportunity, keeping the prerequisite of vertical market expertise at bay. This provides some early opportunities for tech services firms to gain industry specialist expertise. Big Data is a Big Equalizer.
The Hadoop Distributed File System (HDFS) is a pillar of Hadoop. But its single-point-of-failure topology, and its ability to write to a file only once, leaves the Enterprise wanting more. Some vendors are trying to answer the call.
Our last post presented an analogy for MapReduce. In this post, we layer real MapReduce vocabulary over the example to help decode the jargon that sometimes blocks understanding of Big Data.
Can a skyscraper completed in 1931 be used to explain a parallel processing algorithm introduced in 2004? In this post, I use the anology of counting smartphones in the Empire State Building to explain MapReduce...without using code.
Big Data infrastructure and competency can seem distant from the workaday world of retail planning, strategy and analysis. Bringing the two worlds together would be quite useful though. At least one vendor is trying, through acquisition, integration and leadership experienced in both.
Complex Event Handling (CEP) is the category of technology focused on handling large, continuous streams of data that must be processed in real-time. CEP is distinct from Big Data in the eyes of some, and yet inextricably tied to it as well.