The promise of Big Data
Big Data solutions promise to help organizations move decision-making from a seat-of-the-pants exercise to a systematic and repeatable process. It can also make it possible to uncover hidden trends and reduce the chance that the organization will be blindsided by rapidly moving events. Another promise of Big Data is that organizations can learn more about their customers, their requirements and how they make purchasing decisions. Taken together, these promises mean that organizations can reduce costs by being able to better choose which products and services should be brought to market and which should be abandoned because customers are not interested.
The challenge of Big Data
Several open source communities made up of people and organizations that need to gather, analyze and report on Big Data repositories have developed and now offer projects that make the process of using Big Data processes much simpler.
Many Big Data implementations are based upon tools from Apache Software Foundation including the following:
- Hadoop — Distributed processing framework designed to harness together the power of many computers, each having its own processing and storage, and provide the capability to quickly process large, distributed data sets.
- Hadoop Distributed File System (HDFS) — a distributed file system designed to support large data sets made up of rapidly changing structured and non-structured data.
- HBase — A distributed database that makes it possible to deal with HDFS data as if it was a structured set of very large tables.
- Cassandra — a multi-master database designed for high availability
- Other tools including Chukwa, Hive, Mahout, Pig and ZooKeeper
It is clear that a Hadoop solution has many moving parts, each of which must be properly installed, configured and optimized for the organization's application. This is beyond the capabilities of some organizations that wish to use Hadoop.