What do you do when you have terabytes and more of data and you want to work it with in real time? Well, one solution is to turn to Apache Storm.
Storm is an open-source high-performance, distributed real-time computation framework for processing fast, large streams of data. It can be used for real-time analytics, online machine learning, continuous computation, and other Big Data jobs.
This program is also very fast. The Apache Software Foundation (ASF) claims that Storm is capable of processing more than a million tuples per second per node. Storm does this by working by streaming data in parallel over a cluster unlike MapReduce, which does it in batch jobs.
If you've been waiting for Storm to become an Apache Top-Level Project (TLP) before using it, you don't have that excuse any more. Storm became a TLP on September 29.
Officially blessed or not, Storm, which began at marketing intelligence company BackType before being acquired by Twitter, is already being used by many top companies looking for the fastest speeds for their Big Data projects. These include Alibaba, Twitter, Yahoo, and Groupon.
Typically Storm is being used in conjunction with Hadoop, but it's not limited to that. Microsoft, for example, appears to be on the verge of .
As Andrew Feng, a distinguished architect at Yahoo, said in a statement, "Today's announcement marks a major milestone in the continued evolution of Storm. We are proud of our continued contributions to Storm that have led to the hardening of security, multi-tenancy support, and increased scalability. Today, Apache Storm is widely adopted at Yahoo for real-time data processing needs including content personalization, advertising, and mobile development. It's thrilling to see the Hadoop ecosystem and community expand with the continued adoption of Storm."
So, if you're looking for an ideal answer for your real-time data processing workloads, you should check out Storm.