X
Tech

Twitter AnomalyDetection tool goes open source

Twitter has released a tool which detects unusual activity across Big Data to the open-source community.
Written by Charlie Osborne, Contributing Writer
screen-shot-2015-01-07-at-10-41-00.png
Twitter has opened up suspicious activity tracker AnomalyDetection to developers.

The social media giant said on Tuesday the tool, dubbed AnomalyDetection, is used by the firm's team to detect unusual traffic events including traffic spikes and surges, as well as the presence of spam bots. In the world of Big Data, such spikes on a company's networks can negatively impact service by flooding Internet lanes, causing denial-of-service problems and website crashes, as well as irritating users on an individual level -- if spam levels are not kept under control, for example.

AnomalyDetection, an open-source R package which automatically detects anomalies, is used by Twitter to scan for spikes in traffic. While traffic surges can be legitimate and the result of events such as Christmas Eve and New Year's Day -- as well as breaking news events or viral stories -- the tool can also find bots, spammers and problems in system metrics.

screen-shot-2015-01-07-at-10-38-29.png

"We're open-sourcing AnomalyDetection because we'd like the public community to evolve the package and learn from it as we have," Twitter says.

According to the company, while this tool may sound familiar to other software recently made open-source, BreakoutDetection, the main difference is that the tools focus on different events. BreakoutDetection, another R package, focuses on breakouts, which are defined as activity shifts in "two steady states and an intermediate transition period."

The two main changes BreakoutDetection monitors are mean shifts and ramp ups. A mean shift reveals a sudden jump in a time series, which creates a jump in CPU utilization from 40 percent to 60 percent, whereas ramp up shifts show a gradual increase in the value of a metric from one steady state to another.

In comparison, an anomaly is defined by Twitter as point-in-time anomalous data point, instead of a changing state.

Monitoring changes across social media networks can prove challenging due to traffic being monitored across different times, seasons and locations, as well as the existence of trends and viral media. In addition, anomalies are contextual in nature, and so techniques developed to track anomalies cannot necessarily be applied in different domains effectively.

The AnomalyDetection tool is available on Github.

Read on: In the world of security

Editorial standards