It is almost two years since IBM first outlined its PureData System for Hadoop startegy but on now the company is shifting up a gear with the announcement its Biginsights for Apache Hadoop analytics tools.
While more companies are using the open source tool for collecting and storing a very large sets of variable data, IBM argues companies are struggling "to realize its full potential in every part of their business".
As examples it quotes a business analyst who needs to quickly find relevant information and the data scientists who need to make sense of the data with statistical modeling both with the corollary that the highly complex environments that are created need to be easy for IT to manage and deploy for everyone in their organization. No small task.
According to IBM, BigInsights for Apache Hadoop includes 'a broad data science toolset to query data, visualise, explore and conduct distributed machine learning at scale"
IBM has announced three new modules:
- IBM BigInsights Analyst: This will include IBM's SQL engine and spreadsheet as well as visualisations to find data "quickly and easily", IBM says. The company believes that already the number of SQL queries that are running is "in the billions" and that with BigInsights Analyst, the efficiency can be improved, it says, by approximately 2x to 4x on Apache Hadoop, "depending on the shuffle size".
- IBM BigInsights Data Scientist: This is another new machine-learning engine, IBM says, that automatically tunes its performance over large-scale data to find interesting patterns. Along with this is also has "over a dozen industry-specific algorithms" including Decision Trees, PageRank and Clustering. It will also provide native support for open source R statistical computing so that existing R algorithms can be leveraged, the company said.
- IBM BigInsights Enterprise Management: This consists of some new management tools aimed at speeding up results, IBM said.
IBM also announced the IBM Open Plarform with Apache Hadoop which gives the "necessary data access controls and authentication for an enterprise", the company said. IBM has also adding support for Apache Spark.