​Apache Spark-based ClearStory ramps up its analytics software

Improvements to the Spark-powered analytics platform from ClearStory Data are designed to increase the speed of data analysis and reduce data-preparation times.

Read this

ClearStory CEO: How Apache Spark is helping bring analytics to the average Joe

With a new analytics cloud service unveiled earlier this month, CEO Sharmila Mulligan explains how ClearStory's engine is shifting data insights to ordinary users.

Read More

Silicon Valley startup ClearStory Data says the new release of its Apache Spark-based analytics software significantly speeds up complex analyses based on multiple sources.

The Menlo Park, California-headquartered company is also citing a reduction in the time needed to prepare data for analysis and a simpler approach to blending data on the fly. The new release features the integration of Spark 1.2 in-memory technology into its data-processing engine.

ClearStory, which has been funded by Andreessen Horowitz, DAG Ventures, Google Ventures, Khosla Ventures and Kleiner Perkins Caufield & Byers, has improved the user interface with a new guided model.

The firm's co-founder and chief architect Vaibhav Nivargi said in a statement that its software now gives users the controls and visibility over how disparate datasets should be harmonised, reducing the time and complexity involved in preparing data for analysis.

"This release strikes a new balance between the power that intelligent data harmonisation brings to business users and the level of precision and control that more data-savvy users typically prefer," he said.

"These new capabilities guide users to the best data to blend together to ensure that the resulting harmonised data can deliver fast, accurate and meaningful insights."

There is also the ability to collect extra statistics, as well as the addition of intelligent semantics to measure how individual attributes overlap across a number of datasets.

The company said its software now offers users improvements in the way it traces and identifies the origins of data, including parent datasets, sources, and data structures and shapes.

Spark started in 2009 as a UC Berkeley AMPLab research project to create a clustering computing framework addressing target workloads poorly served by Hadoop. It went open source in 2010 and has experienced a surge of interest over the past 18 months. Last year Spark was the most active Apache Software Foundation project with more than 450 contributors.

ClearStory Data offers a back-end system based on Spark and a front-end application that sits on top of a number of internal and external data sources of data, including cloud applications. Last October, it launched its Collaborative StoryBoards cloud service.

The back-end engine conducts data inference and profiling to spot relationships between data sources. The blended and harmonised data is then presented to the user through the front-end application, enabling a number of employees to explore the same data simultaneously and add data without additional modelling.

More on big data