Silicon Valley startup ClearStory Data says the new release of its Apache Spark-based analytics software significantly speeds up complex analyses based on multiple sources.
The Menlo Park, California-headquartered company is also citing a reduction in the time needed to prepare data for analysis and a simpler approach to blending data on the fly. The new release features the integration of Spark 1.2 in-memory technology into its data-processing engine.
ClearStory, which has been funded by Andreessen Horowitz, DAG Ventures, Google Ventures, Khosla Ventures and Kleiner Perkins Caufield & Byers, has improved the user interface with a new guided model.
The firm's co-founder and chief architect Vaibhav Nivargi said in a statement that its software now gives users the controls and visibility over how disparate datasets should be harmonised, reducing the time and complexity involved in preparing data for analysis.
"This release strikes a new balance between the power that intelligent data harmonisation brings to business users and the level of precision and control that more data-savvy users typically prefer," he said.
"These new capabilities guide users to the best data to blend together to ensure that the resulting harmonised data can deliver fast, accurate and meaningful insights."
There is also the ability to collect extra statistics, as well as the addition of intelligent semantics to measure how individual attributes overlap across a number of datasets.
The company said its software now offers users improvements in the way it traces and identifies the origins of data, including parent datasets, sources, and data structures and shapes.
Spark started in 2009 as a UC Berkeley AMPLab research project to create a clustering computing framework addressing target workloads poorly served by Hadoop. It went open source in 2010 and has experienced a surge of interest over the past 18 months. Last year Spark was the most active Apache Software Foundation project with more than 450 contributors.
ClearStory Data offers a back-end system based on Spark and a front-end application that sits on top of a number of internal and external data sources of data, including cloud applications. Last October, it launched its Collaborative StoryBoards cloud service.
The back-end engine conducts data inference and profiling to spot relationships between data sources. The blended and harmonised data is then presented to the user through the front-end application, enabling a number of employees to explore the same data simultaneously and add data without additional modelling.
More on big data
- Databricks CEO: Why so many firms are fired up over Apache Spark
- MySQL: Percona plugs in TokuDB storage engine for big datasets
- Cloudera links up with Hadoop developer Cask
- Mesosphere and MapR link up over Myriad to create one big data platform to rule them all
- Teradata rolls out big data apps, updates Loom
- MapR CEO talks Hadoop, IPO possibilities for 2015
- Teradata acquires archival app maker RainStor
- Hortonworks expands certification program, looks to accelerate enterprise Hadoop adoption
- Actian adds SPARQL City's graph analytics engine to its arsenal
- Splice Machine's SQL on Hadoop database goes on general release