Cleversafe launches Hadoop without HDFS; Jaspersoft brings disconnected report editing

Cleversafe launches Hadoop without HDFS; Jaspersoft brings disconnected report editing

Summary: Clever data storage for Hadoop and offline data storage for reports mark new releases today from Cleversafe and Jaspersoft.

SHARE:
TOPICS: Big Data
3

In the Big Data world, we're used to accepting compromises and arbitrary limitations.  For example, accepting the benefits of Hadoop means working with the Hadoop Distributed File System, its immuatable files, its technique of keeping three copies of everything by default, and availability/reliability issues of the Hadoop cluster's namenode.  As a further illustration of fixed limtations, iterative report design, even in a self-service scenario, almost always requires an open, persistent connection to the data source(s).

We accept these limitations based on an intuitive sense of needing to give in order to get, but that doesn't mean we have to like it, or meekly accept it.  What if we could transcend these limitations and still get our work done?  Today, in separate product launches, Cleversafe and Jaspersoft seek to provide such breakthroughs.

Cleversafe swaps out HDFS
Assuming it works as advertised, Cleversafe's company name is a fair reflection of its Hadoop architecture.  While other HDFS alternatives exist for Hadoop (for example, MapR's Hadoop distro, which can mount HDFS-compatible NFS volumes), Cleversafe's Slicestor appliance nodes retain HDFS' distributed nature and maintain fault tolerance too.  Cleversafe does this with "information dispersal" slices: spreading the data around different nodes in the cluster, employing Erasure Coding -- a scheme that allows reconstruction of data from a subset of storage nodes, and eliminates single points of failure without the overhead of HDFS' complete replication. 

Meanwhile, the data is also stored in conventional format on the nodes where it is expected to be used for computation.  The conventional storage assures fast MapReduce operations, and the striped storage assures fault tolerance, without the need (and network traffic and management overhead) to keep multiple full copies of the data.

Namenode issues disappear as well, since a Cleversafe cluster's accesser nodes federate and cover for each other, and the meta data is split up along with the data itself.  Although various high availability namenode technologies are appearing in the major Hadoop distributions now, they nonetheless still use a single central namenode at any given time.  Keeping a warm spare around is not the same thing as having meta data/directory services responsibilities shared among a collection of active nodes.

Although Cleversafe clusters are appliance-based, the appliances nonetheless use commodity processors and  storage.  The added value comes from tuning and optimization, and the unique storage software subsystem.  Cleversafe storage runs about $500 per Terabyte, and can be less depending on total storage size.  On the MapReduce side, Cleversafe uses Cloudera's Distribution Including Apache Hadoop (CDH).

Jaspersoft: we don't need no stinkin' connections
While Cleversafe seeks to liberate data specialists form the tyranny of the HDFS namenode, Jaspersoft seeks to do likewise for end-users with respect to original data sources.  With its new 4.7 release, Jaspersoft has really focused on the reporting scenario and has taken the position that modifying the design of a report shouldn't require going back to the server, if the report already has the data it needs.

Jaspersoft reports now carry with them a full offline snapshot containing the data set, the original query and the formatting information.  From there, users can take advantage of Jaspersoft's browser based report tooling as if they were working in a connected capacity -- the only difference is that they'll be querying the offline cache.

What's especially interesting here is that this disconnected cache interactivity capability is to be included in Jaspersoft's free, open source Community Edition. This opens up interesting, royalty-free embedding opportunities for developers.  And given that the Community Edition, according to Jaspersoft, is often used to build reports on transactional databases, the availability of the offline snapshot cache will provide end-users with a datamart of sorts, thus easing stress on the production database.

Jaspersoft 4.7 also includes improved, intelligent querying of MongoDB, the popular document store NoSQL database, and one which Jaspersoft has found to be increasingly popular with its customers (MongoDB has broken out to top Jaspersofts's Big Data Index).  Users can now filter MongoDB in a more optimized fashion, as this version of Jaspersoft is much more attuned to wortking with JSON (JavaScript Object Notation).

On the Mobile side, Jaspersoft is introducing a native Android app for smartphones running that operating system.  The Android native smartphone app joins the native iPhone app Jaspersoft already had on offer.  For the tablet form factor, Jaspersoft is sticking with the browser and HTML 5.

Incremental Maturity
As we saw with the many annoucements around last month's Hadoop Summit, Big Data companies are working hard to bring Hadoop up to Enterprise quality expectations and the NoSQL and Open Source BI companies are working hard to make their layers stack up as well.  As these Enterprise efforts have progressed, so many point solutions have emerged that there is now some risk of fragmentation in the platform.

But I think the likely scenario is one of evolution, where the best new approaches to storage, high availability and batch/online moderation (from amongst the many permutations proffered), will be widely adopted and the less popular approaches may fade away.

This is a normal part of software maturity and an overall good sign for Big Data.

Topic: Big Data

Andrew Brust

About Andrew Brust

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

3 comments
Log in or register to join the discussion
  • Hadoop on OpenStack Object Storage

    I think we're starting to see a number of viable storage options as Hadoop evolves. HDFS has not stood up as a desirable storage platform in the enterprise. The team at Big Data Craft has recently provided modifications to Hadoop to run natively against OpenStack Storage (http://bigdatacraft.com/archives/349) without the NameNode dependancy, and both the Hadoop modifications and OpenStack are open source.

    This is significant because OpenStack Storage can be deployed privately in the enterprise and is already the storage platform used by Rackspace, HP, and SoftLayer (among others). A Hadoop initiative that an organization starts at small scale in a public cloud could easily be moved in house and expanded to other data on their own storage cluster. Similarly, an in house initiative could be extended into public cloud environments for added compute resources.
    Adam Bane
  • Data Solution

    Andrew, I agree we are seeing more efforts from Big Data companies trying to make Hadoop a mature and complete solution. It is worth mentioning the HPCC Systems as open source offering that is a complete enterprise-ready solution. As an alternative to Hadoop, the HPCC Systems platform designed by data scientists comprises a single architecture, a consistent data-centric programming language (ECL), and two processing clusters: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster providing for real-time data analysis and delivery. Both the Thor and the Roxie clusters use commodity servers and local storage. There is no network attached storage or storage area network in the HPCC Systems architecture. Both clusters use commodity networking and run on standard Linux distributions. For more info visit: hpccsystems.com
    H-M
  • Cleversafe utilizes Dispersed Storage® Technology 2 deliver limitless scale

    More information on complete government and enterprise ready data storage systems: http://www.cleversafe.com/overview/how-cleversafe-works
    Derek Abbring