Disclosure: Readers should take careful note that SiSense is a client of my company, Blue Badge Insights. This article is not commissioned or compensated by SiSense, and I intend it as an objective report on the company's new product announcement. However, my point of view is certainly subjective, and readers should bear that in mind.
SiSense, a company on which I have reported before, has introduced a new version of its Prism product, dubbed Prism 10X, which delivers major enhancements to its underlying ElastiCube database engine. The company says the new version provides 100 times the data capacity and 10 times the speed as competing, in-memory analysis solutions running on on the same hardware.
Think global, act local
A lot of available Big Data solutions work on a "scale-out" approach to processing data. Typically this means adding more commodity servers to a cluster, so that more data can be processed in parallel, allowing processing times to remain reasonable even as data volumes increase.
Such an architecture is very powerful, but it can take focus away from the parallel optimizations that can be had on a single machine. These in-machine optimizations are exactly where SiSense's engineers have concentrated their efforts, employing a combination of cache-awareness, columnar compression, predictive pre-fetching and vectorization.
SiSense's ElastiCube engine focuses not just processing data in-memory, but also within a Central Processing Unit's (CPU's) on-board cache. Moving data in and out of cache is much faster than doing so with Random Access Memory (RAM), and while many data engines use cache only incidentally, Prism targets in-cache data manipulation explicitly.
Cache is much smaller than memory, so the ElastiCube engine employs columnar compression, not just for the storage of data on disk, but also for its in-cache persistence. The engine also factors out queries into sub-queries (which SiSense calls "instructions" and says tend to repeat) and pre-fetches results for the sub-queries that its heuristics tells it users will want. Interestingly, these heuristics improve as the engine's workload increases, so greater load on the system can actually lead to better performance.
Prism not only targets cache, but makes use of newer CPUs' "single instruction, multiple data" (SIMD) instructions, which process several data values at once, rather than one at a time. This facilitates parallel processing within a machine, rather than between nodes (servers) in a cluster. This technique is sometimes referred to as vectorization.
Start your engine...and then keep going
As obsessive as the SiSense engineering team is about crafting a super-efficient query kernel, Prism is more a competitor to data discovery and visualization tools like Tableau, QlikView and TIBCO Spotfire than it is to a data warehouse or online analytical processing (OLAP) products. Prism includes scatter charts, wind-roses, funnels, scatter and areas maps, among others, and SiSense says that "thousands of combinations are available."
Competing data discovery tools have their own engines too, and use a combination of columnar and in-memory techniques to attain high performance. But they don't seem to exploit cache and SIMD operations (what SiSense calls "in-chip analytics").
Room to grow
SiSense is far from perfect. As good as its single node optimizations are, its lack of cluster-based deployment capability will be a turn-off to some who are looking for petabyte-scale solutions. But for data discovery work, terabyte-scale is where many (if not most) enterprise customers are right now.
Clustering capabilities may come in a future release. But for now, SiSense is focusing on a high-speed, integrated solution for single-box data discovery work, and its reported 520% year-on-year growth is making the company feel its approach is quite well-validated.