MemSQL is an in-memory database with support for transactional and analytical workloads. It has not been that long since we covered version 6.5, and MemSQL's people are not short of ambition, as their vision is to make their product the best database offer out there.
Today, MemSQL is releasing version 6.7, bringing a range of capabilities focused on usability, simplicity, performance, and compatibility with legacy systems and third-party tools. But what is probably going to get the most attention is the free tier and performance enhancement.
Free tiers are nothing new, in databases and beyond. It's a good way to invite users to try a product and see if it works for them. The thinking behind it is if people like the offering and start using it they will eventually outgrow the free tier and get converted to paying users.
Also: Neuton: A new, disruptive neural network framework for AI applications
Free tiers usually come with some strings attached, however. They have a limited duration, or stripped-down capabilities, or both. In databases, restrictions usually involve things such as limited functionality or storage capacity. MemSQL claims to have the best freebie around, as it offers a full version of MemSQL 6.7, with restrictions applying only to memory size and number of nodes -- up to 128GB and 4 nodes.
Nikita Shamgunov, MemSQL CEO, said that anyone using or wanting to use MemSQL can build any application they want for free:
"As long as they don't need support or to scale beyond 128GB or four nodes, MemSQL will remain free for them. And if they grow beyond the free tier, it's a simple request to get an Enterprise license and keep running their workload as normal.
Legacy databases such as Oracle and SQL Server charge a premium for high end features such as Oracle Data Guard and SQL Server mirroring that allow their customers to run mission critical workloads. The MemSQL free tier comes with similar capabilities out of the box."
Disk storage is unlimited, but for in-memory databases like MemSQL, this is a secondary concern. Query processing does not happen on disk, but no disk storage limitations ensures that you won't run out of space for your secondary storage.
Also: Processing time series data: What are the options?
The real question is: How much can you get done with 128GB of memory? This may sound like a lot for the average user, but can any meaningful real-world applications be built with this?
Shamgunov noted that the amount of MemSQL's user base using up to 128GB of memory is less than 20 percent, representing less than 5 percent of MemSQL's revenue. In other words, it's small fish for MemSQL, but not an insignificant category. If you want to give MemSQL a spin, 128GB may go a long way.
MemSQL comes with an annual license. So, if a current customer is under the 128GB and four node limit, they will get MemSQL for free. In theory, they can stop paying MemSQL if they don't need any support.
However, if they need more capacity or support, then they will need an Enterprise license. The free tier of MemSQL can be used on-premises or in the cloud, but it does not apply to MemSQL's managed version.
"When deciding the threshold for the free tier, we wanted to make sure customers would get value with the offering, we weren't concerned if some current paying customers fell into the free category. We wanted to offer an environment that was robust enough to create production-level applications," said Shamgunov.
Also: Knowledge graphs beyond the hype: Getting knowledge in and out of graphs and databases
Shamgunov also added it's seeing that once customers get started with MemSQL they expand their contracts and find other places within their organizations to utilize the technology:
"We see customers starting at sub-$100,000 contracts renew the next year with over million-dollar deals. So, we expect that trend to continue once more people are able to test and put MemSQL into production."
Other than that, it's all about performance in star joins. As star schemas are rather typical in data warehousing, this means existing applications that utilize it can get far faster response times and concurrent throughput. MemSQL highlights the fact this allows interactive analytics on large data sets with rapidly changing data, without resorting to the complexity of pre-aggregating data.
To achieve this, MemSQL has added new proprietary, patent-pending algorithms for star joins that make use of vectorization and SIMD. These algorithms operate directly on MemSQL's compressed column store data formats, called encoded data.
Shamgunov explained that -- instead of doing a hash join in the traditional way, where each row of the "probe-side" table is used to do a function call to search into a hash table created from the "build-side" table -- MemSQL now has a special implementation of hash join that doesn't do function calls in the inner loop.
Also: The past, present, and future of streaming: Flink, Spark, and the gang
Instead, it uses generated, templatized code to process part of the work for multiple probes at once in a single SIMD instruction, operating directly on encoded data. To demonstrate this, MemSQL used a basic star schema data set.
The following is a basic star join query that forces processing of every single row from the fact table, does a join, and groups by columns from a dimension:
Select d_daynuminweek, d_dayofweek, count(*) as c from media_view_fact f, date_dim d where f_datekey = d_datekeygroup by 1, 2 order by 1 asc;â
The results from running the query with and without the new encoded join capability enabled show a speedup of 101 times. MemSQL said the data is not pre-aggregated, and the speedup is due to operating directly on encoded data, using SIMD and the enhanced join algorithm.
Also: Future directions for Apache Flink/Data Artisans
MemSQL 6.7 also introduces MemSQL Studio, a UI of MemSQL clusters to be able to run SQL commands, visualize query plans, profile queries, workloads, and give insights into cluster state and performance. This is somewhat overshadowed by the free tier and performance enhancement, but there's no doubt it will also be a welcome addition for MemSQL users.
Also: Google can now search for datasets. First research, then the world?
All in all, this may be a minor new version release for MemSQL, but it's bringing it forward in a not-so-minor way. The free tier offer will get the attention, and it's not just a toy either, but it's the performance that MemSQL is really betting on to keep users hooked.
Manyverse and Scuttlebutt: A human-centric technology stack for social applications
Are you aware the web is dying in the stranglehold of big tech, from which you'd like to move away, but feel you don't have an alternative? If you are ready for a completely different paradigm, Manyverse and Scuttlebutt may be your thing.
Pretty low level, pretty big deal: Apache Kafka and Confluent Open Source go mainstream
Apache Kafka is great and all, but it's an early adopter thing, goes the conventional wisdom. Jay Kreps, Kafka co-creator and Confluent CEO, digresses. Mainstream adoption is happening, and it's happening now, he says, while also commenting on latest industry trends.
Apache Spark creators set out to standardize distributed machine learning training, execution, and deployment
Matei Zaharia, Apache Spark co-creator and Databricks CTO, talks about adoption patterns, data engineering and data science, using and extending standards, and the next wave of innovation in machine learning: Distribution.
Opinionated and open machine learning: The nuances of using Facebook's PyTorch
Soumith Chintala from Facebook AI Research, PyTorch project lead, talks about the thinking behind its creation, and the design and usability choices made. Facebook is now unifying machine learning frameworks for research and production in PyTorch, and Chintala explains how and why.