Some 20 months after version 7.3, the latest iteration of the open-source MySQL Cluster database is now generally available, with a promise of new management features and improved performance.
According to Oracle, which acquired MySQL when it bought Sun Microsystems for $7.4bn in 2010, MySQL Cluster 7.4 has faster in-memory processing and can run analytics workloads more efficiently.
As well as enhanced geographic redundancy features for faster maintenance, the latest version of the ACID-compliant transactional database also provides better reporting on distributed memory use and database operations, and performance-tuning options, Oracle said.
"Feature-wise, I wouldn't say there's really anything new. It's just an improvement of current capabilities across the board," Oracle vice president MySQL engineering Tomas Ulin said.
"It's faster if you have more cores on the system, so it can run on bigger machines with about a 50 percent improvement. It's also more scalable and you can run with better performance on more data nodes. We haven't increased the number of nodes you can use. But at the higher node counts with more cores, it performs better."
Ulin said version 7.4 of the technology, which provides shared-nothing clustering and auto-sharding for the MySQL database, showed the fruits of work in improving the speed of table scans.
"Cluster has always been a great database for fairly simplistic queries with extreme performance and latency requirements - so simple NoSQL-type operations, key-value type operations," he said.
"We've always had the capability of doing complex joins. In the last release we have the ability to do parallel queries as well, so you can actually distribute the load of more complex queries on several nodes and then merge the result in the end. That enables us to scale better on complex queries.
"What we've done with this release, when you get into these kinds of queries, table scans become quite important. So we've done some great improvements in table scans to improve the performance overall of more complex queries, which sort of broadens the use case for us a bit - so we'll see where that takes us."
On the management side, Olin said the ability to add nodes on the fly added in an earlier release brings with it the need to redistribute the data to run evenly across the system and on all the additional machines.
"What we've added with this release, which is a top request from customers, is to be able to see the distribution - to see how the data is distributed on the different machines - and how much data is being used. Also, when you start going in and deleting data, you get gaps. [You need to see] what that can mean and when you can start reclaiming memory," he said.
"It becomes very important - also in these kinds of in-memory databases - because memory is expensive and needs to be utilised in the best way. We've added a lot of extra information that you can access, not only about the data distribution itself but also the usage pattern."
Being able to see whether specific data in the system is being utilised heavily and causing the overall system to perform badly is also important.
"You have very typical cases. This is not a use case for Cluster in itself but it exemplifies what the issue is - you can have the Justin Bieber effect, where you get very hot data for some reason because there's just an excess of people watching whatever he's writing or he's putting up a selfie or whatever. You can get similar usage issues either from hot data or from a faulty part of the system," Ulin said.
"It doesn't have to be the actual database but it can be for some reason that the other system that's accessing the database is going there and doing a lot of pinging of some hot data or wrongly designed, so you create some hotspots in the data. In this release you can now get a lot more information if there are particular fragments in the data which are being heavily accessed and therefore getting an uneven distribution of the load."
Other significant aspects of Cluster 7.4 are improvements to geo redundancy capabilities and online maintenance, which Olin said is now five times faster.
"For mobile operators, for example, maintenance windows are shrinking. A lot of these databases serve a much greater geographic span. If you were just running France, then you could put your maintenance window for Sunday at 1am and have four or five hours to run your maintenance. But now you don't find those longer windows," he said.
"So being able to do this maintenance in a much shorter time span [is important]. Also, at the same time the systems become bigger and bigger, so it takes longer to restart a node that's 256GB versus one that's 32GB. For that reason you need to shrink these things. That's one thing that we've put a lot of effort into to make it faster."
More on databases
- Databricks CEO: Why so many firms are fired up over Apache Spark
- MySQL: Percona plugs in TokuDB storage engine for big datasets
- Cloudera links up with Hadoop developer Cask
- Mesosphere and MapR link up over Myriad to create one big data platform to rule them all
- Teradata rolls out big data apps, updates Loom
- MapR CEO talks Hadoop, IPO possibilities for 2015
- Teradata acquires archival app maker RainStor
- Hortonworks expands certification program, looks to accelerate enterprise Hadoop adoption
- Actian adds SPARQL City's graph analytics engine to its arsenal
- Splice Machine's SQL on Hadoop database goes on general release