I had the opportunity to speak with Gemstone Systems' Richard Lamb, President, and Raj Kulkarni, COO, about Gemfire Enterprise. This product appears to take lessons from the grid computing world to expand levels of performance and scalability dramatically. They appeared to deal pretty well with my bad puns and I learned a great deal.
How Gemstone describes Gemfire Enterprise
GemFire is an Enterprise Data Fabric (EDF) solution from GemStone Systems, Inc. It is a high performance, distributed memory-oriented operational data management infrastructure that offers very low latency, high resiliency, scalability and high throughput data sharing and event distribution features for high performance computing applications that need access to real-time data.
What does it do?
Here's how the company describes their product:
- Continuous Querying: Applications register complex queries, which are queries with complex predicates, joins, etc and, unlike a regular query, are not just executed once. They become resident in the distributed cache and give the impression that the query is continuously running. As the data changes the continuous query engine calculates how the result set has changed, pushes the "delta" to the application and merges this with a cache result set on the application node.
- Native Object Interoperability across C++, Java and .NET: Applications can share data objects through the distributed cache in real-time. The built-in on-the-wire data format for objects is language neutral and doesn't require any intermediate formats.
- Pools memory across data grid or cluster: GemFire EDF has built in features that allow it to collect memory across nodes at runtime, to optimize data distribution and help scale the application.
- Data Replication: Out of the box, GemFire EDF supports replication of data in memory for high availability and resiliency. GemFire EDF supports multiple topologies to replicate data based upon the cluster/grid size.
- Data Partition: GemFire EDF provides data distribution feature using partition for high scalability across thousands of application nodes.
- Dynamic Scaling: The dynamic scaling capability of GemFire EDF allows graceful scale-out and scale-in capabilities of data nodes at runtime.
- Active Data Management: GemFire EDF offers active data management where interested applications can express interest through query expressions and get instantaneous data delivery notifications when the underlying data changes.
- Reliable Publish-Subscribe Semantics: With GemFire EDF, applications perform create/read/update/delete (CRUD) operations on a local cache and the corresponding event is routed to nodes that subscribe to the data. Data objects can either be synchronously or asynchronously pushed to subscribing applications. Events are pushed to subscribers that contain the new, changed or deleted object(s). The data fabric is intelligent enough to only propagate changes to data objects or its relationships, keeping the underlying network traffic to a minimum.
- Data Source Abstraction: GemFire EDF offers synchronous or asynchronous options to synchronize data in memory with the underlying data store. This feature provides a data abstraction layer on top of the underlying data store.
Many analytical or high volume transactional applications push the limits of system and storage capabilities. So, developers of data management software took to storing more and more live data in in a system's memory rather than saving it on disk each and every time it was updated. This approach, called caching, was designed to improve application performance.
While this approach worked really well when there is a single database server supporting a workload, it eventually creates a bottleneck when the volume of work or the complexity of that work requires more processing that is available in a single computer. Keeping the cache consistent and accurate across many systems without also creating another bottleneck is no mean fete.
Another point is that these high volume or "extreme transaction processing" workloads change data so rapidly that it really doesn't make sense to save the data from each transaction to disk only to pull it immediately back up once again to use in the next transaction. So, companies such as Gemstone looked at the problem and took a different approach to solve the problems performance and scalability typical data management solutions create.
The products create an in-memory cache of the data for these extreme applications and make that data available to applications running on many independent blade computers or other types of systems. This effectively creates a massive symmetric multiprocessing system even though each of the systems in this data grid run its own operating system and own copy of the application. Since transactions across all of the application processing engines are updating a single virtual database, the atomic nature of each transaction can be assured without reducing or limiting the performance or capability this parallel processing environment.
It would seem to me that this technology, largely seen in the world of analytic financial applications could apply equally well to eCommerce, engineering, media creation or other applications that are example of extreme transaction processing.
What other types of applications would see benefits if this approach was used?