The economical storage and compute resources of the cloud have changed the notion of what's affordable when it comes to keeping data available. While on-premises databases typically had backups and disaster recovery sites limited to a single target, the economics of scale in the cloud have changed that multiple. It's not unusual to have upwards of a half-dozen copies or more of the data replicated across data centers, and in some cases, regions, as standard built-in features for cloud-managed databases. The same goes for snapshots, which for many cloud-managed database services, extend to the most recent 30 – 35 days. So even if a cloud data center goes down, after a few seconds, the database should keep on going.
But backups are another story. Replication and snapshots won't satisfy requirements for long-term data retention. The good news, once more, is that the economics of cloud storage also make backups more affordable. The difference is that, unlike replication and snapshots, backups are typically not always bundled in as part of the core database service and often must be ordered separately.
Naturally, there is a huge legacy market for data center backup and recovery solutions that could extend to covering data stored in the cloud, such Commvault, Veritas, Cohesity, Veeam, and others. There is a plethora of individual and department-level tools geared toward backing up on-premise data to the cloud.
But the enterprise backup and recovery as a SaaS service is still emerging. AWS and Microsoft Azure offer their own automated backup services. There are also a handful of third-party SaaS services emerging that use the cloud, not just as the storage target, but also the control plane.
Druva was one of the earliest solutions, providing a SaaS service that leverages the autoscaling of EC2 instances, with DynamoDB used for storing metadata (which ensures de-duplication), with snapshots stored in S3 for the first 90 days, and then automatically tiered to Glacier as they age. There's also Rubrik, which offers backup, instant recovery, archival, search, analytics, compliance, and copy data management services. For auditing access, Rubrik leverages AWS CloudWatch for monitoring activity and CloudTrail for drilling down to event logs of actual access.
Now Clumio, a company that recently emerged from stealth a few months ago, has joined the fray. Like Druva, Clumio is built native on AWS, where the storage control plane is deployed, and takes advantage of recent innovations in containerization and microservices. Like Druva, it uses DynamoDB as the fast lookup for metadata and de-duplication; it also uses RDS PostgreSQL database for storing the status and configuration of the backup. For handling the data pipeline for ingest and de-duplication, Clumio uses either Lambda functions for simple, short-lived processes performed in minutes, or containers for larger, more complex jobs spanning hours. It minimizes overhead by dividing backup tasks into small microservices that process in the background.
Unlike Druva, Clumio does not rely on snapshots for indexing or retrieving data. Instead, it divides the data into small 16-Kbyte chunks that can be packed into cloud storage. The benefit is that, in place of the time-based indexes or text searches that would come from snapshots, Clumio can index down to file level.
For now, Clumio's backup service requires that data be virtualized into pools of storage, which could be data sitting in cloud storage services such as AWS Elastic Block Storage (EBS) or VMware. Therefore, it does not currently compete with or match database-specific services that are bundled in or offered as options with DBaaS services such as Amazon RDS, Microsoft Azure SQL Database, or Google Cloud SQL. Clumio has not ruled out developing a database backup service in the future.
On the horizon, Clumio could also eventually develop services that scan for PII data or anomalies like sudden changes that might be caused by malware. It could conceivably enable the footprint of its solution to extend on-premises through hybrid cloud offerings like AWS Outposts. Leveraging the native services in AWS, from its databases to cloud monitoring tools to serverless functions or container orchestration, makes this extensibility possible.
Like rivals, Clumio's control plane sits in one cloud. But we expect that Clumio will soon start orchestrating the backup and storage of data sitting in other clouds. Admittedly, that would involve the overhead of inter-process communications between the control plane in AWS and data flowing into the other cloud(s). So, the service could become more multi-cloud, but it would not have the efficiency of having a local control plane.
For now, the tradeoff to making cloud backup SaaS services cloud-native has come at the cost of cloud-independence, with a truly multi-cloud or cloud-agnostic solution yet to emerge.