On-site, cloud, or what? Where data should reside in hybrid environments

Advice on where data goes as the enterprise cloudifies.

As we've seen in recent years, the all-on-premises system is going the way of the telephone booth -- they will be few and far between. At the same time, everything isn't rushing to the cloud all at once, either. Rather, there will be a great in-between space for a long time to come, with most enterprises somewhere along a spectrum between on-premises and cloud.

Photo: Joe McKendrick

That's all good, but where should the data land? Presumably, some of it will go to the cloud, and some of it will remain on-premises. This means some hard decisions need to be made, along with rethinking the way we handle data. It also opens up more opportunities to extend and enrich applications beyond what was possible before.

As Marcio Moura and Christine Ouyang, both with IBM, explain in a white paper published by the Cloud Standards Customer Council, "a hybrid cloud allows different personas to work with data and analytics capabilities where it makes the most sense and helps to define the requirements where the data and analytics capabilities should be placed in the hybrid cloud environment. As a result, analytics workloads can run more efficiently wherever the data is stored."

The data challenge associated with hybrid cloud is is top of mind for Dave Nielsen, head of ecosystem programs for Redis Labs, who says the growing reliance on cloud-based services means designing new types of data management systems, and where you keep your data along the on-premises-to-cloud spectrum is important. I recently caught up with Nielsen, who states that "one of the challenges in the use of cloud services is designing the data management system. As organizations begin to implement cloud solutions alongside their existing ones, designing an effective data management system across cloud environments is especially important. Data must be maintained and distributed in any format, anywhere - on-premises, in the cloud, multi-cloud, and hybrid-cloud, in an autonomous manner."

How do managers determine what applications and data may still be best suited for on-premises database systems versus cloud? "Data can be complicated. Some data can be exposed publicly, and some data cannot risk even a hint of exposure," Nielsen states. For highly sensitive data, "an air-gapped, on-premises system may provide better security." Sensitive data, "which may be collected from multiple sources, can often be stored on-premises for safe keeping especially when the data is used by only a few internal users, for use cases like research."

Speed of access is another factor in where data should reside. Local in-memory systems provides almost instantaneous access. There may simply be too much data on-premises to attempt a cloud move, Nielsen adds. "Large collections of data can take hours, even days, to move from one data center to the cloud. Applications working with large data on-premises should be located near the data where it can better interact with the data. This is especially relevant to modern applications that rely on real-time user data interaction, high-speed analytics, personalization, IoT or recommendations."

Often, "data gravity is driving decisions for where data is stored," Nielsen explains. "Wherever data begins to collect, more data will naturally begin to collect around that data, especially when the combination of data makes it more valuable." Cloud storage may be most practical "where data is naturally exposed to multiple users across regions, such as collaboration tools," he continues.

In addition to data security, the primary challenge with supporting hybrid data environments is platform consistency, Nielsen cautions. "Data located in the cloud is often stored in databases that run on platforms that are different than ones found on-premises. Maintaining these heterogeneous platforms can be difficult. Integration can also be complex, not to mention hybrid security models. Platform consistency can help. Perhaps Kubernetes will become the Linux-of-the-cloud-platforms, but it seems we are a ways from that. Choosing a database system that was built to run both on-premises and in the cloud can help reduce platform consistency and integration friction."