Storage 2022: Active archiving, ML-enabled volumes on the rise

Unmistakable trends: Tape as cloud, active archiving, sustainability metrics will play major roles in data storage of the future.
Written by Chris Preimesberger, Contributor

Add another "certainty in life" to the conventional death and taxes: the continued growth of structured and unstructured data in clouds, data centers, and personal devices. 

With the streams of files and data emanating from sensors, cameras, connected machines, and people — and with the world's population continuing to increase at an average of 81 million individuals per year — the storage business will never lack for customers. 

But storage pricing isn't getting cheaper for volume discounts, as it did during the cloud era a decade or so ago.  

So people are seeking alternatives for the increasing cost of cloud storage – along with affiliated fees for egress and data protection. One of them is active archiving, in which data initially stored in a cloud (or multiple clouds) is automatically shipped to cold-storage tape archives after a set period of time. Digital tape storage is far less expensive than any cloud system, and its security is mostly airtight. New and more efficient connections between cloud and digital tape mark a significant trend for the coming year.

With this in mind, here are storage industry leaders' predictions about the state of storage in 2022:

Why on-prem storage will become more strategic as data volumes multiply

As data grows – both in size and importance – on-premises storage use will expand in parallel, growing into indispensable infrastructure for a variety of reasons, including security, performance, regulation, cost, and latency. On-premises storage will serve all these critical needs, while the cold and warm storage move to the cloud. 

We will see continuous progress and innovation in the segment of on-prem computing and storage, as well as with innovation on the edge — all driven by the need for 5G base stations, autonomous driving, and associated costs. It will be impossible to store all this data in the cloud. Dr. Hao Zhong, Co-founder and CEO, ScaleFlux

Artificial intelligence will be a game-changer for unstructured data storage and disposition

Both data crawler technology and AI are not new. But they are both getting extremely quick, and equally, if not more importantly, they are becoming fast and accurate. They can enable organizations to identify optimal storage classes within hours of a creation event, as opposed to days or weeks. 

The obvious integration with InfoGov and storage mediums will be apparent with the changing classification of storage: what can be figured out after data is created, and what must be figured out before data is created. This may not reshape the way InfoGov principles are articulated, but it has the immediate capacity to modify the Information Governance Maturity Model (IGMM), where storage and disposition become altered by AI. Plus, with 45TB of storage space (LTO-9) and 3.5TB/hour transfer speeds, large, cyber-nervous storage users may increasingly opt to put active archive data on tape rather than the cloud. – Brendan Sullivan, Founder and CEO, SullivanStrickler

Enterprises will utilize a new cloud class of storage: Tape as object storage or tape as cloud

In 2022, more organizations will roll out a new cloud class known as "Tape as Object Storage" or "Tape as Cloud." Tape as Cloud allows organizations to archive data to a remote cloud storage provider via a cloud API protocol, such as S3. Data is written to tape remotely, and that media is periodically removed and stored offline as an ultimate disaster recovery copy. 

Tape as cloud is very economical and can be used as part of a multi-cloud solution, in which organizations send data to two or more cloud storage providers, or as part of a hybrid cloud solution, where an on-premise cloud storage solution is used with remote Tape as Cloud. – Dave Thomson, Senior VP Sales and Marketing, QStar Technologies

Sustainability metrics will drive new requirements in data storage

In 2020 and 2021, we saw CPU utilization and network bandwidth increases driven into the data center in response to a highly-remote workforce. This increased consumption comes at a cost, with data centers worldwide contributing hundreds of millions of tons of yearly CO2e emissions globally. The IT industry is the focus for this contribution. 

In 2022, we will see organizations that provide and utilize data center services challenged to show measurable progress on sustainability. Reusable energy production and usage is just a start. Organizations will be looking at how products support sustainability outcomes. Data storage on tape will lead the sustainability outcomes, providing data storage at scale that reduces CO2e emissions by up to 90% compared to hard disks and flash. – Kiyoshi Urabe, Manager, IBM Tape Product Management

AI/MLOps will become standard in enterprise and mid-range storage 

For several years, the data storage industry has recognized a need for increased automation in storage systems management. This need is amplified by data growth and by predicted shortages in skilled human resources needed to manage these mountains of data. Industry reports have predicted that storage administrators will have to manage 50 times more data in the next decade – but with only a 1.5X increase in the number of skilled personnel. 

The integration of AI/MLOps (artificial intelligence and machine-learning operations) into large-scale data storage offerings will increasingly emerge to help administrators offload and automate processes – and to find and reduce waste to increase overall storage management efficiency. MLOps can monitor and provide predictive analytics on common manual tasks, such as capacity utilization, pending component failures, and storage inefficiencies. These innovations wouldn't be possible without the application of ML techniques and their ability to consume and "train" from extremely granular system logs and event data during real-time operations.  – Paul Speciale, CMO, Scality

Cold storage heats up

As IT budgets lag data growth rates, pressure builds for creative ways to cost-effectively store, manage, and extract value from more of this data. Emerging architectures and services blur the lines between cold and warm data, with high-performance access and simpler cost models allowing for more effective use of cold data sets. 

Solutions will be deployed within an organization's own data center or colocation facility to maintain data within in-house security parameters and to meet data residency requirements. New erasure coding algorithms optimized specifically for cold storage will enhance data protection and durability for long-term retention, while reducing storage costs significantly versus multi-copy and cloud-based solutions.– Tim Sherbak, Enterprise Products and Solutions Manager, Quantum

Data tape libraries won't be going away anytime soon

Organizations with large volumes of unstructured data will continue to find that a tape-based active archive is their most cost-effective option, rather than using a public cloud service. Data tape libraries provide a low TCO (total cost of ownership) due to the low cost per TB of the cartridges themselves and the low system power requirements, which remain an important element, alongside disk and management software, in high-capacity active archive systems. 

For users who need remote access to on-premises active archives, solutions that offer an object storage interface will gain traction because they allow the archive to be securely shared by remote users and other facilities. Unlike most public cloud services, tape-based active archives avoid unpredictable and costly egress fees. – Philip Storey, CEO, XenData

Predictive ML comes into the database 

A big theme of 2022 in database management will be enterprises finding new machine learning-powered paths to optimization. It's an important trend that should help enterprises burst through traditional limitations set by inflexible data design and data usage trends humans can't foresee. Database admins, once saddled with the unenviable task of producing optimized and performant queries based on imperfect knowledge, will get welcome relief from ML solutions that can intuit where data resides using reliably predictive models. 

I also expect this capability will go further, with ML creating entirely optimized data indexes and automatically handling reindexing and storage management. Whereas AIOps (similarly ML-powered solutions for operations and predictive maintenance) shows some signs of sputtering as a much-anticipated technology, predictive database management should find the brighter destiny as a crucial component of any database operations strategy once its training sets are appropriately refined. – Anil Inamdar, VP & Global Head of Data, Instaclustr

Increased use of active archives to balance the cost of storage and speed of access

Organizations are challenged with extraordinary data growth that is creating a need to balance the cost of storage and the speed of access – literally what data, at what time, should be stored on what storage medium. Cloud is changing the way organizations not only store but use their data. 

The question isn't cloud or not, but what data needs to be in the cloud, on-prem, or both, and when. Workflows are getting more complex, and the seamless integration of applications, regardless of location, needs to be supported. Active Archive solutions solve the issue of utilizing a more cost-effective storage tier, making data available and searchable, and taking advantage of cloud and on-prem solutions in a unified platform.– Betsy Doughty, VP of Marketing, Spectra Logic

Health-care knowledge and reliance will expand in active archives

Health-care knowledge and reliance will expand in the active archive space as more and more organizations look to mine data for improved patient safety and use data to improve treatments and patient outcomes. COVID treatments and diagnoses data found in the active archives of health organizations will become extremely valuable for insurance and CMS audits of payments related to the pandemic. -- Dr. Kel Pults, Chief Clinical Officer, MediQuant

Archive revolution will continue with greater technology choice than ever before

As predicted, 2021 saw the introduction of innovative HDD technologies that continue to push the boundaries on capacity, performance, and reliability. We expect the approximately 30% CAGR of data generated will be sustained in 2022 and fuel the ongoing need for long-term retention and the protection of valuable unstructured data assets. 

Resource disaggregation and composability will continue to proliferate, along with new standards and methods to make better use of CPU, GPU, memory, storage, and networking resources. With these technology trends in play, large and long-term active archive solutions will leverage the low TCO and cost-effective benefits of disk resource consolidation… as well as tape because of its long-term standing in this space. -- Mark Pastor, Director, Platform Product Management, Western Digital

Editorial standards