'
Special Feature
Part of a ZDNet Special Feature: The Evolution of Enterprise Storage

Best practices for corporate data retention

Setting a clear policy for electronic records management is critical for day-to-day operations, as well as being in compliance with record-keeping laws in the event of litigation.

The ease and convenience of electronic communication has resulted in a substantial increase in documents being produced through the processes of day-to-day workloads that must be retained for normal business needs. More regulations -- which are applied differently in varying industries and jurisdictions -- result in more data that must be retained for legal compliance.

These are some best practices for electronic data retention to observe across the board, as well as key points that highlight the requirements for specific industries.

Determining the information lifecycle for your organization

The first consideration in determining the lifecycle of information generated in your organization is the minimum legal requirement for retaining that information. Information that may be considered old or irrelevant for internal use may need to be retained to comply with the laws in your jurisdiction or industry.

In the US, navigating these laws is a rather labyrinthine process. Federal laws for various industries and the type of information involved vary widely -- there is no universal rule that can be applied to everything. For instance, in the medical field, the data from medical research has different retention requirements between studies with and without public funding, whereas HIPAA has no minimum retention requirement (PDF) for patient privacy; it defers to the requirements set forth in each state.

In Europe, MoReq2010 (Model Requirements for the Management of Electronic Records) is generally held to be the standard for electronic records management, although it does not have any legal force behind it. However, following MoReq2010 would very likely be considered a defensible strategy in the event of an inquiry into record-keeping practices.

Around the world, financial institutions are subject to data retention and reporting requirements of the Foreign Account Tax Compliance Act (FATCA), a US law that requires financial institutions in other countries to disclose "financial accounts held by U.S. taxpayers or foreign entities in which U.S. taxpayers hold a substantial ownership interest." FATCA has been roundly criticized by The Economist, and various organizations such as American Citizens Abroad have called for a repeal of the law, although document retention and production is still required for any business conducted with US citizens.

Software solutions for information lifecycle management

With the exception of industries that have additional retention responsibilities (such as the medical field), the rate at which data is created in small businesses is likely low enough that lifecycle management can be handled by in-house IT without a particular need for a dedicated external software package. Redundancies and backups should be in place, as is the case with any piece of important data.

For midsize and larger organizations that produce greater quantities of data, the use of qualified information lifecycle management software is the safest bet for regulatory compliance.

  • IBM's System Storage Archive Manager is a popular software package for information lifecycle management. Among other features, it has a deduplication function to reduce storage capacity requirements, and multi-tiered storage management.
  • Oracle's Information Lifecycle Management suite is worth consideration for organizations already in the Oracle ecosystem. It also has extensive automated compression utilities and storage tier creation and management.

Considering the costs of compliance

Practically speaking, the cost of data storage is lower than the cost of non-compliance. Google's new Cloud Nearline storage product is $0.01 GB/month. Compared to standard storage, the availability time is generally a few seconds, which is much faster than the comparably-priced Amazon Glacier storage service, where availability time is measured in hours.

For long-term storage, methods of compression are also of interest in reducing capacity requirements. LZMA compression as found in the popular 7-Zip program (also widely supported by other utilities, and in the LZMA SDK) can reduce the file size of text-based records such as databases by up to 85 percent in certain cases, though a reduction of 60 to 70 percent is typical. Although the decompression time for LZMA is minimal, the time needed to compress can take hours depending on the situation and strength of hardware used. Presently, bzip2 has the best balance of compression ratio and processing time, although the resulting files will be somewhat larger than with LZMA.

Summary

Ultimately, for circumstances where non-compliance can result in fines and civil liabilities, it is safer and more cost-effective to retain information than to delete it. The continually plummeting prices for disk and cloud storage make retention a much less burdensome task than managing a mountain of file cabinets.

Also see