Deduplication will exist everywhere

Deduplication will exist everywhere

Summary: Most customers were just starting to get their arms around all the different deduplication approaches available in disk appliances and VTLs from vendors when backup software vendors and even non-storage related vendors began announcing deduplication capabilities.We all know the appliance and VTL vendors offering dedupe, including COPAN Systems, Data Domain, EMC, Exagrid, FalconStor, HP, IBM (Diligent), NEC, NetApp, Quantum, Sepaton, Sun StorageTek, and others.

SHARE:
TOPICS: Data Management
3

Most customers were just starting to get their arms around all the different deduplication approaches available in disk appliances and VTLs from vendors when backup software vendors and even non-storage related vendors began announcing deduplication capabilities.

We all know the appliance and VTL vendors offering dedupe, including COPAN Systems, Data Domain, EMC, Exagrid, FalconStor, HP, IBM (Diligent), NEC, NetApp, Quantum, Sepaton, Sun StorageTek, and others.

And there were existing backup software vendors, including EMC Avamar, Symantec NetBackup PureDisk, and many online backup software vendors, like Asigra. Now add CommVault Simpana 8.0 and IBM Tivoli Storage Manager (TSM) V6.

But just because the deduplication is performed in software doesn't automatically make it source deduplication. With source deduplication, the deduplication is performed on the client (the server or desktop/laptop that you want to backup) before its transmitted over the LAN. Even though it's performed in software, IBM TSM and CommVault Simpana provide target deduplication. The deduplication is not performed on the client — it's performed on the media server and stores the data in deduplicated form on whatever disk target you have. So source vs. target is not what does the deduplication but rather where deduplication is performed. You can expect other backup software vendors to add deduplication capabilities in the future.

Any software vendor that manages content will build in dedupe: Ocarina is a good example of this. So will software vendors that manage storage capacity, particularly those vendors whose offerings have a built in volume manager and filesystem. VMware is a good example here. The company introduced deduplication capabilities in vSphere (v4 of of Virtual Infrastructure).

Plus, vendors such as NetApp offer deduplication in their production storage systems. NetApp customers have seen good dedupe ratios in virtual environments (server and desktop) and file shares. EMC has introduced file level deduplication capabilities in its Celerra offering. Neither vendor charges for dedupe. You can bet more storage vendors will add file and eventally block-level deduplication functionality.

AND there are completely new entrants like Riverbed. Surprised? It makes sense when you realize that WAN optimization vendors like Riverbed have been deduplicating data all along - that's partly how they're able to reduce bandwidth requirements for workloads like remote backup and replication. They simply "rehydrate" the data before its written to disk. Now imagine a WAN optimization appliance as a gateway to a NAS system deduplicating data inline and storing the data in deduplicated form, not rehydrating it.

IT professionals need to know that deduplication will be available everywhere in the environment, in software and hardware, and in production environments, not just backup and archiving. There will be pros and cons to each approach and it's likely that you will leverage multiple approaches in your environment.

I also expect that since deduplication exists everywhere and is quickly becoming a standard feature of software and disk systems, vendors soon won't be able to charge a significant premium for it, if any premium at all.

I'm in the process of finishing up a report on the state of deduplication and I'd welcome any comments on the subject. Are you still struggling to decide between approaches? If you've already dedplyed dedupe, is it living up to the hype? Are you using dedupe to reduce bandwidth requirements between data centers and remote offices?

Topic: Data Management

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

3 comments
Log in or register to join the discussion
  • Global Dedupe

    Please make sure to factor in Global Dedupe as
    well in your report. Without it, you dont have
    a scalable nor in some cases a redundant
    system.
    unredeemed
  • Datacastle PC Backup & Data Protection

    Datacastle offers five best of breed components in one
    enterprise PC data privacy solution ? integrating
    policy-driven PC backup, AES encryption, automated PKI
    key management, data deduplication, proactive data
    deletion and device tracing @ www.datacastlecorp.com
    BartPestarino
    • Data who?

      Stop spamming on a product that is never on any
      radars of analysts nor reviewed in large trade
      publications.

      Best of breed backup has historically been the
      likes of Symantec, EMC, IBM, and CommVault.
      Dedupe is right now run by DataDomain as the
      market leader, but there are some strong
      vendors like NetApp, HP, and Symantec that have
      other value adds.

      Data Castle is still on VC funding! Best of
      breed nothing IMO.
      unredeemed