If you’re thinking about data protection solutions, you’ve undoubtedly considered the performance optimization benefits offered by deduplication. But, are you somewhat baffled by the different approaches offered? Do you wonder how to effectively compare one vendor’s dedupe technology to another’s?
In general terms, deduplication is the process of identifying and removing or omitting any duplicate data found in an organization’s backup data sets. This is important because whether you’re using on-premise backup software, Cloud-based online backup or a hybrid backup appliance combining both on-premise and the Cloud, deduplication helps reduce the overall size of backup data (or backup data “footprints”) that needs to be transmitted over the LAN or WAN and stored to a secondary disk target, either on-premise or in the Cloud. Duplicate data removed or omitted is often replaced by some type of pointer to the original, remaining file or data block.
However, when comparing different vendor implementations, it’s critical to drill down deeper and evaluate specific characteristics of the deduplication processes offered. For instance, it’s important that you consider:
- The type of data deduplication used in the backup process. The most common types are time-based duplication and horizontal deduplication.
- Where the deduplication occurs in the backup process. Deduplication typically occurs at the front-end source (via server-side software or an inline appliance), at the back-end target (post-process deduplication), or at a combination of both (mixed source-based and target-based deduplication).
- The size of data processed for deduplication. Deduplication can occur at the file level, at the block level, or at the byte level, and the size of the data deduplicated can have a significant impact on the overall optimization and efficiency of the solution.
- The storage location of deduplicated data. There are three places to store deduplicated data: on a local disk target, on a remote disk target either on-premise or in the Cloud, or on tape.
- The impact of deduplication on performance. Watch for performance bottlenecks and lack of optimization for LAN or WAN implementations. Also, some systems are over-dependent on local deduplication hardware. Others can slow performance because they require repeated checks and verifications of local nodes to ensure existence of deduplicated blocks as part of each backup process.
- Costs to deploy or grow the deduplication environment. As your data environment grows, so will your need for extra nodes or controllers. Will your investment in underlying licensed hardware support future deduplication needs?
Of course, when you’re mulling over backup and recovery solutions, deduplication is only one piece of the puzzle. You’ll also have to consider how data compression and encryption fit into your overall strategy, so stay tuned. We’ll talk more about those optimization technologies in upcoming posts.
For more tips about choosing the backup and recovery system that best fits your needs, check out our new white paper, “Deduplication and Beyond: Optimizing Performance for Backup and Recovery,” available here.
Tags: backup and recovery, Data Protection, dedupe, deduplication, hybrid solutions, online backup, remote backup
