Last week, in her blog post titled “Software Deduplication: When is Free Not Free?”, ESG analyst Lauren Whitehouse made some really good points about the hidden costs of “no cost” dedupe in backup applications. Over the course of the last year or two, we’ve seen several backup vendors jump in with their own built-in dedupe feature. Whitehouse is spot on when she notes that for existing users, adding it oftentimes comes at a cost.
At i365, we’ve been offering our own built-in dedupe feature for more than a decade – for both on-premise software and Cloud backup customers – at no extra charge. There is no extra/hidden charge for customers to use it as it has always been a standard and core feature to our software and service. So in our case free really means free.
For buyers looking for a backup software solution with built-in deduplication, it’s important that you ask lots of questions and know what you’re getting. Our customers ask a lot of great questions, and one way we help them understand is to first explain the two main types of deduplication they are likely to see for disk-based backup and recovery:
- Source-side deduplication: This method compares today’s planned backup data against the last backup job’s data at the source. Since much data remains the same from backup to backup, source-side deduplication finds, transmits and stores only the new blocks or differences (deltas) found since the last backup.
- Target-side deduplication: This typically happens on the backend (on the media server or “vault”). It compares the differences and similarities of data within a certain group. Differences might be compared between two or more files or two or more data blocks. If two or more duplicate files or blocks are found, the solution will retain one copy of the file with two (or more) unique pointers.
As a general guide, if you’re trying to evaluate the merits of deduplication technology for backup and recovery, you should look at how well the solution can successfully balance the following three goals:
- Significantly decrease the storage footprint or capacity required for backup data
- Minimize utilization/impact on your network and source systems
- Reasonable investment cost for the benefits gained now and in the future
The following set of questions may help focus further inquiry into built-in deduplication:
- When and where is deduplication applied in the backup process?
- If I have more data but not a huge WAN pipeline, how will the deduplication process help me get my data offsite the most cost-effectively?
- How will deduplication impact my recovery time objectives?
- How will deduplication affect my backup window or any need to perform “hot backups” of key applications like Microsoft Exchange?
- What will the deduplication components and additional hardware required cost and what is my expected ROI?
- If the solution is designed to offload deduplicated backup data to physical tape, what are the steps necessary to restore and how long could it take?
- If I wanted to perform backup and deduplication at various remote offices, do I need specialized hardware or media servers at each location?
- How does the deduplication data size (files, bytes or blocks) impact backups or, more often, restores?
- What amount of processing power (memory, CPU, etc.) is needed on the source and/or target to perform the deduplication process?
So when looking to reduce your backup data, don’t get duped. Make sure you find a vendor who won’t increase your costs to dedupe…
Posted by Brandon Farris
