With massive amounts of data flowing through organizations and taking up valuable storage space, IT departments often have two choices in dealing with the influx: purchase more storage arrays or eliminate unnecessary data. Adding capacity through investments in storage arrays can quickly become cost-prohibitive. At the same time, deleting data in mass is usually not an option for a myriad of reasons. The answer? Data deduplication.
What is Data Deduplication?
Data deduplication is an automated process that searches for redundant data, particularly large sequences of it. As data is processed, data deduplication uniquely identifies data segments. Those segments are compared to other data on the storage arrays. Unique segments are stored. Once duplicate data is found, compared, and confirmed, it is then eliminated which frees up space previously occupied by an unnecessary, redundant copy. Instead of leaving copies of these long sequences of redundant data, references are placed that point to the first uniquely stored version of the data. Future matches are not stored either; they too are eliminated and replaced with references.
Common Uses for Data Deduplication
Many operations, such as backup operations, are highly redundant, making them excellent candidates for data deduplication. For example, imagine a data set being backed up daily for a one month period. Each day, the backup software makes a copy of the data set. By the end of the month, that's 30 copies of the backup set. Though much of the data will have changed since the first day, a lot of it will be identical. Using data deduplication to eliminate all those identical copies can free up a great deal of storage capacity.
The Benefits of Data Deduplication
Data reduplication can dramatically shrink an organization's storage needs as well as improve bandwidth efficiency. It can also significantly reduce the organization's storage costs. For example, instead of requiring dozens of terabytes of capacity to hold a large backup set, you might need just one terabyte of storage space thanks to data deduplication. Since there's less data to store, there's fewer hard disk drives required to store it all.
Once you do the math and calculate the dollar value of storage arrays that you won't need, you'll soon discover the economic value of data deduplication. The economic benefits extend to cooling costs to keep the storage arrays cool as well as reduced physical space requirements for storing storage arrays.
Data deduplication also makes for faster disaster recoveries because there's not nearly as much data to restore.
Data deduplication makes sense on almost all levels. It frees up storage space without requiring you to delete data you may actually need. All data that is eliminated by data deduplication is merely redundant and completely unnecessary. References are placed during data deduplication, ensuring that the actual data you need to retrieve will be referenced and displayed when you call it up. Meanwhile, your organization's storage requirements will fall - as will its storage-related costs. Best of all, data deduplication takes place automatically and in the background. End-users will not be affected by data deduplication. In fact, they may even notice an improvement in network operations.
Stephanie is an author and expert in data storage technology. She advises her readers on what data deduplication is and how it benefits an organization with resources gathered from Tegile (Source: http://www.tegile.com/blog/tegile-zebi-storage-arrays-deduplication-and-performance-for-primary-data-a-balanced-approach-lab-validation-report ). In her free time she enjoys spending it with her family and traveling around the world.