Data accumulation
Intel-founder Moore's law not only applies to processors, but also to the amount of data produced by these processors. By extending storage capacity, we struggle to control this accumulation of data. Unfortunately, we don't always succeed. Because of storage systems that are not limitless. Or because of extreme growth of data due to e.g. mergers or acquisitions.
'Backup windows'
Storage of data is a mere aspect of data accumulation. Making backups of an ever-growing volume of data is even more challenging. Not only because of the amount of data, but even more so because of the time it takes. When backing up data we try to take a snapshot-image of all data available. While taking that snapshot, that data must remain unaltered. During the time it takes to transform data into a backup, the data may not be altered, so no work can be done. This timeframe of inactivity we call the 'backup window'. Keeping that 'backup window' as compact as possible is a true challenge.
De-duplication
When backups are performed at maximum speed, acceleration of the traditional backup process may only be gained by reducing the data that needs to be backed up. This could be attained by excluding data from the backup schedule, but that oftentimes is not an option. De-duplication offers an efficient alternative.
During the process of de-duplication, smart algorithms detect repetitive data blocks and make certain these are only saved once. Upon detecting a repeating block, instead of storing it again, they store a pointer to the original. Depending on the type of data involved, this has been reported to reduce data by 95%. This effectively shortens your backup window. For de-duplication to be this effective, it needs to be implemented within an appliance. The dedicated hardware will provide the memory and processor speed needed to perform the extensive volume of calculations involved. Quantum has proven to be a pioneer applying this technology. Quantum's DXi appliances have proven to daily offer extremely voluminous reduction of data.
Currently, some recent generations of storage systems feature firmware with embedded de-duplication features. For use with less modern systems, separate external de-duplication appliances are preferred.
Moving
At times it may be necessary to de-duplicate only the data to be transported or moved. Thus, traffic is cleaned of duplicate data 'on the fly', transporting only the reference pointers. This technique is especially efficient when office applications, databases or web traffic are involved. Leading vendors featuring this technique are Riverbed and Blue Coat.