Client-Side Deduplication

Deduplication is an approach that involves multiple usage of the same data parts in various processes.

The new backup format uses client-side deduplication. This approach brings the following benefits:

  • Client-side deduplication is much faster compared to a server deduplication

  • The absence of internet connection issues

  • An internet traffic decrease

  • A server deduplication database constantly grows, and this can cause a significant expense increase. Client-side deduplication uses local capacities only.

How It Works

Regardless of a backup type, the first backup is always a full backup. Bringing a routine to a backup, a backup implies data updates, thus next backup jobs are usually incremental and depend on full backup and previous incremental backups as well.

The backup format reckons for a full backup plan independence, so each separate backup plan has its own deduplication database. Moreover, backup plan generations also have their own deduplication databases.

Once a backup plan is run, the application reads backup data in batches aliquot to block size. Once a block is read, it is compared with deduplication database records. If a block is not found, it is delivered to storage and is assigned with a block ID, which becomes a new deduplication database record. The block scanning continues, and if a block matches any of the deduplication database records, a block with such ID is excluded from a backup plan.

This approach significantly decreases a backup size, especially in virtual environments with a large number of identical blocks.

If a deduplication database is deleted or corrupted, a full backup is always forced

Deduplication cannot work for some types of files. Archives, some media files or database files that are considered as a changed ones will not be handled

For image-based backup type, the approach is slightly different. Instead of cluster reading, a Master File Table (MFT) is read then the mechanism checks which files have been modified. This decreases source data reading exponentially.

Last updated