About variable-length deduplication on NetBackup clients
Currently, NetBackup deduplication follows a fixed-length deduplication method where the data streams are chunked into fixed-length segments (128 KB) and then processed for deduplication. Fixed-length deduplication has the advantage of being a swift method and it consumes less computing resources. Fixed-length deduplication handles most kinds of data streams efficiently. However, there can be cases where fixed-length deduplication might result in low deduplication ratios.
If your data was modified in a shifting mode, that is, if some data was inserted in the middle of a file, then variable-length deduplication enables you to get higher deduplication ratios when you back up the data. Variable-length deduplication reduces backup storage, improves the backup performance, and lowers the overall cost that is spent on data protection.
Note:
Use variable-length deduplication for data types that do not show a good deduplication ratio with the current MSDP intelligent deduplication algorithm and affiliated streamers. Enabling Variable-length deduplication might improve the deduplication ratio, but consider that the CPU performance might get affected.
In variable-length deduplication, every segment has a variable size with configurable size boundaries. The NetBackup client examines and applies a secure hash algorithm (SHA-2) to the variable-length segments of the data. Each data segment is assigned a unique ID and NetBackup evaluates if any data segment with the same ID exists in the backup. If the data segment already exists, then the segment data is not stored again.
Warning:
If you enable compression for the backup policy, variable-length deduplication does not work even when you configure it.
The following table describes the effect of variable-length deduplication on the data backup:
Table: Effect of variable-length deduplication
|
Effect on the deduplication ratio |
Variable-length deduplication is beneficial if the data file is modified in a shifting mode, that is when data is inserted, removed, or modified at a binary level. When such modified data is backed up again, variable-length deduplication achieves a higher deduplication ratio. Thus, the second or subsequent backups have higher deduplication ratios. |
|
Effect on the CPU |
Variable-length deduplication can be a bit more resource-intensive than fixed-length deduplication to achieve a better deduplication ratio. Variable-length deduplication needs more CPU cycles to compute segment boundaries and the backup time might be more than the fixed-length deduplication method. |
|
Effect on data restore |
Variable-length deduplication does not affect the data restore process. |
By default, the variable-length deduplication is disabled on a NetBackup client. From NetBackup 10.2 onwards, you can enable variable-length deduplication by using cacontrol command-line utility. In the previous versions of NetBackup, you can enable it by adding parameters in the pd.conf file. To enable the same settings for all NetBackup clients or policies, you must specify all the clients or policies in the pd.conf file.
From NetBackup 10.2 onwards, the default version of the variable-length deduplication is VLD v2. If you have enabled variable-length deduplication in pd.conf file and image backups do not exists in the storage, VLD v2 is used by default. If image backups already exist in the storage, NetBackup continues to use VLD v1.
In a deduplication load balancing scenario, you must upgrade the media servers to NetBackup 8.1.1 or later and modify the pd.conf file on all the media servers. If a backup job selects an older media server (earlier than NetBackup 8.1.1) for the load balancing pool, fixed-length deduplication is used instead of variable-length deduplication. Avoid configuring media servers with different NetBackup versions in a load balancing scenario. The data segments generated from variable-length deduplication are different from the data segments generated from fixed-length deduplication. Therefore, load balancing media servers with different NetBackup versions results in a low deduplication ratio.
See Managing the variable-length deduplication using the cacontrol command-line utility.
See About the MSDP pd.conf configuration file.