About NetBackup Media Server Deduplication (MSDP)
The NetBackup Media Server Deduplication (MSDP) feature lets you choose where in the backup process to perform deduplication. In particular, you can choose deduplicating the data at the source (client) or the target (NetBackup server) side. The system resource usage pattern and thus the performance tuning will be different based on the choice.
To facilitate performance planning and tuning, a brief introduction of the technology is included here. For a more detailed description of the technology, refer to the NetBackup Deduplication Guide.
MSDP deduplication technology is composed of four main components:
Data segmentation: This component is responsible for dividing a data stream into segments and performing fingerprint calculation using, starting with NetBackup 8.1, SHA-2 algorithm against each segment. With proper data segmentation, you can achieve a higher deduplication ratio when the data stream has interspersed data changes.
Fingerprint lookup for deduplication: For each newly created fingerprint, this component compares the new fingerprint against the existing fingerprints that are cached in memory. Store a pointer but not the corresponding segment if a match is found; if there is no matching fingerprint found in the cache, then the corresponding data segment is written to the storage pool. Storing only unique data segments results in a significant reduction in the storage pool.
Data store: This component manages the storing and maintaining the data segments on the persistent storage. It also facilitates the read, write and deletion operations of the stored data segments.
Space reclamation: When a data container in the data store which holds a number of data segments is no longer referenced due to delete or image expiration, the space occupied by the container can be reclaimed, and a container having enough deleted space may go through a compaction operation to get the deleted space released. The storage space reclamation is handled by this component to maintain a robust, efficient, and well performing data store.