Data segmentation
MSDP supports two types of segmentation:
Application data format-based segmentation: NetBackup provides a number of stream handlers which were specifically developed to process the unique application data streams, such as Oracle, MS SQL Server, and NDMP NAS. Stream handlers break up the data stream into segments which results in higher deduplication by identifying similar data segments within the data stream and across multiple backup images. Stream handlers are also capable of separating image meta data from actual application data resulting in higher deduplication rates. This type of segmentation is less CPU intensive and higher performance than context aware segmentation.
Data stream context-aware segmentation, also known as variable length deduplication (VLD): the algorithm does not need to fully understand the data format, but it scans the data to identify anchor points in the data stream by moving a predefined window byte by byte to efficiently generate a series of hash results to identify the anchor points when the hash results match predefined values, and then segments are formed based on the anchors. Window sizes are configurable to allow customized tuning for specific workloads. This type of segmentation is application data agnostic, but usually runs slower as it is more CPU intensive.