About sampling and predictive cache
MSDP uses a memory up to a size that is configured in MaxCacheSize to cache fingerprints for efficient deduplication lookup. A new fingerprint cache lookup data scheme that is introduced in NetBackup release 10.1 reduces the memory usage. It splits the current memory cache into two components, sampling cache (S-cache) and predictive cache (P-cache). S-cache caches a percentage of the fingerprints from each backup and is used to find similar data from the samples of previous backups for deduplication. P-cache caches the fingerprints that are most likely used in the immediate future for deduplication lookup.
At the start of a job, a small portion of the fingerprints from its last backup is loaded into P-cache as initial seeding. The fingerprint lookup is done with P-cache to find duplicates, and the lookup misses are searched from S-cache samples to find the possible matches of previous backup data. If found, part of the matched backup fingerprints is loaded into P-cache for future deduplication.
The S-cache and P-cache fingerprint lookup method is enabled for local and cloud storage volumes with MSDP non-BYO deployments including Flex, Flex Worm, Flex Scale, NetBackup Appliance, AKS, and EKS deployment. This method is also enabled for cloud-only volumes for MSDP BYO platforms. For the platforms with cloud-only volume support, local volume still uses the original cache lookup method. You can find S-cache and P-cache configuration parameters under the Cache section of the configuration file contentrouter.cfg.
From NetBackup 10.2, S-cache and P-cache fingerprint lookup method for local storage is used with the new setup for Flex, Flex WORM, and NetBackup Appliance. The upgrade does not change the S-cache and P-cache fingerprint lookup method.
The default values for S-cache and P-cache:
Configuration | Default value |
|---|---|
MaxCacheSize | 512MiB |
MaxPredictiveCacheSize | 40% |
MaxSamplingCacheSize | 20% |
EnableLocalPredictiveSamplingCache in | true |
EnableLocalPredictiveSamplingCache in | true |
For the systems that use P/S cache, the local volume and cloud volumes share the same S-cache and P-cache size, and the overall memory is limited by UsableMemoryLimit.
The S-cache size is determined by the back-end MSDP capacity or the number of fingerprints from the back-end data. With the assumption that an average segment size of 32KB, the S-cache size is about 100MB per TB of back-end capacity. P-cache size is determined by the number of concurrent jobs and data locality or working set of the incoming data. With working set of 250MB per stream (about 5 million fingerprints). For example, 100 concurrent streams need a minimum memory of 25GB (100*250MB). The working set can be larger for certain applications with multiple streams and large data sets. As P-cache is used for fingerprint deduplication lookup and all fingerprints that are loaded into P-cache stay there until its allocated capacity is reached, the larger the P-cache size, the better the potential lookup hit rate, and the more memory usage. Under-sizing S-cache or P-cache leads to reduced deduplication rates and over-sizing increases the memory cost.