Ransomware attackers specifically target and attempt to destroy backup systems to increase the probability of payment. Hardening your system is critical. Please ensure you have reviewed your platform security using the Security Hardening Checklist
Cohesity

COHESITY Documentation

Explore our documentation to get started, discover products & new features, access troubleshooting guides, register sources, platforms support.

Products
Data Security Alliance
Visit Cohesity.com
Demos
Support
Blogs
Developers
Partner Portals
Cohesity Community
© 2026 Cohesity, Inc. All Rights Reserved.
Terms of Use|
Privacy Policy|
Legal|
  1. Home
  2. NetBackup™ Deduplication Guide
  3. Configuring deduplication
  4. About sampling and predictive cache
NetBackup™ Deduplication Guide

About sampling and predictive cache

MSDP uses a memory up to a size that is configured in MaxCacheSize to cache fingerprints for efficient deduplication lookup. A new fingerprint cache lookup data scheme that is introduced in NetBackup release 10.1 reduces the memory usage. It splits the current memory cache into two components, sampling cache (S-cache) and predictive cache (P-cache). S-cache caches a percentage of the fingerprints from each backup and is used to find similar data from the samples of previous backups for deduplication. P-cache caches the fingerprints that is most likely used in the immediate future for deduplication lookup.

At the start of a job, a small portion of the fingerprints from its last backup is loaded into P-cache as initial seeding. The fingerprint lookup is done with P-cache to find duplicates, and the lookup misses are searched from S-cache samples to find the possible matches of previous backup data. If found, part of the matched backup fingerprints is loaded into P-cache for future deduplication.

The S-cache and P-cache fingerprint lookup method is enabled for local and cloud storage volumes with MSDP cluster deployments including Flex Scale, AKS, and EKS deployment. This method is also enabled for cloud-only volumes for MSDP non-cluster platforms that are NetBackup appliance, Flex, and BYO. For the platforms with cloud-only volume support, local volume still uses the original cache lookup method. You can find S-cache and P-cache configuration parameters under Cache section of configuration file contentrouter.cfg.

The default values for non-cluster deployments:

Configuration

Default value

MaxCacheSize

50%

MaxPredictiveCacheSize

20% (10% in NetBackup Appliance)

MaxSamplingCacheSize

5% (10% in NetBackup Appliance)

EnableLocalPredictiveSamplingCache in contentrouter.cfg

false

EnableLocalPredictiveSamplingCache in spa.cfg

false

The default values for cluster deployments:

Configuration

Default value

MaxCacheSize

512MiB

MaxPredictiveCacheSize

40%

MaxSamplingCacheSize

20%

EnableLocalPredictiveSamplingCache in spa.cfg

true

EnableLocalPredictiveSamplingCache in contentrouter.cfg

true

For MSDP cluster deployments, the local volume and cloud volume share the same S-cache and P-cache size. For the non-cluster deployment, S-cache and P-cache are only for cloud volume, and MaxCacheSize is still used for local volume. In case the system is not used for cloud backup, MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a small value, for example, 1% or 128MiB. MaxCacheSize can be set to a large value, for example, 50% or 60%. Similarly, if the system is used for cloud backups only, MaxCacheSize can be set to 1% or 128MiB, and MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a larger value.

The S-cache size is determined by the back-end MSDP capacity or the number of fingerprints from the back-end data. With the assumption that average segment size of 32KB, the S-cache size is about 100MB per TB of back-end capacity. P-cache size is determined by the number of concurrent jobs and data locality or working set of the incoming data. With working set of 250MB per stream (about 5 million fingerprints). For example, 100 concurrent stream needs minimum memory of 25GB (100*250MB). The working set can be larger for certain applications with multiple streams and large data sets. As P-cache is used for fingerprint deduplication lookup and all fingerprints that are loaded into P-cache stay there until its allocated capacity is reached, the larger the P-cache size, the better the potential lookup hit rate, and the more memory usage. Under-sizing S-cache or P-cache leads to reduced deduplication rates and over-sizing increases the memory cost.

Feedback

Was this page helpful?
Previous

NetBackup seedutil options

Next

Enabling 400 TB support for MSDP

Feedback

Was this page helpful?