Ransomware attackers specifically target and attempt to destroy backup systems to increase the probability of payment. Hardening your system is critical. Please ensure you have reviewed your platform security using the Security Hardening Checklist
Cohesity

COHESITY Documentation

Explore our documentation to get started, discover products & new features, access troubleshooting guides, register sources, platforms support.

Products
Data Security Alliance
Visit Cohesity.com
Demos
Support
Blogs
Developers
Partner Portals
Cohesity Community
© 2026 Cohesity, Inc. All Rights Reserved.
Terms of Use|
Privacy Policy|
Legal|
  1. Home
  2. NetBackup™ Backup Planning and Performance Tuning Guide
  3. NetBackup capacity planning
  4. Sizing for capacity with MSDP
  5. Key sizing parameters
  6. Data types and deduplication
NetBackup™ Backup Planning and Performance Tuning Guide

Data types and deduplication

Different data types deduplicate at different rates. MSDP performs both deduplication and compression of data. Deduplication is performed first and then the resulting data segments are compressed before they are written to disk.

Unstructured data

It is important to understand the different types of unstructured data in the environment for sizing. Some data types will not deduplicate well:

  • Encrypted files:

    Encrypted files will not deduplicate well, and even small changes will often change the entire file resulting in higher change rates than non-encrypted files. There will generally only be small (<10% at best) storage savings from compression. There will be no deduplication between files, which will lower deduplication rates.

  • Compressed, image, audio, and video files:

    Files that fall into this category will not deduplicate well, and there will be no savings from compression.

Note that encryption and compression at the file system level such as with NTFS is transparent to NetBackup, as the files are uncompressed and decrypted by the operating system when they are read. This may result in backups appearing larger in FETB than the data consumed on the file system. These file systems will see good deduplication and compression rates when the data is written to MSDP however.

Databases

Database deduplication will generally be lower than that observed for unstructured data. To achieve optimal deduplication, compression and encryption should not be enabled in the backup stream (for example, with RMAN directives for Oracle).

Database transaction logs will not deduplicate well due to the nature of the data, although savings from compression may be observed. it is important to determine deduplication rates for database backups and transaction log backups separately.

Transparent database encryption options will lower deduplication and compression rates. Initial backups will show minimal space savings. The level of deduplication achieved between backups depends on the nature of the changes to the database. In general, OLTP databases that may have changes distributed throughout the database will show lower deduplication rates than OLAP instances which tend to have more inserts than updates.

NDMP

The notes above for unstructured data apply to NDMP backups. In addition, the nature of NDMP can affect deduplication rates. NDMP defines the communication protocol between filers and backup targets. It does not define the data format. Veritas has developed stream handlers for several filers (NetApp and Dell EMC PowerScale) that allow an understanding of the data streams. Filers without a stream handler may show very low deduplication rates (for example, 20% or lower). In these cases, MSDP Variable Length Deduplication (VLD) should be enabled on the MSDP policies, and a significant increase in deduplication rates will generally be observed.

Virtualization

For virtualization workloads, supported file systems and volume managers should be used so that NetBackup can understand the structure of the data. On configurations that meet these requirements, the deduplication engine will respect file boundaries when segmenting the data stream and significant increases in deduplication rates will be observed.

Determining deduplication rates

Due to the wide variations in customer environments, even within specific workloads, Veritas does not publish expected deduplication rates.

It is recommended that customers perform test in their own environments with a representative subset of data to be protected to determine the actual deduplication rates for the schedule types to be implemented:

  • Initial Full

  • Daily Differential

  • Subsequent Full

  • Database Transaction Log

Deduplication rates can be found in the Activity Monitor in the Deduplication Rate column. When viewing the job details, there is also an entry for deduplication rates:

Oct 8, 2021 12:22:20 AM - Info media-server.example.com (pid=29340)
 StorageServer=PureDisk:mediaserver.example.com; Report=PDDO Stats
 (multi-threaded stream used) for (mediaserver.example.com): scanned:
 1447258 KB, CR sent: 6682 KB, CR sent over FC: 0 KB, dedup: 99.5%,
 cache hits: 11263 (99.2%), where dedup space saving:99.2%, compression
 space saving:0.3%

In this example, the deduplication rate that will be used for calculations is the total rate of 99.5%, which includes savings from compression.

Tests should be run over a period of weeks to capture typical change rates in the environment.

Feedback

Was this page helpful?
Previous

Key sizing parameters

Next

Determining FETB for workloads

Feedback

Was this page helpful?