Ransomware attackers specifically target and attempt to destroy backup systems to increase the probability of payment. Hardening your system is critical. Please ensure you have reviewed your platform security using the Security Hardening Checklist
Cohesity

COHESITY Documentation

Explore our documentation to get started, discover products & new features, access troubleshooting guides, register sources, platforms support.

Products
Data Security Alliance
Visit Cohesity.com
Demos
Support
Blogs
Developers
Partner Portals
Cohesity Community
© 2026 Cohesity, Inc. All Rights Reserved.
Terms of Use|
Privacy Policy|
Legal|
  1. Home
  2. NetBackup™ for Hadoop Administrator's Guide
  3. Introduction
  4. Backing up NetBackup for Hadoop data
NetBackup™ for Hadoop Administrator's Guide

Backing up NetBackup for Hadoop data

NetBackup for Hadoop data is backed up in parallel streams wherein NetBackup for Hadoop DataNodes stream data blocks simultaneously to multiple backup hosts.

Note:

All the directories specified in NetBackup for Hadoop backup selection must be snapshot-enabled before the backup.

The following diagram provides an overview of the backup flow:

Figure: Backup flow

Backup flow

As illustrated in the following diagram:

  1. A scheduled backup job is triggered from the primary server.

  2. Backup job for NetBackup for Hadoop data is a compound job. When the backup job is triggered, first a discovery job is run.

  3. During discovery, the first backup host connects with the NameNode and performs a discovery to get details of data that needs to be backed up.

  4. A workload discovery file is created on the backup host. The workload discovery file contains the details of the data that needs to be backed up from the different DataNodes.

  5. The backup host uses the workload discovery file and decides how the workload is distributed amongst the backup hosts. Workload distribution files are created for each backup host.

  6. Individual child jobs are executed for each backup host. As specified in the workload distribution files, data is backed up.

  7. Data blocks are streamed simultaneously from different DataNodes to multiple backup hosts.

The compound backup job is not completed until all the child jobs are completed. After the child jobs are completed, NetBackup cleans all the snapshots from the NameNode. Only after the cleanup activity is completed, the compound backup job is completed.

See About backing up a NetBackup for Hadoop cluster.

Feedback

Was this page helpful?
Previous

Protecting NetBackup for Hadoop data using NetBackup

Next

Restoring NetBackup for Hadoop data

Feedback

Was this page helpful?