Troubleshoot PaaS workload protection and recovery issues

Backup job fails or falls back to full for a PaaS database

If you manually delete a backup image created by an incremental schedule, the next incremental backup may either fail or run as a full backup. The outcome depends on which image was removed.

For example, consider a backup sequence with images F1 (full), I1, I2, and I3 (incremental).

If the most recent incremental image (I3) is deleted, the next backup automatically switches to a full backup.
If an earlier incremental image (such as I1 or I2) is deleted, future incremental backups fail. The error indicates missing images between the last full backup and the current backup time.

The error message specifies the expected number of images (X) and the actual number found (Y) in the catalog.

Backup fails with error: 3808 Cannot check if the database exists.

You can see the following message in the Activity Monitor:

AuthorizationFailed -Message: The client '<clientId> '<objetId>' does not have authorization to perform action 'Microsoft.Sql/servers/databases/read' over scope '<resoourceId>' or the scope is invalid. If access was recently granted, refresh your credentials.

Explanation: This error occurs when the Snapshot Manager and NetBackup are deployed in AKS, and:

The media server pod node pool is a different node pool from the Snapshot Manager node pool.
Managed Identity is enabled in the Snapshot Manager Virtual Machine Scale set.

Workaround: Do any of the following:

In the media server for backup and restore, enable Managed Identity on the Scale set. Also, assign required permission in the role attached to this managed identity.
Create a storage unit on the MSDP server, and use only those media servers that have the Managed Identity feature enabled on Scale configuration.

Backup fails when the database or the resource group has a read-only lock applied, and is partially successful when the delete lock is applied.

Explanation: This issue occurs if the Read-only lock or Delete lock attribute is applied to the database or the resource group.

Workaround: Before performing any backup or restore, remove any existing Read-only lock and Delete lock attributes from the database or the resource group.

Status Code 150: Termination requested by administrator

Explanation: This appears when you manually cancel a backup or a restore job from the activity monitor and a database is created on the portal during the partial restore operation.

Workaround: Manually clean up the database on the provider portal, and temporary staging location at the universal share mount location under a specific directory created by the database name.

Stale status messages in the Activity monitor

Explanation: If the Snapshot Manager container service restarts abruptly; the provider-protected restore jobs may remain in the active state and you may not see the updated status on the activity monitor details page.

Workaround: Restart the workflow containers using the following command in the Snapshot Manager:

docker restart flexsnap-workflow-system-0-min flexsnap-workflow-general-0-min

After restarting the containers, the restore jobs are updated with the latest status in the activity monitor.

Status Code 233: Premature EOF encountered

Explanation: Appears if the client name used for the backup exceeds the length of 255 characters.

The bpdbm log confirms the same by displaying the following error message:

db_error_add_to_file: Length of client is too long. Got 278, but limit is 255. read_next_image: db_IMAGEreceive() failed: text exceeded allowed length (225)

Note:

This is observed when the primary server is RHEL.

Workaround: Rename the database such that the client name fits within the length of 255 characters.

EMM status: Unsupported media server.

Explanation: This occurs when the media server is running a lower version of NetBackup or is hosted on an unsupported platform.

Workaround: You must have at least one media server with the following configuration:

NetBackup version 11.1 or later.
Hosted on Red Hat Enterprise Linux

You can either add a new media server with the supported version and platform, or upgrade an existing one to meet these requirements.

An unexpected failure occurred

Explanation: Job details include the following error:

An unexpected failure occurred: Data plan execution failed with the message - "One or more errors occurred. (One or more errors occurred. (One or more errors occurred. (Exception of type 'System.OutOfMemoryException' was thrown.")

Workaround: This issue is caused by a limitation in Microsoft's SqlPackage utility.

To resolve it, increase the available memory on the media server to match the database size, and then retry the job.

There is no exact formula to calculate the required memory, but it is recommended to allocate at least twice the memory of the database schema size.

Error: Broken pipe (32), premature end of file encountered EXITING with status 42, network read failed

Or,

Status 174: media manager - system error occurred

Explanation:Occurs during backup if the policy prefix length during protection plan creation is larger than the allowed length. Due to this the file path length of the catalog image exceeds 256 chars and fails with the above error message in the Activity monitor.

The bpdbm log confirms the same by displaying the following error message:

<16> db_error_add_to_file: cannot stat(\\?\C:\Program Files\Cohesity NetBackup\NetBackup\db\images \azure-midb-1afb87487dc04ddc8fafe453dccb7ca3+ nbux-qa-bidi-rg+eastus+az-sql-mi-bidinet01+ testdb_bidinet02\1656000000\tmp\catstore\ BACKUPNOW+141a73e7-cdc4-4371-823a-f170447dba2d_ 1656349831_FULL.f_imgUserGroupNames0): No such file or directory (2) <16> ImageReadFilesFile::get_file_size: cannot stat(\\?\C:\Program Files\Veritas\NetBackup\db \images\azure-midb-1afb87487dc04ddc8fafe453d ccb7ca3+nbux-qa-bidi-rg+eastus+az-sql-mi-bidinet01+testdb_ bidinet02\1656000000\tmp\catstore\BACKUPNOW+141a73e7-cdc4-4371 -823a-f170447dba2d_1656349831_FULL.f_imgUserGroupNames0): No such file or directory (2) <16> ImageReadFilesFile::executeQuery: Cannot copy \\?\C:\Program Files\Veritas\NetBackup\db\images\azure-midb-1afb87487dc04ddc8fafe453dccb7 ca3+nbux-qa-bidi-rg+eastus+az-sql-mi-bidinet01+testdb_bidinet02\1 656000000\tmp\catstore\BACKUPNOW+141a73e7-cdc4-4371-823a-f170447d ba2d_1656349831_FULL.f_imgUserGroupNames0

Note:

This is observed when the primary server is Windows.

Workaround: Use a policy prefix name in the protection plan with length less than 10 characters, so that the total length of the catalog path is less than 256 characters.

Status Code 3801: Cannot complete the requested operation.

Explanation:NetBackup is not able to successfully carry out the requested operation.

Recommended action: Refer to the activity monitor details for the possible reasons for failure.

Status Code 3817: Cannot complete the pre-backup operation

Explanation: The error message seen in dbagentsutil logs as,pg_dump: error: query failed: ERROR: permission denied for table test;pg_dump: error: query was: LOCK TABLE public.test IN ACCESS SHARE MODE;Invoked operation: PRE_BACKUP failed

Occurs when you try to back up a database that has multiple tables with different roles. If tables have at least one different owner, other than the database owner, and it is not a member of the database owner role, then the backup may fail.

Recommended action: You must have a role that has access to all tables inside the database that you want to backup or restore.

For example, say that we wanted to back up the School database which has two tables.

student, owner is postgres
teacher, owner is schooladmin

Create a new role. Say, NBUbackupadmin

Run the following command to create the role:

postgres=> CREATE USER NBUbackupadmin WITH PASSWORD '***********';

CREATE ROLE

To make this new role a member of postgres and schooladmin role, run:

postgres=> GRANT postgres TO NBUbackupadmin;

GRANT ROLE

postgres=> GRANT schooladmin TO NBUbackupadmin;

GRANT ROLE

Note:

You must have a role who is either owner or member of the owner of the table, for all tables inside the database.

Backup fails with Status 40 (Network connection broken)

Explanation: Backups fail due to loss of connectivity to the media server.

Recommended action: You can restart the backup job if the policy has checkpoints enabled. Once the network issue is resolved, select the incomplete backup job in the web UI and click Resume. The job resumes from the point it was stopped. If the checkpoint is not enabled in the policy, the job shows up as a failed job in the web UI.

Backup job fails with the error: "Failed to backup database"

Explanation: The Job details contain additional details: ManagedIdentityCredential authentication unavailable. The requested identity is not assigned to this resource. The allocated media server does not have any Managed Identity attached to it.

Recommended action: When using system or user-managed identity for the PaaS Azure SQL and Managed Instances, apply the same permissions and rules to the media server(s) and the Snapshot Manager. If you use user-managed identity, attach the same user-managed identity to the media server(s) and the Snapshot Manager.

Error code 3842 - The requested backup type for the corresponding PaaS asset is unsupported.

Differential incremental backup is supported for only for Azure SQL server and Azure SQL Managed Instance. When you select an unsupported backup type, this error appears.

Backup job fails with the error: Invalid object name 'cdc.<schema_name>_<table_name>_CT'

Explanation: Backup failed due to an error while executing the backup query script on the Azure SQL Server. The following error message appears:

('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server] [SQL Server]Invalid object name 'cdc.<schema_name>_<table_name>_CT'. (208) (SQLExecDirectW)")

This issue may occur due to a mismatch between user tables and their associated CDC tables. The mismatch typically arises after schema changes or when database objects are moved between schemas after CDC is enabled.

Recommended action:

Do the following:

Disable CDC on the database before running the backup. Use the command:
# EXEC sys.sp_cdc_disable_db
Run a full backup after disabling CDC.

NetBackup automatically re-enables CDC during the next backup after you manually disable it.

Error code 3843 or 3844 - Failed to enable or disable CDC.

Appears when you do not have permissions to enable or disable the CDC.

Workaround: Give NetBackup the necessary permissions to enable or disable the CDC in your Azure environment.

Note:

Do not enable CDC manually. Provide the permissions to NetBackup to enable or disable the CDC.

Error: Client restore EXIT STATUS 5: the restore failed to recover the requested files Cloud policy restore error (2824)

Error: ERR - Failed to restore database [<db_name>] with name [<db_name>]. ERR - Failed to open file ''. Errno = 12: client restore EXIT STATUS 5: the restore failed to recover the requested files

Explanation: Occurs during restore if the backup image is generated on 10.2 media and restore goes to an older (< 10.2) media server.

Workaround: Change the restore media to 10.2 and remove the older media from storage.

AWS DynamoDB table does not have auto scaling enabled, after restoring from a backup image that has auto scaling enabled

Explanation: Currently the AWS API response does not show if a table has auto scaling enabled. So, during backup, this metadata is not captured in NetBackup, and as a result the restored table does not have auto scaling enabled.

Workaround: Enable the auto-scaling property of the restored DynamoDB table in the AWS portal manually.

CDC-enabled Azure SQL MI incremental backups: Dropping a CDC-enabled database leads to full backup without schema changes, instead of incremental.

Explanation: Azure SQL MI maintains CDC-enabled database details in the table cdc_jobs inside the msdb schema. When the database is dropped, its cdc_jobs entry should be deleted. Sometimes this entry does not get deleted from the cdc_jobs table. So, when a new database is created with the same db_id which already exists in the cdc_jobs table, the issue occurs.

Workaround: When you drop a database, check the entry of the dropped database in the cdc_jobs table of the msdb schema. If the entry is present there, delete it manually.

AWS RDS: Error while fetching details of db instance: An error occurred (SignatureDoesNotMatch) when calling the DescribeDBInstances operation: Signature expired.

Explanation: Failure of the RDS boto3 APIs shows this error. NetBackup shows this error for the DescribeDBInstances operation.

Workaround: Synchronize the date and time of the media server with the actual network date and time.

Also, verify if you are using the correct provider credentials.

Importing from a replica on a target NetBackup domain fails, with the status code 191.

Explanation: The import operation on the target domain may fail with the state code 191: No images successfully processed. The job details in the Activity monitor show: Failed to create the JSON payload.

Cause: The image that you are replicating to the target domain is created from a NetBackup 10.4 or older media server, which does not have the required metadata in the NetBackup catalog.

Workaround: Do any of the following:

Use a media server version later than 10.4 to use the AIR feature for the PaaS workloads.
Install EEB on a 10.4 media server to use this as a back-level media server for the AIR feature for PaaS workloads. Contact Cohesity Technical Support for more details.

Troubleshoot PaaS workload protection and recovery issues

Feedback

Feedback