MSDP-X and Primary server corrupted
- Note the storage server, cloud LSU and cloud bucket name.
Note the DR Passphrase also.
- Copy DRPackages files (packages) from the pod to the local VM if not received over the email using the following command:
kubectl cp <primary-pod-namespace>/<primary-pod-name>:/mnt/nbdb/usr/openv/drpackage_<storageservername> <Path_where_to_copy_on_host_machine>
- Delete the corrupted MSDP and Primary server by running the following command:
kubectl delete -f environment.yaml -n <namespace>
Note:
Perform this step carefully as it would delete NetBackup.
- Clean the PV and PVCs of primary and MSDP server as follows:
Get names of PV attached to primary and MSDP server PVC (catalog, log and data) using the kubectl get pvc -n <namespace> -o wide command.
Delete primary and MSDP server PVC (catalog, log and data) using the kubectl delete pvc <pvc-name> -n <namespace> command.
Delete the PV linked to primary server PVC using the kubectl delete pv <pv-name> command.
- (EKS-specific) Navigate to mounted EFS directory and delete the content from primary_catalog folder by running the rm -rf /efs/* command.
- Modify the
environment.yamlfile with the paused: true field in the MSDP and Media sections.Change CR spec from paused: false to paused: true in MSDP Scaleout and media servers. Save it.
Note:
Ensure that only primary server is deployed. Now apply the modified
environment.yamlfile.Save the
environment.yamlfile. Apply theenvironment.yamlfile using the following command:kubectl apply -f environment.yaml -n <namespace>
- After the primary server is up and running, perform the following:
Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command in the primary server pod.
Increase the debug logs level on primary server.
Create a directory
DRPackagesat persisted location usingmkdir /mnt/nblogs/DRPackages.
- Copy earlier copied DR files to primary pod at
/mnt/nblogs/DRPackagesusing the kubectl cp <Path_of_DRPackages_on_host_machine> <primary-pod-namespace>/<primary-pod-name>:/mnt/nblogs/DRPackages command. - Execute the following steps (after exec) into the primary server pod:
Change ownership of files in
/mnt/nblogs/DRPackagesusing the chown nbsvcusr:nbsvcusr <file-name> command.Deactivate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health deactivate command.
Stop the NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
Execute the /usr/openv/netbackup/bin/admincmd/nbhostidentity -import -infile /mnt/ndbdb/usr/openv/drpackage/<filename>.drpkg command.
Clear NetBackup host cache by running the bpclntcmd -clear_host_cache command.
Restart the pods as follows:
Navigate to the
VRTSk8s-netbackup-<version>/scriptsfolder.Run the
cloudscale_restart.shscript as follows:./cloudscale_restart.sh <action> <namespace>
Provide the namespace and the required action:
stop: Stops all the services under primary server (waits until all the services are stopped).
start: Starts all the services and waits until the services are up and running under primary server.
restart: Stops the services and waits until all the services are down. Then starts all the services and waits until the services are up and running.
Note:
Ignore if policy job pod does not come up in running state. Policy job pod would start once primary services start.
Refresh the certificate revocation list using the /usr/openv/netbackup/bin/nbcertcmd -getcrl command.
- The SHA fingerprint is updated in the primary CR's status.
Run the primary server reconciler as follows:
Edit the environment (using kubectl edit environment -n <namespace> command) and change primary spec's for paused field to true and save it.
To enable the reconciler to run, the environment must be edited again and the primary's paused field must be set to false.
- From Web UI, allow reissue of token from primary server for MSDP, media and Snapshot Manager server as follows:
Navigate to Security > Host Mappings for the MSDP storage server and select Allow Auto reissue Certificate.
Repeat this for media and Snapshot Manager server entries.
- Edit the environment using kubectl edit environment -n <namespace> command and change paused field to false for MSDP.
- Perform from step 2 in the following section:
- Edit environment CR and change
paused: falsefor media server. - Once media server pods are ready, perform full catalog recovery using one of the following options:
Trigger a catalog recovery from the Web UI.
Or
Exec into primary pod and run bprecover -wizard command.
- Once recovery is completed, restart the NetBackup services:
Stop NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
Start NetBackup services using the /usr/openv/netbackup/bin/bp.start_all command.
- Activate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health activate command.
- Verify/Backup/Restore the backup images in NetBackup server to check if the MSDP-X cluster has recovered or not.
- Verify that the Primary, Media, MSDP and Snapshot Manager server are up and running.
- Verify that the Snapshot Manager is running.