Primary server corrupted
When catalog backup is taken on external media server
When catalog backup is taken on MSDP-X
- Copy DRPackages files (packages) located at
/mnt/nbdata/DRPackages/(or wherever the DRPackage is located, on the primary pod) from the pod to the host machine from where Kubernetes Service cluster is accessed.Run the kubectl cp <primary-pod-namespace>/<primary-pod-name>:/mnt/nbdata/DRPackages <Path_where_to_copy_on_host_machine> command.
Preserve the DRPackages as they will be required during Catalog recovery.
- Change CR spec from paused: false to paused: true in primary, mediaServers, and msdpScaleouts sections in environment object using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.paused=true \ --set environment.msdpscaleout.paused=true \ --set environment.mediaServers[0].name=<original value> --set environment.mediaServers[0].paused=true \ --set environment.mediaServers[0].nodeSelector.labelKey=<original value> --set environment.mediaServers[0].nodeSelector.labelValue=<original value> \ --set environment.mediaServers[0].replicas=<original value> --set environment.mediaServers[0].minimumReplicas=<original value> \ --set environment.mediaServers[0].storage.data.capacity=<original value>Gi \ --set environment.mediaServers[0].storage.data.storageClassName=<original sc name> \ --set environment.mediaServers[0].storage.log.capacity=<original value>Gi \ --set environment.mediaServers[0].storage.log.storageClassName=<original sc name> \ -n netbackup
- Delete all statefulSets, deployments, and any existing jobs associated with the primary decoupled services.
kubectl get sts -n netbackup NAME READY AGE flexsnap-rabbitmq 1/1 5h24m msdpx-uss-controller 1/1 5h32m nb-postgresql 1/1 5h53m nbu-bpdbm 0/1 5h49m nbu-log-viewer 0/1 5h49m nbu-nbatd 1/1 5h49m nbu-nbmqbroker 0/1 5h49m nbu-nbwsapp 0/1 5h49m nbu-policyjob 0/1 5h49m nbu-policyjobmgr 0/1 5h49m nbu-primary 0/1 5h49m kubectl get deployment -n netbackup NAME READY UP-TO-DATE AVAILABLE AGE flexsnap-agent 1/1 1 1 5h23m flexsnap-agent-2580d680823a4cfaa54c9fe9fdff8c84 1/1 1 1 3h2m flexsnap-api-gateway 1/1 1 1 5h23m flexsnap-certauth 1/1 1 1 5h26m flexsnap-coordinator 1/1 1 1 5h23m flexsnap-listener 1/1 1 1 5h23m flexsnap-nginx 1/1 1 1 5h23m flexsnap-notification 1/1 1 1 5h23m flexsnap-scheduler 1/1 1 1 5h23m nb-fluentbit-collector 1/1 1 1 5h54m nbu-requestrouter 1/1 1 1 5h46m kubectl delete sts nbu-log-viewer nbu-bpdbm nbu-nbatd nbu-nbmqbroker nbu-nbwsapp nbu-policyjob nbu-policyjobmgr nbu-primary -n netbackup statefulset.apps/nbu-bpdbm deleted statefulset.apps/nbu-log-viewer deleted statefulset.apps/nbu-nbatd deleted statefulset.apps/nbu-nbmqbroker deleted statefulset.apps/nbu-nbwsapp deleted statefulset.apps/nbu-policyjob deleted statefulset.apps/nbu-policyjobmgr deleted statefulset.apps/nbu-primary deleted kubectl delete deployment nbu-requestrouter -n netbackup deployment.apps/nbu-requestrouter deleted kubectl delete job nbu-bootstrapper -n netbackup job.batch "nbu-bootstrapper" deleted from netbackup namespace
Delete the db-cert configmap, it will be re-created by the bundle:
~/VRTSk8s-netbackup-<version>/helm$ kubectl get bundle -n netbackup NAME CONFIGMAP TARGET SECRET TARGET SYNCED REASON AGE db-cert dbcertpem True Synced 3h54m ~/VRTSk8s-netbackup-<version>/helm$ kubectl get cm -n netbackup NAME DATA AGE appconf 1 3h30m cs-config 1 4h12m db-cert 1 3h54m svcmapping 1 3h51m ~/VRTSk8s-netbackup-<version>/helm$ kubectl delete cm db-cert -n netbackup configmap "db-cert" deleted from netbackup namespace ~/VRTSk8s-netbackup-<version>/helm$ kubectl get cm -n netbackup NAME DATA AGE appconf 1 3h30m cs-config 1 4h13m db-cert 1 7s flexsnap-conf 1 3h30m flexsnap-deployment-conf-cp 24 3h30m
- Remove the PVs and PVC's associated with the primary server decoupled pods.
~/Ampol$ kubectl delete pvc catalog-nbu-primary-0 data-nbu-primary-0 logs-nbu-primary-0 nbatd-data nbatd-logs nbmqbroker-log policy ws-log bpdbm-logs -n netbackup persistentvolumeclaim "bpdbm-log" deleted from netbackup namespace persistentvolumeclaim "catalog-nbu-primary-0" deleted from netbackup namespace persistentvolumeclaim "data-nbu-primary-0" deleted from netbackup namespace persistentvolumeclaim "logs-nbu-primary-0" deleted from netbackup namespace persistentvolumeclaim "nbatd-data" deleted from netbackup namespace persistentvolumeclaim "nbatd-logs" deleted from netbackup namespace persistentvolumeclaim "nbmqbroker-log" deleted from netbackup namespace persistentvolumeclaim "policyjob-log-nbu-policyjob-0" deleted from netbackup namespace persistentvolumeclaim "policyjobmgr-log-nbu-policyjobmgr-0" deleted from netbackup namespace persistentvolumeclaim "ws-log" deleted from netbackup namespace ~/Ampol/DR$ kubectl get pv -n netbackup | grep Released pvc-0bd4e64b-e089-493a-8ff1-e2eeafa7a5e9 30Gi RWO Retain Released netbackup/logs-nbu-primary-0 nb- pvc-1a887c36-b860-424a-b35a-fc21313ef532 30Gi RWO Retain Released netbackup/policyjob-log-nbu-policyjob-0 nb- pvc-1dd6720d-6cce-47f8-950c-5067a7dc384a 30Gi RWO Retain Released netbackup/ws-log nb- pvc-2be78c3f-df8d-4724-b72d-dbb60a44452a 30Gi RWO Retain Released netbackup/nbmqbroker-log nb- pvc-733438f4-da39-4031-a164-8a9706534ecd 110Gi RWX Retain Released netbackup/catalog-nbu-primary-0 nb- pvc-9d421d55-e136-4de7-aa48-3026e931ccb4 30Gi RWO Retain Released netbackup/data-nbu-primary-0 nb- pvc-c9e78f63-4ab3-4515-b41b-8711ef61ec10 2Gi RWO Retain Released netbackup/nbatd-data nb- pvc-cfe6ec5c-8367-42c7-b3a9-d99fc7087504 30Gi RWO Retain Released netbackup/policyjobmgr-log-nbu-policyjobmgr-0 nb- pvc-ab1e8d27-2da2-4340-aae4-c4bb702db7ce 30Gi RWO Retain Released netbackup/bpdbm-logs nb- ~/Ampol/DR$ kubectl delete pv pvc-0bd4e64b-e089-493a-8ff1-e2eeafa7a5e9 pvc-1a887c36-b860-424a-b35a-fc21313ef532 pvc-1dd6720d-6cce-47f8 pvc-733438f4-da39-4031-a164-8a9706534ecd pvc-9d421d55-e136-4de7-aa48-3026e931ccb4 pvc-c9e78f63-4ab3-4515-b41b-8711ef61ec10 pvc-cfe6ec5c-8367-42c7-b3a9-d99fc7087504 pvc-fe1e8d2 pvc-ab1e8d27-2da2-4340-aae4-c4bb702db7ce persistentvolume "pvc-0bd4e64b-e089-493a-8ff1-e2eeafa7a5e9" deleted persistentvolume "pvc-1a887c36-b860-424a-b35a-fc21313ef532" deleted persistentvolume "pvc-1dd6720d-6cce-47f8-950c-5067a7dc384a" deleted persistentvolume "pvc-2be78c3f-df8d-4724-b72d-dbb60a44452a" deleted persistentvolume "pvc-733438f4-da39-4031-a164-8a9706534ecd" deleted persistentvolume "pvc-9d421d55-e136-4de7-aa48-3026e931ccb4" deleted persistentvolume "pvc-c9e78f63-4ab3-4515-b41b-8711ef61ec10" deleted persistentvolume "pvc-cfe6ec5c-8367-42c7-b3a9-d99fc7087504" deleted persistentvolume "pvc-fe1e8d27-2da2-4340-aae4-c4bb702db7ce" deleted persistentvolume "pvc-ab1e8d27-2da2-4340-aae4-c4bb702db7ce" deleted
- Perform the following:
Change CR spec
paused: truetopaused: falsefor the environment using the following command:helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set environment.paused=true -n netbackup
Reinstall the PostgreSQL latest release:
Save the
dbSecret.yamlfile on a persistent location using the following command:kubectl get secret dbsecret -n netbackup -o yaml > dbSecret.yaml
Uninstall the PostgreSQL using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set postgresql.enabled=false -n netbackup
Wait for the postgres pod to terminate.
Recreate the secret from the details saved above:
kubectl apply -f dbSecret.yaml
Reinstall the PostgreSQL using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set postgresql.enabled=true -n netbackup
- Retag and push the
netbackup/mainimage with a unique tag (distinct from other decoupled pods, such as MR1), and add theserviceImagesection under the primary environment withbootstrapper.main: <new-image-tag>using the following command:helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.serviceImageTag."bootstrapper\.main"="<newImage tag>" -n netbackup
Change CR spec
paused: truetopaused: falsein the primary server and environment sections in the environment using the following command:helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.paused=false --set environment.paused=false -n netbackup
- After the primary server pod is in ready state, change CR spec from
paused: falsetopaused: truein primary server section in environment object using the following command:helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.paused=true -n netbackup
- Once the primary server is up and running perform the following:
Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command to exec into the primary pod.
Increase the debug logs level on primary server (set VERBOSE = 6 in
bp.conffile).Create a DRPackages directory at persisted location using the following command:
mkdir /mnt/nbdata/usr/openv/drpackage
Deactivate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health deactivate command.
Stop the NetBackup services using the following command:
/usr/openv/netbackup/bin/bp.kill_all
Check if all processes are terminated correctly using the /usr/openv/netbackup/bin/bpps command.
Perform the following steps for NBATD pod recovery:
Create the
DRPackagesdirectory on persisted location/mnt/nblogs/in nbatd pod by executing the following command:kubectl exec -it -n <namespace> <nbatd-pod-name> --/bin/bash
mkdir /mnt/nblogs/DRPackages
Copy DR files which were saved when performing DR backup to nbatd pod at
/mnt/nblogs/DRPackagesusing the following command:kubectl cp <Path_of_DRPackages_on_host_machine> <nbatd-pod-namespace>/<nbatd-pod-name>:/mnt/nblogs/DRPackages
Execute the following steps in the nbatd pod:
Execute the kubectl exec -it -n <namespace> <nbatd-pod-name> --/bin/bash command.
Deactivate nbatd health probes using the /opt/veritas/vxapp-manage/nbatd_health.sh disable command.
Stop the nbatd service using /opt/veritas/vxapp-manage/nbatd_stop.sh 0 command.
Execute the /opt/veritas/vxapp-manage/nbatd_identity_restore.sh -infile /mnt/nblogs/DRPackages/ (DR package name) command.
Copy back the earlier copied disaster recovery files to primary pod at
/mnt/nbdata/usr/openv/drpackagelocation using the following command:kubectl cp <Path_of_DRPackages_on_host_machine> <primary-pod-namespace>/<primary-pod-name>:/mnt/nbdata/usr/openv/drpackage
Change the ownership of files in
/mnt/nbdb/usr/openv/drpackageusing the chown nbsvcusr:nbsvcusr <file-name> command.Execute the /usr/openv/netbackup/bin/admincmd/nbhostidentity -import -infile /mnt/nbdb/usr/openv/drpackage/.drpkg command.
Clear NetBackup host cache, run the bpclntcmd -clear_host_cache command.
Restart the pods as follows:
Navigate to the
VRTSk8s-netbackup-<version>/scriptsfolder.Run the
cloudscale_restart.shscript with Restart option as follows:./cloudscale_restart.sh <action> <namespace>
Provide the namespace and the required action:
stop: Stops all the services under primary server (waits until all the services are stopped).
start: Starts all the services and waits until the services are up and running under primary server.
restart: Stops the services and waits until all the services are down. Then starts all the services and waits until the services are up and running.
Refresh the certificate revocation list using the /usr/openv/netbackup/bin/nbcertcmd -getcrl command.
- (Applicable for catalog backup taken on external media server)
Add respective media server entry in host properties using NetBackup Administration Console as follows:
Navigate to NetBackup Management > Host properties > Master Server > Add Additional server and add media server.
Restart the NetBackup services in primary server pod and external media server
Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command in the primary server pod.
Run the /usr/openv/netbackup/bin/bp.kill_all command. After stopping all services restart the same using the /usr/openv/netbackup/bin/bp.start_all command.
External media server: Run the /usr/openv/netbackup/bin/bp.kill_all command. After stopping all services restart the services using the /usr/openv/netbackup/bin/bp.start_all command on the external media server.
Scaleout the msdp nodepool and unpause the msdpScaleout using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set environment.msdpscaleout.paused=false -n netbackup
Ensure that the msdpScaleout CR is in a ready state before moving ahead.
Scaleout the media nodepool and unpause the media object using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.mediaServers[0].name=<original name> --set environment.mediaServers[0].paused=false \ --set environment.mediaServers[0].nodeSelector.labelKey=<original name> --set environment.mediaServers[0].nodeSelector.labelValue=<original value> \ --set environment.mediaServers[0].replicas=<original value> --set environment.mediaServers[0].minimumReplicas=<original value> \ --set environment.mediaServers[0].storage.data.capacity=<original value>Gi \ --set environment.mediaServers[0].storage.data.storageClassName=<original sc name> \ --set environment.mediaServers[0].storage.log.capacity=<original value>Gi \ --set environment.mediaServers[0].storage.log.storageClassName=<original sc name> \ -n netbackup
Perform catalog recovery:
Exec into primary pod and run bprecover -wizard command.
Reset the debug logs level on primary server (remove VERBOSE = 6 from
bp.conffile).Restart the pods as follows:
Navigate to the
VRTSk8s-netbackup-<version>/scriptsfolder.Run the
cloudscale_restart.shscript with Restart option as follows:./cloudscale_restart.sh <action> <namespace>
Provide the namespace and the required action.
- (Applicable for catalog backup taken on MSDP-X)
From Web UI, allow reissue of token from primary server for MSDP only as follows:
Navigate to Security > Host Mappings for the MSDP storage server and select Allow Auto reissue Certificate.
Run the primary server reconciler as follows:
Run the following command to pause the primary server CR:
helm upgrade cloudscale cloudscale-<version>.tgz -n netbackup --reuse-values --set environment.primary.paused=true
Run the following command to un-pause the primary server CR:
helm upgrade cloudscale cloudscale-<version>.tgz -n netbackup --reuse-values --set environment.primary.paused=false
Scaleout the msdp nodepool and unpause the msdpScaleout using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set environment.msdpscaleout.paused=false -n netbackup
Wait for the msdpScaleout CR to be ready, before moving ahead.
Scaleout the media nodepool and unpause the media object using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.mediaServers[0].name=<original name> --set environment.mediaServers[0].paused=false \ --set environment.mediaServers[0].nodeSelector.labelKey=<original name> --set environment.mediaServers[0].nodeSelector.labelValue=<original value> \ --set environment.mediaServers[0].replicas=<original value> --set environment.mediaServers[0].minimumReplicas=<original value> \ --set environment.mediaServers[0].storage.data.capacity=<original value>Gi \ --set environment.mediaServers[0].storage.data.storageClassName=<original sc name> \ --set environment.mediaServers[0].storage.log.capacity=<original value>Gi \ --set environment.mediaServers[0].storage.log.storageClassName=<original sc name> \ -n netbackup
Ensure that the media pods are up and running before proceeding further.
Verify if MSDP installation is successful and default MSDP storage server, STU and disk pool is created with old names. This takes some time.
If the storage server and disk pool are not created in the Web UI after waiting for some time, you can manually create the Storage Server (STS) using the following commands:
Create STS: nbdevconfig -creatests -storage_server <server name> -stype <server type> -media_server <media server>
Add/STS and alias (using tpconfig): tpconfig -add -storage_server <server name> -stype <server type> -sts_user_id <user ID> -password <password>
After completing these steps:
Refresh the Web UI and verify that the storage server is listed.
Create a disk pool on the storage server using the same name as before.
Perform from step 2 in the following section:
Perform catalog recovery:
Exec into primary pod and run bprecover -wizard command.
Reset the debug logs level on primary server (remove VERBOSE = 6 from
bp.conffile).Restart the pods as follows:
Navigate to the
VRTSk8s-netbackup-<version>/scriptsfolder.Run the
cloudscale_restart.shscript with Restart option as follows:./cloudscale_restart.sh <action> <namespace>
Provide the namespace and the required action.