Primary server corrupted

When catalog backup is taken on external media server
When catalog backup is taken on MSDP-X

Copy DRPackages files (packages) located at /mnt/nbdata/DRPackages/ (or wherever the DRPackage is located, on the primary pod) from the pod to the host machine from where Kubernetes Service cluster is accessed.
Run the kubectl cp <primary-pod-namespace>/<primary-pod-name>:/mnt/nbdata/DRPackages <Path_where_to_copy_on_host_machine> command.
Preserve the DRPackages as they will be required during Catalog recovery.

Change CR spec from paused: false to paused: true in primary, mediaServers, and msdpScaleouts sections in environment object using the following command:

helm upgrade cloudscale <path to cloudscale.tar.gz>  --reuse-values  \
--set environment.primary.paused=true \
--set environment.msdpscaleout.paused=true \
--set environment.mediaServers[0].name=<original value> --set environment.mediaServers[0].paused=true \
--set environment.mediaServers[0].nodeSelector.labelKey=<original value> --set environment.mediaServers[0].nodeSelector.labelValue=<original value> \
--set environment.mediaServers[0].replicas=<original value> --set environment.mediaServers[0].minimumReplicas=<original value> \
--set environment.mediaServers[0].storage.data.capacity=<original value>Gi \
--set environment.mediaServers[0].storage.data.storageClassName=<original sc name> \
--set environment.mediaServers[0].storage.log.capacity=<original value>Gi \
--set environment.mediaServers[0].storage.log.storageClassName=<original sc name> \
-n netbackup

Delete all statefulSets, deployments, and any existing jobs associated with the primary decoupled services.

kubectl get sts -n netbackup
NAME                   READY   AGE
flexsnap-rabbitmq      1/1     5h24m
msdpx-uss-controller   1/1     5h32m
nb-postgresql          1/1     5h53m
nbu-bpdbm              0/1     5h49m
nbu-log-viewer         0/1     5h49m
nbu-nbatd              1/1     5h49m
nbu-nbmqbroker         0/1     5h49m
nbu-nbwsapp            0/1     5h49m
nbu-policyjob          0/1     5h49m
nbu-policyjobmgr       0/1     5h49m
nbu-primary            0/1     5h49m

kubectl get deployment -n netbackup
NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
flexsnap-agent                                    1/1     1            1           5h23m
flexsnap-agent-2580d680823a4cfaa54c9fe9fdff8c84   1/1     1            1           3h2m
flexsnap-api-gateway                              1/1     1            1           5h23m
flexsnap-certauth                                 1/1     1            1           5h26m
flexsnap-coordinator                              1/1     1            1           5h23m
flexsnap-listener                                 1/1     1            1           5h23m
flexsnap-nginx                                    1/1     1            1           5h23m
flexsnap-notification                             1/1     1            1           5h23m
flexsnap-scheduler                                1/1     1            1           5h23m
nb-fluentbit-collector                            1/1     1            1           5h54m
nbu-requestrouter                                 1/1     1            1           5h46m

kubectl delete sts nbu-log-viewer nbu-bpdbm nbu-nbatd nbu-nbmqbroker nbu-nbwsapp nbu-policyjob nbu-policyjobmgr nbu-primary -n netbackup
statefulset.apps/nbu-bpdbm deleted
statefulset.apps/nbu-log-viewer deleted
statefulset.apps/nbu-nbatd deleted
statefulset.apps/nbu-nbmqbroker deleted
statefulset.apps/nbu-nbwsapp deleted
statefulset.apps/nbu-policyjob deleted
statefulset.apps/nbu-policyjobmgr deleted
statefulset.apps/nbu-primary deleted

kubectl delete deployment nbu-requestrouter -n netbackup
deployment.apps/nbu-requestrouter deleted

kubectl delete job nbu-bootstrapper -n netbackup
job.batch "nbu-bootstrapper" deleted from netbackup namespace

Delete the db-cert configmap, it will be re-created by the bundle:

~/VRTSk8s-netbackup-<version>/helm$ kubectl get bundle -n netbackup
NAME      CONFIGMAP TARGET   SECRET TARGET   SYNCED   REASON   AGE
db-cert   dbcertpem                          True     Synced   3h54m

~/VRTSk8s-netbackup-<version>/helm$ kubectl get cm -n netbackup
NAME                             DATA   AGE
appconf                          1      3h30m
cs-config                        1      4h12m
db-cert                          1      3h54m
svcmapping                       1      3h51m
~/VRTSk8s-netbackup-<version>/helm$ kubectl delete cm db-cert -n netbackup
configmap "db-cert" deleted from netbackup namespace
~/VRTSk8s-netbackup-<version>/helm$ kubectl get cm -n netbackup
NAME                             DATA   AGE
appconf                          1      3h30m
cs-config                        1      4h13m
db-cert                          1      7s
flexsnap-conf                    1      3h30m
flexsnap-deployment-conf-cp      24     3h30m

Remove the PVs and PVC's associated with the primary server decoupled pods.

~/Ampol$ kubectl delete pvc catalog-nbu-primary-0 data-nbu-primary-0 logs-nbu-primary-0 nbatd-data nbatd-logs nbmqbroker-log policy ws-log bpdbm-logs -n netbackup
persistentvolumeclaim "bpdbm-log" deleted from netbackup namespace
persistentvolumeclaim "catalog-nbu-primary-0" deleted from netbackup namespace
persistentvolumeclaim "data-nbu-primary-0" deleted from netbackup namespace
persistentvolumeclaim "logs-nbu-primary-0" deleted from netbackup namespace
persistentvolumeclaim "nbatd-data" deleted from netbackup namespace
persistentvolumeclaim "nbatd-logs" deleted from netbackup namespace
persistentvolumeclaim "nbmqbroker-log" deleted from netbackup namespace
persistentvolumeclaim "policyjob-log-nbu-policyjob-0" deleted from netbackup namespace
persistentvolumeclaim "policyjobmgr-log-nbu-policyjobmgr-0" deleted from netbackup namespace
persistentvolumeclaim "ws-log" deleted from netbackup namespace
 
 
~/Ampol/DR$ kubectl get pv -n netbackup | grep Released
pvc-0bd4e64b-e089-493a-8ff1-e2eeafa7a5e9   30Gi       RWO            Retain           Released   netbackup/logs-nbu-primary-0                                                nb-
pvc-1a887c36-b860-424a-b35a-fc21313ef532   30Gi       RWO            Retain           Released   netbackup/policyjob-log-nbu-policyjob-0                                     nb-
pvc-1dd6720d-6cce-47f8-950c-5067a7dc384a   30Gi       RWO            Retain           Released   netbackup/ws-log                                                            nb-
pvc-2be78c3f-df8d-4724-b72d-dbb60a44452a   30Gi       RWO            Retain           Released   netbackup/nbmqbroker-log                                                    nb-
pvc-733438f4-da39-4031-a164-8a9706534ecd   110Gi      RWX            Retain           Released   netbackup/catalog-nbu-primary-0                                             nb-
pvc-9d421d55-e136-4de7-aa48-3026e931ccb4   30Gi       RWO            Retain           Released   netbackup/data-nbu-primary-0                                                nb-
pvc-c9e78f63-4ab3-4515-b41b-8711ef61ec10   2Gi        RWO            Retain           Released   netbackup/nbatd-data                                                        nb-
pvc-cfe6ec5c-8367-42c7-b3a9-d99fc7087504   30Gi       RWO            Retain           Released   netbackup/policyjobmgr-log-nbu-policyjobmgr-0                               nb-
pvc-ab1e8d27-2da2-4340-aae4-c4bb702db7ce   30Gi       RWO            Retain           Released   netbackup/bpdbm-logs                                                       nb-
 

~/Ampol/DR$ kubectl delete pv pvc-0bd4e64b-e089-493a-8ff1-e2eeafa7a5e9 pvc-1a887c36-b860-424a-b35a-fc21313ef532 pvc-1dd6720d-6cce-47f8 pvc-733438f4-da39-4031-a164-8a9706534ecd pvc-9d421d55-e136-4de7-aa48-3026e931ccb4 pvc-c9e78f63-4ab3-4515-b41b-8711ef61ec10 pvc-cfe6ec5c-8367-42c7-b3a9-d99fc7087504 pvc-fe1e8d2 pvc-ab1e8d27-2da2-4340-aae4-c4bb702db7ce
persistentvolume "pvc-0bd4e64b-e089-493a-8ff1-e2eeafa7a5e9" deleted
persistentvolume "pvc-1a887c36-b860-424a-b35a-fc21313ef532" deleted
persistentvolume "pvc-1dd6720d-6cce-47f8-950c-5067a7dc384a" deleted
persistentvolume "pvc-2be78c3f-df8d-4724-b72d-dbb60a44452a" deleted
persistentvolume "pvc-733438f4-da39-4031-a164-8a9706534ecd" deleted
persistentvolume "pvc-9d421d55-e136-4de7-aa48-3026e931ccb4" deleted
persistentvolume "pvc-c9e78f63-4ab3-4515-b41b-8711ef61ec10" deleted
persistentvolume "pvc-cfe6ec5c-8367-42c7-b3a9-d99fc7087504" deleted
persistentvolume "pvc-fe1e8d27-2da2-4340-aae4-c4bb702db7ce" deleted
persistentvolume "pvc-ab1e8d27-2da2-4340-aae4-c4bb702db7ce" deleted

Perform the following:
- Change CR spec paused: true to paused: false for the environment using the following command:
  helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set environment.paused=true -n netbackup
- Reinstall the PostgreSQL latest release:
  - Save the dbSecret.yaml file on a persistent location using the following command:
    kubectl get secret dbsecret -n netbackup -o yaml > dbSecret.yaml
  - Uninstall the PostgreSQL using the following command:
    helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set postgresql.enabled=false -n netbackup
    Wait for the postgres pod to terminate.
  - Recreate the secret from the details saved above:
    kubectl apply -f dbSecret.yaml
  - Reinstall the PostgreSQL using the following command:
    helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set postgresql.enabled=true -n netbackup
Retag and push the netbackup/main image with a unique tag (distinct from other decoupled pods, such as MR1), and add the serviceImage section under the primary environment with bootstrapper.main: <new-image-tag> using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.serviceImageTag."bootstrapper\.main"="<newImage tag>" -n netbackup
Change CR spec paused: true to paused: false in the primary server and environment sections in the environment using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.paused=false --set environment.paused=false -n netbackup
After the primary server pod is in ready state, change CR spec from paused: false to paused: true in primary server section in environment object using the following command:
helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values \ --set environment.primary.paused=true -n netbackup
Once the primary server is up and running perform the following:
- Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command to exec into the primary pod.
- Increase the debug logs level on primary server (set VERBOSE = 6 in bp.conf file).
- Create a DRPackages directory at persisted location using the following command:
  mkdir /mnt/nbdata/usr/openv/drpackage
- Deactivate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health deactivate command.
- Stop the NetBackup services using the following command:
  /usr/openv/netbackup/bin/bp.kill_all
  Check if all processes are terminated correctly using the /usr/openv/netbackup/bin/bpps command.
- Perform the following steps for NBATD pod recovery:
  - Create the DRPackages directory on persisted location /mnt/nblogs/ in nbatd pod by executing the following command:
    kubectl exec -it -n <namespace> <nbatd-pod-name> --/bin/bash
    mkdir /mnt/nblogs/DRPackages
  - Copy DR files which were saved when performing DR backup to nbatd pod at /mnt/nblogs/DRPackages using the following command:
    kubectl cp <Path_of_DRPackages_on_host_machine> <nbatd-pod-namespace>/<nbatd-pod-name>:/mnt/nblogs/DRPackages
  - Execute the following steps in the nbatd pod:
    Execute the kubectl exec -it -n <namespace> <nbatd-pod-name> --/bin/bash command.
    Deactivate nbatd health probes using the /opt/veritas/vxapp-manage/nbatd_health.sh disable command.
    Stop the nbatd service using /opt/veritas/vxapp-manage/nbatd_stop.sh 0 command.
    Execute the /opt/veritas/vxapp-manage/nbatd_identity_restore.sh -infile /mnt/nblogs/DRPackages/ (DR package name) command.
- Copy back the earlier copied disaster recovery files to primary pod at /mnt/nbdata/usr/openv/drpackage location using the following command:
  kubectl cp <Path_of_DRPackages_on_host_machine> <primary-pod-namespace>/<primary-pod-name>:/mnt/nbdata/usr/openv/drpackage
- Change the ownership of files in /mnt/nbdb/usr/openv/drpackage using the chown nbsvcusr:nbsvcusr <file-name> command.
- Execute the /usr/openv/netbackup/bin/admincmd/nbhostidentity -import -infile /mnt/nbdb/usr/openv/drpackage/.drpkg command.
- Clear NetBackup host cache, run the bpclntcmd -clear_host_cache command.
- Restart the pods as follows:
  - Navigate to the VRTSk8s-netbackup-<version>/scripts folder.
  - Run the cloudscale_restart.sh script with Restart option as follows:
    ./cloudscale_restart.sh <action> <namespace>
    Provide the namespace and the required action:
    stop: Stops all the services under primary server (waits until all the services are stopped).
    start: Starts all the services and waits until the services are up and running under primary server.
    restart: Stops the services and waits until all the services are down. Then starts all the services and waits until the services are up and running.
- Refresh the certificate revocation list using the /usr/openv/netbackup/bin/nbcertcmd -getcrl command.
(Applicable for catalog backup taken on external media server)
- Add respective media server entry in host properties using NetBackup Administration Console as follows:
  Navigate to NetBackup Management > Host properties > Master Server > Add Additional server and add media server.
- Restart the NetBackup services in primary server pod and external media server
  - Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command in the primary server pod.
  - Run the /usr/openv/netbackup/bin/bp.kill_all command. After stopping all services restart the same using the /usr/openv/netbackup/bin/bp.start_all command.
  - External media server: Run the /usr/openv/netbackup/bin/bp.kill_all command. After stopping all services restart the services using the /usr/openv/netbackup/bin/bp.start_all command on the external media server.
- Scaleout the msdp nodepool and unpause the msdpScaleout using the following command:
  helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set environment.msdpscaleout.paused=false -n netbackup
  Ensure that the msdpScaleout CR is in a ready state before moving ahead.
- Scaleout the media nodepool and unpause the media object using the following command:
```
helm upgrade cloudscale <path to cloudscale.tar.gz>  --reuse-values \
--set environment.mediaServers[0].name=<original name> --set environment.mediaServers[0].paused=false \
--set environment.mediaServers[0].nodeSelector.labelKey=<original name> --set environment.mediaServers[0].nodeSelector.labelValue=<original value> \
--set environment.mediaServers[0].replicas=<original value> --set environment.mediaServers[0].minimumReplicas=<original value> \
--set environment.mediaServers[0].storage.data.capacity=<original value>Gi \
--set environment.mediaServers[0].storage.data.storageClassName=<original sc name> \
--set environment.mediaServers[0].storage.log.capacity=<original value>Gi \
--set environment.mediaServers[0].storage.log.storageClassName=<original sc name> \
-n netbackup
```
- Perform catalog recovery:
  - Exec into primary pod and run bprecover -wizard command.
  - Reset the debug logs level on primary server (remove VERBOSE = 6 from bp.conf file).
  - Restart the pods as follows:
    Navigate to the VRTSk8s-netbackup-<version>/scripts folder.
    Run the cloudscale_restart.sh script with Restart option as follows:
    ./cloudscale_restart.sh <action> <namespace>
    Provide the namespace and the required action.
(Applicable for catalog backup taken on MSDP-X)
- From Web UI, allow reissue of token from primary server for MSDP only as follows:
  Navigate to Security > Host Mappings for the MSDP storage server and select Allow Auto reissue Certificate.
- Run the primary server reconciler as follows:
  - Run the following command to pause the primary server CR:
    helm upgrade cloudscale cloudscale-<version>.tgz -n netbackup --reuse-values --set environment.primary.paused=true
  - Run the following command to un-pause the primary server CR:
    helm upgrade cloudscale cloudscale-<version>.tgz -n netbackup --reuse-values --set environment.primary.paused=false
- Scaleout the msdp nodepool and unpause the msdpScaleout using the following command:
  helm upgrade cloudscale <path to cloudscale.tar.gz> --reuse-values --set environment.msdpscaleout.paused=false -n netbackup
  Wait for the msdpScaleout CR to be ready, before moving ahead.
- Scaleout the media nodepool and unpause the media object using the following command:
```
helm upgrade cloudscale <path to cloudscale.tar.gz>  --reuse-values \
--set environment.mediaServers[0].name=<original name> --set environment.mediaServers[0].paused=false \
--set environment.mediaServers[0].nodeSelector.labelKey=<original name> --set environment.mediaServers[0].nodeSelector.labelValue=<original value> \
--set environment.mediaServers[0].replicas=<original value> --set environment.mediaServers[0].minimumReplicas=<original value> \
--set environment.mediaServers[0].storage.data.capacity=<original value>Gi \
--set environment.mediaServers[0].storage.data.storageClassName=<original sc name> \
--set environment.mediaServers[0].storage.log.capacity=<original value>Gi \
--set environment.mediaServers[0].storage.log.storageClassName=<original sc name> \
-n netbackup
```
  Ensure that the media pods are up and running before proceeding further.
- Verify if MSDP installation is successful and default MSDP storage server, STU and disk pool is created with old names. This takes some time.
  If the storage server and disk pool are not created in the Web UI after waiting for some time, you can manually create the Storage Server (STS) using the following commands:
  Create STS: nbdevconfig -creatests -storage_server <server name> -stype <server type> -media_server <media server>
  Add/STS and alias (using tpconfig): tpconfig -add -storage_server <server name> -stype <server type> -sts_user_id <user ID> -password <password>
  After completing these steps:
  - Refresh the Web UI and verify that the storage server is listed.
  - Create a disk pool on the storage server using the same name as before.
- Perform from step 2 in the following section:
  “Scenario 2: MSDP Scaleout and its data is lost and the NetBackup primary server was destroyed and is re-installed”
- Perform catalog recovery:
  - Exec into primary pod and run bprecover -wizard command.
  - Reset the debug logs level on primary server (remove VERBOSE = 6 from bp.conf file).
  - Restart the pods as follows:
    Navigate to the VRTSk8s-netbackup-<version>/scripts folder.
    Run the cloudscale_restart.sh script with Restart option as follows:
    ./cloudscale_restart.sh <action> <namespace>
    Provide the namespace and the required action.

Primary server corrupted

Feedback

Feedback