Job remains in queue for long time

Job remains in queue for long time with 'Cloud scale media server is not available' reason as follows:
```
 awaiting resource abc-stu. Waiting for resources. 
          Reason: Cloud scale media server is not available., Media server: media1-media-0, 
          Robot Type(Number): NONE(N/A), Media ID: N/A, Drive Name: N/A, 
          Volume Pool: N/A, Storage Unit: default_stu_xyz.abc.com, Drive Scan Host: N/A, 
          Disk Pool: default_dp_stu_xyz.abc.com, Disk Volume: PureDiskVolume
```
The above issue occurs due to one of the following reason while creating STU:
- While selecting media server, Manually select option is selected and specific elastic media server or primary server is selected explicitly.
- While selecting media server, Allow NetBackup to automatically select option is selected and primary server as only media server is listed in media server list.
Workaround:
To resolve the issue, perform the following:
- Edit the respective storage unit, if Manually select option is selected for media server. Change the option to Allow NetBackup to automatically select.
- If non default storage server is used and while creating stu, Allow NetBackup to automatically select option is selected and primary server is listed in media server list as the only media server, then edit the respective storage server and add external or elastic media server in the media server list and remove the primary server.

Job remains in queue for long time with "Media server is currently not connected to master server" or "Disk media server is not active" due to the following reasons:

At least one elastic media server is 'Offline'.
Primary server is not present in Media server of default storage server when minimumReplica value is set to 0.

awaiting resource default_stu_abc.com. Waiting for resources.
Reason: Media server is currently not connected to
master server, Media server: media1-media-0,
Robot Type(Number): NONE(N/A), Media ID: N/A, Drive
Name: N/A,
Volume Pool: NetBackup, Storage Unit:
default_stu_abc.com, Drive Scan Host: N/A,
Disk Pool: default_dp_nbux-abc.com, Disk Volume:
PureDiskVolume

awaiting resource default_stu_abc.com. Waiting for resources.
Reason: Disk media server is not active, Media server:
media1-media-0,
Robot Type(Number): NONE(N/A), Media ID: N/A, Drive
Name: N/A,
Volume Pool: NetBackup, Storage Unit:
default_stu_abc.com, Drive Scan Host: N/A,
Disk Pool: default_dp_abc.com, Disk Volume:
PureDiskVolume

Workaround: Perform the following respective workaround depending on the reason of the issue:

Issue	Workaround
At least one elastic media server is 'Offline'	Workaround 1: Change the value of minimumReplica to a value greater than the current value of minimumReplica and wait for new media server pod to be in ready state. Then value of minimumReplica can be changed to original value of minimumReplica. Use the following command to update the value of minimumReplica in mediaServer section: helm upgrade cloudscale VRTSk8s-netbackup-11.0.1-0118/helm/cloudscale-11.0.1-118.tgz -n nbux --reuse-values \ --set environment.mediaServers[0].name=media1 --set environment.mediaServers[0].nodeSelector.labelKey=agentpool \ --set environment.mediaServers[0].nodeSelector.labelValue=mediapool --set environment.mediaServers[0].replicas=2 \ --set environment.mediaServers[0].minimumReplicas=2 --set environment.mediaServers[0].storage.data.capacity=60Gi \ --set environment.mediaServers[0].storage.data.storageClassName=gp2-ebs-storage-class \ --set environment.mediaServers[0].tag=11.0.1-0118 --set environment.mediaServers[0].storage.log.capacity=10Gi \ --set environment.mediaServers[0].storage.log.storageClassName=gp2-ebs-storage-class Save the changes.
Workaround 2: If media server pod is not running, change the 'Offline' media server state to 'Deactivated' state as follows: Run PATCH /config/media-servers/{hostName} API to change the 'machineState' of Offline media server to CEMM_MACHINE_ADMINISTRATIVE_PAUSE_SET. `Or` Execute the following command to exec in primary server pod: kubectl exec -it -n <namespace> <primaryServer-pod-name> -- bash Execute the following command for Offline media server: nbemmcmd -updatehost -machinename <offline-media-server-hostname> -machinestateop set_admin_pause -machinetype media -masterserver <primary-server-hostname> Note: For elasticity, scaled in media servers are marked as Deactivated by media server elasticity.
Primary server is not present in Media server of default storage server when minimumReplica value is set to 0 and the following error appears in netbackup-operator-pod logs: Error in registering additional media servers in storage server. Please add manually.	Run the following command to obtain the netbackup-operator-pod logs: kubectl logs <netbackup-operator-pod-name> -c netbackup-operator -n <netbackup operator-namespace> To resolve the issue, perform one of the following: Set the value of minimumReplica to a value greater than 0 and wait for atleast one media server pod to be in ready state. After media server pod goes into running state then the value of minimumReplica can be set to 0. Use the following command to update the value of minimumReplica in mediaServer section: helm upgrade cloudscale VRTSk8s-netbackup-11.0.1-0118/helm/cloudscale-11.0.1-118.tgz -n nbux --reuse-values \ --set environment.mediaServers[0].name=media1 --set environment.mediaServers[0].nodeSelector.labelKey=agentpool \ --set environment.mediaServers[0].nodeSelector.labelValue=mediapool --set environment.mediaServers[0].replicas=2 \ --set environment.mediaServers[0].minimumReplicas=2 --set environment.mediaServers[0].storage.data.capacity=60Gi \ --set environment.mediaServers[0].storage.data.storageClassName=gp2-ebs-storage-class \ --set environment.mediaServers[0].tag=11.0.1-0118 --set environment.mediaServers[0].storage.log.capacity=10Gi \ --set environment.mediaServers[0].storage.log.storageClassName=gp2-ebs-storage-class Save the changes. Note: For a value of minimumReplica greater than 0, the primary server is not present in Media servers of default storage server.

Issue

Workaround

At least one elastic media server is 'Offline'

Workaround 1:

Change the value of minimumReplica to a value greater than the current value of minimumReplica and wait for new media server pod to be in ready state.

Then value of minimumReplica can be changed to original value of minimumReplica.

Use the following command to update the value of minimumReplica in mediaServer section:

helm upgrade cloudscale VRTSk8s-netbackup-11.0.1-0118/helm/cloudscale-11.0.1-118.tgz -n nbux --reuse-values \
--set environment.mediaServers[0].name=media1 --set environment.mediaServers[0].nodeSelector.labelKey=agentpool \
--set environment.mediaServers[0].nodeSelector.labelValue=mediapool --set environment.mediaServers[0].replicas=2 \
--set environment.mediaServers[0].minimumReplicas=2 --set environment.mediaServers[0].storage.data.capacity=60Gi \
--set environment.mediaServers[0].storage.data.storageClassName=gp2-ebs-storage-class \
--set environment.mediaServers[0].tag=11.0.1-0118 --set environment.mediaServers[0].storage.log.capacity=10Gi \
--set environment.mediaServers[0].storage.log.storageClassName=gp2-ebs-storage-class

Save the changes.

Workaround 2:

If media server pod is not running, change the 'Offline' media server state to 'Deactivated' state as follows:

Run PATCH /config/media-servers/{hostName} API to change the 'machineState' of Offline media server to CEMM_MACHINE_ADMINISTRATIVE_PAUSE_SET.

Or

Execute the following command to exec in primary server pod:

kubectl exec -it -n <namespace> <primaryServer-pod-name> -- bash

Execute the following command for Offline media server:

nbemmcmd -updatehost -machinename <offline-media-server-hostname> -machinestateop set_admin_pause -machinetype media -masterserver <primary-server-hostname>

Note:

For elasticity, scaled in media servers are marked as Deactivated by media server elasticity.

Primary server is not present in Media server of default storage server when minimumReplica value is set to 0 and the following error appears in netbackup-operator-pod logs:

Error in registering additional media servers in storage server. Please add manually.

Run the following command to obtain the netbackup-operator-pod logs:

kubectl logs <netbackup-operator-pod-name> -c netbackup-operator -n <netbackup operator-namespace>

To resolve the issue, perform one of the following:

Set the value of minimumReplica to a value greater than 0 and wait for atleast one media server pod to be in ready state.

After media server pod goes into running state then the value of minimumReplica can be set to 0.

Use the following command to update the value of minimumReplica in mediaServer section:

helm upgrade cloudscale VRTSk8s-netbackup-11.0.1-0118/helm/cloudscale-11.0.1-118.tgz -n nbux --reuse-values \
--set environment.mediaServers[0].name=media1 --set environment.mediaServers[0].nodeSelector.labelKey=agentpool \
--set environment.mediaServers[0].nodeSelector.labelValue=mediapool --set environment.mediaServers[0].replicas=2 \
--set environment.mediaServers[0].minimumReplicas=2 --set environment.mediaServers[0].storage.data.capacity=60Gi \
--set environment.mediaServers[0].storage.data.storageClassName=gp2-ebs-storage-class \
--set environment.mediaServers[0].tag=11.0.1-0118 --set environment.mediaServers[0].storage.log.capacity=10Gi \
--set environment.mediaServers[0].storage.log.storageClassName=gp2-ebs-storage-class

Save the changes.

Note:

For a value of minimumReplica greater than 0, the primary server is not present in Media servers of default storage server.

Job remains in queue for long time

Feedback

Feedback