Job remains in queue for long time
Job remains in queue for long time with 'Cloud scale media server is not available' reason as follows:
awaiting resource abc-stu. Waiting for resources. Reason: Cloud scale media server is not available., Media server: media1-media-0, Robot Type(Number): NONE(N/A), Media ID: N/A, Drive Name: N/A, Volume Pool: N/A, Storage Unit: default_stu_xyz.abc.com, Drive Scan Host: N/A, Disk Pool: default_dp_stu_xyz.abc.com, Disk Volume: PureDiskVolumeThe above issue occurs due to one of the following reason while creating STU:
While selecting media server, Manually select option is selected and specific elastic media server or primary server is selected explicitly.
While selecting media server, Allow NetBackup to automatically select option is selected and primary server as only media server is listed in media server list.
Workaround:
To resolve the issue, perform the following:
Edit the respective storage unit, if Manually select option is selected for media server. Change the option to Allow NetBackup to automatically select.
If non default storage server is used and while creating stu, Allow NetBackup to automatically select option is selected and primary server is listed in media server list as the only media server, then edit the respective storage server and add external or elastic media server in the media server list and remove the primary server.
Job remains in queue for long time with "Media server is currently not connected to master server" or "Disk media server is not active" due to the following reasons:
At least one elastic media server is 'Offline'.
Primary server is not present in Media server of default storage server when value is set to 0.
awaiting resource default_stu_abc.com. Waiting for resources. Reason: Media server is currently not connected to master server, Media server: media1-media-0, Robot Type(Number): NONE(N/A), Media ID: N/A, Drive Name: N/A, Volume Pool: NetBackup, Storage Unit: default_stu_abc.com, Drive Scan Host: N/A, Disk Pool: default_dp_nbux-abc.com, Disk Volume: PureDiskVolume
awaiting resource default_stu_abc.com. Waiting for resources. Reason: Disk media server is not active, Media server: media1-media-0, Robot Type(Number): NONE(N/A), Media ID: N/A, Drive Name: N/A, Volume Pool: NetBackup, Storage Unit: default_stu_abc.com, Drive Scan Host: N/A, Disk Pool: default_dp_abc.com, Disk Volume: PureDiskVolume
Workaround: Perform the following respective workaround depending on the reason of the issue:
Issue
Workaround
At least one elastic media server is 'Offline'
:
Change the value of to a value greater than the current value of and wait for new media server pod to be in state.
Then value of can be changed to original value of .
Use the following command to update the value of in mediaServer section:
helm upgrade cloudscale VRTSk8s-netbackup-11.0.1-0118/helm/cloudscale-11.0.1-118.tgz -n nbux --reuse-values \ --set environment.mediaServers[0].name=media1 --set environment.mediaServers[0].nodeSelector.labelKey=agentpool \ --set environment.mediaServers[0].nodeSelector.labelValue=mediapool --set environment.mediaServers[0].replicas=2 \ --set environment.mediaServers[0].minimumReplicas=2 --set environment.mediaServers[0].storage.data.capacity=60Gi \ --set environment.mediaServers[0].storage.data.storageClassName=gp2-ebs-storage-class \ --set environment.mediaServers[0].tag=11.0.1-0118 --set environment.mediaServers[0].storage.log.capacity=10Gi \ --set environment.mediaServers[0].storage.log.storageClassName=gp2-ebs-storage-class
Save the changes.
:
If media server pod is not running, change the 'Offline' media server state to 'Deactivated' state as follows:
Run PATCH /config/media-servers/{hostName} API to change the 'machineState' of Offline media server to .
OrExecute the following command to exec in primary server pod:
kubectl exec -it -n <namespace> <primaryServer-pod-name> -- bash
Execute the following command for Offline media server:
nbemmcmd -updatehost -machinename <offline-media-server-hostname> -machinestateop set_admin_pause -machinetype media -masterserver <primary-server-hostname>
Note:
For elasticity, scaled in media servers are marked as by media server elasticity.
Primary server is not present in Media server of default storage server when value is set to 0 and the following error appears in netbackup-operator-pod logs:
Error in registering additional media servers in storage server. Please add manually.
Run the following command to obtain the netbackup-operator-pod logs:
kubectl logs <netbackup-operator-pod-name> -c netbackup-operator -n <netbackup operator-namespace>
To resolve the issue, perform one of the following:
Set the value of to a value greater than 0 and wait for atleast one media server pod to be in ready state.
After media server pod goes into running state then the value of can be set to 0.
Use the following command to update the value of in mediaServer section:
helm upgrade cloudscale VRTSk8s-netbackup-11.0.1-0118/helm/cloudscale-11.0.1-118.tgz -n nbux --reuse-values \ --set environment.mediaServers[0].name=media1 --set environment.mediaServers[0].nodeSelector.labelKey=agentpool \ --set environment.mediaServers[0].nodeSelector.labelValue=mediapool --set environment.mediaServers[0].replicas=2 \ --set environment.mediaServers[0].minimumReplicas=2 --set environment.mediaServers[0].storage.data.capacity=60Gi \ --set environment.mediaServers[0].storage.data.storageClassName=gp2-ebs-storage-class \ --set environment.mediaServers[0].tag=11.0.1-0118 --set environment.mediaServers[0].storage.log.capacity=10Gi \ --set environment.mediaServers[0].storage.log.storageClassName=gp2-ebs-storage-class
Save the changes.
Note:
For a value of greater than 0, the primary server is not present in Media servers of default storage server.