Taint, Toleration, and Node affinity related issues in cpServer

The cpServer control pool pod is in pending state

If one of the following cpServer control pool pod is in pending state, then perform the steps that follow:

flexsnap-agent, flexsnap-api-gateway, flexsnap-certauth, flexsnap-coordinator, flexsnap-idm, flexsnap-nginx, flexsnap-notification, flexsnap-scheduler, flexsnap-listener, flexsnap-postgresql, flexsnap-rabbitmq, flexsnap-fluentd-, flexsnap-fluentd

Obtain the pending pod's toleration and affinity status using the following command:
kubectl get pods <pod name>
Check if the node-affinity and tolerations of pod are matching with:
- fields listed in cpServer.nodepool.controlpool or primary.nodeselector in the cloudscale-values.yamlfile.
- taint and label of node pool, mentioned in cpServer.nodeselector.controlpool or primary.nodeselector in the cloudscale-values.yaml file.

If all the above fields are correct and matching and still the control pool pod is in pending state, then the issue may be due to all the nodes in nodepool running at maximum capacity and cannot accommodate new pods. In such case the nodepool must be scaled properly.

The cpServer data pool pod is in pending state

If one of the following cpServer data pool pod is in pending state, then perform the steps that follow:

flexsnap-workflow,flexsnap-datamover

Obtain the pending pod's toleration and affinity status using the following command:
kubectl get pods <pod name>
Check if the node-affinity and tolerations of pod are matching with:
- fields listed in cpServer.nodepool.datapool in the environment.yaml file.
- taint and label of node pool, mentioned in cpServer.nodeselector.datapool in the cloudscale-values.yaml file.

The Snapshot Manager operator (flexsnap-operator) pod is in pending state

Obtain the pending pod's toleration and affinity status using the following command:
kubectl get pods <pod name>
Check if the node-affinity and tolerations of pod are matching with:
- fields listed in operators-values.yaml file.
- taint and label of node pool, mentioned in above values.

Nodes configured with incorrect taint and label

If the nodes are configured with incorrect taint and label values, the user can edit them using the following command provided for AKS as an example:

az aks nodepool update \ --resource-group <resource_group> \ --cluster-name <cluster_name> \ --name <nodepool_name> \ --node-taints <key>=<value>:<effect> \ --no-wait

az aks nodepool update \ --resource-group <resource_group> \ --cluster-name <cluster_name> \ --name <cluster_name> \ --labels <key>=<value>

Taint, Toleration, and Node affinity related issues in cpServer

Feedback

Feedback