Use this procedure to check if Kubernetes pods get scheduled on an NCN, but do not eventually reach the Running
state.
The kubectl get pod
command returns pods that seem to be stuck in the Init
or ContainerCreating
state.
-
Run the
kubectl get pod -o wide
command to identify the node where the pod is not starting.kubectl get pod -A -o wide | egrep 'Init|ContainerCreating' services cray-sls-58cfdb7c46-b7dbj 0/2 Init:0/2 0 2d22h 10.39.0.165 ncn-w002 <none> <none> services gitea-vcs-65c98746b-jk5v7 0/2 ContainerCreating 0 2d3h 10.47.0.104 ncn-w002 <none> <none>
In the above example,
ncn-w002
is the node that may need attention.
Execute the following steps on the node that was determined in the previous step.
-
Restart the
kubelet
service.systemctl restart kubelet
-
Ensure that
kubelet
is running.systemctl status kubelet
-
Restart the
containerd
service.systemctl restart containerd
-
Ensure that
containerd
is running.systemctl status containerd
Try running the kubectl get pod
command again; within a few minutes, the pods should transition to the Running
state.