New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort kubelet pods by their creation time #113041
Sort kubelet pods by their creation time #113041
Conversation
b7fc175
to
1bdd40e
Compare
do we want a test to prevent future regressions? |
cc @harche due to evented PLEG work |
/triage accepted |
There is a corner case when blocking Pod termination via a lifecycle preStop hook, for example by using this StateFulSet: ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: ubi serviceName: "ubi" replicas: 1 template: metadata: labels: app: ubi spec: terminationGracePeriodSeconds: 1000 containers: - name: ubi image: ubuntu:22.04 command: ['sh', '-c', 'echo The app is running! && sleep 360000'] ports: - containerPort: 80 name: web lifecycle: preStop: exec: command: - /bin/sh - -c - 'echo aaa; trap : TERM INT; sleep infinity & wait' ``` After creation, downscaling, forced deletion and upscaling of the replica like this: ``` > kubectl apply -f sts.yml > kubectl scale sts web --replicas=0 > kubectl delete pod web-0 --grace-period=0 --force > kubectl scale sts web --replicas=1 ``` We will end up having two pods running by the container runtime, while the API only reports one: ``` > kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 92s ``` ``` > sudo crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default) d90088614c73b 12 minutes ago Ready web-0 default 0 (default) ``` When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`. This is caused by the container lookup via its name (and no podUID) at: https://github.com/kubernetes/kubernetes/blob/02109414e8816ceefce775e8b627d67cad6ced85/pkg/kubelet/kubelet_pods.go#L1905-L1914 And more specifiy by the conversion of the pod result map to a slice in `GetPods`: https://github.com/kubernetes/kubernetes/blob/02109414e8816ceefce775e8b627d67cad6ced85/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L407-L411 We now solve that unexpected behavior by tracking the creation time of the pod and sorting the result based on that. This will cause to always match the most recently created pod. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
1bdd40e
to
b296f82
Compare
Sure, I added a unit test to cover that scenario. An e2e test would be more like a trial and error since the order of the map is based on their hashes. |
/lgtm IMO, unit test is enough. @matthyx agree? |
/retest |
/test pull-kubernetes-e2e-kind-ipv6 |
Yes |
/lgtm |
For approval: PTAL @Random-Liu @dchen1107 @derekwaynecarr @yujuhong @sjenning @mrunalp @klueska |
/lgtm |
I like this change as there is some predictability in the order! This is a small enough change that i am happy to get this going. Thanks for the test case as well @saschagrunert /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
There is a corner case when blocking Pod termination via a lifecycle preStop hook, for example by using this StateFulSet:
After creation, downscaling, forced deletion and upscaling of the replica like this:
We will end up having two pods running by the container runtime, while the API only reports one:
When now running
kubectl exec -it web-0 -- ps -ef
, there is a random chance that we hit the wrong container reporting the lifecycle command/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait
.This is caused by the container lookup via its name (and no podUID) at:
kubernetes/pkg/kubelet/kubelet_pods.go
Lines 1905 to 1914 in 0210941
And more specifiy by the conversion of the pod result map to a slice in
GetPods
:kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
Lines 407 to 411 in 0210941
We now solve that unexpected behavior by tracking the creation time of the pod and sorting the result based on that. This will cause to always match the most recently created pod.
Which issue(s) this PR fixes:
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2090782
Special notes for your reviewer:
cc @harche @rphillips
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: