Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Mark ready condition as false explicitly for terminal pods #110256

Merged
merged 2 commits into from Jun 9, 2022

Conversation

bobbypage
Copy link
Member

@bobbypage bobbypage commented May 27, 2022

Terminal pods may continue to report a ready condition of true because
there is a delay in reconciling the ready condition of the containers
from the runtime with the pod status. It should be invalid for kubelet
to report a terminal phase with a true ready condition. To fix the
issue, explicitly override the ready condition to false for terminal
pods during status updates.

Signed-off-by: David Porter david@porter.me

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #108594

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a 1.22 regression kubelet issue that could result in invalid pod status updates to be sent to the api-server where pods would be reported in a terminal phase but also report a ready condition of true in some cases.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 27, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 27, 2022
@bobbypage
Copy link
Member Author

/retest

@robscott
Copy link
Member

/retest

@bobbypage
Copy link
Member Author

bobbypage commented May 28, 2022

xref: see #110257 where after lowering the pollInternval in the existing regression test for #108594 we were able to detect the invalid pod status update by kubelet of (phase=Failed, Ready=True).

@bobbypage
Copy link
Member Author

bobbypage commented May 28, 2022

/cc @smarterclayton @rphillips

@aojea
Copy link
Member

aojea commented May 28, 2022

/cc

@k8s-ci-robot k8s-ci-robot requested a review from aojea May 28, 2022 11:25
@rphillips
Copy link
Member

Great find!

@rphillips
Copy link
Member

/kind bug
/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 31, 2022
@bobbypage
Copy link
Member Author

Thanks Tim! I agree with you, ideally clients should ignore terminal pods, and if the pod is terminal ignore the ready condition in the first place. We can't guarantee that all clients will do that though.

However, at least pre-1.22 (before this regression was introduced), clients could assume that terminal pods would always report ready=false, which is no longer the case. This PR aims to revert to that behavior and ensure kubelet never reports an "invalid" status update of Phase=Terminal, Ready=True.

That said, I guess we've had this problem FOREVER, right? If you are examining pods and not handling Phase properly, this doesn't make you worse than you were before.

Yup, this doesn't make it any worse for clients ignoring phases but rather brings back the guarantee that terminal pods will always report a ready status of false.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I'm a big fan of leaving future-me clues.

We should move directories around so that podutil OWNERs don't need API approvers.

@@ -69,6 +71,15 @@ func GenerateContainersReadyCondition(spec *v1.PodSpec, containerStatuses []v1.C
}
}

// If the pod phase is failed, explicitly set the ready condition to false for containers since they may be in progress of terminating.
if podPhase == v1.PodFailed {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not also checking PodSucceeded ? Should we leave a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why does this not call generateContainersReadyConditionForTerminalPhase(podPhase) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not also checking PodSucceeded ? Should we leave a comment?

PodSucceded is already handled above on line 66, see here

And why does this not call generateContainersReadyConditionForTerminalPhase(podPhase) ?

I created that helper function to be used in status_manager.go to be able to handle either phase. I returned the condition directly inline here, since it's what done in the succeeded case above, but I agree with you, we can switch it to use the helper function as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated, based on your feedback and reused generateContainersReadyConditionForTerminalPhase for both succeeded and failed cases, so the condition generation logic is in one place.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobbypage, dchen1107, mrunalp, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2022
@bobbypage
Copy link
Member Author

/hold

to update based on feedback

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2022
bobbypage and others added 2 commits June 8, 2022 16:19
Terminal pods may continue to report a ready condition of true because
there is a delay in reconciling the ready condition of the containers
from the runtime with the pod status. It should be invalid for kubelet
to report a terminal phase with a true ready condition. To fix the
issue, explicitly override the ready condition to false for terminal
pods during status updates.

Signed-off-by: David Porter <david@porter.me>
Use a watch to detect invalid pod status updates in graceful node
shutdown node e2e test. By using a watch, all pod updates will be
captured while the previous logic required polling the api-server which
could miss some intermediate updates.

Signed-off-by: David Porter <david@porter.me>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 8, 2022
@bobbypage
Copy link
Member Author

/unhold

Pushed an additional revision to fix suggestions here

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2022
@thockin
Copy link
Member

thockin commented Jun 8, 2022

Assuming previous LGTM holds after minor changes:

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 8, 2022
@bobbypage
Copy link
Member Author

/retest

@bobbypage
Copy link
Member Author

Cherrypicks:

1.24 - #110479
1.23 - #110480
1.22 - #110481

k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…10256-upstream-release-1.23

Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for terminal pods
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…10256-upstream-release-1.22

Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for terminal pods
k8s-ci-robot added a commit that referenced this pull request Jun 10, 2022
…10256-upstream-release-1.24

Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for terminal pods
Status: v1.ConditionFalse,
Reason: PodCompleted,
}
return generateContainersReadyConditionForTerminalPhase(podPhase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back and looking through this, the len(unknownContainers) == 0 above has me worried.

There are two phases in play - the apiserver phase, and the one the Kubelet tracks. Setting phase to terminal on the apiserver is roughly the same outcome as deleting the pod - the kubelet should inexorably converge the actual pod state to stopped. Likewise if the Kubelet observes a failure or evicts the pod, the phase should inexorably converge. However, internal to the Kubelet status loop the phase should match the actual observed state of the pod containers, and so we should never be in a terminal state here without these containers being in some state we recognize (I think).

This is something I want to have laid out in the pod lifecycle clarification KEP so we can look at all the factors and eliminate my "shoulds" from above - it might be that the correct status when succeeded + > 1 unknown container = pod ready = false here, always. But it's super subtle, and we need this sort of stuff in a clear top level doc, not in comment threads in a PR. I'll add it to the list.

@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

Kubelet reports Ready=true for pod and containers conditions during kubelet level evictions