New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix disruption controller permissions to allow patching pod's status #113580
Fix disruption controller permissions to allow patching pod's status #113580
Conversation
@mimowo: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @alculquicondor @liggitt |
d0910e9
to
6f54848
Compare
}, | ||
} | ||
if utilfeature.DefaultFeatureGate.Enabled(features.PodDisruptionConditions) { | ||
role.Rules = append(role.Rules, rbacv1helpers.NewRule("patch").Groups(legacyGroup).Resources("pods/status").RuleOrDie()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised to see this added to a controller that did not have pod delete permissions... I thought the conditions were being added by the actors that were going to delete the pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This controller is meant to disable a stale pod condition, It changes the status to False
if the pod is non terminating (there was no successful delete) within 2min. This is a best effort tool that aims to resolve the issue that some in some situations, when the delete request fails, for example during preemption, then the condition might be left misleading the job's pod failure policy. However, the disruption controller does not guarantee removal as the pod may terminate within 2min for whatever reason.
/lgtm This also affects 1.25, right? |
Right, but only in Alpha so I don't think it requires a cherry-pick. Not to cherry-pick was the decision in case of a similar PR: #112518 |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #113578
Special notes for your reviewer:
The issue and the fix are very similar to this one: #112517.
There are no tests added in this PR. The permissions for disruption controller are tested by the
policy_test.go
unit test, but only for the feature gates enabled by default. The test asserts in thecontroller-role-bindings.yaml
will be adjusted when the feature gradutes to Beta in the PR: #113360Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: