Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disruption controller permissions to allow patching pod's status #113580

Merged

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Nov 3, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #113578

Special notes for your reviewer:

The issue and the fix are very similar to this one: #112517.
There are no tests added in this PR. The permissions for disruption controller are tested by the policy_test.go unit test, but only for the feature gates enabled by default. The test asserts in the controller-role-bindings.yaml will be adjusted when the feature gradutes to Beta in the PR: #113360

Does this PR introduce a user-facing change?

Fix that disruption controller changes the status of a stale disruption condition after 2 min when the PodDisruptionConditions feature gate is enabled

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 3, 2022
@k8s-ci-robot
Copy link
Contributor

@mimowo: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 3, 2022
@mimowo
Copy link
Contributor Author

mimowo commented Nov 3, 2022

/assign @alculquicondor @liggitt

@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 3, 2022
@mimowo mimowo force-pushed the fix-disruption-controller-permissions branch from d0910e9 to 6f54848 Compare November 3, 2022 09:19
},
}
if utilfeature.DefaultFeatureGate.Enabled(features.PodDisruptionConditions) {
role.Rules = append(role.Rules, rbacv1helpers.NewRule("patch").Groups(legacyGroup).Resources("pods/status").RuleOrDie())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised to see this added to a controller that did not have pod delete permissions... I thought the conditions were being added by the actors that were going to delete the pod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controller is meant to disable a stale pod condition, It changes the status to False if the pod is non terminating (there was no successful delete) within 2min. This is a best effort tool that aims to resolve the issue that some in some situations, when the delete request fails, for example during preemption, then the condition might be left misleading the job's pod failure policy. However, the disruption controller does not guarantee removal as the pod may terminate within 2min for whatever reason.

@alculquicondor
Copy link
Member

/lgtm

This also affects 1.25, right?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2022
@mimowo
Copy link
Contributor Author

mimowo commented Nov 3, 2022

/lgtm

This also affects 1.25, right?

Right, but only in Alpha so I don't think it requires a cherry-pick. Not to cherry-pick was the decision in case of a similar PR: #112518

@liggitt
Copy link
Member

liggitt commented Nov 3, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 3, 2022
@k8s-ci-robot k8s-ci-robot merged commit 208b2b7 into kubernetes:master Nov 4, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 4, 2022
@mimowo mimowo deleted the fix-disruption-controller-permissions branch March 18, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
4 participants