Fix disruption controller permissions to allow patching pod's status #113580

mimowo · 2022-11-03T08:49:44Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The issue and the fix are very similar to this one: #112517.
There are no tests added in this PR. The permissions for disruption controller are tested by the policy_test.go unit test, but only for the feature gates enabled by default. The test asserts in the controller-role-bindings.yaml will be adjusted when the feature gradutes to Beta in the PR: #113360

Does this PR introduce a user-facing change?

Fix that disruption controller changes the status of a stale disruption condition after 2 min when the PodDisruptionConditions feature gate is enabled

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2022-11-03T08:49:52Z

@mimowo: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mimowo · 2022-11-03T08:50:26Z

/assign @alculquicondor @liggitt

liggitt · 2022-11-03T12:42:59Z

plugin/pkg/auth/authorizer/rbac/bootstrappolicy/controller_policy.go

+			},
+		}
+		if utilfeature.DefaultFeatureGate.Enabled(features.PodDisruptionConditions) {
+			role.Rules = append(role.Rules, rbacv1helpers.NewRule("patch").Groups(legacyGroup).Resources("pods/status").RuleOrDie())


I'm a little surprised to see this added to a controller that did not have pod delete permissions... I thought the conditions were being added by the actors that were going to delete the pod

This controller is meant to disable a stale pod condition, It changes the status to False if the pod is non terminating (there was no successful delete) within 2min. This is a best effort tool that aims to resolve the issue that some in some situations, when the delete request fails, for example during preemption, then the condition might be left misleading the job's pod failure policy. However, the disruption controller does not guarantee removal as the pod may terminate within 2min for whatever reason.

alculquicondor · 2022-11-03T14:39:16Z

/lgtm

This also affects 1.25, right?

mimowo · 2022-11-03T14:51:42Z

/lgtm

This also affects 1.25, right?

Right, but only in Alpha so I don't think it requires a cherry-pick. Not to cherry-pick was the decision in case of a similar PR: #112518

liggitt · 2022-11-03T21:36:06Z

/approve

k8s-ci-robot · 2022-11-03T21:36:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~plugin/pkg/auth/authorizer/OWNERS~~ [liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 3, 2022

k8s-ci-robot assigned alculquicondor and liggitt Nov 3, 2022

k8s-ci-robot requested review from enj and mikedanese November 3, 2022 08:50

k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 3, 2022

Fix disruption controller permissions to allow patching pod's status

6f54848

mimowo force-pushed the fix-disruption-controller-permissions branch from d0910e9 to 6f54848 Compare November 3, 2022 09:19

liggitt reviewed Nov 3, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 3, 2022

k8s-ci-robot merged commit 208b2b7 into kubernetes:master Nov 4, 2022

k8s-ci-robot added this to the v1.26 milestone Nov 4, 2022

alculquicondor mentioned this pull request Nov 15, 2022

Retriable and non-retriable Pod failures for Jobs kubernetes/enhancements#3329

Open

8 tasks

mimowo deleted the fix-disruption-controller-permissions branch March 18, 2023 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix disruption controller permissions to allow patching pod's status #113580

Fix disruption controller permissions to allow patching pod's status #113580

mimowo commented Nov 3, 2022 •

edited

k8s-ci-robot commented Nov 3, 2022

mimowo commented Nov 3, 2022

liggitt Nov 3, 2022

mimowo Nov 3, 2022

alculquicondor commented Nov 3, 2022

mimowo commented Nov 3, 2022 •

edited

liggitt commented Nov 3, 2022

k8s-ci-robot commented Nov 3, 2022

Fix disruption controller permissions to allow patching pod's status #113580

Fix disruption controller permissions to allow patching pod's status #113580

Conversation

mimowo commented Nov 3, 2022 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Nov 3, 2022

mimowo commented Nov 3, 2022

liggitt Nov 3, 2022

Choose a reason for hiding this comment

mimowo Nov 3, 2022

Choose a reason for hiding this comment

alculquicondor commented Nov 3, 2022

mimowo commented Nov 3, 2022 • edited

liggitt commented Nov 3, 2022

k8s-ci-robot commented Nov 3, 2022

mimowo commented Nov 3, 2022 •

edited

mimowo commented Nov 3, 2022 •

edited