New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add worker to clean up stale DisruptionTarget condition #111475
Add worker to clean up stale DisruptionTarget condition #111475
Conversation
/remove-sig api-machinery |
53b272e
to
a403d60
Compare
a403d60
to
5f09ef0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, controller changes
/lgtm
/approve
return | ||
} | ||
dc.stalePodDisruptionQueue.AddAfter(key, d) | ||
klog.InfoS("Enqueued pod for stale DisruptionTarget condition cleanup", "pod", klog.KObj(pod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: most of the debugging info are at level 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, left over from my debugging.
dc.queue.Forget(key) | ||
return true | ||
} | ||
utilruntime.HandleError(fmt.Errorf("syncing Pod %v to clear DisruptionTarget condition, requeueing: %v", key.(string), err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utilruntime.HandleError(fmt.Errorf("syncing Pod %v to clear DisruptionTarget condition, requeueing: %v", key.(string), err)) | |
utilruntime.HandleError(fmt.Errorf("error syncing Pod %v to clear DisruptionTarget condition, requeueing: %v", key.(string), err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat disagree on this. This is already an error
, so it shouldn't need the word error again. And if some day the implementation of HandleError changes and wraps the given error, it wouldn't read well.
But sure, given the current implementation of HandleError, your recommendation makes sense.
/hold |
@@ -181,9 +187,8 @@ func newFakeDisruptionController() (*disruptionController, *pdbStates) { | |||
dc.rsListerSynced = alwaysReady | |||
dc.dListerSynced = alwaysReady | |||
dc.ssListerSynced = alwaysReady | |||
ctx := context.TODO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from where ctx.Done()
is coming after removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's coming from the test. See newFakeDisruptionControllerWithTime
To be able to write more precise unit tests in the future Change-Id: I8f45947dfacca501acd856849bd978fad0f735cd
7fe1b17
to
bb2353d
Compare
Change-Id: I907fbdf01e7ff08d823fb23aa168ff271d8ff1ee
bb2353d
to
4188d9b
Compare
/assign @lavalamp |
/approve for client |
/hold cancel |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/remove-kind api-change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, janetkuo, lavalamp, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This is a requirement from #110959.
It is possible that controllers that add the DisruptionTarget condition and fail before being able to Delete a pod. When the controller restarts, it might take a different decision, leaving a stale condition. This worker, added to the disruption controller, clears this condition if a DeletionTimestamp is not added to the pod after 2 minutes.
Which issue(s) this PR fixes:
Ref kubernetes/enhancements#3329
Special notes for your reviewer:
The first commit introduces a clock interface to the controller in order to write the unit tests more accurately. I proposed it initially as a separate PR in #111447, but it seems small enough to be in this PR as a separate commit.
This functionality is purposely not gated. This is because if the feature gate PodDisruptionConditions is disabled, we still want to clear the condition.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: