New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353
Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353
Conversation
Update the maximum sync backoff value to 1000s to match the sequence of delays expected by the endpointslice controller when syncing Services: Before this change the sequence was: > 1s, 2s, 4s, 8s, 16s, 32s, 64s, 100s Now it is: > 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s, 1000s Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
@dgrisonnet: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't have enough context to help with review of this change. |
I don't have any context either, but I personally feel that a human-made comment detailing the exact delay sequence of the controller is more trustworthy than a potential typo. |
@@ -69,7 +69,7 @@ const ( | |||
// defaultSyncBackOff is the default backoff period for syncService calls. | |||
defaultSyncBackOff = 1 * time.Second | |||
// maxSyncBackOff is the max backoff period for syncService calls. | |||
maxSyncBackOff = 100 * time.Second | |||
maxSyncBackOff = 1000 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was introduced here:
https://github.com/kubernetes/kubernetes/pull/89438/files
This backoff is used when we fail the operation:
https://github.com/kubernetes/kubernetes/pull/89438/files#diff-8765679aa8ca8cc500a6dc038b8707cbc29ddc2d2f94f1da911e2cd5e4df5816R104
If we can't update ES within 100s backoff, we're really in big troubles or this is some kind of permanent error (like too big object or sth). Extending this to 1000s in those cases makes sense to me.
Although I'm not sure if the original numbers where backed by some real world scenarios...
/lgtm [Cluster failed to start] |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Update the maximum sync backoff value to 1000s to match the sequence of delays expected by the endpointslice controller when syncing Services:
Before this change the sequence was:
Now it is:
Special notes for your reviewer:
The expectation for the sequence of delays was detailed in the following comment but in practice, the queue was configured for a shorter sequence.
kubernetes/pkg/controller/endpointslice/endpointslice_controller.go
Line 59 in 2b2be7f
Does this PR introduce a user-facing change?
/cc @aojea