Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353

dgrisonnet · 2022-09-09T14:35:26Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Update the maximum sync backoff value to 1000s to match the sequence of delays expected by the endpointslice controller when syncing Services:

Before this change the sequence was:

1s, 2s, 4s, 8s, 16s, 32s, 64s, 100s

Now it is:

1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s, 1000s

Special notes for your reviewer:

The expectation for the sequence of delays was detailed in the following comment but in practice, the queue was configured for a shorter sequence.

kubernetes/pkg/controller/endpointslice/endpointslice_controller.go

Line 59 in 2b2be7f

// 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s, 1000s (max)

Does this PR introduce a user-facing change?

Increase the maximum backoff delay of the endpointslice controller to match the expected sequence of delays when syncing Services.

/cc @aojea

Update the maximum sync backoff value to 1000s to match the sequence of delays expected by the endpointslice controller when syncing Services: Before this change the sequence was: > 1s, 2s, 4s, 8s, 16s, 32s, 64s, 100s Now it is: > 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s, 1000s Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

k8s-ci-robot · 2022-09-09T14:35:34Z

@dgrisonnet: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2022-09-09T15:04:46Z

/assign @wojtek-t @robscott
they have more context, I think they came up with the number based on real feedback

robscott · 2022-09-12T16:08:05Z

This change looks reasonable to me, but I'll have to defer to @wojtek-t or @mborsz here as they have a better handle on the scalability side of this.

/cc @mborsz

mborsz · 2022-09-13T12:47:24Z

I don't have enough context to help with review of this change.

dgrisonnet · 2022-09-13T15:50:13Z

I don't have any context either, but I personally feel that a human-made comment detailing the exact delay sequence of the controller is more trustworthy than a potential typo.

wojtek-t · 2022-09-14T13:32:35Z

pkg/controller/endpointslice/endpointslice_controller.go

@@ -69,7 +69,7 @@ const (
 	// defaultSyncBackOff is the default backoff period for syncService calls.
 	defaultSyncBackOff = 1 * time.Second
 	// maxSyncBackOff is the max backoff period for syncService calls.
-	maxSyncBackOff = 100 * time.Second
+	maxSyncBackOff = 1000 * time.Second


It was introduced here:
https://github.com/kubernetes/kubernetes/pull/89438/files

This backoff is used when we fail the operation:
https://github.com/kubernetes/kubernetes/pull/89438/files#diff-8765679aa8ca8cc500a6dc038b8707cbc29ddc2d2f94f1da911e2cd5e4df5816R104

If we can't update ES within 100s backoff, we're really in big troubles or this is some kind of permanent error (like too big object or sth). Extending this to 1000s in those cases makes sense to me.

Although I'm not sure if the original numbers where backed by some real world scenarios...

wojtek-t · 2022-09-14T13:32:59Z

/lgtm
/approve

[Cluster failed to start]
/retest

k8s-ci-robot · 2022-09-14T13:33:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/endpointslice/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested a review from aojea September 9, 2022 14:35

k8s-ci-robot assigned robscott and wojtek-t Sep 9, 2022

k8s-ci-robot requested a review from mborsz September 12, 2022 16:08

wojtek-t reviewed Sep 14, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 14, 2022

k8s-ci-robot merged commit c7d47e4 into kubernetes:master Sep 14, 2022

k8s-ci-robot added this to the v1.26 milestone Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353

Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353

dgrisonnet commented Sep 9, 2022

k8s-ci-robot commented Sep 9, 2022

aojea commented Sep 9, 2022

robscott commented Sep 12, 2022

mborsz commented Sep 13, 2022

dgrisonnet commented Sep 13, 2022

wojtek-t Sep 14, 2022

wojtek-t commented Sep 14, 2022

k8s-ci-robot commented Sep 14, 2022

Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353

Update endpointslice controller maximum sync backoff delay to match expected sequence of delays #112353

Conversation

dgrisonnet commented Sep 9, 2022

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Sep 9, 2022

aojea commented Sep 9, 2022

robscott commented Sep 12, 2022

mborsz commented Sep 13, 2022

dgrisonnet commented Sep 13, 2022

wojtek-t Sep 14, 2022

Choose a reason for hiding this comment

wojtek-t commented Sep 14, 2022

k8s-ci-robot commented Sep 14, 2022