Bump default burst limit for discovery client to 300 #109141

ulucinar · 2022-03-30T08:52:35Z

What type of PR is this?

What this PR does / why we need it:

/kind cleanup
/sig cli

The burst limit of the token bucket rate limiter of the discovery client for kubectl has been previously bumped to 300 here. Similar to that, this PR proposes to bump the burst limit of the default discovery client to 300. With a large (over 100) number of GroupVersions in the cluster, we are now in a better situation with kubectl as it now uses tbrl(b=300, r=50.0 qps) but other clients, such as Helm still experience throttling during the discovery phase.

As mentioned in this comment, it's not a good strategy to bump tbrl parameters as CRD use cases evolve but this PR proposes the limit currently in use by kubectl as the default, and in my opinion, this will allow a consistent experience among API server clients like kubectl, Helm and others. Currently in cases where kubectl is not throttled, other clients (which are using the default discovery client tbrl parameters) are being throttled.

Having proposed a bump for the default burst limit, I'd like to also ask why we do not remove the rate limiter from the discovery client as we have APF now.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Default burst limit for the discovery client is now 300.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2022-03-30T08:52:43Z

Hi @ulucinar. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>

fedebongio · 2022-03-31T16:36:45Z

/assign @Jefftree
/triage accepted

Jefftree · 2022-03-31T22:32:32Z

/ok-to-test

muvaf · 2022-06-02T19:48:07Z

@deads2k @caesarxuchao Hello! Is there a blocker to get this merged?

Jefftree · 2022-06-03T16:08:47Z

We do have a fix for the discovery burst coming this release so this may not be necessary. There might be some opinions around this PR for increasing the burst though.

/assign @lavalamp

lavalamp · 2022-06-03T17:03:14Z

I'd like to also ask why we do not remove the rate limiter from the discovery client as we have APF now.

Yeah this is a great question, what do you think @deads2k, APF has been beta and default-on for a while now. It's missing features for watch still but discovery doesn't do watches.

I think for most clusters burst of 300 and disabling client side rate limits completely is not that different. And for the rest of clusters, people won't want it to break at 300 group versions. So I think this is a good first place to just disable the rate limit completely.

lavalamp · 2022-06-03T17:05:46Z

@wojtek-t may also have an opinion about disabling this rate limit.

deads2k · 2022-06-03T19:04:41Z

I think for most clusters burst of 300 and disabling client side rate limits completely is not that different.

I think a high burst, even with a fairly high recharge rate, is different than no limit at all. I would rather go to 300 than remove it entirely at this stage.

APF has been beta and default-on for a while now. It's missing features for watch still but discovery doesn't do watches.

We made this a GA criteria, but we haven't tested that way yet. I agree that discovery is a good place to start, but I'd rather have an organized, "test this with p&f" before removing the limit.

lavalamp · 2022-06-03T21:18:12Z

staging/src/k8s.io/client-go/discovery/discovery_client.go

-	if config.Burst == 0 && config.QPS < 100 {
+	// if a burst limit is not already configured and
+	// an avg. rate of `defaultBurst` qps is not already configured
+	if config.Burst == 0 && config.QPS < defaultBurst {


The condition here is a bit weird, since if burst is already set to something like 2, it won't get fixed?

And if burst is set to zero, likely qps will also be zero, and then we don't modify qps?

It looks like @liggitt added this condition in 711dc0b. Perhaps he can provide some context?

hrmm... trying to page that back in...

I think the intent was two-fold:

this should be a default, so if Burst is explicitly set we don't want to override it

I vaguely remember the burst bucket defaulting to match QPS at some layer (at least at some point in time), so the config.QPS < 100 guard was to prevent us from defaulting to a burst value lower than what would have been auto-selected. I'm not sure if this guard is still necessary

Thanks for the clarification. Kept the config.Burst == 0 part of the conjunction and removed the config.QPS < defaultBurst condition.

lavalamp · 2022-06-03T21:18:46Z

I agree that discovery is a good place to start, but I'd rather have an organized, "test this with p&f" before removing the limit.

It's pretty early in the release cycle now...

wojtek-t · 2022-06-06T08:41:23Z

We made this a GA criteria, but we haven't tested that way yet. I agree that discovery is a good place to start, but I'd rather have an organized, "test this with p&f" before removing the limit.

+1

It's pretty early in the release cycle now...

We're planning to have something done in this area this cycle (#109614 would the first thing to scale-test), but without knowing where exactly we're not, I'm against disabling it completely now.

apelisse · 2022-07-25T15:24:06Z

I don't understand the goal of rate limiting the discovery. Are people mis-using the discovery? are they mis-using on purpose? What problem does this address?

wojtek-t · 2022-07-25T15:51:47Z

I don't understand the goal of rate limiting the discovery. Are people mis-using the discovery? are they mis-using on purpose? What problem does this address?

I've seen many bugs where some people misconfigured their components/scripts etc. and were overloading control-plane.
Discovery API is not different wrt it.

apelisse · 2022-07-25T15:55:23Z

Discovery API is not different wrt it.

They are a little bit, they are often memory cached, which means they will re-trigger every time you restart your binary (including kubectl) and the rate-limiter also doesn't protect you from that. They are much more static than most other APIs in kubernetes, so people are less likely to poll from it.

Would it make sense to just disable the rate-limiter for discovery in kubectl? Would that help?

negz · 2022-07-25T22:30:56Z

I've seen many bugs where some people misconfigured their components/scripts etc. and were overloading control-plane.

Shouldn't server-side API priority and fairness address this now?

Would it make sense to just disable the rate-limiter for discovery in kubectl? Would that help?

We've already bumped it to 300 in kubectl. I'm open to disabling it there, but we'd also like to at least see the discovery rate-limit bumped (or removed) in client-go. There are plenty of other Go clients (Helm comes to mind) that hit these discovery rate limits and provide a bad user experience; if it's safe to update the default I'd prefer to do that here in client-go rather than chase down and ask every client to override the defaults.

apelisse · 2022-07-26T16:15:24Z

I don't disagree that we don't want to fix it in all clients. I'm still not sure I really understand what we get with the limiter.

lavalamp · 2022-07-26T16:40:43Z

The comments here are rehashing things addressed in previous comments: #109141 (comment)

lavalamp · 2022-07-26T16:41:59Z

I also think we should remove this limit completely, but this is the improvement we can all agree on for the moment.

/lgtm
/approve

k8s-ci-robot · 2022-07-26T16:42:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, ulucinar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/cli-runtime/OWNERS~~ [lavalamp]
~~staging/src/k8s.io/client-go/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

See: kubernetes/kubernetes#109141

See: kubernetes/kubernetes#109141 Signed-off-by: Laszlo Uveges <laszlo@giantswarm.io>

* Fixes crossplane-contrib#159 * No more client-side throttling timeouts like ``` I1128 12:44:10.336621 1 request.go:665] Waited for 11.188964474s due to client-side throttling, not priority and fairness, request: GET:https://10.255.0.1:443/apis/cloudidentity.cnrm.cloud.google.com/v1beta1?timeout=32s I1128 12:44:20.345096 1 request.go:665] Waited for 5.989111872s due to client-side throttling, not priority and fairness, request: GET:https://10.255.0.1:443/apis/binaryauthorization.cnrm.cloud.google.com/v1beta1?timeout=32s I1128 12:44:31.086707 1 request.go:665] Waited for 1.174828735s due to client-side throttling, not priority and fairness, request: GET:https://10.255.0.1:443/apis/servicedirectory.cnrm.cloud.google.com/v1beta1?timeout=32s ``` * 300 value is aligned with associated kubectl fix at kubernetes/kubernetes#109141 Signed-off-by: Yury Tsarev <yury@upbound.io>

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 30, 2022

k8s-ci-robot requested review from caesarxuchao and deads2k March 30, 2022 08:53

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Mar 30, 2022

Bump default burst limit for discovery client to 300

534427f

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>

ulucinar force-pushed the bump-discovery-burst branch from 553dd90 to 534427f Compare March 30, 2022 08:54

k8s-ci-robot assigned Jefftree Mar 31, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 31, 2022

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 31, 2022

k8s-ci-robot assigned lavalamp Jun 3, 2022

lavalamp reviewed Jun 3, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 26, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 26, 2022

k8s-ci-robot merged commit 5ac563c into kubernetes:master Jul 26, 2022

k8s-ci-robot added this to the v1.25 milestone Jul 26, 2022

negz mentioned this pull request Jul 27, 2022

Don't rate limit discovery clients by default? kubernetes-sigs/controller-runtime#1707

Closed

negz mentioned this pull request Aug 17, 2022

Support discovery cache or up both QPS/BurstQPS tektoncd/cli#1641

Closed

negz mentioned this pull request Aug 27, 2022

Encourage Kubernetes clients to bump discovery rate limits to 300+ rps crossplane/crossplane#3272

Closed

chobostar mentioned this pull request Sep 9, 2022

Update client-go to v0.25.0 OT-CONTAINER-KIT/redis-operator#331

Closed

uvegla added a commit to giantswarm/fluxcd-pkg that referenced this pull request Feb 9, 2023

Increase default burst to 300 to conform client-go

4243bb1

See: kubernetes/kubernetes#109141

uvegla added a commit to giantswarm/fluxcd-pkg that referenced this pull request Feb 9, 2023

Increase default burst to 300 to conform client-go

60c3306

See: kubernetes/kubernetes#109141

uvegla mentioned this pull request Feb 9, 2023

Increase default burst to 300 to conform client-go fluxcd/pkg#461

Merged

uvegla added a commit to giantswarm/fluxcd-pkg that referenced this pull request Feb 9, 2023

Increase default burst to 300 to conform client-go

e4ce34c

See: kubernetes/kubernetes#109141 Signed-off-by: Laszlo Uveges <laszlo@giantswarm.io>

chobostar mentioned this pull request Feb 14, 2023

Bug: Client-side throttling openshift/oadp-operator#902

Closed

1 task

uvegla added a commit to giantswarm/fluxcd-pkg that referenced this pull request Feb 14, 2023

Increase default burst to 300 to conform client-go

4d5a7eb

See: kubernetes/kubernetes#109141 Signed-off-by: Laszlo Uveges <laszlo@giantswarm.io>

kaovilai mentioned this pull request Feb 22, 2023

Update client-go to v0.25.0+ vmware-tanzu/velero#5906

Closed

ytsarev mentioned this pull request Feb 25, 2023

Configure higher burst limit to avoid client side throttling crossplane-contrib/provider-helm#179

Merged

2 tasks

tnthornton mentioned this pull request Feb 27, 2023

Configure higher burst and QPS to avoid client side throttling crossplane-contrib/provider-kubernetes#104

Merged

2 tasks

haarchri mentioned this pull request Mar 9, 2023

[Bug] Kyverno creates too many admissionReports kyverno/kyverno#6462

Closed

2 tasks

chobostar mentioned this pull request Mar 14, 2023

Client-side throttling, pod restarts - need to update client-go gatekeeper/gatekeeper-operator#240

Closed

vadasambar mentioned this pull request Aug 17, 2023

fix(workflow): match discovery burst and qps for kubectl with upstream kubectl binary argoproj/argo-workflows#11603

Merged

legal90 mentioned this pull request Nov 17, 2023

Throttling messages while running helm upgrade commands helm/helm#10560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump default burst limit for discovery client to 300 #109141

Bump default burst limit for discovery client to 300 #109141

ulucinar commented Mar 30, 2022 •

edited

k8s-ci-robot commented Mar 30, 2022

fedebongio commented Mar 31, 2022

Jefftree commented Mar 31, 2022

muvaf commented Jun 2, 2022

Jefftree commented Jun 3, 2022

lavalamp commented Jun 3, 2022

lavalamp commented Jun 3, 2022

deads2k commented Jun 3, 2022

lavalamp Jun 3, 2022

negz Jun 29, 2022

liggitt Jun 29, 2022

liggitt Jun 29, 2022

ulucinar Jul 14, 2022

lavalamp commented Jun 3, 2022

wojtek-t commented Jun 6, 2022

apelisse commented Jul 25, 2022

wojtek-t commented Jul 25, 2022

apelisse commented Jul 25, 2022 •

edited

negz commented Jul 25, 2022 •

edited

apelisse commented Jul 26, 2022

lavalamp commented Jul 26, 2022

lavalamp commented Jul 26, 2022

k8s-ci-robot commented Jul 26, 2022

Bump default burst limit for discovery client to 300 #109141

Bump default burst limit for discovery client to 300 #109141

Conversation

ulucinar commented Mar 30, 2022 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 30, 2022

fedebongio commented Mar 31, 2022

Jefftree commented Mar 31, 2022

muvaf commented Jun 2, 2022

Jefftree commented Jun 3, 2022

lavalamp commented Jun 3, 2022

lavalamp commented Jun 3, 2022

deads2k commented Jun 3, 2022

lavalamp Jun 3, 2022

Choose a reason for hiding this comment

negz Jun 29, 2022

Choose a reason for hiding this comment

liggitt Jun 29, 2022

Choose a reason for hiding this comment

liggitt Jun 29, 2022

Choose a reason for hiding this comment

ulucinar Jul 14, 2022

Choose a reason for hiding this comment

lavalamp commented Jun 3, 2022

wojtek-t commented Jun 6, 2022

apelisse commented Jul 25, 2022

wojtek-t commented Jul 25, 2022

apelisse commented Jul 25, 2022 • edited

negz commented Jul 25, 2022 • edited

apelisse commented Jul 26, 2022

lavalamp commented Jul 26, 2022

lavalamp commented Jul 26, 2022

k8s-ci-robot commented Jul 26, 2022

ulucinar commented Mar 30, 2022 •

edited

apelisse commented Jul 25, 2022 •

edited

negz commented Jul 25, 2022 •

edited