New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump default burst limit for discovery client to 300 #109141
Bump default burst limit for discovery client to 300 #109141
Conversation
Hi @ulucinar. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
553dd90
to
534427f
Compare
/assign @Jefftree |
/ok-to-test |
@deads2k @caesarxuchao Hello! Is there a blocker to get this merged? |
We do have a fix for the discovery burst coming this release so this may not be necessary. There might be some opinions around this PR for increasing the burst though. /assign @lavalamp |
Yeah this is a great question, what do you think @deads2k, APF has been beta and default-on for a while now. It's missing features for watch still but discovery doesn't do watches. I think for most clusters burst of 300 and disabling client side rate limits completely is not that different. And for the rest of clusters, people won't want it to break at 300 group versions. So I think this is a good first place to just disable the rate limit completely. |
@wojtek-t may also have an opinion about disabling this rate limit. |
I think a high burst, even with a fairly high recharge rate, is different than no limit at all. I would rather go to 300 than remove it entirely at this stage.
We made this a GA criteria, but we haven't tested that way yet. I agree that discovery is a good place to start, but I'd rather have an organized, "test this with p&f" before removing the limit. |
if config.Burst == 0 && config.QPS < 100 { | ||
// if a burst limit is not already configured and | ||
// an avg. rate of `defaultBurst` qps is not already configured | ||
if config.Burst == 0 && config.QPS < defaultBurst { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition here is a bit weird, since if burst is already set to something like 2, it won't get fixed?
And if burst is set to zero, likely qps will also be zero, and then we don't modify qps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrmm... trying to page that back in...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent was two-fold:
- this should be a default, so if Burst is explicitly set we don't want to override it
- I vaguely remember the burst bucket defaulting to match QPS at some layer (at least at some point in time), so the
config.QPS < 100
guard was to prevent us from defaulting to a burst value lower than what would have been auto-selected. I'm not sure if this guard is still necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Kept the config.Burst == 0
part of the conjunction and removed the config.QPS < defaultBurst
condition.
It's pretty early in the release cycle now... |
+1
We're planning to have something done in this area this cycle (#109614 would the first thing to scale-test), but without knowing where exactly we're not, I'm against disabling it completely now. |
I don't understand the goal of rate limiting the discovery. Are people mis-using the discovery? are they mis-using on purpose? What problem does this address? |
I've seen many bugs where some people misconfigured their components/scripts etc. and were overloading control-plane. |
They are a little bit, they are often memory cached, which means they will re-trigger every time you restart your binary (including kubectl) and the rate-limiter also doesn't protect you from that. They are much more static than most other APIs in kubernetes, so people are less likely to poll from it. Would it make sense to just disable the rate-limiter for discovery in kubectl? Would that help? |
Shouldn't server-side API priority and fairness address this now?
We've already bumped it to 300 in kubectl. I'm open to disabling it there, but we'd also like to at least see the discovery rate-limit bumped (or removed) in client-go. There are plenty of other Go clients (Helm comes to mind) that hit these discovery rate limits and provide a bad user experience; if it's safe to update the default I'd prefer to do that here in client-go rather than chase down and ask every client to override the defaults. |
I don't disagree that we don't want to fix it in all clients. I'm still not sure I really understand what we get with the limiter. |
The comments here are rehashing things addressed in previous comments: #109141 (comment) |
I also think we should remove this limit completely, but this is the improvement we can all agree on for the moment. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, ulucinar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
See: kubernetes/kubernetes#109141 Signed-off-by: Laszlo Uveges <laszlo@giantswarm.io>
See: kubernetes/kubernetes#109141 Signed-off-by: Laszlo Uveges <laszlo@giantswarm.io>
* Fixes crossplane-contrib#159 * No more client-side throttling timeouts like ``` I1128 12:44:10.336621 1 request.go:665] Waited for 11.188964474s due to client-side throttling, not priority and fairness, request: GET:https://10.255.0.1:443/apis/cloudidentity.cnrm.cloud.google.com/v1beta1?timeout=32s I1128 12:44:20.345096 1 request.go:665] Waited for 5.989111872s due to client-side throttling, not priority and fairness, request: GET:https://10.255.0.1:443/apis/binaryauthorization.cnrm.cloud.google.com/v1beta1?timeout=32s I1128 12:44:31.086707 1 request.go:665] Waited for 1.174828735s due to client-side throttling, not priority and fairness, request: GET:https://10.255.0.1:443/apis/servicedirectory.cnrm.cloud.google.com/v1beta1?timeout=32s ``` * 300 value is aligned with associated kubectl fix at kubernetes/kubernetes#109141 Signed-off-by: Yury Tsarev <yury@upbound.io>
What type of PR is this?
What this PR does / why we need it:
/kind cleanup
/sig cli
The burst limit of the token bucket rate limiter of the discovery client for kubectl has been previously bumped to 300 here. Similar to that, this PR proposes to bump the burst limit of the default discovery client to 300. With a large (over 100) number of GroupVersions in the cluster, we are now in a better situation with kubectl as it now uses
tbrl(b=300, r=50.0 qps)
but other clients, such as Helm still experience throttling during the discovery phase.As mentioned in this comment, it's not a good strategy to bump tbrl parameters as CRD use cases evolve but this PR proposes the limit currently in use by kubectl as the default, and in my opinion, this will allow a consistent experience among API server clients like kubectl, Helm and others. Currently in cases where kubectl is not throttled, other clients (which are using the default discovery client tbrl parameters) are being throttled.
Having proposed a bump for the default burst limit, I'd like to also ask why we do not remove the rate limiter from the discovery client as we have APF now.
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: