Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce default gzip compression level from 4 to 1 in apiserver #112299

Merged
merged 1 commit into from Sep 7, 2022

Conversation

shyamjvs
Copy link
Member

@shyamjvs shyamjvs commented Sep 7, 2022

Part-1 of the proposal here - #112296

/kind feature
(for lack of a better match?)

Does this PR introduce a user-facing change?

kube-apiserver: gzip compression switched from level 4 to level 1 to improve large list call latencies in exchange for higher network bandwidth usage (10-50% higher). This increases the headroom before very large unpaged list calls exceed request timeout limits.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Detailed doc showing why this change is safe and useful - https://docs.google.com/document/d/1rMlYKOVyujboAEG2epxSYdx7eyevC7dypkD_kUlBxn4

/assign @lavalamp @deads2k
/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 7, 2022
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 7, 2022
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver labels Sep 7, 2022
defaultGzipContentEncodingLevel = 4
// defaultGzipContentEncodingLevel is set to 1 which uses least CPU compared to higher levels, yet offers
// similar compression ratios (off by at most 1.5x, but typically within 1.1x-1.3x). For further details see -
// https://docs.google.com/document/d/1rMlYKOVyujboAEG2epxSYdx7eyevC7dypkD_kUlBxn4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a tiny preference for mentioning the date when the SIG discussed it, or the issue you filed? I don't mind it, we just don't usually link to docs in the comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to the gh issue link. That should have all the details

@lavalamp
Copy link
Member

lavalamp commented Sep 7, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, shyamjvs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 7, 2022
@lavalamp
Copy link
Member

lavalamp commented Sep 7, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2022
@k8s-ci-robot k8s-ci-robot merged commit 8287e21 into kubernetes:master Sep 7, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Sep 7, 2022
@shyamjvs shyamjvs deleted the gzip-level-change branch September 7, 2022 22:30
@leilajal
Copy link
Contributor

leilajal commented Sep 8, 2022

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 8, 2022
@shyamjvs
Copy link
Member Author

shyamjvs commented Sep 9, 2022

@lavalamp Wdyt about backporting this change to previous versions too? This would also help with the "reflector unable to initialize within 60s timeout" problem that was discussed earlier under #111507

@shyamjvs
Copy link
Member Author

shyamjvs commented Sep 9, 2022

cc @mborsz

@lavalamp
Copy link
Member

lavalamp commented Sep 9, 2022

I could go either way on a backport, it's not really a bug? @liggitt may have a stronger opinion one way or the other

@mborsz
Copy link
Member

mborsz commented Sep 12, 2022

I agree that it's a gray area whether we should backport this or not, but given the relatively low risk and significant impact (we have seen a number of clusters hitting the "reflector unable to initialize within 60s timeout" due to unexpectedly low throughput of gzip with level=4) I personally would vote for backporting this.

@liggitt
Copy link
Member

liggitt commented Sep 12, 2022

I could also go either way.

I agree it's not exactly a functional bug, more an unexpected limitation. We know that at some point, creating enough objects will inevitably time out an unpaged request, even with gzip completely disabled. This change just increases the amount of headroom before we still hit a limit. That said, the risk does seem low, and there's not really a realistic workaround for someone hitting this limit on a cluster at large enough scale, so ~doubling the headroom is nice.

I was going to ask if we were seeing LIST requests of sufficient size to manifest the timeout with object counts within the bounds described by https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md#kubernetes-thresholds, but then noticed that things like secrets and configmaps have TBD entries for count limits :-/

@lavalamp
Copy link
Member

OK I finally thought of a way to decide: does this patch make it more or less likely to pass the conformance tests? I think the answer is "no change for most clusters; yes for clusters in particular resource constrained configurations".

I therefore (just barely) support backporting. Since this has had no negative impact on the scalability tests at head.

@shyamjvs
Copy link
Member Author

Thanks for driving down the ambiguity here @lavalamp and @liggitt. I'll shortly open the backport PRs.

k8s-ci-robot added a commit that referenced this pull request Sep 20, 2022
…2299-upstream-release-1.22

Automated cherry pick of #112299: Reduce default gzip compression level from 4 to 1 in
k8s-ci-robot added a commit that referenced this pull request Sep 20, 2022
…2299-upstream-release-1.23

Automated cherry pick of #112299: Reduce default gzip compression level from 4 to 1 in
k8s-ci-robot added a commit that referenced this pull request Sep 20, 2022
…2299-upstream-release-1.24

Automated cherry pick of #112299: Reduce default gzip compression level from 4 to 1 in
k8s-ci-robot added a commit that referenced this pull request Sep 20, 2022
…2299-upstream-release-1.25

Automated cherry pick of #112299: Reduce default gzip compression level from 4 to 1 in
chenchun pushed a commit to chenchun/kubernetes that referenced this pull request Mar 20, 2024
…!720)

PR112299-Reduce default gzip compression level from 4 to 1 in apiserver
Reduce default gzip compression level from 4 to 1 in apiserver

kubernetes#112299
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants