Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

winkernel proxier cache HNS data to improve syncProxyRules performance #109124

Merged

Conversation

daschott
Copy link
Contributor

@daschott daschott commented Mar 29, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Resolves issues with proxy rules taking a long time to be synced on Windows. This PR reduces the number of expensive calls into HNS for each rule sync cycle and improves the overall proxier performance on Windows.

Which issue(s) this PR fixes:

Fixes #109162

This addresses the issue of failing service connectivity after a new Windows node has joined a cluster by reducing the time it takes to sync rules needed to reach desired state. During testing, we can see that time taken by kube-proxy on Windows to complete creation of 2000 services of type ClusterIP on a Node with ~5000 remote endpoints and 15 local endpoints/pods, is reduced as follows:

Winkernel proxier Windows Server 2019 Windows Server 2022
Without improvements 114 minutes 192 minutes
With improvements 110 minutes 10 minutes
With improvements + OS updates (shipping separately) 22 minutes 1 minute

NOTE: The last row is the performance after applying the changes on a Node with an OS update installed, which is shipping separately through Windows Servicing channels.

Environment Details: The node used to test these changes was a Hyper-V VM with 8vCPU cores and 32GB RAM.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Reduced time taken to sync proxy rules on Windows kube-proxy with kernelspace mode

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 29, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @daschott!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 29, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @daschott. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 29, 2022
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Mar 29, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 29, 2022
@daschott
Copy link
Contributor Author

/sig windows

@k8s-ci-robot k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 29, 2022
@daschott daschott force-pushed the daschott/winkernel-perf-fix branch from 31ec38d to df80c5f Compare March 29, 2022 22:34
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Mar 29, 2022
@feiskyer
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 30, 2022
@jsturtevant
Copy link
Contributor

/sig windows
/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 30, 2022
@sbangari
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 16, 2022
@daschott
Copy link
Contributor Author

/retest

@marosset marosset added this to In Review (v1.25) in SIG-Windows May 4, 2022
@marosset
Copy link
Contributor

marosset commented May 4, 2022

/hold cancel
/milestone v1.25

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 4, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 4, 2022
@k8s-ci-robot k8s-ci-robot merged commit 889e60a into kubernetes:master May 4, 2022
SIG-Windows automation moved this from In Review (v1.25) to Done (v1.25) May 4, 2022
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…9124-upstream-release-1.23

Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…9124-upstream-release-1.24

Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…9124-upstream-release-1.22

Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules
jsturtevant added a commit to marosset/kubernetes that referenced this pull request Jun 14, 2022
… cache HNS data to improve syncProxyRules"
jsturtevant added a commit to marosset/kubernetes that referenced this pull request Jun 14, 2022
… cache HNS data to improve syncProxyRules"
k8s-ci-robot added a commit that referenced this pull request Jun 15, 2022
…ry-pick-of-#109124-upstream-release-1.22

Revert "Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules"
k8s-ci-robot added a commit that referenced this pull request Jun 15, 2022
…ry-pick-of-#109124-upstream-release-1.23

Revert "Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules"
k8s-ci-robot added a commit that referenced this pull request Jun 30, 2022
…9124-upstream-release-1.22

Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules
k8s-ci-robot added a commit that referenced this pull request Jun 30, 2022
…9124-upstream-release-1.23

Automated cherry pick of #109124: Winkernel proxier cache HNS data to improve syncProxyRules
@mastersingh24
Copy link

@daschott - which build of Windows Server 2019 has the updates required for the improvements?

@daschott
Copy link
Contributor Author

@mastersingh24 KB5014669 or above has the needed changes. However, unfortunately they are disabled by default on Windows Server 2019 and require a RegKey to enable them. Please contact Microsoft support or AKS support for assistance on how to set these.

On Windows Server 2022, KB5014665 or above has the needed changes. They are enabled by default and do not require setting any Regkeys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG-Windows
  
Done (v1.25)
Development

Successfully merging this pull request may close these issues.

Delay in processing HNS LB policies on kube-proxy start on Windows nodes results in unreachable services