Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote APIServerIdentity to Beta #113629

Merged
merged 5 commits into from Nov 8, 2022

Conversation

andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Nov 4, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR promotes the APIServerIdentity feature, per KEP-1965 updates for v1.26:

In addition, it adds e2e tests which were part of the Beta graduation criterias and also removes kube-apiserver flags used to configure the lease duration and renewal interval.

The following Beta criterias have been met:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Promote the `APIServerIdentity` feature to Beta. By default, each kube-apiserver will now create a Lease in the `kube-system` namespace. These lease objects can be used to identify the number of active API servers in the cluster, and may also be used for future features such as the Storage Version API. 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 4, 2022
@sftim
Copy link
Contributor

sftim commented Nov 4, 2022

Changelog entry suggestion

Promote the `APIServerIdentity` feature to Beta. By default, each kube-apiserver will now create a Lease in the `kube-system` namespace. These lease objects can be used to identify the number of active API servers in the cluster, and may also be used for future features such as the Storage Version API. 

Not sure what StorageVersionAPI is; if possible, follow https://kubernetes.io/docs/contribute/style/style-guide/ when working out how to refer to it.

@andrewsykim andrewsykim force-pushed the apiserver-identity-beta branch 2 times, most recently from 669e27a to d781f1d Compare November 4, 2022 16:01
@andrewsykim
Copy link
Member Author

Changelog entry suggestion

Thanks for the suggestion! Updated :)

@andrewsykim
Copy link
Member Author

/hold

#113307 (comment) needs to be addressed first

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2022
@@ -480,9 +477,9 @@ func (c completedConfig) New(delegationTarget genericapiserver.DelegationTarget)
clock.RealClock{},
kubeClient,
holderIdentity,
int32(c.ExtraConfig.IdentityLeaseDurationSeconds),
3600,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Inside this function create a const for this an re-use it in the two spots:

const identityLeaseDurationSeconds = 3600

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

controlPlaneAddress := nodeAddresses[0]

host := controlPlaneAddress + ":" + e2essh.SSHPort
result, err := e2essh.SSH("hostname", host, framework.TestContext.Provider)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, that is one way to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using exec on pods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiserver pods generally aren't part of the cluster (static pods managed only by a kubelet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, exec a pod on that node as hostNetwork , not execing on the apiserver pod ... but maybe this is restricted, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl debug has tooling for running a more-privileged Pod on a node to learn about it; maybe we could use some of that approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, I commented because I was curious, we are close to code freeze and I don't mean to delay this PR

Comment on lines 126 to 128
h := fnv.New32a()
h.Write([]byte(strings.TrimSpace(hostname)))
leaseName := "kube-apiserver-" + fmt.Sprint(h.Sum32())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, but test will fail until the other PR merges

f.NamespacePodSecurityEnforceLevel = admissionapi.LevelPrivileged

ginkgo.It("kube-apiserver identity should persist after restart [Disruptive]", func() {
e2eskipper.SkipUnlessProviderIs("gce")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we do often?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially on tests where you have to ssh into the nodes or run some external operation against them

LabelSelector: "k8s.io/component=kube-apiserver",
})
framework.ExpectNoError(err)
framework.ExpectEqual(len(leases.Items), len(controlPlaneNodes), "unexpected number of leases")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this flake in CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you run it on a cluster while it is still being provisioned. Otherwise the node check above should prevent flakes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't hurt to add a retry loop here

// This shorten the GC check period to make the test run faster.
// Since we are testing GC behavior on leases we create, what happens to
// the real apiserver lease doesn't matter.
[]string{"--identity-lease-duration-seconds=1"},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove this test, because of the dependence to this flag. But I think the unit tests added in #113074 and the additional unit test I added in this PR should make up for the coverage we lose here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we just let the integration test override the "constant"

// CertCallbackRefreshDuration is exposed so that integration tests can crank up the reload speed.
var CertCallbackRefreshDuration = 5 * time.Minute

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm -- updating the lease parameters into global vars seems worthwhile so we can run integration tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use public vars

@andrewsykim andrewsykim force-pushed the apiserver-identity-beta branch 5 times, most recently from 899fc01 to b423794 Compare November 7, 2022 21:56
@enj
Copy link
Member

enj commented Nov 7, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2022
@andrewsykim
Copy link
Member Author

/retest

@andrewsykim
Copy link
Member Author

/assign @lavalamp @deads2k

// the real apiserver lease doesn't matter.
controlplane.IdentityLeaseDurationSeconds = 1
controlplane.IdentityLeaseGCPeriod = time.Second
controlplane.IdentityLeaseRenewIntervalPeriod = time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you set them back when you're done to avoid weird things in other tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- updated

@lavalamp
Copy link
Member

lavalamp commented Nov 8, 2022

Looks good except for the one comment

… behavior when restarting kube-apiserver

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…identity-lease-renew-interval-seconds

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…ey can be set by integration tests

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…public lease parameter vars

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022
@lavalamp
Copy link
Member

lavalamp commented Nov 8, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2022
@aojea
Copy link
Member

aojea commented Nov 8, 2022

/test pull-kubernetes-e2e-gce-alpha-features

@k8s-ci-robot
Copy link
Contributor

@andrewsykim: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-alpha-features 196a3b9 link false /test pull-kubernetes-e2e-gce-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@lavalamp
Copy link
Member

lavalamp commented Nov 8, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 8, 2022
@k8s-ci-robot k8s-ci-robot merged commit 3a99a59 into kubernetes:master Nov 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 8, 2022
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants