New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: fix the bug that configurable KubernetesVersion not respected during kubeadm join #110791
Conversation
@SataQiu: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SataQiu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
added some comments here: |
/hold |
015c944
to
0a6b3d4
Compare
0a6b3d4
to
bee3e14
Compare
// UnNormalizedKubernetesVersion stores the unnormalized target version of the control plane. | ||
// Useful to restore the original target version before upload config. | ||
// +k8s:conversion-gen=false | ||
UnNormalizedKubernetesVersion string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnNormalized
can be Unnormalized
as a whole word:
https://en.wiktionary.org/wiki/unnormalize
@@ -100,6 +100,7 @@ func NormalizeKubernetesVersion(cfg *kubeadmapi.ClusterConfiguration) error { | |||
if err != nil { | |||
return err | |||
} | |||
cfg.UnNormalizedKubernetesVersion = cfg.KubernetesVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i think the solution is clean, but it would basically take the user config and write it to the config to upload.
i have no tested any of this, so please correct me, but i think what happens is that if the user provides ci/latest
, the clusterconfiguration.kubernetesversion
is written in the config map as ci/latest
.
as mentioned on the issue, one problem with that is that if an init
node was using ci/latest
at that particular moment it can end up as 1.25-alpha...
but if a joining node was added later in time (e.g. 4 months later) it can end up as 1.26-alpha...
, because ci/latest
is now a much newer version and ci/latest
will be resolved to that. this use case is strange and we should not worry too much about it, i agree, but what we can do instead is the following:
- check if the version is a CI version (
kubeadmutil.KubernetesIsCIVersion
above can be used) - resolve the version
- write
ci/<resolved-version
tocfg.UnnormalizedKubernetesVersion
this also puts a question on the name UnnormalizedKubernetesVersion
, because ci/<resolved-version
is a mixture of normalized / unnormalized. perhaps CIKubernetesVersion
is better (matches CIImageRepository
too)?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. It seems that we should NOT save ci/latest
directly in this scenario.
…d during kubeadm join
bee3e14
to
5b2d6ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This seems better to me. Thanks.
/hold
@pacoxu does this seem right to you?
…#110791-upstream-release-1.24 Automated cherry pick of #110791: kubeadm: fix the bug that configurable KubernetesVersion not
…#110791-upstream-release-1.23 Automated cherry pick of #110791: kubeadm: fix the bug that configurable KubernetesVersion not
…#110791-upstream-release-1.22 Automated cherry pick of #110791: kubeadm: fix the bug that configurable KubernetesVersion not
@SataQiu @neolit123 @pacoxu I'm seeing some failures in conformance tests against latest-1.24, perhaps since this was cherry-picked and a new build pushed with these changes?
Does the above look related to these changes? This is on a worker node btw, not a control plane node. Control plane node looks good:
|
@SataQiu the version parser should work but perhaps something is missing and maybe we don't have a unit test. |
Repro cluster configmap below, certainly seems the result of these changes:
|
Confirmed the obvious:
|
Kubeadm should strip and store the ci prefix before passing the version to the apimachinery version library. |
@neolit123 I'm not sure it's that simple
|
From the error returned in cloud-init preflight stdout it seems that the version parsing failure is occuring as we enumerate through the kube-proxy and kubelet component configmap handlers:
Probably somewhere in here:
|
Yes, maybe our e2e tests are not robust enough. I'll look at the issue again. |
The e2e tests pin a k8s version that is without ci prefixes. I guess that is mostly because we bake our own images from tars and we treat it as a regular k8s version, not ci. It should be possible to test this scenario with unit tests. Or modify our e2e tests somehow (probably harder to do). |
Hi @jackfrancis , I guess you probably didn't update the int control-plane node: root@kind-control-plane:/# ./kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.0-alpha.0.1261+d3092cd296f82d-dirty", GitCommit:"d3092cd296f82d24236f57f5928cd4755a080d5c", GitTreeState:"dirty", BuildDate:"2022-07-08T06:45:45Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/arm64"}
root@kind-control-plane:/# ./kubeadm init --kubernetes-version=ci/latest-1.24
[init] Using Kubernetes version: v1.24.3-rc.0.29+99b7713ba36e8f
[preflight] Running pre-flight checks
[WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kind-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.18.0.3]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.18.0.3 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.18.0.3 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 5.503461 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node kind-control-plane as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node kind-control-plane as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: ri7v1v.ctuj2k2l9pt17id4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.18.0.3:6443 --token ri7v1v.ctuj2k2l9pt17id4 \
--discovery-token-ca-cert-hash sha256:3eb72a6cfa9616f8c1b8f9b061a91255e98add0870c966b84d7932e41a8a905f check kubeadm-config: root@kind-control-plane:/# kubectl get cm -n kube-system kubeadm-config -oyaml
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: ci/v1.24.3-rc.0.29+99b7713ba36e8f
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
kind: ConfigMap
metadata:
creationTimestamp: "2022-07-08T08:01:43Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "204"
uid: 6022a1cf-7f75-4a2c-99f5-607b0c75fbae join worker node using old kubeadm: root@kind-worker:/# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-19T15:42:59Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/arm64"}
root@kind-worker:/# kubeadm join 172.18.0.3:6443 --token ri7v1v.ctuj2k2l9pt17id4 \
--discovery-token-ca-cert-hash sha256:3eb72a6cfa9616f8c1b8f9b061a91255e98add0870c966b84d7932e41a8a905f
[preflight] Running pre-flight checks
[WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get component configs: could not parse "ci/v1.24.3-rc.0.29+99b7713ba36e8f" as version
To see the stack trace of this error execute with --v=5 or higher join worker using new kubeadm: root@kind-worker:/# ./kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.0-alpha.0.1261+d3092cd296f82d-dirty", GitCommit:"d3092cd296f82d24236f57f5928cd4755a080d5c", GitTreeState:"dirty", BuildDate:"2022-07-08T06:45:45Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/arm64"}
root@kind-worker:/# ./kubeadm join 172.18.0.3:6443 --token ri7v1v.ctuj2k2l9pt17id4 --discovery-token-ca-cert-hash sha256:3eb72a6cfa9616f8c1b8f9b061a91255e98add0870c966b84d7932e41a8a905f
[preflight] Running pre-flight checks
[WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
|
Thanks @SataQiu, I will check that! |
From control plane node able to successfully init:
From a worker node that fails pre-flight check:
tl;dr we are using a new version of kubeadm that has the change in our repro:
|
a |
@jackfrancis I have sent a PR to fix this #111021 |
@neolit123 I'll paste a verbosity=5 output from a failing node here thanks @SataQiu! hopefully my |
commented on #111021 |
/test pull-kubernetes-e2e-capz-conformance |
What type of PR is this?
/kind bug
What this PR does / why we need it:
kubeadm: fix the bug that configurable KubernetesVersion not respected during kubeadm join
Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#2713
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: