Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers #112793

swetharepakula · 2022-09-29T19:42:01Z

What happened?

When upgrading nodes from 1.24 to 1.25, on a cluster where master is already at 1.25, I notice that my Service type=LoadBalancer and xTP=Local have an incorrect set of nodes after the nodes have been upgraded. The set contains only the old nodes that no longer exist resulting my service being unavailable through my load balancer.

What did you expect to happen?

I would expect that the load balancer to be properly updated with the new set of nodes after the upgrade.

How can we reproduce it (as minimally and precisely as possible)?

Create 1.25 Cluster with 1.24 nodes
Deploy a service of type=LoadBalancer and xTP=Local
Upgrade the 1.24 nodes to 1.25
After upgrade is finished look at the node list for the Load Balancer.

Anything else we need to know?

The existing logging is not enough to diagnose the issue. I added some more logs and ran the KCM at log level =5 to find the root cause.

There was a change introduced to reduce the number syncs for xTP=Local services: #109706. With this change, there are situations where the xTP=Local never gets updated.

The following is the chain of events.

Node is created or deleted and causes triggerNodeSync():

kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go

Lines 169 to 192 in a866cbe

    
           	AddFunc: func(cur interface{}) { 
        
           		s.triggerNodeSync() 
        
           	}, 
        
           	UpdateFunc: func(old, cur interface{}) { 
        
           		oldNode, ok := old.(*v1.Node) 
        
           		if !ok { 
        
           			return 
        
           		} 
        
           		curNode, ok := cur.(*v1.Node) 
        
           		if !ok { 
        
           			return 
        
           		} 
        
           		if !shouldSyncUpdatedNode(oldNode, curNode) { 
        
           			return 
        
           		} 
        
           		s.triggerNodeSync() 
        
           	}, 
        
           	DeleteFunc: func(old interface{}) { 
        
           		s.triggerNodeSync() 
        
           	}, 
        
           },

Inside nodeTriggerSync(), the nodeLister (line 264) filters for Ready only nodes so it does not have the new node or still contains the deleted node, which means that c.needFullSync = false when line 281 is executed.

kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go

Lines 260 to 288 in a866cbe

    
           // triggerNodeSync triggers a nodeSync asynchronously 
        
           func (c *Controller) triggerNodeSync() { 
        
           	c.nodeSyncLock.Lock() 
        
           	defer c.nodeSyncLock.Unlock() 
        
           	newHosts, err := listWithPredicates(c.nodeLister, allNodePredicates...) 
        
           	if err != nil { 
        
           		runtime.HandleError(fmt.Errorf("Failed to retrieve current set of nodes from node lister: %v", err)) 
        
           		// if node list cannot be retrieve, trigger full node sync to be safe. 
        
           		c.needFullSync = true 
        
           	} else if !nodeSlicesEqualForLB(newHosts, c.knownHosts) { 
        
           		// Here the last known state is recorded as knownHosts. For each 
        
           		// LB update, the latest node list is retrieved. This is to prevent 
        
           		// a stale set of nodes were used to be update loadbalancers when 
        
           		// there are many loadbalancers in the clusters. nodeSyncInternal 
        
           		// would be triggered until all loadbalancers are updated to the new state. 
        
           		klog.V(2).Infof("Node changes detected, triggering a full node sync on all loadbalancer services") 
        
           		c.needFullSync = true 
        
           		c.knownHosts = newHosts 
        
           	} 
        
           	select { 
        
           	case c.nodeSyncCh <- struct{}{}: 
        
           		klog.V(4).Info("Triggering nodeSync") 
        
           		return 
        
           	default: 
        
           		klog.V(4).Info("A pending nodeSync is already in queue") 
        
           		return 
        
           	} 
        
           }

Following down the chain of functions that are called (across goroutines communicating with nodeSyncCh), we end up at nodeSyncInternal. Because c.needFullSync = false, we will only do a sync of services that were marked for retry. If the state previously was good, this means c.servicesToUpdate has 0 services before entering updateLoadBalancerHosts.

kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go

Lines 725 to 741 in a866cbe

    
           if !c.needFullSyncAndUnmark() { 
        
           	// The set of nodes in the cluster hasn't changed, but we can retry 
        
           	// updating any services that we failed to update last time around. 
        
           	// It is required to call `c.cache.get()` on each Service in case there was 
        
           	// an update event that occurred between retries. 
        
           	var servicesToUpdate []*v1.Service 
        
           	for key := range c.servicesToUpdate { 
        
           		cachedService, exist := c.cache.get(key) 
        
           		if !exist { 
        
           			klog.Errorf("Service %q should be in the cache but not", key) 
        
           			continue 
        
           		} 
        
           		servicesToUpdate = append(servicesToUpdate, cachedService.state) 
        
           	} 
        
           	c.servicesToUpdate = c.updateLoadBalancerHosts(ctx, servicesToUpdate, workers) 
        
           	return

Nodes are queried again from the NodeLister. But this time the new node or the deleted node is reflected.

kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go

Lines 782 to 811 in a866cbe

    
           func (c *Controller) updateLoadBalancerHosts(ctx context.Context, services []*v1.Service, workers int) (servicesToRetry sets.String) { 
        
           	klog.V(4).Infof("Running updateLoadBalancerHosts(len(services)==%d, workers==%d)", len(services), workers) 
        
           	// Include all nodes and let nodeSyncService filter and figure out if 
        
           	// the update is relevant for the service in question. 
        
           	nodes, err := listWithPredicates(c.nodeLister) 
        
           	if err != nil { 
        
           		runtime.HandleError(fmt.Errorf("failed to retrieve node list: %v", err)) 
        
           		return serviceKeys(services) 
        
           	} 
        
           	// lock for servicesToRetry 
        
           	servicesToRetry = sets.NewString() 
        
           	lock := sync.Mutex{} 
        
           	doWork := func(piece int) { 
        
           		if shouldRetry := c.nodeSyncService(services[piece], c.lastSyncedNodes, nodes); !shouldRetry { 
        
           			return 
        
           		} 
        
           		lock.Lock() 
        
           		defer lock.Unlock() 
        
           		key := fmt.Sprintf("%s/%s", services[piece].Namespace, services[piece].Name) 
        
           		servicesToRetry.Insert(key) 
        
           	} 
        
           	workqueue.ParallelizeUntil(ctx, workers, len(services), doWork) 
        
           	c.lastSyncedNodes = nodes 
        
           	klog.V(4).Infof("Finished updateLoadBalancerHosts") 
        
           	return servicesToRetry 
        
           }

nodeSyncService is then parallelized based on the length of services. In this case we have no services, so we do no updates but on line 808, c.lastSyncedNodes is set to the nodes found in step 4.

kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go

Line 808 in a866cbe

c.lastSyncedNodes = nodes

A subsequent full sync does not fix things. We go through steps 1 through 5 again, the difference being c.needFullSync=true. This will mean that in step 4 & 5, c.servicesToUpdate will not be empty resulting in the following:

inside nodeSyncService we filter based on predicates for the xTP=Local and xTP=Cluster. In the case of xTP=Local since we do not pay attention to Ready status, all of the nodes in the c.lastSyncedNodes will be the oldNodes. And since no node creations or deletions have occurred, all the newNodes will be the same. This results in no sync (line 767).

kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go

Lines 759 to 777 in a866cbe

    
           func (c *Controller) nodeSyncService(svc *v1.Service, oldNodes, newNodes []*v1.Node) bool { 
        
           	retSuccess := false 
        
           	retNeedRetry := true 
        
           	if svc == nil || !wantsLoadBalancer(svc) { 
        
           		return retSuccess 
        
           	} 
        
           	newNodes = filterWithPredicates(newNodes, getNodePredicatesForService(svc)...) 
        
           	oldNodes = filterWithPredicates(oldNodes, getNodePredicatesForService(svc)...) 
        
           	if nodeNames(newNodes).Equal(nodeNames(oldNodes)) { 
        
           		return retSuccess 
        
           	} 
        
           	klog.V(4).Infof("nodeSyncService started for service %s/%s", svc.Namespace, svc.Name) 
        
           	if err := c.lockedUpdateLoadBalancerHosts(svc, newNodes); err != nil { 
        
           		runtime.HandleError(fmt.Errorf("failed to update load balancer hosts for service %s/%s: %v", svc.Namespace, svc.Name, err)) 
        
           		return retNeedRetry 
        
           	} 
        
           	klog.V(4).Infof("nodeSyncService finished successfully for service %s/%s", svc.Namespace, svc.Name) 
        
           	return retSuccess 
        
           }

In the the xTP=Cluster, the node ready status is used to filter the nodes, so c.LastSyncedNodes would already have the newly created node, but it would not be ready. Which means the node will not be part of oldNodes but it will exist in newNodes and allows the sync to continue as expected.

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2-gke.300", GitCommit:"6f9a8e57036ff71785ef9c90998437413a3a8ff5", GitTreeState:"clean", BuildDate:"2022-09-26T09:26:16Z", GoVersion:"go1.19.1 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Tested on GKE, however this should theoretically be possible on any 1.25+ cluster.

OS version

N/A

Install tools

N/A

Container runtime (CRI) and version (if applicable)

N/A

Related plugins (CNI, CSI, ...) and versions (if applicable)

N/A

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-09-29T19:42:09Z

@swetharepakula: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

swetharepakula · 2022-09-29T19:45:42Z

/sig network

/cc @thockin @alexanderConstantinescu @bowei @panslava

aojea · 2022-09-29T19:57:44Z

/cc

swetharepakula · 2022-09-29T23:14:58Z

I have done a test with code on master and looked through the changes that have been made since 1.25 was cut and this issue seems to be fixed. I believe this issue is specific only to 1.25

alexanderConstantinescu · 2022-09-30T09:49:46Z

/assign

I filed #112798 which provides an in-depth answer as to why that fixes the problem

liggitt · 2022-10-06T01:43:08Z

with #112807 merged, is this resolved?

alexanderConstantinescu · 2022-10-06T12:50:37Z

Yes, it should be. I think this was verified by @swetharepakula last Friday?

swetharepakula · 2022-10-12T20:24:43Z

This is resolved with #112807

/close

k8s-ci-robot · 2022-10-12T20:24:47Z

@swetharepakula: Closing this issue.

In response to this:

This is resolved with #112807

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…sion Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.

Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.

JoelSpeed · 2023-07-04T13:39:14Z

@alexanderConstantinescu The revert that you put up in #112807 went directly into the release-1.25 branch. I believe I am seeing the same issue again on release-1.26 and release-1.27. Did a revert or fix go up to the master branch that I've missed?

alexanderConstantinescu · 2023-07-04T15:07:01Z

@JoelSpeed: There was no revert PR needed on master, because master (at that time) didn't have the bug. This is the PR I filed which attempted to fix the issue on 1.25: #112798 but which later got superseded by the revert PR. It explains the issue in greater depth.

What issue are you seeing?

JoelSpeed · 2023-07-05T07:41:58Z

Observing on Azure, with the in-tree cloud provider in 1.27 (this could be a bug directly in the Azure code to be fair but wanted to rule this bug out too)

KCM boots up on new cluster
New worker nodes are added to the cluster, CSRs approved etc
Workers do not get added to the LB on first pass because they are not ready (Azure specific? We are using stableNodeSetPredicates)
Workers eventually go ready, but EnsureLoadBalancer never called again

The symptoms look the same as this bug, but I'm guessing now that we are using stableNodeSetPredicates, the Azure provider cannot exclude nodes based on their readiness, since the service controller will no longer tell it to update the nodes in the load balancer for an eTP: Local service?

JoelSpeed · 2023-07-05T13:49:29Z

I think I found the issue, kubernetes-sigs/cloud-provider-azure#4230

liggitt · 2023-09-09T23:46:50Z

@JoelSpeed: There was no revert PR needed on master, because master (at that time) didn't have the bug. This is the PR I filed which attempted to fix the issue on 1.25: #112798 but which later got superseded by the revert PR. It explains the issue in greater depth.

The revert PR (#112807) only landed in the release-1.25 branch, and #112798 never merged... what resolved this issue in master and 1.26+?

Can you link to the later PRs in master that fixed the issue?

alexanderConstantinescu · 2023-09-11T11:16:10Z

what resolved this issue in master and 1.26+?

This bug was fixed on 1.26 during the feature cycle. Here's a timeline of events:

[1.25 feature cycle]: PR Avoid re-syncing LBs for ETP=local services #109706 merges on 1.25 with a bug
[1.26 feature cycle]: PR [CCM - service controller] Clean up node sync and fix re-sync of failed services. #111663 merges on 1.26 with the fix to said bug (the bug was not known at the time, and this PR was originally created as a cleanup but ended up also fixing the bug)
[1.25 code freeze cycle]: PR Revert 109706 + 111691 #112807 merges on 1.25, reverting 1.

I.e: we could have applied #112798 in step 3, but chose to revert instead

Let me know if this is not clear?

swetharepakula added the kind/bug Categorizes issue or PR as related to a bug. label Sep 29, 2022

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 29, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 29, 2022

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 29, 2022

aojea mentioned this issue Sep 29, 2022

KEP-3458: Remove transient node predicates from KCCM's service controller kubernetes/enhancements#3460

Merged

thockin self-assigned this Sep 29, 2022

alexanderConstantinescu mentioned this issue Sep 30, 2022

Automated cherry pick of #111663: Cleanup service sync path #112798

Closed

k8s-ci-robot assigned alexanderConstantinescu Sep 30, 2022

alexanderConstantinescu mentioned this issue Sep 30, 2022

Revert 109706 + 111691 #112807

Merged

k8s-ci-robot closed this as completed Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers #112793

Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers #112793

swetharepakula commented Sep 29, 2022 •

edited

k8s-ci-robot commented Sep 29, 2022

swetharepakula commented Sep 29, 2022

aojea commented Sep 29, 2022

swetharepakula commented Sep 29, 2022

alexanderConstantinescu commented Sep 30, 2022

liggitt commented Oct 6, 2022

alexanderConstantinescu commented Oct 6, 2022

swetharepakula commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

JoelSpeed commented Jul 4, 2023

alexanderConstantinescu commented Jul 4, 2023

JoelSpeed commented Jul 5, 2023

JoelSpeed commented Jul 5, 2023

liggitt commented Sep 9, 2023 •

edited

alexanderConstantinescu commented Sep 11, 2023 •

edited

Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers #112793

Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers #112793

Comments

swetharepakula commented Sep 29, 2022 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Sep 29, 2022

swetharepakula commented Sep 29, 2022

aojea commented Sep 29, 2022

swetharepakula commented Sep 29, 2022

alexanderConstantinescu commented Sep 30, 2022

liggitt commented Oct 6, 2022

alexanderConstantinescu commented Oct 6, 2022

swetharepakula commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

JoelSpeed commented Jul 4, 2023

alexanderConstantinescu commented Jul 4, 2023

JoelSpeed commented Jul 5, 2023

JoelSpeed commented Jul 5, 2023

liggitt commented Sep 9, 2023 • edited

alexanderConstantinescu commented Sep 11, 2023 • edited

swetharepakula commented Sep 29, 2022 •

edited

liggitt commented Sep 9, 2023 •

edited

alexanderConstantinescu commented Sep 11, 2023 •

edited