Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix error type when handling failures in scheduler #111999

Merged
merged 3 commits into from Aug 30, 2022

Conversation

kerthcet
Copy link
Member

@kerthcet kerthcet commented Aug 24, 2022

Signed-off-by: kerthcet kerthcet@gmail.com

What type of PR is this?

/kind bug
/sig scheduling

What this PR does / why we need it:

When we report metrics about scheduling error, we should also update the pod status with schedulerError, to keep them consistent.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Pod failed in scheduling due to expected error will be updated with the reason of "SchedulerError" 
rather than "Unschedulable"

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: kerthcet <kerthcet@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 24, 2022
@k8s-ci-robot
Copy link
Contributor

@kerthcet: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 24, 2022
@kerthcet
Copy link
Member Author

/retest

@sanposhiho
Copy link
Member

I don't investigate deeply, but I think this test failure is related to the change, not due to flaky.
Test failed here and it waits for the pod condition's reason to become PodReasonUnschedulable. Seems related.

cond.Reason == v1.PodReasonUnschedulable && pod.Spec.NodeName == "", nil

@kerthcet
Copy link
Member Author

PodReasonUnschedulable

Thanks @sanposhiho I'll dig into that.

Signed-off-by: kerthcet <kerthcet@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 25, 2022
@kerthcet
Copy link
Member Author

/test pull-kubernetes-integration

Signed-off-by: kerthcet <kerthcet@gmail.com>
@kerthcet
Copy link
Member Author

/retest

}
sched.FailureHandler(ctx, fwk, podInfo, err, v1.PodReasonUnschedulable, nominatingInfo)
sched.FailureHandler(ctx, fwk, podInfo, err, reason, nominatingInfo)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a break change, since we'll change the reason when updating pod's status. I have no idea whether this was intended, but IMO, I think we should keep the metrics and reason in consistent. cc @Huang-Wei @alculquicondor @ahg-g

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider reason values part of the API after https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures?

Does CA rely on this value? I am guessing changing this to a different value is not a an issue with CA since we shouldn't be scaling up because of a scheduling error (and CA will anyway not scale up if it indeed fits)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it wouldn't break CA. Its logic to find unschedulabe pods is like: spec.nodeName==,status.phase!=Succeeded,status.phase!=Failed:

https://github.com/kubernetes/autoscaler/blob/81d70f94ad2094b2a20f1c8e52d15aff8df7bac1/cluster-autoscaler/utils/kubernetes/listers.go#L183-L194

However, in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-up-work, it says:

Whenever a Kubernetes scheduler fails to find a place to run a pod, it sets "schedulable" PodCondition to false and reason to "unschedulable".

So cc @x13n for confirmation. (TL;DR, we want to figure out if scheduler updates a pod with an internal ScheduleError reason, would it break CA or not?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider reason values part of the API after https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures?

From the description https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures#using-pod-statusreason-field, reason is not considered as it's arbitrary and hard to track between different versions. cc @alculquicondor @mimowo for confirmation.

Copy link
Member

@x13n x13n Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CA actually filters pods by reason client-side: https://github.com/kubernetes/autoscaler/blob/81d70f94ad2094b2a20f1c8e52d15aff8df7bac1/cluster-autoscaler/utils/kubernetes/listers.go#L168-L173

When do we expect to get SchedulerError instead of PodReasonUnschedulable? Any error returned by some plugin? If so, CA probably shouldn't attempt to find a place for such pod, there's no way to tell whether the error can be addressed by adding more capacity to the cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the reason is in the v1 package, it's considered part of the API. We don't filter based on it for Job failure policy, but that's a specific feature. In any case, we should be good to add new values.

@x13n an error is expected in the cases where apiserver is unavailable (this is already the case), or there are parsing errors #111791 (here we are currently returning Unschedulable). So yes, in general CA should skip these pods. This means that CA is not skipping these errors, which is actually a bug.

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold on @x13n to confirm the current understanding

}
sched.FailureHandler(ctx, fwk, podInfo, err, v1.PodReasonUnschedulable, nominatingInfo)
sched.FailureHandler(ctx, fwk, podInfo, err, reason, nominatingInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the reason is in the v1 package, it's considered part of the API. We don't filter based on it for Job failure policy, but that's a specific feature. In any case, we should be good to add new values.

@x13n an error is expected in the cases where apiserver is unavailable (this is already the case), or there are parsing errors #111791 (here we are currently returning Unschedulable). So yes, in general CA should skip these pods. This means that CA is not skipping these errors, which is actually a bug.

@@ -150,8 +151,9 @@ func (sched *Scheduler) schedulingCycle(ctx context.Context, state *framework.Cy
nominatingInfo = clearNominatedNode
klog.ErrorS(err, "Error selecting node for pod", "pod", klog.KObj(pod))
metrics.PodScheduleError(fwk.ProfileName(), metrics.SinceInSeconds(start))
reason = SchedulerError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably move it to the v1 package.

But I would do it in a separate PR so that we can cherry-pick this as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was plan to.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 29, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 29, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 29, 2022
@x13n
Copy link
Member

x13n commented Aug 30, 2022

Got it. In that case I think it should be safe to merge this.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2022
@kerthcet
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 04f8a5c into kubernetes:master Aug 30, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Aug 30, 2022
@kerthcet kerthcet deleted the refactor/handle-failure branch August 30, 2022 13:51
k8s-ci-robot added a commit that referenced this pull request Sep 7, 2022
…1999-upstream-release-1.23

Automated cherry pick of #111999: fix error type
k8s-ci-robot added a commit that referenced this pull request Sep 7, 2022
…1999-upstream-release-1.24

Automated cherry pick of #111999: fix error type
k8s-ci-robot added a commit that referenced this pull request Sep 7, 2022
…1999-upstream-release-1.25

Automated cherry pick of #111999: fix error type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants