New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflake a preemption test that may patch Node incorrectly #114350
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Dec 7 17:58:56 UTC 2022. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Huang-Wei: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc |
/lgtm Also, I've reproduced the issue on my local kind cluster by patching the quantity to 5 just before the test and looping the test. It appears that after the proposed changes the test passes consistently. |
@aojea |
@Huang-Wei thanks a lot for investigating and proposing the PR |
this will have to wait for the 1.26.1, we'll merge in 1.27 and backport later /milestone 1.27 |
@aojea: The provided milestone is not valid for this repository. Milestones in this repository: [ Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@aojea please correct the milestone assignment as the robot suggests 😄 |
Now I understand why the existing tests were passing: Initially the pods are fully unschedulable because /lgtm |
/milestone v1.27 |
/retest |
@kubernetes/release-managers |
yes, cherry-pick please |
…50-upstream-release-1.26 Automated cherry pick of #114350: Deflake a preemption test that may patch Node incorrectly
What type of PR is this?
/kind failing-test
/kind flake
/sig scheduling
What this PR does / why we need it:
In the e2e-test grid,
[sig-scheduling] SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod
is failing flakily.We had a discussion in slack: https://kubernetes.slack.com/archives/C09TP78DV/p1670410754287509.
The preemptor was expected to be pending b/c the only extended source piece was occupied by a low-priority pod, but the log shows the preemptor gets into running directly:
A suspicious clue is that the e2e test patch Node by only updating the
capacity
(instead of bothcapacity
andallocatable
), it works most of the time as kubelet is working asynchronously to sync thecapacity
value toallocatable
:But we'd better not count on that due to:
So this PR enforces setting both
capacity
andallocatable
in the e2e when we want to update a Node's resource capacity.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: