New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bugs of container cpu shares when cpu request set to zero #108832
fix bugs of container cpu shares when cpu request set to zero #108832
Conversation
Hi @waynepeking348. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@andrewsykim @odinuge would you please have a look at this fix? thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/triage accepted |
@yangjunmyfm192085 hi~ those checks have all been passed, could you please help merge this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @waynepeking348!
This fixes a problem where a user could insert a "sky high" cpu limit and then explicitly set 0
as the request, and starve all other workloads!
I think we should back port this one as well, since it fixes a real problem that can cause system instability!
/priority important-longterm
Except for the minor nit;
/lgtm
I think the changelog value should be rephrased to something like (tho. my English is not my native language, so happy for others to phrase it better);
Fix relative cpu priority for pods where containers explicitly request zero cpu by giving the lowest priority instead of falling back to the cpu limit to avoid possible cpu starvation of other pods
@@ -138,7 +142,7 @@ func (m *kubeGenericRuntimeManager) calculateLinuxResources(cpuRequest, cpuLimit | |||
// If request is not specified, but limit is, we want request to default to limit. | |||
// API server does this for new containers, but we repeat this logic in Kubelet | |||
// for containers running on existing Kubernetes clusters. | |||
if cpuRequest.IsZero() && !cpuLimit.IsZero() { | |||
if cpuRequest == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; I think we should either keep the && !cpuLimit.IsZero()
to ensure we are in line with the comment above, or rewrite that comment to make the "behavior" explicitly clear if someone comes and tries to rewrite it.
This change is a noop, but I think I prefer that one.
if cpuRequest == nil { | |
if cpuRequest == nil && !cpuLimit.IsZero() { |
Since you engaged on the old PR; |
/retest |
@yangjunmyfm192085 @odinuge @ehashman hi~ I changed code to in line with the comment
and change log is also changed according to @odinuge suggestion I think I need both /lgtm and /appoved label to make it merged, thx~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@odinuge I think I still need a /approved label to make it merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @mrunalp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign @mrunalp
hi, I think @mrunalp has missed this info, and I have tried to contact with him in slack, but also without reply, can anyone help to merge this pr please? |
/assign @derekwaynecarr @klueska @yujuhong |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, odinuge, waynepeking348 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/milestone v1.25 |
@waynepeking348: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
For container resources, if limits are specified, but requests are not, default requests are set to limits in API server, and kubelet repeat this logic for containers running on existing Kubernetes cluster, but it works with some mistake for cpu.
Actually, if cpu request is set to zero (instead of not specified), kubelet will also consider it as equal to limits, and set cpu shares as to limit instead of 2 (minShares).
This is unexpected, think that if we have 2 containers, one with requests set as:
and the other with requests set as:
the cpu shares (in container level cgroup) for first one would be set as 2048 while the latter as 4096
but the expected cpu shares would be 2048 and 2 instead
Special notes for your reviewer:
I have one PR before #100986, but it's too out-of-date, so I open a new one rebased on current master
Does this PR introduce a user-facing change?: