New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supply denominators #110164
Supply denominators #110164
Conversation
42f5193
to
69f8912
Compare
/retest |
69f8912
to
adf6b24
Compare
/retest |
/triage accepted |
/assign @dgrisonnet |
adf6b24
to
360d30d
Compare
/retest |
staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/timing_ratio_histogram.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/timing_ratio_histogram_test.go
Outdated
Show resolved
Hide resolved
408a042
to
7568427
Compare
The latest force-push is a rebase onto |
@@ -162,6 +164,9 @@ type configController struct { | |||
// name to the state for that level. Every name referenced from a | |||
// member of `flowSchemas` has an entry here. | |||
priorityLevelStates map[string]*priorityLevelState | |||
|
|||
// totalLimitConsumers are to be kept appraised of the overall limits | |||
totalLimitConsumers []func(maxWaitingRequests, maxExecutingRequests int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still struggling with understand why we need to complicate the code so much for metrics.
What I would do is simply in lines 546-548 hardcode the metrics (which are P&F metrics anyway) and do something like:
fcmetrics.ReadWriteConcurrencyGaugeVec.NewForLabelValuesSafe(0, 1, []string{fcmetrics.LabelValueWaiting, epmetrics.ReadOnlyKind}),SetDenominator(meal.MaxWaiting)
fcmetrics.ReadWriteConcurrencyGaugeVec.NewForLabelValuesSafe(0, 1, []string{fcmetrics.LabelValueWaiting, epmetrics.MutatingKind}).SetDenominator(meal.MaxWaiting)
(and same for mutating)
I think that passing thing around, all those fields etc. is making this code very hard to follow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that lines 546--548 in apf_controller.go need to do what is primarily an up-call, to the higher layer in priority-and-fairness.go that updates the watermark data structures there and in maxinflight.go. You have proposed a solution that folds in the knowledge of how those watermark data structures are composed, lessening the modularity. That is not a deal-breaker, but something to consider.
I am struggling to remember why apf_filter.go:Interface allows registration of a multiplicity of request limit consumers; I think only one is needed. That would itself provide some simplification.
Note that every call to NewForLabelValuesSafe returns a distinct object --- with its own denominator variable. For this reason, the replacement you suggested would not have the desired effect.
I will think about ways to simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vector structure there is not really needed. We statically know that it has exactly four members. We can replace the vector with four individuals; that will eliminate the calls to NewForLabelValuesSafe with their attendant new object every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, wait, we need to use a vector so that all four time series show up as one metric with different label sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised, using a combination of the not-incorrect ideas above.
9512838
to
5f97b8c
Compare
) | ||
|
||
// Gaue of number of readonly requests waiting / limit on those. Can be used directly after GetReadOnlyWaitingConcurrency() | ||
var ReadOnlyWaitingConcurrency RatioedGauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make those vars private (we want people to use Get* methods to ensure those get initialized correctly before).
}) | ||
return MutatingExecutingConcurrency | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please squash the commits - this looks great now!
Thank you Mike! In my opinion this looks much better now and it's much easier to follow! |
@@ -78,6 +76,9 @@ func WithPriorityAndFairness( | |||
klog.Warningf("priority and fairness support not found, skipping") | |||
return handler | |||
} | |||
waitingMark.readOnlyObserver = fcmetrics.GetReadOnlyWaitingConcurrency() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be causing a race in our tests:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/110164/pull-kubernetes-unit/1550364149416988672
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I did not think that writes of the same value would be a problem. Easily fixed...
c9d6c72
to
d37e1b5
Compare
/retest |
/test pull-kubernetes-unit |
}, observationMaintenancePeriod, stopCh) | ||
var initMaxinflightOnce sync.Once | ||
|
||
func initMaxinflight(nonMutatingLimit, mutatingLimit int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency with other places in this file: initMaxInFlight
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
watermark.readOnlyObserver.SetDenominator(float64(nonMutatingLimit)) | ||
klog.V(2).InfoS("Set denominator for readonly requests", "limit", nonMutatingLimit) | ||
} else { | ||
klog.V(2).InfoS("Did not set denominator for readonly requests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing this - the lack of log is also an information. (Same below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
initMaxinflight(nonMutatingLimit, mutatingLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that synchronization on once
is relatively cheap, but still I think we should avoid that in the handler function.
Why not calling it directly from WithMaxInFlightLimit - all metrics should be initialized at this point anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure of that.
I was not confident that this piece of code should rely on that knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -79,6 +80,13 @@ func WithPriorityAndFairness( | |||
return handler | |||
} | |||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |||
initAPFOnce.Do(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for max-in-flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
d37e1b5
to
631c695
Compare
Co-authored-by: JUN YANG <yang.jun22@zte.com.cn>
631c695
to
fdd921c
Compare
The force-pushes to 631c6950eae and fdd921c make the changes suggested above and add logging of whether the efficient or inefficient case happened. |
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MikeSpreitzer, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes the metric denominator problems noted in issues #109846 and #110160 .
There is more APF metrics cleanup to follow in later PRs.
Which issue(s) this PR fixes:
Fixes #109846
Fixes #110160
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig api-machinery
/sig instrumentation
/cc @dgrisonnet
/cc @wojtek-t
@tkashem
@lavalamp