New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timing ratio histograms #110104
Add timing ratio histograms #110104
Conversation
da19f27
to
7523cd9
Compare
Here are the current benchmarking results. (base) mspreitz@mjs12 metrics % go test -benchmem -run=^$ -bench Histogram .
goos: darwin
goarch: amd64
pkg: k8s.io/apiserver/pkg/util/flowcontrol/metrics
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSampleAndWaterMarkHistogramsVecEltSimple-16 1000000 1230 ns/op 0 B/op 0 allocs/op
BenchmarkSampleAndWaterMarkHistogramsVecEltSafeEarly-16 987891 1230 ns/op 0 B/op 0 allocs/op
BenchmarkSampleAndWaterMarkHistogramsVecEltSafeLate-16 982856 1200 ns/op 0 B/op 0 allocs/op
BenchmarkTimingRatioHistogram-16 13167859 90.70 ns/op 0 B/op 0 allocs/op
BenchmarkTimingRatioHistogramVecEltSimple-16 13016456 89.85 ns/op 0 B/op 0 allocs/op
BenchmarkTimingRatioHistogramVecEltSafeEarly-16 7515996 160.1 ns/op 0 B/op 0 allocs/op
BenchmarkTimingRatioHistogramVecEltSafeLate-16 13404110 89.83 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/apiserver/pkg/util/flowcontrol/metrics 9.665s |
7523cd9
to
943a6f8
Compare
63cb7f1
to
b67ae76
Compare
The next rev should
|
b67ae76
to
ca49068
Compare
/retest |
/triage accepted |
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have two small nits - other than that LGTM.
But I would like to see an LGTM from @dgrisonnet on the actual metrics changes.
@dgrisonnet - can you please take a look?
} | ||
underMember.Set(initialNumerator / initialDenominator) | ||
return &timingRatioHistogramInner{ | ||
getGaugeOfRatio: func() Gauge { return underMember }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation (there seem to be two tabs instead of one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github makes it hard (but not impossible) to select the initial tab on each line. Line 180 starts with one tab, line 181 starts with three. This is what gofmt
produces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's even inconsistent with line 205-209...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I just made the other change you suggested, and that made gofmt put one less tab on these lines.
numerator: initialNumerator, | ||
denominator: initialDenominator, | ||
}, | ||
nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move to the previous line:
...
}, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a force-push that only made this change and re-run gofmt.
57aaa6d
to
a595f43
Compare
LGTM, but I want an LGTM from sig-instrumention folks before approving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
(from the sig-instrumentation side of things)
I'm okay with the metrics changes here, since they are alpha metrics.
@@ -0,0 +1,223 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong year.
// That is the registerable aspect of the underlying TimingHistogram. | ||
compbasemetrics.Registerable | ||
|
||
// timingRatioHistogramInner implements the RatioedObsserver aspect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/RatioedObsserver/RatioedObserver/
var errMetricNotFound = errors.New("not found") | ||
|
||
func TestTimingRatioHistogramVecElementSimple(t *testing.T) { | ||
samplesHistName := "tro_test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can basically be a constant, the way it is used in the tests.
a595f43
to
220ee8a
Compare
The latest force-push makes the changes suggested by @logicalhan . |
Following is the result of benchmarking the two kinds of histograms on my laptop. % go test -benchmem -run=^$ -bench Histogram . goos: darwin goarch: amd64 pkg: k8s.io/apiserver/pkg/util/flowcontrol/metrics cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkSampleAndWaterMarkHistogramsVecEltSafeEarly-16 980143 1230 ns/op 0 B/op 0 allocs/op BenchmarkSampleAndWaterMarkHistogramsVecEltSafeLate-16 932380 1216 ns/op 0 B/op 0 allocs/op BenchmarkTimingRatioHistogram-16 12665247 94.13 ns/op 0 B/op 0 allocs/op BenchmarkTimingRatioHistogramVecElementSimple-16 11015806 100.4 ns/op 0 B/op 0 allocs/op BenchmarkTimingRatioHistogramVecElementSafeEarly-16 7142589 172.0 ns/op 0 B/op 0 allocs/op BenchmarkTimingRatioHistogramVecElementSafeLate-16 11487517 96.67 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/apiserver/pkg/util/flowcontrol/metrics 7.718s
220ee8a
to
0c0b7ca
Compare
And the latest force-push changes the timing ratio histogram test funcs to each use a distinct name for the test histogram, for additional log clarity and possibly correctness in parallel testing. |
Thanks @logicalhan - lgtming with your LGTM from the sig-instrumentation POV. /lgtm /retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MikeSpreitzer, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@MikeSpreitzer: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR adds timing ratio histograms, as an alternative to sample-and-watermark histograms,
and switches API Priority and Fairness metrics to use the timing ratio histograms.
Which issue(s) this PR fixes:
This partially addresses #108272
Special notes for your reviewer:
This is part of the campaign to replace sample-and-watermark histograms.
I am breaking #109742 into pieces for easier reviewing.
This is most of the middle piece; PRs #110515 and #110516, both now merged, are the rest of the middle piece.
There are more PRs to come; #110164 is next in line, and there will also be PRs that replace repeated indexing with element caching (as seen in #109742 and #110154) and do other cleanup.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig api-machinery
/cc @wojtek-t
/cc @tkashem
@deads2k
@lavalamp