New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote job metrics #113010
Promote job metrics #113010
Conversation
d2402e3
to
7bc6d30
Compare
/triage accepted |
7bc6d30
to
693f47b
Compare
This PR may require stable metrics review. Stable metrics are guaranteed to not change. Please review the documentation for the requirements and lifecycle of stable metrics and ensure that your metrics meet these guidelines. |
/assign @dgrisonnet |
@@ -37,7 +37,7 @@ var ( | |||
Subsystem: JobControllerSubsystem, | |||
Name: "job_sync_duration_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a new metric without the job_
prefix and deprecate this one. Same for others below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I agree that the job
prefix feels a bit redundant here since it is implied by the subsystem, metrics such as job_controller_finished_total
would not be very explicit. So, I am fine with keeping this prefix on some of the metrics that need to be explicitly scoped.
@@ -51,7 +51,7 @@ var ( | |||
Subsystem: JobControllerSubsystem, | |||
Name: "job_sync_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counters generally have the object they are counting in the plural form to denote that it accumulated multiple data, so I would recommend renaming this metric to syncs_total
.
@@ -37,7 +37,7 @@ var ( | |||
Subsystem: JobControllerSubsystem, | |||
Name: "job_sync_duration_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I agree that the job
prefix feels a bit redundant here since it is implied by the subsystem, metrics such as job_controller_finished_total
would not be very explicit. So, I am fine with keeping this prefix on some of the metrics that need to be explicitly scoped.
@@ -64,7 +64,7 @@ var ( | |||
Subsystem: JobControllerSubsystem, | |||
Name: "job_finished_total", | |||
Help: "The number of finished job", | |||
StabilityLevel: metrics.ALPHA, | |||
StabilityLevel: metrics.STABLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following @dgrisonnet's advice, this should be jobs_finished_total
?
Name: "job_pods_finished_total", | ||
Help: "The number of finished Pods that are fully tracked", | ||
Subsystem: JobControllerSubsystem, | ||
Name: "job_pods_finished_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would job_controller_pods_finished_total
be enough? or shall we keep the jobs_
prefix @dgrisonnet
693f47b
to
5d721b5
Compare
@@ -37,7 +37,7 @@ var ( | |||
Subsystem: JobControllerSubsystem, | |||
Name: "job_sync_duration_seconds", | |||
Help: "The time it took to sync a job", | |||
StabilityLevel: metrics.ALPHA, | |||
StabilityLevel: metrics.STABLE, | |||
Buckets: metrics.ExponentialBuckets(0.001, 2, 15), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self:
At 100 qps, creating 500 pods should take 5s. The max bucket is 16, so these buckets should give us enough signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Name: "jobs_pods_finished_total", | ||
Help: "The number of finished Pods that are fully tracked", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Name: "jobs_pods_finished_total", | |
Help: "The number of finished Pods that are fully tracked", | |
Name: "pods_finished_total", | |
Help: "The number of finished Pods that are fully tracked in Job statuses", |
/hold cancel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I meant
/lgtm cancel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefixing with job
is also fine with me since it adds a bit of context, but we can also remove it, both make sense in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we can leave it as is to avoid disturbing existing users of this metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me, I would just change to singular pod
since it is used as a way to scope where the pods are coming from rather than as a quantitative object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean singular job? We are counting pods.
job_pods_finished_total
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, I totally meant singular job
Name: "job_finished_total", | ||
Help: "The number of finished job", | ||
StabilityLevel: metrics.ALPHA, | ||
Name: "jobs_finished_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the terminology used in the status API (https://github.com/kubernetes/api/blob/v0.25.3/batch/v1/types.go#L326-L401), would it make sense to rename this to jobs_completed_total
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because it includes both Complete and Failed jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok, thank you for clarifying :)
Name: "jobs_pods_finished_total", | ||
Help: "The number of finished Pods that are fully tracked", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefixing with job
is also fine with me since it adds a bit of context, but we can also remove it, both make sense in my opinion
5d721b5
to
39d9981
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/sig apps
What this PR does / why we need it:
This should've been promoted along with the new cronjob controller, as part of kubernetes/enhancements#2214
Special notes for your reviewer:
/assign @alculquicondor
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: