Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote job metrics #113010

Merged
merged 1 commit into from Nov 8, 2022
Merged

Conversation

soltysh
Copy link
Contributor

@soltysh soltysh commented Oct 12, 2022

What type of PR is this?

/kind cleanup
/sig apps

What this PR does / why we need it:

This should've been promoted along with the new cronjob controller, as part of kubernetes/enhancements#2214

Special notes for your reviewer:

/assign @alculquicondor

Does this PR introduce a user-facing change?

Promote  job-related metrics to stable to follow IndexedJobs GA, the following metrics had their name updated to match metrics API guidelines:
- job_sync_total -> job_syncs_total
- job_finished_total -> jobs_finished_total

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/2214

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/apps Categorizes an issue or PR as relevant to SIG Apps. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 12, 2022
@soltysh
Copy link
Contributor Author

soltysh commented Oct 12, 2022

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 12, 2022
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/stable-metrics Issues or PRs involving stable metrics sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 12, 2022
@k8s-triage-robot
Copy link

This PR may require stable metrics review.

Stable metrics are guaranteed to not change. Please review the documentation for the requirements and lifecycle of stable metrics and ensure that your metrics meet these guidelines.

@soltysh
Copy link
Contributor Author

soltysh commented Oct 12, 2022

/assign @dgrisonnet
for metrics part

@@ -37,7 +37,7 @@ var (
Subsystem: JobControllerSubsystem,
Name: "job_sync_duration_seconds",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a new metric without the job_ prefix and deprecate this one. Same for others below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I agree that the job prefix feels a bit redundant here since it is implied by the subsystem, metrics such as job_controller_finished_total would not be very explicit. So, I am fine with keeping this prefix on some of the metrics that need to be explicitly scoped.

@@ -51,7 +51,7 @@ var (
Subsystem: JobControllerSubsystem,
Name: "job_sync_total",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counters generally have the object they are counting in the plural form to denote that it accumulated multiple data, so I would recommend renaming this metric to syncs_total.

@@ -37,7 +37,7 @@ var (
Subsystem: JobControllerSubsystem,
Name: "job_sync_duration_seconds",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I agree that the job prefix feels a bit redundant here since it is implied by the subsystem, metrics such as job_controller_finished_total would not be very explicit. So, I am fine with keeping this prefix on some of the metrics that need to be explicitly scoped.

@@ -64,7 +64,7 @@ var (
Subsystem: JobControllerSubsystem,
Name: "job_finished_total",
Help: "The number of finished job",
StabilityLevel: metrics.ALPHA,
StabilityLevel: metrics.STABLE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following @dgrisonnet's advice, this should be jobs_finished_total?

Name: "job_pods_finished_total",
Help: "The number of finished Pods that are fully tracked",
Subsystem: JobControllerSubsystem,
Name: "job_pods_finished_total",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would job_controller_pods_finished_total be enough? or shall we keep the jobs_ prefix @dgrisonnet

@@ -37,7 +37,7 @@ var (
Subsystem: JobControllerSubsystem,
Name: "job_sync_duration_seconds",
Help: "The time it took to sync a job",
StabilityLevel: metrics.ALPHA,
StabilityLevel: metrics.STABLE,
Buckets: metrics.ExponentialBuckets(0.001, 2, 15),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self:

At 100 qps, creating 500 pods should take 5s. The max bucket is 16, so these buckets should give us enough signal.

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2022
Comment on lines 84 to 85
Name: "jobs_pods_finished_total",
Help: "The number of finished Pods that are fully tracked",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Name: "jobs_pods_finished_total",
Help: "The number of finished Pods that are fully tracked",
Name: "pods_finished_total",
Help: "The number of finished Pods that are fully tracked in Job statuses",

/hold cancel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I meant
/lgtm cancel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefixing with job is also fine with me since it adds a bit of context, but we can also remove it, both make sense in my opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we can leave it as is to avoid disturbing existing users of this metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, I would just change to singular pod since it is used as a way to scope where the pods are coming from rather than as a quantitative object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean singular job? We are counting pods.

job_pods_finished_total?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I totally meant singular job

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2022
Name: "job_finished_total",
Help: "The number of finished job",
StabilityLevel: metrics.ALPHA,
Name: "jobs_finished_total",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the terminology used in the status API (https://github.com/kubernetes/api/blob/v0.25.3/batch/v1/types.go#L326-L401), would it make sense to rename this to jobs_completed_total?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because it includes both Complete and Failed jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok, thank you for clarifying :)

Comment on lines 84 to 85
Name: "jobs_pods_finished_total",
Help: "The number of finished Pods that are fully tracked",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefixing with job is also fine with me since it adds a bit of context, but we can also remove it, both make sense in my opinion

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2022
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2022
@dgrisonnet
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2022
@k8s-ci-robot k8s-ci-robot merged commit aef9a37 into kubernetes:master Nov 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 8, 2022
@soltysh soltysh deleted the promote_job_metrics branch November 8, 2022 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/stable-metrics Issues or PRs involving stable metrics cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants