Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable health check SLI metrics for apiserver #112741

Merged
merged 2 commits into from Sep 27, 2022

Conversation

logicalhan
Copy link
Member

@logicalhan logicalhan commented Sep 26, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR enables healthcheck metrics for apiserver. The output of the metrics looks like this:

# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.
# TYPE kubernetes_healthcheck gauge
kubernetes_healthcheck{name="autoregister-completion",type="healthz"} 1
kubernetes_healthcheck{name="autoregister-completion",type="readyz"} 1
kubernetes_healthcheck{name="etcd",type="healthz"} 1
kubernetes_healthcheck{name="etcd",type="readyz"} 1
kubernetes_healthcheck{name="etcd-readiness",type="readyz"} 1
kubernetes_healthcheck{name="informer-sync",type="readyz"} 1
kubernetes_healthcheck{name="log",type="healthz"} 1
kubernetes_healthcheck{name="log",type="readyz"} 1
kubernetes_healthcheck{name="ping",type="healthz"} 1
kubernetes_healthcheck{name="ping",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/aggregator-reload-proxy-client-cert",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/aggregator-reload-proxy-client-cert",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-openapi-controller",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-openapi-controller",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-openapiv3-controller",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-openapiv3-controller",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-registration-controller",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-registration-controller",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-status-available-controller",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/apiservice-status-available-controller",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/bootstrap-controller",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/bootstrap-controller",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/crd-informer-synced",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/crd-informer-synced",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/generic-apiserver-start-informers",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/generic-apiserver-start-informers",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/kube-apiserver-autoregistration",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/kube-apiserver-autoregistration",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/priority-and-fairness-config-consumer",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/priority-and-fairness-config-consumer",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/priority-and-fairness-config-producer",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/priority-and-fairness-config-producer",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/priority-and-fairness-filter",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/priority-and-fairness-filter",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/rbac/bootstrap-roles",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/rbac/bootstrap-roles",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/scheduling/bootstrap-system-priority-classes",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/scheduling/bootstrap-system-priority-classes",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/start-apiextensions-controllers",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/start-apiextensions-controllers",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/start-apiextensions-informers",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/start-apiextensions-informers",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/start-cluster-authentication-info-controller",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/start-cluster-authentication-info-controller",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/start-kube-aggregator-informers",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/start-kube-aggregator-informers",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/start-kube-apiserver-admission-initializer",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/start-kube-apiserver-admission-initializer",type="readyz"} 1
kubernetes_healthcheck{name="poststarthook/storage-object-count-tracker-hook",type="healthz"} 1
kubernetes_healthcheck{name="poststarthook/storage-object-count-tracker-hook",type="readyz"} 1
kubernetes_healthcheck{name="shutdown",type="readyz"} 1
# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.
# TYPE kubernetes_healthchecks_total counter
kubernetes_healthchecks_total{name="autoregister-completion",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="etcd-readiness",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="informer-sync",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="informer-sync",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="log",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="log",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="ping",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/aggregator-reload-proxy-client-cert",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/aggregator-reload-proxy-client-cert",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/apiservice-openapi-controller",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/apiservice-openapi-controller",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/apiservice-openapiv3-controller",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/apiservice-openapiv3-controller",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/apiservice-registration-controller",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="poststarthook/apiservice-registration-controller",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/apiservice-registration-controller",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="poststarthook/apiservice-status-available-controller",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/apiservice-status-available-controller",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/bootstrap-controller",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="poststarthook/bootstrap-controller",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/bootstrap-controller",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="poststarthook/crd-informer-synced",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="poststarthook/crd-informer-synced",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/crd-informer-synced",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="poststarthook/generic-apiserver-start-informers",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/generic-apiserver-start-informers",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/kube-apiserver-autoregistration",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/kube-apiserver-autoregistration",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-config-consumer",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-config-consumer",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-config-producer",status="error",type="readyz"} 2
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-config-producer",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-config-producer",status="success",type="readyz"} 13
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-filter",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/priority-and-fairness-filter",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/rbac/bootstrap-roles",status="error",type="readyz"} 14
kubernetes_healthchecks_total{name="poststarthook/rbac/bootstrap-roles",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/rbac/bootstrap-roles",status="success",type="readyz"} 1
kubernetes_healthchecks_total{name="poststarthook/scheduling/bootstrap-system-priority-classes",status="error",type="readyz"} 10
kubernetes_healthchecks_total{name="poststarthook/scheduling/bootstrap-system-priority-classes",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/scheduling/bootstrap-system-priority-classes",status="success",type="readyz"} 5
kubernetes_healthchecks_total{name="poststarthook/start-apiextensions-controllers",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="poststarthook/start-apiextensions-controllers",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/start-apiextensions-controllers",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="poststarthook/start-apiextensions-informers",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/start-apiextensions-informers",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/start-cluster-authentication-info-controller",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/start-cluster-authentication-info-controller",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/start-kube-aggregator-informers",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/start-kube-aggregator-informers",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/start-kube-apiserver-admission-initializer",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/start-kube-apiserver-admission-initializer",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="poststarthook/storage-object-count-tracker-hook",status="success",type="healthz"} 1
kubernetes_healthchecks_total{name="poststarthook/storage-object-count-tracker-hook",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="shutdown",status="success",type="readyz"} 15

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Expose health check SLI metrics on "metrics/slis" for apiserver

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver labels Sep 26, 2022
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 26, 2022
@logicalhan logicalhan force-pushed the health-check-metrics branch 2 times, most recently from 2ef3451 to b30b6b7 Compare September 27, 2022 15:46
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Sep 27, 2022
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 27, 2022
Change-Id: I1b43e6dfea35b8c3bfdf5daaa8b42adff2fbc786
Change-Id: I0d87e29715432f772309a0d4a7305fff358c6d48
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 27, 2022
@lavalamp
Copy link
Member

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, logicalhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 27, 2022
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 27, 2022
@k8s-ci-robot k8s-ci-robot merged commit e11f23e into kubernetes:master Sep 27, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants