New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch cache: use resource.group for object type in log messages and metrics #111807
Watch cache: use resource.group for object type in log messages and metrics #111807
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Thu Aug 11 19:31:32 UTC 2022. |
/sig api-machinery |
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and can be carried even a little further, as noted inline.
c.versioner, | ||
deadline, | ||
pred.AllowWatchBookmarks, | ||
c.objectType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objectType
field no longer needs to be in the cacheWatcher
struct, and thus also no longer needs to be passed to newCacheWatcher
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, PTAL
@@ -1177,7 +1195,8 @@ type cacheWatcher struct { | |||
deadline time.Time | |||
allowWatchBookmarks bool | |||
// Object type of the cache watcher interests | |||
objectType reflect.Type | |||
objectType reflect.Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is no longer used for anything and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, PTAL
/triage accepted |
As noted in the description:
This is a "breaking" change in that it changes the metrics labels (e.g. |
@ncdc: there is a concept of metric maturity. Documented at https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#metric-lifecycle . The question is, does this PR change any metric that is declared to be stable? |
It looks like they are all alpha |
406d366
to
d2c6aa8
Compare
Here are the stable metrics I found in a scrape of the a running server built from the current
|
All CustomResources are treated as *unstructured.Unstructured, leading the watch cache to log anything related to CRs as Unstructured. This change uses the schema.GroupResource instead of object type for all type related log messages in the watch cache, resulting in distinct output for each CR type. Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
Use the group resource instead of objectType in watch cache metrics, because all CustomResources are grouped together as *unstructured.Unstructured, instead of 1 entry per type. Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
d2c6aa8
to
eb03850
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @deads2k |
/test pull-kubernetes-dependencies |
Best answered by The change looks fine to me from a code perspective. I think we do make some promises about stable metrics, but I don't know if it extends to labels. /approve holding for instrumentation sign off from @logicalhan |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, MikeSpreitzer, ncdc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "breaking" change in that it changes the metrics labels (e.g.
*v1.Pod
becomespods
). Is that ok, or should I only make this change for*unstructured.Unstructured
?
We don't actually have any guarantees around label values (despite the fact that it may break alerts), we permit even changes to buckets on stable metrics. I am generally reticent about changes to label values anyway though since it can break alerts though. In this case, using group resource is actually more consistent with other metrics so this change generally looks good to me.
/lgtm
/hold cancel |
Thanks @deads2k and @logicalhan . BTW, #111807 (comment) says that the metrics affected by this PR are not stable. |
修复CustomerResource的watch_cache_capacity指标resource都是Unstructured的问题 修复CustomerResource的watch_cache_capacity指标resource都是Unstructured的问题 社区[pr111807](kubernetes#111807 1.20等低版本简单暴力修复下 另外pr111807是breaking change,从1.26起,内置资源的resource名称会发上变化,比如pod从*v1.Pod变为pods,建议我们在1.26之前的版本,只修复crd资源resource都是Unstructured的问题,其他内置对象,保持不变
What type of PR is this?
/kind bug
What this PR does / why we need it:
Update logging and metrics references to "object type" to be by "group resource" instead, so the logs/metrics can disambiguate custom resource types, instead of grouping them all together as
*unstructured.Unstructured
.Which issue(s) this PR fixes:
Fixes #111605
Special notes for your reviewer:
I have split this into 1 commit to update logging, and a separate commit to update metrics. Happy to split out and do logging first if desired, as per the discussion starting at #111605 (comment).
This does change the log/metrics keys for built-in types to also be by group resource. If that is not desirable, I can fix it so this change only applies to CRs.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: