New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix requeueing of cronjobs with every-style schedule #109250
Fix requeueing of cronjobs with every-style schedule #109250
Conversation
Hi @d-honeybadger. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
97f91e2
to
7ae21b9
Compare
/sig apps |
@soltysh, @alculquicondor You are the two people I saw dealing with cronjobs recently, could you please take a look? |
/ok-to-test |
/retest |
👋 @soltysh Could you please take a look whenever you have a minute? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
// for Network Time Protocol(NTP) time skews. If the time drifts are adjusted which in most | ||
// realistic cases would be around 100s, scheduled cron will still be executed without missing | ||
// the schedule. | ||
func nextScheduledTimeDuration(sched cron.Schedule, now time.Time) *time.Duration { | ||
t := sched.Next(now).Add(nextScheduleDelta).Sub(now) | ||
func nextScheduledTimeDuration(cj batchv1.CronJob, sched cron.Schedule, now time.Time) *time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed there's a quite a lot of similarities between this method and getNextScheduleTime
, I'll followup with a squash. But this is very good - thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true, was trying to keep changes local but that had its cons.
Thank you!
/triage accepted |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: d-honeybadger, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
/kind regression
What this PR does / why we need it:
In CronJob ControllerV2, cronjobs need to be requeued when it's time for the next schedule. This requeueing interval is currently calculated based off the time
now
. This works fine for classic cron schedule, b/c the "next" schedule is aligned with the specific day/hour/minute from the schedule, but breaks for @every X schedule, because the "next" schedule in this case becomes simplynow + X
, and is incorrect if a) kube-controller-manager was restarted and syncing all cronjobs; b) there was an update to cronjob schedule. Instead, for @every-style schedules, it becomes important to always base the calculation of the next time relative to the last scheduled time (or cronjob creation if it wasn't scheduled before).Note that since before controllerv2, requeueing wasn't a thing, and therefore this bug didn't exist.
Which issue(s) this PR fixes:
Fixes #104012
Special notes for your reviewer:
Verified the fix with a local cluster (
./hack/local-up-cluster.sh
). Figured an e2e test for such a specific case would be an overkill. Either way, lmk if I'm over- or under-testing this.There's also one unit test that has a comment on why it will fail. The reason for its failure is outside of the scope of this PR, and not introduced by this PR - lmk if it should be deleted of kept to indicate that a future fix in necessary.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: