Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setting resource version on etcd3 deletion #113369

Merged

Conversation

wojtek-t
Copy link
Member

@wojtek-t wojtek-t commented Oct 26, 2022

Fix #113368

The resourceVersion returned in objects from delete responses is now consistent with the resourceVersion contained in the delete watch event

/kind bug
/sig api-machinery
/priority important-soon

/assign @liggitt @lavalamp @deads2k
/cc @stevekuznetsov

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 26, 2022
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. labels Oct 26, 2022
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 26, 2022
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 26, 2022
@wojtek-t
Copy link
Member Author

/hold
To avoid accidental merge and give people a chance to react.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 26, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/apiserver approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 26, 2022
@liggitt
Copy link
Member

liggitt commented Oct 26, 2022

queued up, need to page back in context and edge cases

@lavalamp
Copy link
Member

It seems the prior code returned an RV which would not have been very useful for re-watching (which is why the watch path changes the RV). That makes it hard for people to have used it for anything useful, which reduces the chance that this change would break people.

I think I'm in favor of changing this?

@wojtek-t
Copy link
Member Author

It seems the prior code returned an RV which would not have been very useful for re-watching (which is why the watch path changes the RV). That makes it hard for people to have used it for anything useful, which reduces the chance that this change would break people.

That exactly reflects my thinking too.

@wojtek-t
Copy link
Member Author

Just fixed some unit test failures.

Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This semantic certainly seems clearer than the previous one, and you don't need to go further than the test case to see why. If we change this, though, won't we also need to change the watch behavior, as today it sends the old object with the previous revision in a delete event?


if err := store.Delete(ctx, key, &example.Pod{}, &storage.Preconditions{}, storage.ValidateAllObjectFunc, nil); err != nil {
deletedObj := &example.Pod{}
if err := store.Delete(ctx, key, deletedObj, &storage.Preconditions{}, storage.ValidateAllObjectFunc, nil); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, I think it behooves us to actually expose this to the client, and handle it well in client libraries. IIRC when we delete immediately (as opposed to just setting the deletion timestamp + waiting for finalizers), the client today gets a metav1.Status only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, I think it behooves us to actually expose this to the client, and handle it well in client libraries

Agree - although I think that's a separate thing and we shouldn't tightly couple these changes.

@lavalamp
Copy link
Member

won't we also need to change the watch behavior, as today it sends the old object with the previous revision in a delete event?

Deletion watch events must already be correct, or clients wouldn't be able to resume following one.

@leilajal
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 27, 2022
@stevekuznetsov
Copy link
Contributor

ACK, I mis-remembered - we send the old object but overwrite the resource version in there to be the moment of deletion.

@@ -323,7 +323,9 @@ func (s *store) conditionalDelete(
origStateIsCurrent = true
continue
}
return decode(s.codec, s.versioner, origState.data, out, origState.rev)
deleteResp := txnResp.Responses[0].GetResponseDeleteRange()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about the edge cases of etcd deletion to know if it is possible for txnResp.Responses to be zero-length, or for txnResp.Responses[0].GetResponseDeleteRange() to return nil (the implementation certainly hints at the possibility)

Same question about the possibility of deleteResp.Header being nil

Can we be very defensive in what we assume about the etcd response here to avoid panics?

(xref https://github.com/kubernetes/kubernetes/pull/76675/files#diff-c84f7808522d6cd14f52c2d800c897bbff103fcee2c73fa3bbb13da81db72298 as an example of rare edges in etcd responses)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txnResp.Responses[0].GetReponse*() calls are fairly ubiquitous in the current implementation, might be worth a pass to remove the possibility the out-of-bounds and nil-return panics from them all. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this one scoped narrowly to the code path being modified, and I wouldn't assume that responses to delete requests work the same way as responses to write requests. As far as I can see, this is the only path looking at GetResponseDeleteRange or pulling details out of a successful delete transation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one, yes - I meant more broadly, if you are worried about edges (is it delete only, and if so, why?), I'm happy to pop in and handle the handful of cases where we index into .Responses and use those getters.

@liggitt
Copy link
Member

liggitt commented Oct 28, 2022

intent of the change lgtm, would like to see us be as defensive as possible with the etcd transaction result

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 29, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2022
@wojtek-t
Copy link
Member Author

@liggitt - thanks a lot for the review - addressed the comment

PTAL

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2022
return decode(s.codec, s.versioner, origState.data, out, origState.rev)

if len(txnResp.Responses) == 0 || txnResp.Responses[0].GetResponseDeleteRange() == nil {
return errors.New(fmt.Sprintf("invalid DeleteRange response: %v", txnResp.Responses))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making these errors turns what would previously have been a successful delete response into an error. Is it better to error so callers don't get incorrect resourceVersions, or return origState.rev in this case so callers don't get errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, and my personal opinion is to rather return an error in this case.
I think that neither of these should generally happen looking at the implementation (though as you pointed out some corner cases are in theory possible).

But in those cases, I think it's better to error, than to send RV that would be misleading to users expect the semantic of "this is the current RV".

If you think very strongly the other way I can change that, but I think the current way is better (note that this code path is not used in watch, so it won't result in any broken watches or additional relists or anything like that - it's only the explicit Delete code path).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would it even mean to get a successful transaction against etcd, with no response and no recorded delete? Would we even be certain that the deletion occurred? Would we try to undo it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what I don't know about possible etcd responses... is it documented what we can rely on from successful delete transactions? If non-nil/non-zero .Responses[0].GetResponseDeleteRange().Header.Revision is guaranteed, erroring here seems fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entirety of "documentation" here is the godoc on the Responses field:

	// responses is a list of responses corresponding to the results from applying
	// success if succeeded is true or failure if succeeded is false.

So if our transaction succeeded, and we asked for at least one key to be deleted, we can assume that there is at least one entry there. Barring a bug in etcd (is that what we're being defensive about?), we'd expect that response to be a delete response, as well, since that was the RPC we just used.

The response header is ubiquitous and must exist for RPCs, so similarly, barring a bug in etcd here, the chain .Responses[0].GetResponseDeleteRange().Header.Revision should never go out of bounds or nil panic.

To quote you from a different conversation:

I want to make sure we're not adding new code that relies on an invariant the etcd API does not actually promise in a way that would turn previously 2xx operations into 5xx operations

To my reading, accessing the chain .Responses[0].GetResponseDeleteRange().Header.Revision here falls fully within the normal guarantees & expected behavior of etcd, so unless we also want to defend against bugs/places where these guarantees are not guaranteed by the type system, we should be fine without additional checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if our transaction succeeded, and we asked for at least one key to be deleted, we can assume that there is at least one entry there. Barring a bug in etcd (is that what we're being defensive about?), we'd expect that response to be a delete response, as well, since that was the RPC we just used.

+1 - the "correct behavior from etcd" (with our assumption from k8s that all our transactions add/update/delete exactly one object) is that we can expect that for delete transactions this GetResponseDeleteRange() to always be non-nil.
So we're rather protecting from bugs in etcd than any desired behavior here.

So I agree with @stevekuznetsov here - all those checks are rather protecting from bugs in etcd, than from a "correct behavior of etcd" - the correct behavior is that all of those should be set correctly and the newly added error-checking branches should never be called.

@liggitt - PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'm just scarred from things like https://github.com/kubernetes/kubernetes/pull/76675/files#diff-c84f7808522d6cd14f52c2d800c897bbff103fcee2c73fa3bbb13da81db72298 where we thought aspects of the etcd response were guaranteed and it turns out they were not in rare scenarios

if the header/rev in the response is reliable, this lgtm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I understand that. The PrevKV is somewhat special, because you have to explicitly request it.

Here, all we need should be set (assuming no bugs in etcd, so having error handling is definitely reasonable to have).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt
Copy link
Member

liggitt commented Nov 2, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 2, 2022
@wojtek-t
Copy link
Member Author

wojtek-t commented Nov 2, 2022

@liggitt - do you think we should cherrypick it back eventually, or we should leave the existing releases as they are?

@liggitt
Copy link
Member

liggitt commented Nov 2, 2022

umm..... I would tend to leave this 1.26+ only

@wojtek-t
Copy link
Member Author

wojtek-t commented Nov 2, 2022

umm..... I would tend to leave this 1.26+ only

ok - let's leave it for now - we can potentially revisit that decision later if needed...

@k8s-ci-robot k8s-ci-robot merged commit 421213b into kubernetes:master Nov 2, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Object returned from etcd3 Delete call is inconsistent with what watch returns
7 participants