New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix setting resource version on etcd3 deletion #113369
Fix setting resource version on etcd3 deletion #113369
Conversation
/hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
queued up, need to page back in context and edge cases |
It seems the prior code returned an RV which would not have been very useful for re-watching (which is why the watch path changes the RV). That makes it hard for people to have used it for anything useful, which reduces the chance that this change would break people. I think I'm in favor of changing this? |
That exactly reflects my thinking too. |
7951a96
to
7516a2b
Compare
Just fixed some unit test failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This semantic certainly seems clearer than the previous one, and you don't need to go further than the test case to see why. If we change this, though, won't we also need to change the watch behavior, as today it sends the old object with the previous revision in a delete event?
|
||
if err := store.Delete(ctx, key, &example.Pod{}, &storage.Preconditions{}, storage.ValidateAllObjectFunc, nil); err != nil { | ||
deletedObj := &example.Pod{} | ||
if err := store.Delete(ctx, key, deletedObj, &storage.Preconditions{}, storage.ValidateAllObjectFunc, nil); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably, I think it behooves us to actually expose this to the client, and handle it well in client libraries. IIRC when we delete immediately (as opposed to just setting the deletion timestamp + waiting for finalizers), the client today gets a metav1.Status only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably, I think it behooves us to actually expose this to the client, and handle it well in client libraries
Agree - although I think that's a separate thing and we shouldn't tightly couple these changes.
Deletion watch events must already be correct, or clients wouldn't be able to resume following one. |
/triage accepted |
ACK, I mis-remembered - we send the old object but overwrite the resource version in there to be the moment of deletion. |
@@ -323,7 +323,9 @@ func (s *store) conditionalDelete( | |||
origStateIsCurrent = true | |||
continue | |||
} | |||
return decode(s.codec, s.versioner, origState.data, out, origState.rev) | |||
deleteResp := txnResp.Responses[0].GetResponseDeleteRange() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about the edge cases of etcd deletion to know if it is possible for txnResp.Responses
to be zero-length, or for txnResp.Responses[0].GetResponseDeleteRange()
to return nil (the implementation certainly hints at the possibility)
Same question about the possibility of deleteResp.Header being nil
Can we be very defensive in what we assume about the etcd response here to avoid panics?
(xref https://github.com/kubernetes/kubernetes/pull/76675/files#diff-c84f7808522d6cd14f52c2d800c897bbff103fcee2c73fa3bbb13da81db72298 as an example of rare edges in etcd responses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txnResp.Responses[0].GetReponse*()
calls are fairly ubiquitous in the current implementation, might be worth a pass to remove the possibility the out-of-bounds and nil-return panics from them all. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep this one scoped narrowly to the code path being modified, and I wouldn't assume that responses to delete requests work the same way as responses to write requests. As far as I can see, this is the only path looking at GetResponseDeleteRange or pulling details out of a successful delete transation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one, yes - I meant more broadly, if you are worried about edges (is it delete only, and if so, why?), I'm happy to pop in and handle the handful of cases where we index into .Responses
and use those getters.
intent of the change lgtm, would like to see us be as defensive as possible with the etcd transaction result |
7516a2b
to
35038dd
Compare
@liggitt - thanks a lot for the review - addressed the comment PTAL /hold cancel |
35038dd
to
bbcf5e3
Compare
return decode(s.codec, s.versioner, origState.data, out, origState.rev) | ||
|
||
if len(txnResp.Responses) == 0 || txnResp.Responses[0].GetResponseDeleteRange() == nil { | ||
return errors.New(fmt.Sprintf("invalid DeleteRange response: %v", txnResp.Responses)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making these errors turns what would previously have been a successful delete response into an error. Is it better to error so callers don't get incorrect resourceVersions, or return origState.rev in this case so callers don't get errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, and my personal opinion is to rather return an error in this case.
I think that neither of these should generally happen looking at the implementation (though as you pointed out some corner cases are in theory possible).
But in those cases, I think it's better to error, than to send RV that would be misleading to users expect the semantic of "this is the current RV".
If you think very strongly the other way I can change that, but I think the current way is better (note that this code path is not used in watch, so it won't result in any broken watches or additional relists or anything like that - it's only the explicit Delete code path).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it even mean to get a successful transaction against etcd, with no response and no recorded delete? Would we even be certain that the deletion occurred? Would we try to undo it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what I don't know about possible etcd responses... is it documented what we can rely on from successful delete transactions? If non-nil/non-zero .Responses[0].GetResponseDeleteRange().Header.Revision
is guaranteed, erroring here seems fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entirety of "documentation" here is the godoc on the Responses
field:
// responses is a list of responses corresponding to the results from applying
// success if succeeded is true or failure if succeeded is false.
So if our transaction succeeded, and we asked for at least one key to be deleted, we can assume that there is at least one entry there. Barring a bug in etcd
(is that what we're being defensive about?), we'd expect that response to be a delete response, as well, since that was the RPC we just used.
The response header is ubiquitous and must exist for RPCs, so similarly, barring a bug in etcd
here, the chain .Responses[0].GetResponseDeleteRange().Header.Revision
should never go out of bounds or nil panic.
To quote you from a different conversation:
I want to make sure we're not adding new code that relies on an invariant the etcd API does not actually promise in a way that would turn previously 2xx operations into 5xx operations
To my reading, accessing the chain .Responses[0].GetResponseDeleteRange().Header.Revision
here falls fully within the normal guarantees & expected behavior of etcd
, so unless we also want to defend against bugs/places where these guarantees are not guaranteed by the type system, we should be fine without additional checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if our transaction succeeded, and we asked for at least one key to be deleted, we can assume that there is at least one entry there. Barring a bug in etcd (is that what we're being defensive about?), we'd expect that response to be a delete response, as well, since that was the RPC we just used.
+1 - the "correct behavior from etcd" (with our assumption from k8s that all our transactions add/update/delete exactly one object) is that we can expect that for delete transactions this GetResponseDeleteRange()
to always be non-nil.
So we're rather protecting from bugs in etcd than any desired behavior here.
So I agree with @stevekuznetsov here - all those checks are rather protecting from bugs in etcd, than from a "correct behavior of etcd" - the correct behavior is that all of those should be set correctly and the newly added error-checking branches should never be called.
@liggitt - PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'm just scarred from things like https://github.com/kubernetes/kubernetes/pull/76675/files#diff-c84f7808522d6cd14f52c2d800c897bbff103fcee2c73fa3bbb13da81db72298 where we thought aspects of the etcd response were guaranteed and it turns out they were not in rare scenarios
if the header/rev in the response is reliable, this lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I understand that. The PrevKV is somewhat special, because you have to explicitly request it.
Here, all we need should be set (assuming no bugs in etcd, so having error handling is definitely reasonable to have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, about explicitly requesting PrevKV:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go#L236
/lgtm |
@liggitt - do you think we should cherrypick it back eventually, or we should leave the existing releases as they are? |
umm..... I would tend to leave this 1.26+ only |
ok - let's leave it for now - we can potentially revisit that decision later if needed... |
Fix #113368
/kind bug
/sig api-machinery
/priority important-soon
/assign @liggitt @lavalamp @deads2k
/cc @stevekuznetsov