Fix setting resource version on etcd3 deletion #113369

wojtek-t · 2022-10-26T19:54:18Z

The resourceVersion returned in objects from delete responses is now consistent with the resourceVersion contained in the delete watch event

/kind bug
/sig api-machinery
/priority important-soon

/assign @liggitt @lavalamp @deads2k
/cc @stevekuznetsov

wojtek-t · 2022-10-26T19:54:35Z

/hold
To avoid accidental merge and give people a chance to react.

k8s-ci-robot · 2022-10-26T19:58:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liggitt · 2022-10-26T20:13:57Z

queued up, need to page back in context and edge cases

lavalamp · 2022-10-26T20:41:27Z

It seems the prior code returned an RV which would not have been very useful for re-watching (which is why the watch path changes the RV). That makes it hard for people to have used it for anything useful, which reduces the chance that this change would break people.

I think I'm in favor of changing this?

wojtek-t · 2022-10-27T06:01:26Z

It seems the prior code returned an RV which would not have been very useful for re-watching (which is why the watch path changes the RV). That makes it hard for people to have used it for anything useful, which reduces the chance that this change would break people.

That exactly reflects my thinking too.

wojtek-t · 2022-10-27T06:09:07Z

Just fixed some unit test failures.

stevekuznetsov

This semantic certainly seems clearer than the previous one, and you don't need to go further than the test case to see why. If we change this, though, won't we also need to change the watch behavior, as today it sends the old object with the previous revision in a delete event?

stevekuznetsov · 2022-10-27T12:56:10Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher_test.go


-	if err := store.Delete(ctx, key, &example.Pod{}, &storage.Preconditions{}, storage.ValidateAllObjectFunc, nil); err != nil {
+	deletedObj := &example.Pod{}
+	if err := store.Delete(ctx, key, deletedObj, &storage.Preconditions{}, storage.ValidateAllObjectFunc, nil); err != nil {


Notably, I think it behooves us to actually expose this to the client, and handle it well in client libraries. IIRC when we delete immediately (as opposed to just setting the deletion timestamp + waiting for finalizers), the client today gets a metav1.Status only.

Notably, I think it behooves us to actually expose this to the client, and handle it well in client libraries

Agree - although I think that's a separate thing and we shouldn't tightly couple these changes.

lavalamp · 2022-10-27T16:12:56Z

won't we also need to change the watch behavior, as today it sends the old object with the previous revision in a delete event?

Deletion watch events must already be correct, or clients wouldn't be able to resume following one.

leilajal · 2022-10-27T16:43:24Z

/triage accepted

stevekuznetsov · 2022-10-27T17:21:30Z

ACK, I mis-remembered - we send the old object but overwrite the resource version in there to be the moment of deletion.

liggitt · 2022-10-28T16:34:11Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

@@ -323,7 +323,9 @@ func (s *store) conditionalDelete(
 			origStateIsCurrent = true
 			continue
 		}
-		return decode(s.codec, s.versioner, origState.data, out, origState.rev)
+		deleteResp := txnResp.Responses[0].GetResponseDeleteRange()


I don't know enough about the edge cases of etcd deletion to know if it is possible for txnResp.Responses to be zero-length, or for txnResp.Responses[0].GetResponseDeleteRange() to return nil (the implementation certainly hints at the possibility)

Same question about the possibility of deleteResp.Header being nil

Can we be very defensive in what we assume about the etcd response here to avoid panics?

(xref https://github.com/kubernetes/kubernetes/pull/76675/files#diff-c84f7808522d6cd14f52c2d800c897bbff103fcee2c73fa3bbb13da81db72298 as an example of rare edges in etcd responses)

txnResp.Responses[0].GetReponse*() calls are fairly ubiquitous in the current implementation, might be worth a pass to remove the possibility the out-of-bounds and nil-return panics from them all. WDYT?

I'd keep this one scoped narrowly to the code path being modified, and I wouldn't assume that responses to delete requests work the same way as responses to write requests. As far as I can see, this is the only path looking at GetResponseDeleteRange or pulling details out of a successful delete transation

This one, yes - I meant more broadly, if you are worried about edges (is it delete only, and if so, why?), I'm happy to pop in and handle the handful of cases where we index into .Responses and use those getters.

liggitt · 2022-10-28T16:36:43Z

intent of the change lgtm, would like to see us be as defensive as possible with the etcd transaction result

wojtek-t · 2022-10-31T16:35:07Z

@liggitt - thanks a lot for the review - addressed the comment

PTAL

/hold cancel

liggitt · 2022-10-31T16:53:09Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

-		return decode(s.codec, s.versioner, origState.data, out, origState.rev)
+
+		if len(txnResp.Responses) == 0 || txnResp.Responses[0].GetResponseDeleteRange() == nil {
+			return errors.New(fmt.Sprintf("invalid DeleteRange response: %v", txnResp.Responses))


Making these errors turns what would previously have been a successful delete response into an error. Is it better to error so callers don't get incorrect resourceVersions, or return origState.rev in this case so callers don't get errors?

I was thinking about that, and my personal opinion is to rather return an error in this case.
I think that neither of these should generally happen looking at the implementation (though as you pointed out some corner cases are in theory possible).

But in those cases, I think it's better to error, than to send RV that would be misleading to users expect the semantic of "this is the current RV".

If you think very strongly the other way I can change that, but I think the current way is better (note that this code path is not used in watch, so it won't result in any broken watches or additional relists or anything like that - it's only the explicit Delete code path).

What would it even mean to get a successful transaction against etcd, with no response and no recorded delete? Would we even be certain that the deletion occurred? Would we try to undo it?

I don't know what I don't know about possible etcd responses... is it documented what we can rely on from successful delete transactions? If non-nil/non-zero .Responses[0].GetResponseDeleteRange().Header.Revision is guaranteed, erroring here seems fine

The entirety of "documentation" here is the godoc on the Responses field:

// responses is a list of responses corresponding to the results from applying // success if succeeded is true or failure if succeeded is false.

So if our transaction succeeded, and we asked for at least one key to be deleted, we can assume that there is at least one entry there. Barring a bug in etcd (is that what we're being defensive about?), we'd expect that response to be a delete response, as well, since that was the RPC we just used.

The response header is ubiquitous and must exist for RPCs, so similarly, barring a bug in etcd here, the chain .Responses[0].GetResponseDeleteRange().Header.Revision should never go out of bounds or nil panic.

To quote you from a different conversation:

I want to make sure we're not adding new code that relies on an invariant the etcd API does not actually promise in a way that would turn previously 2xx operations into 5xx operations

To my reading, accessing the chain .Responses[0].GetResponseDeleteRange().Header.Revision here falls fully within the normal guarantees & expected behavior of etcd, so unless we also want to defend against bugs/places where these guarantees are not guaranteed by the type system, we should be fine without additional checks.

So if our transaction succeeded, and we asked for at least one key to be deleted, we can assume that there is at least one entry there. Barring a bug in etcd (is that what we're being defensive about?), we'd expect that response to be a delete response, as well, since that was the RPC we just used.

+1 - the "correct behavior from etcd" (with our assumption from k8s that all our transactions add/update/delete exactly one object) is that we can expect that for delete transactions this GetResponseDeleteRange() to always be non-nil.
So we're rather protecting from bugs in etcd than any desired behavior here.

So I agree with @stevekuznetsov here - all those checks are rather protecting from bugs in etcd, than from a "correct behavior of etcd" - the correct behavior is that all of those should be set correctly and the newly added error-checking branches should never be called.

@liggitt - PTAL

ok, I'm just scarred from things like https://github.com/kubernetes/kubernetes/pull/76675/files#diff-c84f7808522d6cd14f52c2d800c897bbff103fcee2c73fa3bbb13da81db72298 where we thought aspects of the etcd response were guaranteed and it turns out they were not in rare scenarios

if the header/rev in the response is reliable, this lgtm

Sure - I understand that. The PrevKV is somewhat special, because you have to explicitly request it.

Here, all we need should be set (assuming no bugs in etcd, so having error handling is definitely reasonable to have).

For posterity, about explicitly requesting PrevKV:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go#L236

liggitt · 2022-11-02T13:55:08Z

/lgtm

wojtek-t · 2022-11-02T13:56:38Z

@liggitt - do you think we should cherrypick it back eventually, or we should leave the existing releases as they are?

liggitt · 2022-11-02T13:58:18Z

umm..... I would tend to leave this 1.26+ only

wojtek-t · 2022-11-02T14:01:01Z

umm..... I would tend to leave this 1.26+ only

ok - let's leave it for now - we can potentially revisit that decision later if needed...

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 26, 2022

k8s-ci-robot assigned deads2k Oct 26, 2022

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. labels Oct 26, 2022

k8s-ci-robot assigned lavalamp and liggitt Oct 26, 2022

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 26, 2022

k8s-ci-robot requested a review from stevekuznetsov October 26, 2022 19:54

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 26, 2022

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 26, 2022

wojtek-t mentioned this pull request Oct 26, 2022

Object returned from etcd3 Delete call is inconsistent with what watch returns #113368

Closed

k8s-ci-robot added area/apiserver approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 26, 2022

wojtek-t force-pushed the fix_delete_resource_version branch from 7951a96 to 7516a2b Compare October 27, 2022 06:09

stevekuznetsov reviewed Oct 27, 2022

View reviewed changes

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 27, 2022

liggitt reviewed Oct 28, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 29, 2022

stevekuznetsov mentioned this pull request Oct 29, 2022

e2e/conformance: add a test to validate total order of watch events #109730

Closed

wojtek-t force-pushed the fix_delete_resource_version branch from 7516a2b to 35038dd Compare October 31, 2022 16:34

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2022

Fix setting resource version on deletion

bbcf5e3

wojtek-t force-pushed the fix_delete_resource_version branch from 35038dd to bbcf5e3 Compare October 31, 2022 16:48

liggitt reviewed Oct 31, 2022

View reviewed changes

wojtek-t mentioned this pull request Nov 2, 2022

Reuse generic TestGet in cache tests #113427

Merged

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 2, 2022

k8s-ci-robot merged commit 421213b into kubernetes:master Nov 2, 2022

k8s-ci-robot added this to the v1.26 milestone Nov 2, 2022

neoaggelos mentioned this pull request Nov 22, 2022

Fix Kine for 1.26 canonical/kine#9

Merged

divanodestiny mentioned this pull request Oct 31, 2023

[Bug] Delete resource error kubewharf/kubebrain#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix setting resource version on etcd3 deletion #113369

Fix setting resource version on etcd3 deletion #113369

wojtek-t commented Oct 26, 2022 •

edited

wojtek-t commented Oct 26, 2022

k8s-ci-robot commented Oct 26, 2022

liggitt commented Oct 26, 2022

lavalamp commented Oct 26, 2022

wojtek-t commented Oct 27, 2022

wojtek-t commented Oct 27, 2022

stevekuznetsov left a comment

stevekuznetsov Oct 27, 2022

wojtek-t Oct 27, 2022

lavalamp commented Oct 27, 2022

leilajal commented Oct 27, 2022

stevekuznetsov commented Oct 27, 2022

liggitt Oct 28, 2022

stevekuznetsov Oct 28, 2022

liggitt Oct 28, 2022

stevekuznetsov Oct 28, 2022

liggitt commented Oct 28, 2022

wojtek-t commented Oct 31, 2022

liggitt Oct 31, 2022

wojtek-t Oct 31, 2022

stevekuznetsov Nov 1, 2022

liggitt Nov 1, 2022

stevekuznetsov Nov 1, 2022

wojtek-t Nov 1, 2022

liggitt Nov 2, 2022

wojtek-t Nov 2, 2022

wojtek-t Nov 2, 2022

liggitt commented Nov 2, 2022

wojtek-t commented Nov 2, 2022

liggitt commented Nov 2, 2022

wojtek-t commented Nov 2, 2022

Fix setting resource version on etcd3 deletion #113369

Fix setting resource version on etcd3 deletion #113369

Conversation

wojtek-t commented Oct 26, 2022 • edited

wojtek-t commented Oct 26, 2022

k8s-ci-robot commented Oct 26, 2022

liggitt commented Oct 26, 2022

lavalamp commented Oct 26, 2022

wojtek-t commented Oct 27, 2022

wojtek-t commented Oct 27, 2022

stevekuznetsov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp commented Oct 27, 2022

leilajal commented Oct 27, 2022

stevekuznetsov commented Oct 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Oct 28, 2022

wojtek-t commented Oct 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Nov 2, 2022

wojtek-t commented Nov 2, 2022

liggitt commented Nov 2, 2022

wojtek-t commented Nov 2, 2022

wojtek-t commented Oct 26, 2022 •

edited