Skip to content

txn: fix resolver cache usage for async commit #1629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cfzjywxk
Copy link
Contributor

@cfzjywxk cfzjywxk commented Apr 11, 2025

Ref: pingcap/tidb#59494

  1. Do not call checkSecondaries if the transaction status is determined for async commit transactions.
  2. Small refactor on the TxnStatus usages.
  3. Do not use ttl = 0 setting in test cases.

@cfzjywxk cfzjywxk added the type/bugfix This PR fixes a bug. label Apr 11, 2025
@ti-chi-bot ti-chi-bot bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label Apr 11, 2025
Copy link

ti-chi-bot bot commented Apr 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from cfzjywxk, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 11, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

txnkv/txnlock/lock_resolver.go:1029

  • Typo in comment: 'wih' should be 'with'.
// The `checkAllSecondaries` finishes wih no errors.

@cfzjywxk cfzjywxk requested a review from Copilot April 11, 2025 10:49
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

txnkv/txnlock/lock_resolver.go:1021

  • [nitpick] Consider renaming 'toResolveKeys' to 'resolveKeys' for clarity and consistency with its usage in the async commit resolution functions.
var toResolveKeys [][]byte

txnkv/txnlock/lock_resolver.go:1044

  • [nitpick] Consider adding a clarifying comment to explain the purpose and behavior when the 'resolveAsyncCommitLockReturn' failpoint is triggered to improve code readability.
if _, err := util.EvalFailpoint("resolveAsyncCommitLockReturn"); err == nil {

txnkv/txnlock/lock_resolver_test.go:18

  • Consider adding a test case that covers the branch for async commit transactions with undetermined status (i.e., where checkAllSecondaries is called) to ensure this logic is fully exercised.
lock := func(key, primary string, startTS uint64, useAsyncCommit bool, secondaries [][]byte) *kvrpcpb.LockInfo {

@cfzjywxk cfzjywxk force-pushed the fix_resolve_cache branch from 152f757 to ff3bb45 Compare April 14, 2025 07:00
@cfzjywxk
Copy link
Contributor Author

/retest

@cfzjywxk cfzjywxk added the require-LGT3 Indicates that the PR requires three LGTM. label Apr 14, 2025
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understanding this part of code has become much more difficult than I would imagine...

Comment on lines +127 to +131
func (s TxnStatus) IsRolledBack() bool {
return s.ttl == 0 && s.commitTS == 0 && (s.action == kvrpcpb.Action_NoAction ||
s.action == kvrpcpb.Action_LockNotExistRollback ||
s.action == kvrpcpb.Action_TTLExpireRollback)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my memory, whether the TxnStatus type stands for commited or rolled back is strictly defined by the field ttl and commitTS. Is the previous definition wrong now? May this change cause other unexpected side effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TxnStatus is an abstraction that introduces some complexity. In the current implementation, TxnStatus is determined by the KV response from CheckTxnStatus. Whether a transaction is rolled back depends on the TTL and the specified action, while whether it's committed depends on whether the commit timestamp is greater than 0.

To make it clear two seperate functions IsCommitted and IsRolledback are abstarcated, according to the current CheckTxnStatus implementation.

TxnStatus::RolledBack => resp.set_action(Action::NoAction),
TxnStatus::TtlExpire => resp.set_action(Action::TtlExpireRollback),
TxnStatus::LockNotExist => resp.set_action(Action::LockNotExistRollback),
TxnStatus::Committed { commit_ts } => {
    resp.set_commit_version(commit_ts.into_inner())
}

Only from these four types of respones the transaction status is determined.

Comment on lines +1022 to +1024
toResolveKeys = make([][]byte, 0, len(status.primaryLock.Secondaries)+1)
toResolveKeys = append(toResolveKeys, status.primaryLock.Secondaries...)
toResolveKeys = append(toResolveKeys, l.Primary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might include those keys that has already been resolved. But it seems it's still faster than checking secondary again and then resolve them as 1 RPC is faster than 2.
Is it that you've already considered that? I suggest that this can be noted in comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would add some comments. The perform should be minial as resolving async commit locks is not a common operation.

status.ttl = cmdResp.LockTtl
}
} else if cmdResp.LockTtl != 0 {
if cmdResp.LockTtl != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remenber the purpose of this code. Could you explain this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The

if status.primaryLock != nil && status.primaryLock.UseAsyncCommit && !forceSyncCommit {
			if !lr.store.GetOracle().IsExpired(txnID, cmdResp.LockTtl, &oracle.Option{TxnScope: oracle.GlobalTxnScope}) {
				status.ttl = cmdResp.LockTtl
			}
		}

logic is to change TxnStatus.ttl to zero, so the

if status.ttl != 0
   return

if status.primaryLock != nil && status.primaryLock.usaAsyncCommit && !forceSyncCommit {
    resolveAsyncCommit
}

resolveAsyncCommit could be reached in the resolve function.

I’ve moved the TTL expiration check back into the resolve function. This helps avoid modifying the TxnStatus state deep inside the call stack—especially since setting the TTL to 0 can be confusing. It also ensures that the TTL field remains read-only throughout the process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In actual code, all lock TTLs should be greater than 0. Whether a lock has expired should ideally be determined by the upper layer, rather than by modifying the internal state of TxnStatus IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has signed the dco. require-LGT3 Indicates that the PR requires three LGTM. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants