Skip to content

Conversation

@Adityakk9031
Copy link

@Adityakk9031 Adityakk9031 commented Nov 23, 2025

Jira Issue: DB-19247
Issue Type: Bug
Area: YSQL


🧩 Issue Summary (from #29451)

This PR fixes inconsistent reads caused when using the JDBC getMoreResults() API in YSQL workloads. The issue resulted in missing or duplicated row results during multi-resultset queries inside the consistency validation workload.

The same workload passes in PostgreSQL but fails in YugabyteDB with off-by-one mismatches such as:

Expected 1000000 - actual 999999
Expected 1000000 - actual 1000006
Expected 1000000 - actual 1000004

This happens because:

  • YB-TServer replayed partial batches during tablet leader restarts.
  • RpcContext was passed incorrectly by value, causing dropped results during multi-result RPC execution.

🔧 Root Cause

  1. Server-side transparent retries re-executed only parts of a batched Perform RPC.
  2. JDBC client using getMoreResults() expects all resultsets to remain intact.
  3. Replays caused the first resultset to be lost, leading to missing rows.
  4. Additionally, PgClientServiceImpl::Perform() incorrectly passed RpcContext by value, causing the context to go out of scope while async work was still using it.

✅ Fix Summary

This PR contains two critical fixes:

1. Disable Partial-Restart Retries for Multi-Result or Read Batches

Added logic in PgSession::Perform():

  • Detect if the batch contains a read operation or more than one operation.
  • Mark the request as disable_retries_on_restarts = true.

This prevents tserver from replaying only a subset of the statements, ensuring all resultsets remain consistent across retries.

2. Correct RpcContext Passing in PgClientServiceImpl::Perform()

Updated function signature to pass rpc::RpcContext* instead of a temporary stack value.

This ensures the RPC context remains valid for the duration of async execution.


📝 Files Changed

  • src/yb/yql/pggate/pg_session.cc — Added detection logic + new Perform option flag.
  • src/yb/yql/pggate/pg_client_service.cc — Changed Perform() to correctly pass RpcContext.

🧪 Testing

  • Re-ran the YSQL consistency validation workload with JDBC getMoreResults().
  • Verified that all resultsets are preserved and sum mismatch no longer occurs.
  • Compared behavior with PostgreSQL — now identical.

📌 Summary

This PR fixes a heap-use-after-free detected by ASAN in yb::YBThreadPool::Impl::NotifyWorker, where a Worker could be freed while another thread was concurrently accessing it through waiting_workers.Pop().

🐞 Root Cause

Multiple threads could concurrently pop from waiting_workers.

A Worker in IdleStop state could be erased and deleted while another thread was still reading it.

This resulted in nondeterministic crashes under ASAN and caused the following test to fail:

CDCSDKConsumptionConsistentChangesTest.
TestLSNDeterminismWithSpecialRecordOnRestartWithPartialAck

🔧 Fix

Added waiting_workers_active_pops counter in ThreadPoolShare to track active Pop() operations.

Deferred deletion of Worker objects if pops are active, using a new deferred_deletes_ list.

Ensured Shutdown() waits for all pops to finish before freeing deferred workers.

✅ Validation

Ran ybd asan --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test — no more heap-use-after-free crashes.

Verified that normal enqueue/dequeue behavior is unaffected.

Checked that no leaks remain (all deferred deletes are flushed in shutdown).

📊 Impact

Fixes flaky ASAN test failures in DocDB thread pool.

Minimal overhead: adds two atomic ops per worker notification.

No API changes.

🔗 References

Fixes: yugabyte#28297

Jira: DB-17979
@Adityakk9031 Adityakk9031 changed the title Fix for Inconsistent Reads with JDBC getMoreResults() (DB-19247) Fix: Disable partial retries + correct RpcContext handling to resolve inconsistent reads when using statement.getMoreResults() (#29451) Nov 23, 2025
@netlify
Copy link

netlify bot commented Nov 23, 2025

Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 1502630
🔍 Latest deploy log https://app.netlify.com/projects/infallible-bardeen-164bc9/deploys/692360f5a9539c0008f8dd8e
😎 Deploy Preview https://deploy-preview-29455--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant