Skip to content

tests: add external starter mode realtikv harness#68463

Open
ChangRui-Ryan wants to merge 2 commits into
pingcap:masterfrom
ChangRui-Ryan:changrui_cse_test
Open

tests: add external starter mode realtikv harness#68463
ChangRui-Ryan wants to merge 2 commits into
pingcap:masterfrom
ChangRui-Ryan:changrui_cse_test

Conversation

@ChangRui-Ryan
Copy link
Copy Markdown
Contributor

@ChangRui-Ryan ChangRui-Ryan commented May 18, 2026

What problem does this PR solve?

Issue Number: ref #67765

Problem Summary

The existing next-gen RealTiKV scripts only run TiDB in-process through Go tests. They do not provide a way to validate starter mode through a real external tidb-server process, so startup config handling, MySQL protocol behavior, and status/config endpoint behavior for starter mode were not covered.

What changed and how does it work

Added an external starter-mode test harness under tests/realtikvtest/scripts/next-gen. The new script reuses the existing next-gen cluster bootstrap for PD/TiKV/TiKV-worker/MinIO, then starts a real next-gen tidb-server with deploy-mode = "starter" and runs tests/realtikvtest/startertest through the MySQL protocol.

The new starter tests verify the external server’s /config output, starter-specific sysvar contracts, max_allowed_packet enforcement at the protocol boundary, basic SQL behavior, and SHOW/SET SESSION_STATES roundtrip behavior.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Tests

    • Added comprehensive integration tests for external starter-mode TiDB, covering config endpoints, system variables, protocol enforcement, and session-state round trips.
    • Added a test runner to bootstrap a starter-mode server and execute starter-mode test suites.
    • Added a new test target for automated/sharded execution in CI.
  • Documentation

    • Added usage and setup documentation for the starter-mode test runner, prerequisites, options, and environment variables.

Review Change Stack

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels May 18, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 18, 2026

@ChangRui-Ryan I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 18, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lichunzhu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f9ef6ba6-6977-4e12-80f8-113174d5eb89

📥 Commits

Reviewing files that changed from the base of the PR and between 1d304a2 and f7b45ea.

📒 Files selected for processing (3)
  • tests/realtikvtest/scripts/next-gen/README.md
  • tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh
  • tests/realtikvtest/startertest/starter_external_test.go
✅ Files skipped from review due to trivial changes (1)
  • tests/realtikvtest/scripts/next-gen/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/realtikvtest/startertest/starter_external_test.go
  • tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh

📝 Walkthrough

Walkthrough

Adds a README subsection, a bash test runner that starts an external starter-mode tidb-server with dynamic ports and readiness polling, a Bazel go_test target, and Go integration tests that validate starter /config, sysvar contracts, protocol max-packet enforcement, and session-state round-trip.

Changes

Starter Mode Test Infrastructure

Layer / File(s) Summary
Test runner documentation
tests/realtikvtest/scripts/next-gen/README.md
Added README section documenting run-starter-tests-with-server.sh, including usage, prerequisites (NEXT_GEN=1 make server), arguments, and environment variables for starter tidb-server configuration.
Port discovery helper
tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh
find_available_port implementation that scans upward from a starting port and returns the first unused TCP port.
Run helper: readiness, cleanup, and polling
tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh
cleanup_tidb_server trap and wait_for_tidb_server which polls the status endpoint, checks PID, prints log tails on failure, and enforces a readiness timeout.
Runner orchestration and test invocation
tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh
run_under_cluster writes a temporary starter.toml, starts tidb-server with chosen ports, exports TIDB_STARTER_TEST_DSN/status/max-allowed-packet/tikv-worker env vars, and runs go test for the selected suite with NEXT_GEN=1 and tags.
CLI glue / bootstrap integration
tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh
main and CLI routing that delegate to bootstrap-test-with-cluster.sh and support --under-cluster mode.
Test build configuration
tests/realtikvtest/startertest/BUILD.bazel
Bazel go_test target startertest_test with timeout = "moderate", flaky = True, shard_count = 4, and external deps on MySQL driver and testify/require.
External starter integration tests
tests/realtikvtest/startertest/starter_external_test.go
Four tests validating starter /config JSON, starter sysvar contracts and SET GLOBAL error expectations, max-allowed-packet protocol enforcement via oversized query, and session-states round-trip; plus helpers for DB/env and SQL/assertions.

Sequence Diagram(s)

sequenceDiagram
  participant Runner as run-starter-tests-with-server.sh
  participant PortFinder as find_available_port
  participant TiDB as tidb-server
  participant Status as wait_for_tidb_server
  participant GoTest as go test

  Runner->>PortFinder: request free SQL & status ports
  PortFinder-->>Runner: available ports
  Runner->>Runner: write temporary starter.toml (ports, max-allowed-packet, tikv-worker-url)
  Runner->>TiDB: start tidb-server (background)
  TiDB-->>Runner: background PID
  Runner->>Status: poll ${TIDB_STATUS_URL}/config until ready
  Status->>TiDB: HTTP GET /config
  TiDB-->>Status: JSON config response
  Status-->>Runner: server ready
  Runner->>GoTest: export env vars and run go test with NEXT_GEN=1, tags
  GoTest-->>Runner: test results
  Runner->>TiDB: cleanup trap kills tidb-server on exit
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pingcap/tidb#68245: Related starter-mode sysvar gating and tests for require_secure_transport.
  • pingcap/tidb#68313: Changes affecting starter defaults for standby.enable-zero-backend which the new tests assert.
  • pingcap/tidb#68314: Starter-mode max_allowed_packet wiring referenced by the new runner and tests.

Suggested labels

size/L

Suggested reviewers

  • iosmanthus
  • wjhuang2016
  • yudongusa

Poem

🐰 I found the ports, I wrote the file,
The starter woke and ran a while.
I polled its heartbeat, fed the test,
Session states and packets pressed—
Hooray! The rabbit hops with style.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding an external starter mode test harness to the RealTiKV test infrastructure.
Description check ✅ Passed The description includes all required sections: issue reference, problem summary, explanation of changes, and completed checklist. Integration test is marked as included.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: context deadline exceeded"
level=error msg="Timeout exceeded: try increasing it by passing --timeout option"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 18, 2026

Hi @ChangRui-Ryan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.5112%. Comparing base (e4f51b2) to head (f7b45ea).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68463        +/-   ##
================================================
- Coverage   77.2761%   76.5112%   -0.7650%     
================================================
  Files          2010       1992        -18     
  Lines        555474     557927      +2453     
================================================
- Hits         429249     426877      -2372     
- Misses       125305     130933      +5628     
+ Partials        920        117       -803     
Flag Coverage Δ
integration 41.5034% <ø> (+1.7093%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4679% <ø> (ø)
parser ∅ <ø> (∅)
br 49.9725% <ø> (-13.0354%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/realtikvtest/scripts/next-gen/README.md`:
- Around line 80-83: The README uses relative paths assuming current directory
tests/realtikvtest/scripts/next-gen/ (e.g.,
"./run-starter-tests-with-server.sh"); update the commands to be copy-pasteable
from the repository root by prefixing with the repo-root path (e.g.,
"tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh") or
explicitly state "run from tests/realtikvtest/scripts/next-gen/" for each
affected command (notably the occurrences of
"./run-starter-tests-with-server.sh" around the shown block and the commands at
lines ~92-94) so users can run them directly from the repo root.

In `@tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh`:
- Around line 79-83: The EXIT cleanup currently only stops background processes
but doesn't remove the temporary data directory created in data_dir; update the
EXIT/trap cleanup block (the handler that kills tidb-server and other test
processes) to also remove the temp directory by adding a safe removal like: if
[[ -n "${data_dir:-}" && -d "${data_dir}" ]]; then rm -rf -- "${data_dir}"; fi
so the tmp artifacts created by data_dir are always cleaned up.

In `@tests/realtikvtest/startertest/starter_external_test.go`:
- Line 74: The test currently hard-codes the TiKV worker URL to
"localhost:19000" (asserting cfg.TiKVWorkerURL) which breaks when the harness
assigns a free port; instead assert the stable shape: parse cfg.TiKVWorkerURL
(or use a regex) and verify the host equals "localhost" and the port is a
non-empty numeric value (e.g., use net.SplitHostPort(cfg.TiKVWorkerURL) and
require.Equal(t, "localhost", host) plus require.Regexp(t, `^\d+$`, port)), or
compare against a harness-provided expected value if one is available.
- Around line 115-119: The test's error substring checks are case-sensitive (see
the require.Truef call that inspects err.Error()), causing flakes; change the
assertion to compare against a lowercase error string by calling
strings.ToLower(err.Error()) and match the lowercase literals
"max_allowed_packet", "packet bigger" and "invalid connection" so the
require.Truef(...) still checks the same three conditions but
case-insensitively; update the require.Truef invocation that references
err.Error() accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c64e7654-5b30-45b1-ad5d-07e3b6a4b435

📥 Commits

Reviewing files that changed from the base of the PR and between e4f51b2 and 1d304a2.

📒 Files selected for processing (4)
  • tests/realtikvtest/scripts/next-gen/README.md
  • tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh
  • tests/realtikvtest/startertest/BUILD.bazel
  • tests/realtikvtest/startertest/starter_external_test.go

Comment thread tests/realtikvtest/scripts/next-gen/README.md
Comment thread tests/realtikvtest/scripts/next-gen/run-starter-tests-with-server.sh Outdated
Comment thread tests/realtikvtest/startertest/starter_external_test.go Outdated
Comment thread tests/realtikvtest/startertest/starter_external_test.go
@ChangRui-Ryan
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 19, 2026

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant