test: cross version api compatibility test using tastroa #4680

gupadhyaya · 2025-11-04T07:50:10Z

Summary

Adds bidirectional cross-version API compatibility tests using tastora to detect breaking API changes between versions.

How It Works

Current client → Old server: Uses current codebase's RPC client to connect to an old server version (via Docker image)
Old client → Current server: Compiles client code from an old version inside Docker and connects to current server (built from current branch)

Both directions must pass to ensure API compatibility.

Test Coverage

Tests all major API modules: Node, Header, State, P2P, Share, DAS, and Blob APIs across both bridge and light nodes.

Changes

Added cross-version compatibility test suite (nodebuilder/tests/tastora/api/cross_version_client_test.go)
Enhanced tastora framework with version-aware node creation (NewBridgeNodeWithVersion, NewLightNodeWithVersion)
Added buildCurrentNodeImage() to build Docker images from current codebase
Added GitHub Actions workflow for CI integration
Reverted local changes to transaction_test.go to match upstream

Running Tests

cd nodebuilder/tests/tastora/api
go test -tags=integration -v -run TestCrossVersionClientTestSuite/TestCrossVersionBidirectional -timeout 30m

…ctx is now used to control lifecycle of tx workers (#4634)

…ctx is now used to control lifecycle of tx workers (#4635)

- Add TearDownSuite method to BlobTestSuite for proper cleanup - Add comprehensive parallel transaction test in TransactionTestSuite - Add WithTxWorkerAccounts option to Framework for configuring worker accounts - Add TxWorkerAccounts field to Config struct with validation - Test verifies --tx.worker.accounts feature works correctly - Includes proper error handling and detailed logging for debugging

- Update to use v0.28.1-arabica which contains fixes - This version includes the queued submission feature with bug fixes

.github/workflows/api-compatibility.yml

renaynay

Few comments here. The code in cross_version_client is a bit confusing and i'd like to reduce it to the bare minimum necessary to run the tests.

The tests definitely pass locally?

api/client/client_test.go

renaynay · 2025-11-20T14:57:34Z

nodebuilder/p2p/p2p.go

-	return rms.Stat(), nil
+	stat := rms.Stat()
+
+	// Sanitize peer IDs to ensure backward compatibility with old clients that cannot decode certain peer ID formats.


wait what happened here?

renaynay · 2025-11-20T15:02:34Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+	// Use a longer timeout to handle Docker builds and all test combinations
+	ctx, cancel := context.WithTimeout(context.Background(), 20*time.Minute)
+	defer cancel()
+	oldVersion := "v0.28.3-arabica"


It would be nice if we could grab this from somewhere (inject from CI for example) bc otherwise we'll have to hardcode it constantly. This can be a FLUP though - but pls track it.

renaynay · 2025-11-20T15:12:02Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+
+// waitForNodeReadyAndSynced waits for a node to be fully ready and synced before testing APIs.
+// This ensures APIs will work without needing individual timeouts.
+func (s *CrossVersionClientTestSuite) waitForNodeReadyAndSynced(ctx context.Context, client *rpcclient.Client, nodeName string, timeout time.Duration) {


this should be a method on the framework itself

and it can definitely be cleaned up: instead of polling, just use SyncWait.

Also shares available not a necessary pre-check for this test.

please also remove all the AI-generated comments from the code 😭

renaynay · 2025-11-20T15:13:59Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+	// Share APIs - node should be synced by now, but handle "data not available" gracefully for compatibility
+	err = client.Share.SharesAvailable(ctx, head.Height())
+	if err != nil && strings.Contains(err.Error(), "data not available") {
+		s.T().Logf("Share.SharesAvailable: data not available (known limitation with old light servers): %v", err)
+	} else {
+		require.NoError(s.T(), err)
+	}
+	_, err = client.Share.GetNamespaceData(ctx, head.Height(), namespace)
+	if err != nil && strings.Contains(err.Error(), "data not available") {
+		s.T().Logf("Share.GetNamespaceData: data not available (known limitation with old light servers): %v", err)
+	} else {
+		require.NoError(s.T(), err)
+	}
+	_, err = client.Share.GetEDS(ctx, head.Height())
+	if err != nil && strings.Contains(err.Error(), "data not available") {
+		s.T().Logf("Share.GetEDS: data not available (known limitation with old light servers): %v", err)
+	} else {
+		require.NoError(s.T(), err)
+	}


Here we should test for expected success case and failure case (errors expected for example)

instead of allowing silent failure.

renaynay · 2025-11-20T15:15:43Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+	}
+
+	waitCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
+	err = client.DAS.WaitCatchUp(waitCtx)


IMO we don't need to test the wait methods bc they're just blocking methods that return err in case ctx deadline exceeded.

renaynay · 2025-11-20T15:18:06Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+	}
+
+	tmpDir := s.T().TempDir()
+	testProgram := s.generateTestClientProgram(serverRPCAddr, skipGetRow)


what's this?

chatton

I'm mainly just reviewing the approach of dynamic code generation and image building in this PR in cross_version_client_test.go

If my understanding is correct, what we are attempting to achieve here is creating a set of tests that allows us to use old client code with a new server version, and the other way around.

I don't think the current approach is the way this problem should be solved.

It adds a lot of docker logic that isn't really relevant for the scope of the test
the dynamic go code is kind of hard to follow and we don't get any type safety with it.
it's also a bit weird and unintuitive.

In my opinion, a cleaner approach would be to create a separate binary that is a simple CLI application that performs these checks. It would be built and tagged on every PR/tag. (This would need backported and be built for previous versions, but that would be a one time thing )

It could be run like docker run ghcr.io/celestiaorg/celestia-node-compat-test:v6.1.2-arabica compat-test --rpc-url <some-url>

With this --rpc-url command, all you need is 2 tags, one for your celestia-node and one for the compat-test container.

In this PR, there could be a type added that embeds the tastora *container.Node. This is a type that allows easy access to all the docker volume / exec functionality so none of that needs to be implemented here.

The go code in this PR, can just create an instance of CompatTester or whatever the struct is called. Run it with the --rpc-url of whatever node version you want, and then just check the exit code and stderr of the container. If all assertions pass it will be exit code 0 and the test passes, otherwise you get an error message to display and the go test will fail.

renaynay · 2025-11-21T17:07:47Z

Thank you for the feedback here @chatton

…checks

.github/workflows/api-compatibility.yml

.github/workflows/build-compat-test-images.yml

chatton

Looking a lot better! Left a few suggestions for additional improvements

chatton · 2025-11-25T12:53:53Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+			s.T().Logf("Failed to pull image %s: %v", imageName, pullErr)
+			s.T().Logf("Attempting to build image locally as fallback...")
+
+			if buildErr := s.buildCompatTestImageLocally(ctx, clientVersion, imageName); buildErr != nil {


for compat tests, I would imagine we never actually want to build the image locally, it should always be something that has been built from a tag, or this PR (prior to this go test running, so as a previous step in the workflow)

if we intend to run compatibility test for every PR then we may need to build image locally right? PR branch and the branch on top of which it is based on? not necessarily always against a release version.

chatton · 2025-11-25T12:55:12Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+		NetworkMode: container.NetworkMode(networkID),
+	}
+
+	createResp, err := dockerClient.ContainerCreate(ctx, config, hostConfig, nil, nil, containerName)


rather than doing a raw container creation, take a look at how the hermes struct is implemented, you can create a type embedding *container.Node and get lots of functionality out of the box

chatton · 2025-11-25T12:55:53Z

cmd/compat-test/main.go

@@ -0,0 +1,192 @@
+package main


yes this is much nicer! I would recommend however that you pull this out into a separate PR, so that it is easy to backport to all the relevant release branches.

chatton · 2025-11-25T12:56:31Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+	s.testAllAPIsWithOptions(ctx, client, false)
+}
+
+func (s *CrossVersionClientTestSuite) testAllAPIsWithOptions(ctx context.Context, client *rpcclient.Client, skipGetRow bool) {


I think we don't need this anymore because we can replace it with using the container of whatever this PR is, that way all this compat logic is just in one place.

chatton · 2025-11-25T12:57:23Z

nodebuilder/tests/tastora/api/cross_version_client_test.go

+func (s *CrossVersionClientTestSuite) buildCompatTestImageLocally(ctx context.Context, version, imageName string) error {
+	s.T().Logf("Building compat-test image locally for version %s...", version)
+
+	repoRoot := s.getRepoRoot()
+	currentCommitCmd := exec.CommandContext(ctx, "git", "rev-parse", "HEAD")
+	currentCommitCmd.Dir = repoRoot
+	currentCommit, err := currentCommitCmd.Output()
+	if err != nil {
+		return fmt.Errorf("failed to get current commit: %w", err)
+	}
+	currentCommitStr := strings.TrimSpace(string(currentCommit))
+
+	defer func() {
+		restoreCmd := exec.CommandContext(ctx, "git", "checkout", currentCommitStr)
+		restoreCmd.Dir = repoRoot
+		_ = restoreCmd.Run()
+	}()
+
+	checkoutCmd := exec.CommandContext(ctx, "git", "checkout", version)
+	checkoutCmd.Dir = repoRoot
+	if output, err := checkoutCmd.CombinedOutput(); err != nil {
+		return fmt.Errorf("failed to checkout version %s: %w. Output: %s. Note: compat-test code may need to be backported to this version", version, err, string(output))
+	}
+
+	dockerfilePath := filepath.Join(repoRoot, "cmd/compat-test/Dockerfile")
+	if _, err := os.Stat(dockerfilePath); os.IsNotExist(err) {
+		return fmt.Errorf("cmd/compat-test/Dockerfile not found in version %s. The compat-test code needs to be backported to this version", version)
+	}
+
+	buildCmd := exec.CommandContext(ctx, "docker", "build",
+		"-f", "cmd/compat-test/Dockerfile",
+		"-t", imageName,
+		".",
+	)
+	buildCmd.Dir = repoRoot
+	output, err := buildCmd.CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("docker build failed: %w\nOutput: %s", err, string(output))
+	}
+
+	return nil
+}


I don't think we need this at all, we shouldn't need to execute any docker/git commands via cli like this, IMO the cleanest approach is just making sure that the images exist before the test runs.

github-actions bot added the kind:break! Attached to breaking PRs label Nov 4, 2025

renaynay force-pushed the queued_sub_test branch from 4e6b1b9 to 55a2c26 Compare November 7, 2025 19:34

queued submission tastora tests

5888ae3

renaynay force-pushed the queued_sub_test branch from 55a2c26 to 5888ae3 Compare November 7, 2025 19:36

renaynay and others added 26 commits November 17, 2025 09:37

feat(state): Introduce queued submission (#4620)

9cea59f

deps(go.mod): Bump app v6.1.1-arabica (#4621)

fb1e041

fix(state): pass long lived core accessor ctx into txclient setup as …

bb91422

…ctx is now used to control lifecycle of tx workers (#4634)

queued submission tastora tests

8c187e4

simplify test to only parallel submission case

ec6b932

lint imports fix

2f5ffa8

more attempt to fix lint

840f382

fix(state): pass long lived core accessor ctx into txclient setup as …

daee3a4

…ctx is now used to control lifecycle of tx workers (#4635)

rene commit

4647185

cleanup

1fc2326

chore: update defaultNodeTag to v0.28.1-arabica

a054a0c

- Update to use v0.28.1-arabica which contains fixes - This version includes the queued submission feature with bug fixes

fixes

72041fd

cleanup and fix test

bda29a2

debugging

2689a5a

fix insufficientfund

d148066

default tag

3ce79ac

fixed

92a0675

framework improvements for robustness

f700fa1

cleanup framework

af925be

use tastora PR commit instead of local

bdea938

waitForTransactionInclusion instead of blocks

b6c2e04

more cleanup

126be80

fix lint

595fa0a

add multiversion node support

5760705

test: framework improvements for robustness (#4649)

f4e9e35

gupadhyaya requested a review from walldiss as a code owner November 18, 2025 16:03

fixes

de18b56

gupadhyaya changed the base branch from rene/queued_sub_test to main November 18, 2025 16:11

fix parity check failures

20d9b9a

github-advanced-security bot found potential problems Nov 18, 2025

View reviewed changes

.github/workflows/api-compatibility.yml Fixed Show fixed Hide fixed

.github/workflows/api-compatibility.yml Fixed Show fixed Hide fixed

gupadhyaya added 6 commits November 18, 2025 20:29

fix more parity failures

296a297

yml suggestions

08cc0a2

adding readiness check

7fd61e5

more fixes

63f8a2b

minor

b300dff

some optimizations

f09fea2

renaynay reviewed Nov 20, 2025

View reviewed changes

chatton requested changes Nov 21, 2025

View reviewed changes

change the design to use cmd script to execute the api compatibility …

f804641

…checks

github-advanced-security bot found potential problems Nov 24, 2025

View reviewed changes

.github/workflows/api-compatibility.yml Fixed Show fixed Hide fixed

.github/workflows/build-compat-test-images.yml Fixed Show fixed Hide fixed

gupadhyaya added 12 commits November 24, 2025 20:35

lint

1406da7

more lint

888340f

final lint

4434601

build fix

1587909

add skip for build

e22555e

remove skip logic

97ed9cc

backport fix

3298efc

minor fixes

bb69024

remove ai generated comments

ac98f93

refactor sync wait

09d1575

remove wait catching up check

f421cd5

fix

75bf7ec

chatton reviewed Nov 25, 2025

View reviewed changes

chatton mentioned this pull request Dec 1, 2025

test: introduce cel-compat binary #4721

Open

test: cross version api compatibility test using tastroa #4680

Are you sure you want to change the base?

test: cross version api compatibility test using tastroa #4680

Uh oh!

Conversation

gupadhyaya commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How It Works

Test Coverage

Changes

Running Tests

Uh oh!

Uh oh!

Uh oh!

renaynay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatton left a comment

Choose a reason for hiding this comment

Uh oh!

renaynay commented Nov 21, 2025

Uh oh!

Uh oh!

Uh oh!

chatton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gupadhyaya commented Nov 4, 2025 •

edited

Loading