Skip to content

Conversation

@Lorak-mmk
Copy link
Collaborator

@Lorak-mmk Lorak-mmk commented Nov 14, 2025

Based on #1473 (which was merged) and #1479
The above PR changes the semantics in a way that allows us to write the test for RequiredHostAbsent error.
The test adds proxy rules that drop connection when query to system.local is received on node 1 (which is coordinator for DDL) and also prevent opening new connections to this node.
Now schema awaiting will perform check twice:

  • First check will return broken connection error for coordinator, and drop all connections
  • In second (and each subsequent) check checking schema agreement will require RequiredHostAbsent

Test needs to wait until schema agreement timeout - the semantics change means that it won't return error earlier.
If we need to make this test faster, then we need to make check_schema_agreement_with_required_node public and call it directly.

Fixes: #1362

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk added this to the 1.5.0 milestone Nov 14, 2025
@Lorak-mmk Lorak-mmk requested a review from wprzytula November 14, 2025 18:11
@Lorak-mmk Lorak-mmk self-assigned this Nov 14, 2025
@Lorak-mmk Lorak-mmk requested a review from Copilot November 14, 2025 18:12
@github-actions
Copy link

github-actions bot commented Nov 14, 2025

cargo semver-checks found no API-breaking changes in this PR.
Checked commit: 8a8625c

Copilot finished reviewing on behalf of Lorak-mmk November 14, 2025 18:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive test coverage for the RequiredHostAbsent error condition and refactors the schema agreement logic to better handle transient failures. The changes are based on semantic improvements from PR #1473 that allow detecting when a required coordinator node becomes unavailable during schema agreement checks.

Key Changes

  • Refactored await_schema_agreement_with_required_node to track the last attempt result and return more informative errors on timeout
  • Added test_schema_await_with_transient_failure to verify schema agreement succeeds even when initial checks fail
  • Added test_schema_await_required_host_absent to test the RequiredHostAbsent error condition by simulating connection failures to the coordinator node

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
scylla/tests/integration/session/schema_agreement.rs Adds two new integration tests that verify schema agreement behavior under failure conditions using proxy rules to simulate network issues
scylla/src/client/session.rs Refactors schema agreement implementation to eliminate the helper method and improve error reporting by tracking last attempt results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Lorak-mmk Lorak-mmk force-pushed the test-required-host-absent branch from 4a1e59e to 18debc9 Compare November 16, 2025 15:00
@Lorak-mmk
Copy link
Collaborator Author

Rebased on main

@Lorak-mmk Lorak-mmk force-pushed the test-required-host-absent branch from 18debc9 to fc92c9b Compare November 16, 2025 15:29
@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Nov 16, 2025

Rebased on current version of base PR, addressed comments.

@Lorak-mmk Lorak-mmk force-pushed the test-required-host-absent branch 2 times, most recently from 42cc396 to 4bb38fa Compare November 17, 2025 14:43
@Lorak-mmk
Copy link
Collaborator Author

Rebased on main

@Lorak-mmk Lorak-mmk requested a review from Copilot November 17, 2025 14:43
Copilot finished reviewing on behalf of Lorak-mmk November 17, 2025 14:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +290 to +308
"CREATE KEYSPACE {ks} WITH
REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}"
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CREATE KEYSPACE statement is split across multiple lines without using a line continuation backslash (\). This will include a newline and indentation whitespace in the SQL string, which is inconsistent with the rest of the codebase.

Consider reformatting to a single line like other tests in this file (e.g., line 43), or use a backslash for line continuation like in simple_strategy.rs:

let mut request = Statement::new(format!(
    "CREATE KEYSPACE {ks} WITH REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}"
));
Suggested change
"CREATE KEYSPACE {ks} WITH
REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}"
"CREATE KEYSPACE {ks} WITH REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}"

Copilot uses AI. Check for mistakes.
wprzytula
wprzytula previously approved these changes Nov 17, 2025
@wprzytula
Copy link
Collaborator

Doesn't this close #1349?

@Lorak-mmk
Copy link
Collaborator Author

I don't think so. It doesn't test the things that were changed in that PR (fetching only from one connection).

@Lorak-mmk
Copy link
Collaborator Author

The test failed on Cassandra, but not in the main part of the test but in droopping the keyspace at the end. Weird.
The error is:

thread 'session::schema_agreement::test_schema_await_required_host_absent' (12643) panicked at scylla/tests/integration/session/schema_agreement.rs:312:18:
called `Result::unwrap()` on an `Err` value: SchemaAgreementError(Timeout(1s))
stack backtrace:

The code is

            // Cleanup
            running_proxy.running_nodes[1].change_request_rules(Some(vec![]));
            session.await_schema_agreement().await.unwrap();
            session
                .query_unpaged(format!("DROP KEYSPACE {ks}"), &[])
                .await
                .unwrap();

with unwrap being line 312.

What is weird is that the error mentions SchemaAgreementError alone, not as part of ExecutionError.
Maybe the error message is wrong and the test failed somewhere else? This doesn't make much sense because in the logs I can see that the DROP was issued.

@Lorak-mmk
Copy link
Collaborator Author

Rebased on main. Added a check that uuid in error is correct.
Let's see if the error is reproducible.

@Lorak-mmk
Copy link
Collaborator Author

Now the test passed. I'll rerun it a few times.

wprzytula
wprzytula previously approved these changes Nov 18, 2025
@Lorak-mmk
Copy link
Collaborator Author

Cassandra failed again, but on a different test :D

thread 'session::pager::test_pager_timeouts' (12341) panicked at scylla/tests/integration/session/pager.rs:332:22:
called `Result::unwrap()` on an `Err` value: NextPageError(RequestFailure(RequestTimeout(100ms)))

SyntaxError will cause schema agreement process to fail immediately
after next commits (that will add error classification there).
This changes the logic of schema agreement to allow some error to end
the process immediately, without waiting until the timeout.
For now the error clasification is not done with a lot of effort, just
to have something that doesn't treat transient errors as non-transient,
but also classify some errors as non-transient.
@Lorak-mmk Lorak-mmk force-pushed the test-required-host-absent branch from 7761943 to 858f4af Compare November 21, 2025 17:11
@Lorak-mmk Lorak-mmk marked this pull request as draft November 21, 2025 17:11
After changing schema agreement logic, those tests became slow, as they
had to wait until schema agreement timeout in each case, which is by
default 60s.
@Lorak-mmk Lorak-mmk force-pushed the test-required-host-absent branch from 858f4af to 8a8625c Compare November 21, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test for SchemaAgreementError::RequiredHostAbsent

2 participants