We are having several errors related to random tests failing. I have divided the failed tests in two sections:
Tests that just need to wait until the data is replicated
Seems like in all these situations the problem is that the test is asserting that some data needs to be present but it request it to a replica that have not them yet. The main fix for them would be to retry the check until the data is replicated:
FAILED tests/integration/adapter/incremental/test_base_incremental.py::TestIncrementalCompoundKey::test_compound_key - assert 200 == 180
FAILED tests/integration/adapter/materialized_view/test_materialized_view.py::TestCatchup::test_full_refresh_catchup_enabled - assert 3 == 4
FAILED tests/integration/adapter/incremental/test_schema_change.py::TestOnSchemaChange::test_append[schema_change_append] - assert 4 == 5
FAILED tests/integration/adapter/clickhouse/test_clickhouse_table_materializations.py::TestMergeTreeTableMaterialization - assert 10 >= 20
CH error codes that we need to retry
There are situations where one operation may fail because CH cannot process it in this particular moment. If the error is retryable, we should retry until the operation success.
Received ClickHouse exception, code: 244, server response: Code: 244. DB::Exception: Got unexpected ZooKeeper error ZNODEEXISTS (at index 10) for part all_0_0_0. (UNEXPECTED_ZOOKEEPER_ERROR) (for url https://***:8443)
Code: 242. DB::Exception: Table is shutting down (zookeeper path: /clickhouse/tables/230e7b4e-a580-41ac-a1ee-2958c4a99a02/default). Stack trace
Code: 279, server response: Code: 279. DB::Exception: All connection tries failed. / Code: 32. DB::Exception: Attempt to read after eof. (ATTEMPT_TO_READ_AFTER_EOF
We are having several errors related to random tests failing. I have divided the failed tests in two sections:
Tests that just need to wait until the data is replicated
Seems like in all these situations the problem is that the test is asserting that some data needs to be present but it request it to a replica that have not them yet. The main fix for them would be to retry the check until the data is replicated:
FAILED tests/integration/adapter/incremental/test_base_incremental.py::TestIncrementalCompoundKey::test_compound_key - assert 200 == 180
FAILED tests/integration/adapter/materialized_view/test_materialized_view.py::TestCatchup::test_full_refresh_catchup_enabled - assert 3 == 4
FAILED tests/integration/adapter/incremental/test_schema_change.py::TestOnSchemaChange::test_append[schema_change_append] - assert 4 == 5
FAILED tests/integration/adapter/clickhouse/test_clickhouse_table_materializations.py::TestMergeTreeTableMaterialization - assert 10 >= 20
CH error codes that we need to retry
There are situations where one operation may fail because CH cannot process it in this particular moment. If the error is retryable, we should retry until the operation success.
Received ClickHouse exception, code: 244, server response: Code: 244. DB::Exception: Got unexpected ZooKeeper error ZNODEEXISTS (at index 10) for part all_0_0_0. (UNEXPECTED_ZOOKEEPER_ERROR) (for url https://***:8443)
Code: 242. DB::Exception: Table is shutting down (zookeeper path: /clickhouse/tables/230e7b4e-a580-41ac-a1ee-2958c4a99a02/default). Stack trace
Code: 279, server response: Code: 279. DB::Exception: All connection tries failed. / Code: 32. DB::Exception: Attempt to read after eof. (ATTEMPT_TO_READ_AFTER_EOF