MCOL-6326: cpimport rollback drop ranges with custom tests#3974
MCOL-6326: cpimport rollback drop ranges with custom tests#3974mariadb-SergeyZefirov wants to merge 13 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new custom testing framework and a specific test case for cpimport rollback, alongside updates to the ExtentMap logic to ensure CP ranges are invalidated when the High Water Mark (HWM) is updated during a rollback. Feedback focuses on several critical issues: the customtests step in the CI pipeline may be executing on the host runner instead of within the Docker container, and the new --no-mtr flag might skip necessary environment initialization. Additionally, the cpimport-rollback.sh script contains a logic error in its success assertion, lacks cleanup for large temporary files, and relies on a non-deterministic sleep which introduces a race condition.
| "apk add bash && " + | ||
| get_build_command("run_mtr.sh") + | ||
| " --container-name " + getContainerName("customtests") + | ||
| " --distro " + platform + | ||
| " --triggering-event " + event + | ||
| " --no-mtr" + | ||
| if std.endsWith(result, "ASan") then " --run-as-extern" else "" + | ||
| " && " + get_build_command("../tests/custom/cpimport-rollback.sh") |
There was a problem hiding this comment.
The customtests step appears to execute the test script on the host runner rather than inside the Docker container. Additionally, run_mtr.sh --no-mtr (as implemented in build/run_mtr.sh) skips the execInnerDocker call, which typically handles the execution context and initialization within the container. If cpimport-rollback.sh depends on a running ColumnStore instance and the cpimport binary, it will likely fail if executed directly on the host.
|
|
||
| echo "MAX(dt): $RESULT" | ||
|
|
||
| if [ "$RESULT" != "2026-03-01" ]; |
There was a problem hiding this comment.
The test expectation seems incorrect. The script loads data with 2025-03-01, then attempts to load data with 2026-03-01 but kills the process to trigger a rollback. If the rollback is successful, the 2026-03-01 data should be removed, and MAX(dt) should return 2025-03-01. As written, the test will fail (exit 1) if the rollback works correctly.
| if [ "$RESULT" != "2026-03-01" ]; | |
| if [ "$RESULT" != "2025-03-01" ]; |
| if [[ $NO_MTR = true ]]; then | ||
| echo "Skipping MTR altogether." | ||
| else | ||
| execInnerDocker "${CONTAINER_NAME}" "${MTR_RUN_COMMAND}" | ||
| fi |
There was a problem hiding this comment.
The implementation of --no-mtr skips the execInnerDocker call entirely. If this call is responsible for initializing the container environment (e.g., starting services or setting up the database), skipping it may leave the container in an uninitialized state, causing subsequent tests that rely on a running database to fail.
| seq 1 5000000 | awk '{print $1",2025-03-01"}' > /tmp/test_data0.csv | ||
| echo '100000000,\N' >>/tmp/test_data0.csv | ||
| seq 1 50000000 | awk '{print $1",2026-03-01"}' > /tmp/test_data.csv | ||
| echo '110000000,\N' >>/tmp/test_data.csv |
There was a problem hiding this comment.
The script generates large temporary files (approx. 1GB) in /tmp but does not clean them up. This can lead to disk exhaustion on CI runners. It is recommended to use a trap to ensure these files are removed upon script exit.
| seq 1 5000000 | awk '{print $1",2025-03-01"}' > /tmp/test_data0.csv | |
| echo '100000000,\N' >>/tmp/test_data0.csv | |
| seq 1 50000000 | awk '{print $1",2026-03-01"}' > /tmp/test_data.csv | |
| echo '110000000,\N' >>/tmp/test_data.csv | |
| trap 'rm -f /tmp/test_data0.csv /tmp/test_data.csv' EXIT | |
| seq 1 5000000 | awk '{print $1",2025-03-01"}' > /tmp/test_data0.csv | |
| echo '100000000,\N' >>/tmp/test_data0.csv | |
| seq 1 50000000 | awk '{print $1",2026-03-01"}' > /tmp/test_data.csv | |
| echo '110000000,\N' >>/tmp/test_data.csv |
| CP_PID=$! | ||
|
|
||
| echo "4. Waiting 1 second for extents to allocate..." | ||
| sleep 1 |
There was a problem hiding this comment.
Using a fixed sleep 1 to wait for cpimport to allocate extents is a race condition. On a slow system, the process might not have reached the allocation phase, while on a fast system, it might have already finished. Consider a more robust synchronization method or a longer wait time to ensure the process is in the expected state before sending the signal.
No description provided.