refactor(integration-tests): Deprecate IntegrationTestLogs and tests.utils.config.IntegrationTestPathConfig.#2241
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRefactored integration test fixture system to introduce a new ChangesDataset Abstraction and Fixtures
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
integration-tests/tests/test_log_converter.py (1)
86-90:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAddress substring ambiguity and non-deterministic ordering in event validation.
Two substantive issues:
Substring false-positive for
TEST1. The test data contains messages ending withTEST1throughTEST11. The checkf"TEST{idx + 1}" not in messageatidx=0tests for"TEST1", which is a substring of both"TEST10"and"TEST11". If clp-s returnsTEST10orTEST11at the first position, the validation silently passes despite checking the wrong event.Non-deterministic ordering for equal timestamps. All 11 log events share the same timestamp (
1427089710122 ms). The clp-s search code iterates through messages sequentially as stored in the archive, but there is no explicit guarantee that events with identical timestamps will be returned in insertion order. Any change to the compression or search pipeline could alter this ordering without notice.Minimum fix — use
.endswith()to eliminate substring matches:Proposed fix
for idx, line in enumerate(lines): event = json.loads(line) message = event.get("message", "") - if f"TEST{idx + 1}" not in message: + if not message.endswith(f"TEST{idx + 1}"): pytest.fail( f"Expected 'TEST{idx + 1}' in message of event {idx + 1}, but got: {event}" )This addresses the substring issue but retains the ordering assumption. If clp-s does not guarantee insertion order for equal timestamps, replace with an order-independent set comparison (as suggested in the original diff proposal).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/tests/test_log_converter.py` around lines 86 - 90, The current loop over lines uses substring containment (if f"TEST{idx + 1}" not in message) which yields false-positives for TEST1 matching TEST10/11 and also assumes deterministic ordering for equal timestamps; update the validation in the loop that processes event/message (variables event, message, idx) to use message.endswith(f"TEST{idx + 1}") to avoid substring matches, and for robustness against non-deterministic ordering of identical-timestamp events replace the positional checks with an order-independent comparison: collect all extracted TEST labels from messages and compare that set/list against the expected set of {"TEST1",..."TEST11"} (or assert sorted lists) instead of relying on sequential idx matching.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@integration-tests/tests/data/text_singlefile/metadata.json`:
- Around line 5-6: Update the timestamp values in metadata.json so they
accurately reflect the log events: change the "begin_ts" and "end_ts" JSON
fields from 1427089710000 to 1427089710122 (the actual event epoch milliseconds
for 2015-03-23 05:48:30,122) so the metadata window matches the timestamps in
simple.txt.
In `@integration-tests/tests/fixtures/path_configs.py`:
- Line 38: PackagePathConfig is being constructed with
integration_tests_project_root as test_root_dir which causes
PackagePathConfig.__post_init__ to mkdir temp_config_dir under the project root;
change the constructor call to pass integration_test_path_config.test_cache_dir
(the same cache dir used by CompressionTestPathConfig and
ConversionTestPathConfig) so temp_config_dir is created under the dedicated
cache directory and aligns behavior across PackagePathConfig,
CompressionTestPathConfig and ConversionTestPathConfig.
In `@integration-tests/tests/test_log_converter.py`:
- Around line 37-40: The current calculation of num_log_events opens files
directly in a generator expression, risking leaked file handles; change the
logic to iterate over text_singlefile.metadata.file_names with an explicit with
context manager when opening (text_singlefile.logs_path /
file_name).open(encoding="utf-8") so each file is closed deterministically,
e.g., open each file inside a with block and sum lines (sum(1 for _ in fh)) into
num_log_events; update the code around the num_log_events computation to use
this pattern.
---
Outside diff comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 86-90: The current loop over lines uses substring containment (if
f"TEST{idx + 1}" not in message) which yields false-positives for TEST1 matching
TEST10/11 and also assumes deterministic ordering for equal timestamps; update
the validation in the loop that processes event/message (variables event,
message, idx) to use message.endswith(f"TEST{idx + 1}") to avoid substring
matches, and for robustness against non-deterministic ordering of
identical-timestamp events replace the positional checks with an
order-independent comparison: collect all extracted TEST labels from messages
and compare that set/list against the expected set of {"TEST1",..."TEST11"} (or
assert sorted lists) instead of relying on sequential idx matching.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6831764d-9ae2-474c-9fac-0b8555d2ba7c
📒 Files selected for processing (23)
integration-tests/tests/conftest.pyintegration-tests/tests/data/text_singlefile/logs/simple.txtintegration-tests/tests/data/text_singlefile/metadata.jsonintegration-tests/tests/fixtures/datasets.pyintegration-tests/tests/fixtures/integration_test_logs.pyintegration-tests/tests/fixtures/path_configs.pyintegration-tests/tests/package_tests/clp_json/data/json-multifile/README.mdintegration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-08.jsonlintegration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-09.jsonlintegration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-11.jsonlintegration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-19.jsonlintegration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-21.jsonlintegration-tests/tests/package_tests/clp_json/test_clp_json.pyintegration-tests/tests/package_tests/clp_text/data/text-multifile/README.mdintegration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day01.txtintegration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day04.txtintegration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day07.txtintegration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day10.txtintegration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day13.txtintegration-tests/tests/package_tests/clp_text/test_clp_text.pyintegration-tests/tests/test_identity_transformation.pyintegration-tests/tests/test_log_converter.pyintegration-tests/tests/utils/config.py
💤 Files with no reviewable changes (14)
- integration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-08.jsonl
- integration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-19.jsonl
- integration-tests/tests/package_tests/clp_json/data/json-multifile/README.md
- integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day01.txt
- integration-tests/tests/conftest.py
- integration-tests/tests/package_tests/clp_text/data/text-multifile/README.md
- integration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-21.jsonl
- integration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-11.jsonl
- integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day13.txt
- integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day04.txt
- integration-tests/tests/package_tests/clp_json/data/json-multifile/logs/sts-135-2011-07-09.jsonl
- integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day10.txt
- integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day07.txt
- integration-tests/tests/fixtures/integration_test_logs.py
| @@ -5,11 +5,10 @@ | |||
|
|
|||
There was a problem hiding this comment.
The changes to this file need further discussion. We want to support both small and large datasets in integration tests.
For these tests specifically, we intentionally use large datasets to ensure compression and decompression remain lossless at scale.
Switching them to the new local datasets (or simply adding local datasets) would also be out of scope for this PR.
There was a problem hiding this comment.
discussion in progress
There was a problem hiding this comment.
Restored the large datasets as discussed.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
integration-tests/tests/conftest.py (1)
12-17:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRegister
tests.fixtures.downloaded_datasetsinpytest_pluginsor remove the unused fixtures.The
hive_24hrandpostgresqlfixtures defined intests/fixtures/downloaded_datasets.pyare not registered inpytest_pluginsand are not referenced anywhere in the codebase. Either add the module to the plugin list if tests will use these fixtures, or remove the unused fixture definitions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/tests/conftest.py` around lines 12 - 17, The pytest_plugins list in conftest.py is missing the module that defines the hive_24hr and postgresql fixtures; either add "tests.fixtures.downloaded_datasets" to the pytest_plugins array so pytest registers those fixtures (pytest_plugins variable) or delete the unused fixture definitions from tests/fixtures/downloaded_datasets.py (fixtures named hive_24hr and postgresql) if they aren’t needed; update accordingly and run tests to ensure no import/fixture-not-found errors remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@integration-tests/tests/fixtures/downloaded_datasets.py`:
- Around line 36-47: In DownloadedDataset.__post_init__, add a one-line comment
explaining that object.__setattr__ is intentionally used to bypass frozen=True
so derived/normalized fields (name, tarball_path, extraction_dir) can be set
after construction; then either (A) change the dataclass API to make the raw
constructor value an InitVar (e.g., name_input) or mark name as init=False and
compute/assign the normalized name there, or (B) if keeping the current
signature, document in the class docstring that the constructor argument for
name is normalized (stripped) and may differ from the supplied value so callers
are not surprised. Ensure the comment references __post_init__,
object.__setattr__, and the fields name/tarball_path/extraction_dir.
- Line 95: Update the docstring for the function that returns a
DownloadedDataset so the article is correct: change the return description from
"An DownloadedDataset instance providing metadata for the downloaded logs." to
"A DownloadedDataset instance providing metadata for the downloaded logs." —
locate the docstring referencing DownloadedDataset (the function that returns a
DownloadedDataset) and fix the ":return:" line accordingly.
---
Outside diff comments:
In `@integration-tests/tests/conftest.py`:
- Around line 12-17: The pytest_plugins list in conftest.py is missing the
module that defines the hive_24hr and postgresql fixtures; either add
"tests.fixtures.downloaded_datasets" to the pytest_plugins array so pytest
registers those fixtures (pytest_plugins variable) or delete the unused fixture
definitions from tests/fixtures/downloaded_datasets.py (fixtures named hive_24hr
and postgresql) if they aren’t needed; update accordingly and run tests to
ensure no import/fixture-not-found errors remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c0ca1f1c-ca63-45a1-9411-e6b711ad1757
📒 Files selected for processing (9)
integration-tests/tests/conftest.pyintegration-tests/tests/data/README.mdintegration-tests/tests/fixtures/downloaded_datasets.pyintegration-tests/tests/fixtures/sample_datasets.pyintegration-tests/tests/package_tests/clp_json/test_clp_json.pyintegration-tests/tests/package_tests/clp_text/test_clp_text.pyintegration-tests/tests/test_identity_transformation.pyintegration-tests/tests/test_log_converter.pyintegration-tests/tests/utils/classes.py
Description
This PR migrates the testing suite with respect to the new designs implemented in #2181 (
IntegrationTestDataset) and #2106 (tests.utils.classes.IntegrationTestPathConfig). TheIntegrationTestLogsandtests.utils.config.IntegrationTestPathConfigobjects are now deprecated, and all related flows have been modified accordingly.Checklist
breaking change.
Validation performed
Ran
uv run pytest -m 'package'anduv run pytest -m 'core'; all tests pass.Summary by CodeRabbit
Tests
Chores