Conversation
Implement DVC batch download in test_set_runner and dvc_handler. Add tests for DvcHandler and ComparisonRunner.
Add dependency support with version-suffix disambiguation for shared dependencies referenced with different timestamps across XMLs. Add dry-run mode with MinIO presence checks for cases, references, dependencies, and doc folders. Add testcase verification and error handling with log rotation. Replace tuple types with frozen dataclasses (DependencyKey, AwsCredentials).
Add DVC access key and secret parameters to TeamCity settings. Remove duplicate nearfield test case entry from testbench table.
…cal paths handling
…ve logging for dependency handling
There was a problem hiding this comment.
Pull request overview
This PR updates the Deltares testbench’s DVC integration and the MinIO→DVC migration tooling, focusing on more efficient DVC downloads (batch fetch/checkout), improved dependency handling, and better operational robustness (logging, timeouts, and CI parameterization).
Changes:
- Added batch DVC download support (including temporary AWS credential injection via env vars) and updated the test runner to prefetch all DVC targets up front.
- Enhanced the MinIO→DVC migration tool to detect missing MinIO data, migrate dependency references, and improve logging/CLI ergonomics (e.g.,
--dry-run, log rotation). - Updated CI/configuration and tests to support the new DVC behavior and expanded path pattern handling.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/deltares_testbench/tools/minio_dvc_migration/xml_file_with_testcase_data.py | Migrates XMLs to DVC conventions; adds dependency rewrite + MinIO presence checks + dependency version mapping helpers. |
| test/deltares_testbench/tools/minio_dvc_migration/testcase_data.py | Adds dependency fields/logic, MinIO presence checks, and DVC tracking helpers for dependencies. |
| test/deltares_testbench/tools/minio_dvc_migration/s3_client.py | Refactors AWS credentials loading to a dataclass. |
| test/deltares_testbench/tools/minio_dvc_migration/migrate_xmls.py | Adds --dry-run, log rotation, and refactors migration workflow into run_migration(). |
| test/deltares_testbench/test/utils/handlers/test_dvc_handler.py | New unit tests for DvcHandler batch download + env restoration + root detection. |
| test/deltares_testbench/test/tools/minio_dvc_migration/test_xml_file_with_testcase_data.py | Adds tests for dependency migration, dependency deduping, and version suffix mapping. |
| test/deltares_testbench/test/tools/minio_dvc_migration/test_testcase_data.py | Adds tests for dependency handling and MinIO presence helpers; refactors helper return type. |
| test/deltares_testbench/test/tools/minio_dvc_migration/test_s3_url_info.py | Extends S3→local path mapping test coverage for deeper case paths. |
| test/deltares_testbench/test/tools/minio_dvc_migration/test_migrate_xmls.py | Adds tests for TeeStream and _rotate_log_file(). |
| test/deltares_testbench/test/suite/test_ComparisonRunner.py | Updates expectations to validate batch DVC preparation instead of per-case DVC download logging. |
| test/deltares_testbench/test/helpers/xml_config_helper.py | Fixes dependency XML helper to write cases_path (not version) into <dependency> element text. |
| test/deltares_testbench/src/utils/logging/file_logger.py | Makes stderr writes robust for non-str message inputs. |
| test/deltares_testbench/src/utils/handlers/handler_factory.py | Avoids creating empty destination directories for DVC downloads. |
| test/deltares_testbench/src/utils/handlers/dvc_handler.py | Introduces AWS env credential context manager, NoSCM repo init, and batch fetch/checkout API. |
| test/deltares_testbench/src/suite/test_set_runner.py | Adds DVC batch preparation, includes dependencies, and adjusts dependency download logic. |
| test/deltares_testbench/.gitignore | Attempts to allow tracking .dvc files under data/. |
| ci/teamcity/Delft3D/vars/dimr_testbench_table.csv | Removes a TeamCity test row entry. |
| ci/teamcity/Delft3D/settings.kts | Adds TeamCity parameters for DVC testbench credentials. |
| .dvc/config | Increases S3 read timeout for the DVC remote. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __collect_dependency_dvc_files(self, dvc_configs: list[TestCaseConfig]) -> list[str]: | ||
| """Collect .dvc file paths for DVC dependencies so they are included in the batch download.""" | ||
| seen: set[str] = set() | ||
| dvc_files: list[str] = [] | ||
| for config in dvc_configs: | ||
| if not config.dependency or config.dependency.version != "DVC": | ||
| continue | ||
| location = next((loc for loc in config.locations if loc.type == PathType.INPUT), None) | ||
| if location is None: | ||
| continue | ||
| dep_dvc_path = os.path.abspath( | ||
| Paths().rebuildToLocalPath( | ||
| Paths().mergeFullPath(location.root, location.from_path, config.dependency.cases_path, "input.dvc") | ||
| ) | ||
| ) | ||
| if dep_dvc_path not in seen and os.path.isfile(dep_dvc_path): | ||
| seen.add(dep_dvc_path) | ||
| dvc_files.append(dep_dvc_path) |
There was a problem hiding this comment.
The DVC dependency batch-include path assumes every dependency is tracked as <cases_path>/input.dvc. However the migration tool tracks dependencies by running repo.add() on the dependency directory itself, which produces <cases_path>.dvc (next to the directory), not input.dvc inside it. This mismatch will cause DVC dependencies to be skipped from the batch download and later copy steps to fail. Build the dependency .dvc path consistent with how dependencies are added (likely ${location.root}/${config.dependency.cases_path}.dvc, without appending input.dvc).
| if dependency_version == "DVC": | ||
| # DVC dependencies are already checked out at {location.root}/{cases_path}/input/. | ||
| # Place the dependency as a sibling of the test case's input directory, | ||
| # so that relative paths like ../local_dir resolve correctly from input_work. | ||
| local_path = Paths().rebuildToLocalPath( | ||
| Paths().mergeFullPath(location.root, config.path.prefix, config.dependency.local_dir) | ||
| ) | ||
| if os.path.isdir(local_path) and any(os.scandir(local_path)): | ||
| logger.info("Dependency directory already exists: Skipping download") | ||
| return | ||
|
|
||
| source_input_path = Path( | ||
| Paths().rebuildToLocalPath( | ||
| Paths().mergeFullPath(location.root, location.from_path, config.dependency.cases_path, "input") | ||
| ) | ||
| ) | ||
| if source_input_path.is_dir(): | ||
| logger.info(f"Copying DVC dependency from {source_input_path} to {local_path}") | ||
| shutil.copytree(str(source_input_path), local_path, dirs_exist_ok=True) | ||
| else: | ||
| logger.error( | ||
| f"DVC dependency source not found at {source_input_path}. " | ||
| "Ensure the dependency's .dvc files have been checked out." | ||
| ) | ||
| return |
There was a problem hiding this comment.
In the dependency_version == "DVC" branch, the code copies from <cases_path>/input, but dependencies are treated elsewhere as plain directories (downloaded/tracked at <cases_path>). Copying from .../input will fail if the dependency folder doesn't have an input/ subfolder (and it also depends on location.from_path, which isn't used for DVC case paths). Align the source path with how dependencies are added to DVC (copy from the checked-out dependency directory itself, and compute it without appending /input).
| objects = list(rewinder.list_objects(prefix, timestamp)) | ||
| return "found" if objects else "missing" |
There was a problem hiding this comment.
_check_s3_prefix() materializes the entire object listing with list(...) just to test existence. For large prefixes this can be very slow and memory-heavy. Prefer a short-circuit approach (e.g., next(rewinder.list_objects(...), None) or an iterator any(...)) so you only fetch the first key needed to decide found/missing.
| objects = list(rewinder.list_objects(prefix, timestamp)) | |
| return "found" if objects else "missing" | |
| first_object = next(rewinder.list_objects(prefix, timestamp), None) | |
| return "found" if first_object is not None else "missing" |
| """Update dependency elements: set version to 'DVC' and rename path to DVC convention. | ||
|
|
||
| The dependency folder name is renamed by prepending 'c00_' if it doesn't start with 'c'. | ||
| A version suffix (e.g. '_v1') is appended when the dependency_version_map contains a mapping | ||
| for the (s3_path, version) pair, which happens when a dependency path has multiple versions. |
There was a problem hiding this comment.
The docstring claims the dependency folder is renamed by prepending 'c00_' when it doesn’t start with 'c', but the implementation calls rename_dependency_path_for_dvc() which currently only appends a version suffix and does not do any 'c00_' prefixing. Please update the docstring to match the actual behavior, or implement the documented rename so downstream code/tests aren’t misled about the resulting folder structure.
| """Update dependency elements: set version to 'DVC' and rename path to DVC convention. | |
| The dependency folder name is renamed by prepending 'c00_' if it doesn't start with 'c'. | |
| A version suffix (e.g. '_v1') is appended when the dependency_version_map contains a mapping | |
| for the (s3_path, version) pair, which happens when a dependency path has multiple versions. | |
| """Update dependency elements: set version to 'DVC' and rewrite the dependency path. | |
| The dependency path is normalized by `rename_dependency_path_for_dvc`. | |
| A version suffix (for example, '_v1') is appended when `dependency_version_map` | |
| contains a mapping for the `(s3_path, version)` pair, which happens when a | |
| dependency path has multiple versions. |
| def build_dependency_version_map( | ||
| xmls: list[XmlFileWithTestCaseData], | ||
| ) -> dict[DependencyKey, str]: | ||
| """Build a mapping from DependencyKey to a version suffix. | ||
|
|
||
| When a dependency S3 path is referenced with multiple distinct versions across | ||
| the provided XMLs, each version gets a suffix like '_v1', '_v2', etc. sorted | ||
| chronologically. Paths that have only one version across all XMLs are omitted | ||
| from the map (they don't need a suffix). | ||
| """ | ||
| path_versions: dict[str, set[str]] = defaultdict(set) | ||
| for xml in xmls: | ||
| for tc in xml.testcases: | ||
| if tc.has_dependency and tc.dependency_version: | ||
| path_versions[tc.dependency_s3_path].add(tc.dependency_version) | ||
|
|
||
| version_map: dict[DependencyKey, str] = {} | ||
| for s3_path, versions in path_versions.items(): | ||
| if len(versions) > 1: | ||
| for i, version in enumerate(sorted(versions), start=1): | ||
| version_map[DependencyKey(s3_path, version)] = f"_v{i}" |
There was a problem hiding this comment.
build_dependency_version_map() treats any non-empty dependency_version as a sortable version string, and uses sorted(versions) while the docstring says versions are sorted chronologically. If versions can include non-timestamps (e.g. "NO VERSION") or multiple timestamp formats, lexicographic sorting and including non-parseable values can produce unstable or incorrect suffix assignments. Consider filtering to versions where rewind_timestep_2_datetime() succeeds and sorting by the parsed datetime rather than raw strings.
| @@ -1,5 +1,6 @@ | |||
| /logs/** | |||
| /data/** | |||
There was a problem hiding this comment.
The negation !/data/**/*.dvc likely won’t take effect because /data/** also ignores the intermediate directories under /data, and Git won’t descend into ignored directories to see re-included files. To reliably track .dvc files under data/, you typically also need to unignore the directories (e.g. !/data/**/) or narrow the original ignore pattern so the directories remain visible.
| /data/** | |
| /data/** | |
| !/data/**/ |
| def run_migration(xmls_to_migrate: list[XmlFileWithTestCaseData], dry_run: bool = False) -> None: | ||
| """Execute the migration workflow. | ||
|
|
||
| When dry_run is True, only checks MinIO for file existence without | ||
| downloading, adding to DVC, pushing, or rewriting XMLs. | ||
| """ | ||
| rewinder = setup_minio_rewinder(BASE_URL) | ||
|
|
||
| # Always validate DVC repo and remote configuration | ||
| repo_root = find_dvc_root_in_parent_directories(Path(__file__).resolve()) | ||
| print(f"Found DVC repo at: {repo_root}") | ||
|
|
||
| config_path = repo_root / ".dvc" / "config" | ||
| if not config_path.exists(): | ||
| raise FileNotFoundError(f"DVC config not found: {config_path}") | ||
| config_text = config_path.read_text() | ||
| if 'remote "storage"' not in config_text: | ||
| raise ValueError("DVC remote 'storage' is not configured in .dvc/config") | ||
| print("DVC remote 'storage' is configured") | ||
|
|
||
| repo = Repo(str(repo_root)) | ||
| print("DVC repository initialized successfully") | ||
|
|
||
| total_found = 0 | ||
| total_missing = 0 | ||
| total_errors = 0 | ||
| total_doc_moves = 0 | ||
| total_skipped = 0 | ||
| failed_xmls: list[str] = [] | ||
|
|
||
| for i, xml_file in enumerate(xmls_to_migrate, start=1): | ||
| print(f"\n[{i}/{len(xmls_to_migrate)}] {xml_file.xml_file.name} ({len(xml_file.testcases)} testcases)") | ||
|
|
||
| if dry_run: | ||
| counts = xml_file.check_minio_presence(rewinder) | ||
| total_found += counts["found"] | ||
| total_missing += counts["missing"] | ||
| total_errors += counts["errors"] | ||
| total_doc_moves += counts["doc_moves"] | ||
| else: | ||
| try: | ||
| skipped = xml_file.verify_and_filter_testcases(rewinder) | ||
| total_skipped += len(skipped) | ||
| if not xml_file.testcases: | ||
| print(f" WARNING: All testcases missing on MinIO, skipping XML entirely") | ||
| continue | ||
|
|
||
| xml_file.download_from_minio_in_new_folder_structure(rewinder=rewinder) | ||
| xml_file.move_testcases_doc_folder_to_parent() | ||
| dvc_files = [] | ||
| dvc_files.extend(xml_file.add_to_dvc(repo=repo)) | ||
| push_dvc_files_to_remote(repo, dvc_files) | ||
| xml_file.migrate_xml_to_dvc() |
There was a problem hiding this comment.
run_migration() never builds/applies the dependency version suffix mapping, even though XmlFileWithTestCaseData supports dependency_version_map and helpers exist. Without applying this map before downloading/adding to DVC/migrating XML, two XMLs referencing the same dependency path with different versions can be downloaded into the same local directory and then tracked under the same .dvc, causing data corruption/overwrites. Build the map once for xmls_to_migrate (e.g., via build_dependency_version_map + apply_dependency_version_map) before the per-XML loop so both filesystem paths and rewritten XML dependency paths get the correct suffixes.
| repo = Repo(str(repo_root)) | ||
| print("DVC repository initialized successfully") |
There was a problem hiding this comment.
Repo(str(repo_root)) will use the default SCM integration (typically Git). This script is intended to run in environments where SCM may be absent/unneeded (and the PR description mentions NoSCM()), so initialization can fail unnecessarily. Consider constructing the repo with scm=NoSCM() (and importing it) to match DvcHandler and avoid requiring a Git checkout.
What was done
This pull request introduces significant improvements to the handling of DVC-based test cases in the Deltares testbench system. The main enhancement is the implementation of a batch download process for DVC test case data, which improves efficiency and reliability. Additionally, the code now better manages dependencies and credentials for DVC operations, and includes several refactorings for clarity and maintainability.
.dvcfiles for test cases using DVC, including dependenciesNoSCM()for cases where source control management is not required.dvc/configto increase the S3 read timeout, reducing the risk of timeouts during large downloads. (.dvc/config).gitignoresettings to support new DVC credentials and ensure.dvcfiles are not ignored.Evidence of the work done
<add video/figures if applicable>
Tests
<add testcase numbers if applicable, Issue number>
Documentation
<add description of changes if applicable, Issue number>
Issue link