Fix download logic#230
Conversation
… is handled directly by Anemoi
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2c868b7 to
bdfb08b
Compare
There was a problem hiding this comment.
Pull request overview
This PR simplifies the dataset download logic by removing custom file-locking and progress tracking in favor of delegating to the anemoi-datasets library's built-in chunked loading capabilities. The changes address issue #229 which reported errors when using load_in_parts for downloading datasets.
Changes:
- Simplified download logic to use anemoi-datasets' built-in chunk handling and progress tracking
- Removed custom file-locking implementation and manual part tracking
- Removed CLI commands
init,load, andload_in_partsin favor of a unifiedcreatecommand - Upgraded anemoi-datasets dependency to >=0.5.27 to get necessary fixes
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| icenet_mp/data_processors/data_downloader.py | Replaced manual load_in_parts logic with simplified create/download flow using anemoi-datasets' built-in capabilities; removed file-locking and manual progress tracking |
| icenet_mp/cli/datasets.py | Removed init, load, and load_in_parts CLI commands; simplified inspect command |
| icenet_mp/types/simple_datatypes.py | Removed AnemoiCreateArgs dataclass; reordered remaining types alphabetically |
| icenet_mp/types/init.py | Removed AnemoiCreateArgs export |
| tests/conftest.py | Added MockAnemoiCreateArgs class to replace removed AnemoiCreateArgs in test fixtures |
| tests/test_cli.py | Updated CLI help test expectations to remove defunct commands |
| tests/data_processors/test_data_downloader.py | Removed all tests for the old load_in_parts implementation (435+ lines removed) |
| pyproject.toml | Removed filelock dependency; upgraded anemoi-datasets to >=0.5.27 |
| uv.lock | Reflected dependency changes from pyproject.toml |
Comments suppressed due to low confidence (1)
tests/data_processors/test_data_downloader.py:67
- All tests for the download functionality have been removed, leaving the new create(), download(), initialise(), status(), and load_in_chunks() methods completely untested. The fixtures downloader_with_file_dataset and downloader_with_directory_dataset are defined but unused. Consider adding tests for the new download logic to verify correct behavior in various scenarios: successful download, resuming partial downloads, handling corrupted datasets, overwrite functionality, and the status checking logic.
from pathlib import Path
import pytest
from omegaconf import DictConfig, OmegaConf
from icenet_mp.data_processors.data_downloader import DataDownloader
from icenet_mp.data_processors.preprocessors.ipreprocessor import IPreprocessor
class DummyPreprocessor(IPreprocessor):
"""A dummy preprocessor for testing."""
def download(self, preprocessor_path: Path) -> None: # pragma: no cover - unused
preprocessor_path.mkdir(parents=True, exist_ok=True)
def _build_downloader(tmp_path: Path, dataset_cfg: dict) -> DataDownloader:
"""Helper to create a DataDownloader with a dummy preprocessor."""
full_cfg: DictConfig = OmegaConf.create(
{
"base_path": str(tmp_path),
"data": {
"datasets": {
"test": {
"name": "test",
"preprocessor": {"type": "dummy"},
**dataset_cfg,
}
},
},
}
)
return DataDownloader("test", full_cfg, DummyPreprocessor)
@pytest.fixture
def downloader_with_file_dataset(tmp_path: Path) -> DataDownloader:
"""Fixture that creates a downloader with a file-based dataset path."""
downloader = _build_downloader(
tmp_path,
{
"start": "2020-01-01",
"end": "2020-01-31",
"group_by": "monthly",
},
)
downloader.path_dataset = tmp_path / "test.zarr"
# Create the file to ensure it exists
downloader.path_dataset.touch()
return downloader
@pytest.fixture
def downloader_with_directory_dataset(tmp_path: Path) -> DataDownloader:
"""Fixture that creates a downloader with a directory-based dataset path."""
downloader = _build_downloader(
tmp_path,
{
"start": "2020-01-01",
"end": "2020-01-31",
"group_by": "monthly",
},
)
downloader.path_dataset = tmp_path / "test_dir.zarr"
downloader.path_dataset.mkdir(parents=True, exist_ok=True)
return downloader
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@IFenton : are you happy with this after our coworking yesterday? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
tests/data_processors/test_data_downloader.py:67
tests/data_processors/test_data_downloader.pyno longer contains any test cases—only fixtures/helpers—and nothing else intests/references these fixtures. Consider deleting this file entirely, or add focused tests for the newDataDownloaderflow (e.g.,create()behavior aroundstatus()/finalise()), so changes to download logic remain covered.
from pathlib import Path
import pytest
from omegaconf import DictConfig, OmegaConf
from icenet_mp.data_processors.data_downloader import DataDownloader
from icenet_mp.data_processors.preprocessors.ipreprocessor import IPreprocessor
class DummyPreprocessor(IPreprocessor):
"""A dummy preprocessor for testing."""
def download(self, preprocessor_path: Path) -> None: # pragma: no cover - unused
preprocessor_path.mkdir(parents=True, exist_ok=True)
def _build_downloader(tmp_path: Path, dataset_cfg: dict) -> DataDownloader:
"""Helper to create a DataDownloader with a dummy preprocessor."""
full_cfg: DictConfig = OmegaConf.create(
{
"base_path": str(tmp_path),
"data": {
"datasets": {
"test": {
"name": "test",
"preprocessor": {"type": "dummy"},
**dataset_cfg,
}
},
},
}
)
return DataDownloader("test", full_cfg, DummyPreprocessor)
@pytest.fixture
def downloader_with_file_dataset(tmp_path: Path) -> DataDownloader:
"""Fixture that creates a downloader with a file-based dataset path."""
downloader = _build_downloader(
tmp_path,
{
"start": "2020-01-01",
"end": "2020-01-31",
"group_by": "monthly",
},
)
downloader.path_dataset = tmp_path / "test.zarr"
# Create the file to ensure it exists
downloader.path_dataset.touch()
return downloader
@pytest.fixture
def downloader_with_directory_dataset(tmp_path: Path) -> DataDownloader:
"""Fixture that creates a downloader with a directory-based dataset path."""
downloader = _build_downloader(
tmp_path,
{
"start": "2020-01-01",
"end": "2020-01-31",
"group_by": "monthly",
},
)
downloader.path_dataset = tmp_path / "test_dir.zarr"
downloader.path_dataset.mkdir(parents=True, exist_ok=True)
return downloader
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
IFenton
left a comment
There was a problem hiding this comment.
That all seems to work. Happy to merge, though it would be good to update the README
✏️ Update the README for the changes made in #230
Updated create logic:
overwritethen delete existing dataset and skip to (4)Updated download logic:
Removed:
Closes #229