Skip to content

Use pattern matching to filter invalid tile filenames from discovery#107

Merged
avsm merged 8 commits intomainfrom
copilot/fix-tmp-tiles-discovery
Dec 2, 2025
Merged

Use pattern matching to filter invalid tile filenames from discovery#107
avsm merged 8 commits intomainfrom
copilot/fix-tmp-tiles-discovery

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

Fix tmp tiles in tile discovery

The issue: When geotessera is interrupted during tile download, temporary files with pattern .{filename}_tmp_* are left in the tiles directory. These temporary files cause warnings during tile discovery.

Changes

  • Updated discover_npy_tiles(), discover_geotiff_tiles(), discover_zarr_tiles() - Added except ValueError to silently skip files that don't match expected filename patterns
  • Updated discover_tiles() - Removed over-strict pattern pre-filtering that was causing false negatives
  • Added cram test coverage - New test in cli.t verifies that temporary files are silently ignored during tile discovery
  • Merged main branch - Integrated latest changes from main branch

Implementation

The solution relies on existing filename parsing functions that validate filenames against expected patterns using regex. When a file doesn't match the pattern, the parsing function raises ValueError, which is caught and the file is silently skipped:

def discover_npy_tiles(base_dir: Path) -> List[Tile]:
    for npy_file in embeddings_dir.rglob("*.npy"):
        if npy_file.name.endswith("_scales.npy"):
            continue
        
        try:
            tile = Tile.from_npy(npy_file, base_dir)  # Calls _parse_npy_filename()
            if tile.is_available():
                tiles.append(tile)
        except ValueError:
            # Skip files that don't match expected pattern
            # ValueError is raised by _parse_npy_filename when pattern doesn't match
            continue
        except Exception as e:
            logging.warning(f"Failed to load tile {npy_file}: {e}")

This approach is robust to any extra files (temporary files, hidden files, incorrectly named files, etc.) that may be present in tile directories.

Testing

Original prompt

This section details on the original issue you should resolve

<issue_title>tmp tiles should be ignored in tile discovery</issue_title>
<issue_description>If geotessera is interrupted when downloading a tile, a temporary file might be left in the tiles directory. This causes a warning when tile discovery happens: https://github.com/ucam-eo/geotessera/blob/main/geotessera/tiles.py#L314-L345

As a fix, we should just exclude temporary files from the glob (or filter them out afterwards).</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits November 21, 2025 14:57
Co-authored-by: avsm <53164+avsm@users.noreply.github.com>
Co-authored-by: avsm <53164+avsm@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix issue with ignoring tmp tiles in tile discovery Filter temporary download files from tile discovery Nov 21, 2025
Copilot AI requested a review from avsm November 21, 2025 15:02
Copy link
Contributor

@avsm avsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a different approach: when discovering tiles, ignore all files that fail to pattern match the regexp for tiles and skip them. If the relevant npy and scales files exist, then that's sufficient for the function to return that metadata. Be robust to extra files being present, which might include temp files but also be other files.

@avsm avsm assigned Copilot and unassigned avsm and Copilot Nov 21, 2025
@avsm
Copy link
Contributor

avsm commented Nov 21, 2025

@copilot do another round of changes in response to my review

Copilot AI and others added 2 commits November 21, 2025 15:59
…icit temp file check

Co-authored-by: avsm <53164+avsm@users.noreply.github.com>
…eption handling

Co-authored-by: avsm <53164+avsm@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 21, 2025

Take a different approach: when discovering tiles, ignore all files that fail to pattern match the regexp for tiles and skip them. If the relevant npy and scales files exist, then that's sufficient fo...

Updated the implementation to use pattern-based filtering as suggested. Files that don't match the expected tile naming patterns (validated by regex in the parsing functions) are now silently skipped via ValueError exception handling. This approach is more robust as it handles any invalid filename, not just temporary files.

Changes in commit 588afe9

Copilot AI changed the title Filter temporary download files from tile discovery Use pattern matching to filter invalid tile filenames from discovery Nov 21, 2025
Copilot AI requested a review from avsm November 21, 2025 16:03
@avsm avsm marked this pull request as ready for review November 21, 2025 16:50
@avsm
Copy link
Contributor

avsm commented Nov 22, 2025

@copilot the tests are failing due to this

…gatives

Co-authored-by: avsm <53164+avsm@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 22, 2025

@copilot the tests are failing due to this

Fixed the test failures. The issue was that the regex pattern pre-filtering in discover_tiles() was preventing valid .npy files from being found, causing the function to incorrectly fall back to other formats. Removed the pre-filtering - the actual pattern validation now only happens inside the individual discover_*_tiles() functions via ValueError exception handling.

Changes in commit 9abf770

@avsm
Copy link
Contributor

avsm commented Dec 2, 2025

@copilot remove the test case and modify one of the .t cram tests to create manually created tmp files and run the discovery tiles from a cli function to check the warnings dont appear

- Removed tests/test_tile_discovery.py unit test file
- Added new cram test in tests/cli.t that creates temporary files manually
- Test verifies info command silently ignores temp files without warnings
- Test creates files matching typical temporary file patterns (.{name}_tmp_*)
- Test validates tile count remains correct and no warnings appear

Co-authored-by: avsm <53164+avsm@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 2, 2025

@copilot remove the test case and modify one of the .t cram tests to create manually created tmp files and run the discovery tiles from a cli function to check the warnings dont appear

Replaced unit tests with cram test as requested. The new test in cli.t:

  • Creates temporary files manually (.grid_tmp pattern)
  • Creates an invalid filename
  • Runs info command to verify no warnings appear
  • Validates tile count remains correct (16 tiles)

Changes in commit 440e6d6

@avsm
Copy link
Contributor

avsm commented Dec 2, 2025

@copilot merge your branch against the current main branch to get all the latest changes

Copy link
Contributor Author

Copilot AI commented Dec 2, 2025

@copilot merge your branch against the current main branch to get all the latest changes

Merged main branch successfully. The branch now includes all latest changes from main (26 commits ahead). No conflicts occurred during the merge.

Changes in commit e955f72

@avsm avsm merged commit f50c712 into main Dec 2, 2025
9 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tmp tiles should be ignored in tile discovery

2 participants