Skip to content

Add metadata-precheck module to validate SFTP metadata excels upfront#816

Draft
Aberdur wants to merge 8 commits intoBU-ISCIII:developfrom
Aberdur:feature/metadata-precheck-module
Draft

Add metadata-precheck module to validate SFTP metadata excels upfront#816
Aberdur wants to merge 8 commits intoBU-ISCIII:developfrom
Aberdur:feature/metadata-precheck-module

Conversation

@Aberdur
Copy link
Contributor

@Aberdur Aberdur commented Nov 12, 2025

PR Description

  • Introduce a new CLI command metadata-precheck that scans remote SFTP folders (skipping _invalid_samples) and validates every metadata Excel before running download/wrapper
  • Add SchemaMapper helper to normalise headers, map rows to schema properties, and reconcile enum values without relying on ontologies in the spreadsheet
  • For each row, run full JSON Schema validation (same rules as validate) so structural/enum/type errors are detected early
  • Produce per-lab/per-file reports showing valid/invalid files, invalid sample counts, detailed messages, JSON summary and optional Excel export
  • Print a Rich table with the top issues per lab to highlight common problems immediately
  • Document the new command in the README (usage, options, rationale) so users know to run it before processing batches

Example

image

PR Checklist

  • This comment contains a description of changes (with reason).
  • Make sure your code lints (black and flake8).
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@Aberdur Aberdur self-assigned this Nov 12, 2025
Copy link
Member

@saramonzon saramonzon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are left, use same docstring format than other modules

SchemaMapper can be used in other modules? Should it be put in utils?

class SchemaMapper:
"""Keep all header normalisation + casting logic encapsulated for reuse and readability."""

SAMPLE_FALLBACKS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the code should be no labels, only properties

@Aberdur Aberdur marked this pull request as draft November 13, 2025 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants