Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Add input_directories to upload-results [#808](https://github.com/BU-ISCIII/relecov-tools/pull/808)
- update ERROR handling in mail [#808](https://github.com/BU-ISCIII/relecov-tools/pull/808)
- Add labs to laboratory_adress.json [#813](https://github.com/BU-ISCIII/relecov-tools/pull/813)
- Add metadata-precheck module to validate SFTP metadata excels upfront [#816](https://github.com/BU-ISCIII/relecov-tools/pull/816)

#### Fixes

Expand Down
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ relecov-tools is a set of helper tools for the assembly of the different element
- [send-mail](#send-mail)
- [read-bioinfo-metadata](#read-bioinfo-metadata)
- [Configuration of module `read-bioinfo-metadata`](#configuration-of-module-read-bioinfo-metadata)
- [metadata-precheck](#metadata-precheck)
- [validate](#validate)
- [map](#map)
- [upload-to-ena](#upload-to-ena)
Expand Down Expand Up @@ -132,6 +133,7 @@ Commands:
upload-to-gisaid parsed data to create files to upload to gisaid
update-db upload the information included in json file to...
read-bioinfo-metadata Create the json compliant from the Bioinfo...
metadata-precheck Scan metadata excels in the SFTP before download.
metadata-homogeneizer Parse institution metadata lab to the one used...
pipeline-manager Create the symbolic links for the samples which...
build-schema Generates and updates JSON Schema files from...
Expand Down Expand Up @@ -271,6 +273,45 @@ Options:

##### Configuration of module `read-bioinfo-metadata`

#### metadata-precheck

Use this module at the very beginning of the SFTP workflow to inspect every metadata Excel uploaded by the labs before launching the heavy download/processing steps. It reads the files in place (without deleting them), verifies that the template columns are present, and performs a deep JSON-schema validation on each row so you immediately know which lab/sample must be corrected.

```
$ relecov-tools metadata-precheck --help
Usage: relecov-tools metadata-precheck [OPTIONS]

Inspect remote metadata Excels and report missing required data.

Options:
-u, --user TEXT User name for login to sftp server.
-p, --password TEXT Password for the user to login.
-f, --conf_file TEXT Configuration file (not params file). If omitted,
default values are taken from configuration.json and
extra_config.json.
-o, --output_dir TEXT Directory where logs and reports will be saved. Falls
back to the default logs path when not provided.
-t, --target_folders TEXT Target remote folders. Accepts a JSON-like list
(e.g. ["LAB001/batch1","LAB002"]). Use "ALL" to open an
interactive selector or leave empty to scan every folder.
--export-excel / --no-export-excel
Generate an Excel summary alongside the JSON report.
Disabled by default.
--help Show this message and exit.
```

**Highlights**

- Recursively scans the SFTP tree (skipping folders tagged as `_invalid_samples`) and only downloads metadata `.xlsx` files to a temporary location for validation.
- Checks Excel structure (headers, duplicated samples, missing IDs) and runs the same JSON-schema validation used by `validate`, including enum/type/anyOf rules.
- Outputs per-lab/per-file reports under the selected `output_dir`, including:
- `<timestamp>_metadata_precheck.log` + `*_log_summary.json`.
- `metadata_precheck_report_<batch>_<hex>.json` with the list of labs, files, number of samples per status, and detailed invalid samples/errors.
- Optional Excel summary when `--export-excel` is enabled.
- Prints a Rich table summarising, per lab, the number of valid/invalid files, samples, invalid samples, and the top validation issues so you can spot problems at a glance.

Run this module before `download`/`wrapper` to ensure the labs fix their metadata first; once the report is clean, `validate` should succeed on the first try.

The [`bioinfo_config.json`](relecov_tools/conf/bioinfo_config.json) file is a configuration file used by the `read-bioinfo-metadata` module. Its purpose is to specify **which files to search for** and **how to extract relevant information** from a folder containing bioinformatics results. With this configuration, the module identifies parameters and results for each sample and returns them in a standardized JSON format.

Structure:
Expand Down
65 changes: 65 additions & 0 deletions relecov_tools/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import relecov_tools.ena_upload
import relecov_tools.pipeline_manager
import relecov_tools.build_schema
import relecov_tools.metadata_precheck
import relecov_tools.wrapper
import relecov_tools.upload_results
import relecov_tools.base_module
Expand Down Expand Up @@ -343,6 +344,70 @@ def download(
sys.exit(1)


@relecov_tools_cli.command(help_priority=3)
@click.option("-u", "--user", help="User name for login to sftp server")
@click.option("-p", "--password", help="Password for the user to login")
@click.option(
"-f",
"--conf_file",
help="Configuration file (not params file)",
)
@click.option(
"-o",
"--output_dir",
"--output-dir",
"--output_folder",
"--out-folder",
"--output_location",
"--output_path",
"--out_dir",
"--output",
"output_dir",
type=click.Path(file_okay=False, resolve_path=True),
help="Directory where the generated output and logs will be saved",
)
@click.option(
"-t",
"--target_folders",
is_flag=False,
flag_value="ALL",
default=None,
help='Flag: Select which folders will be targeted giving [paths] or via prompt. For multiple folders use ["folder1", "folder2"]',
)
@click.option(
"--export-excel/--no-export-excel",
default=False,
help="Generate an Excel summary alongside the JSON report",
)
@click.pass_context
def metadata_precheck(
ctx,
user,
password,
conf_file,
output_dir,
target_folders,
export_excel,
):
"""Inspect remote metadata Excels and report missing required data."""
debug = ctx.obj.get("debug", False)
args_merged = merge_with_extra_config(
ctx=ctx,
add_extra_config=True,
)
try:
precheck = relecov_tools.metadata_precheck.MetadataPrecheck(**args_merged)
precheck.execute_process()
except Exception as e:
if debug:
log.exception(f"EXCEPTION FOUND: {e}")
raise
else:
log.exception(f"EXCEPTION FOUND: {e}")
stderr.print(f"EXCEPTION FOUND: {e}")
sys.exit(1)


# metadata
@relecov_tools_cli.command(help_priority=3)
@click.option(
Expand Down
Loading
Loading