Post-harmonization QC reporting tool

Build automated tooling to produce QC reports comparing harmonized output against reference data, based on the grading methodology developed in #196.

### Inputs
- Harmonized output TSVs (from pipeline)
- Reference data summary statistics (e.g., TOPMed DCC aggregates or the raw data)
- Variable mapping configuration (transform-specs which harmonized variables to compare)

### Outputs
- **Machine-readable scores** (JSON or YAML) — per-variable grades, sample counts, summary statistics, overall cohort grade summary
- **Visual HTML report** — human-readable summary with per-variable detail, distribution comparisons, and methodological notes

### Scoring
Uses the grading rubric defined in #196 (A+ through D for both continuous and categorical variables).

### Approach
- Start with one cohort (COPDGene — structurally simplest, established baseline)
- Reports should work as pipeline artifacts (generated alongside harmonized data)
- Summary statistics only — no participant-level data leaves the enclave

### Context
Chris Siege developed and validated this comparison methodology manually across 9 cohorts. This issue is about automating it as reusable tooling. See #196 for the scoring rubric and framework details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-harmonization QC reporting tool #303

Inputs

Outputs

Scoring

Approach

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Post-harmonization QC reporting tool #303

Description

Inputs

Outputs

Scoring

Approach

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions