feat(gl): native --gl loader (replaces gl_to_locator.py) by stsmall · Pull Request #47 · kr-colab/ReLocator

stsmall · 2026-05-12T18:56:15Z

Summary

Follow-up to the microsat native loader (#46) per your note that GL should get the same treatment. loc.load_genotypes(gl=..., bam_list=..., gl_mode=...) and locator --gl ... --bam_list ... --gl_mode {dosage,full_gl} now load beagle GL data directly. scripts/gl_to_locator.py is gone; parsing helpers are in locator/_gl.py. Both dosage and full_gl modes from the script are preserved end-to-end; downstream filtering behavior is unchanged (full_gl flows through the same filter_dosage_matrix path the script's TSV used via --matrix).

Imputation lives in the loader (continuous-dosage path), consistent with the microsat PR's resolution and what the script does today.

Branched off main after #45 merged; independent of #46 (no rebase dependency).

Changes

locator/_gl.py — module-level parsing helpers (lifted from the removed converter script).
locator/loaders.py — _load_from_gl + gl=/bam_list=/gl_mode= plumbing in load_genotypes, dispatched via is_dosage_matrix → filter_dosage_matrix.
locator/cli.py — --gl, --bam_list, --gl_mode {dosage,full_gl} flags.
tests/test_gl_helpers.py, tests/test_gl_input.py, tests/test_gl_cli.py — helper unit tests, loader-level tests, and an end-to-end CLI subprocess test (replaces the old test_input_extensions.py).
scripts/gl_to_locator.py — deleted.
docs/genotype_likelihoods.md — user-facing guide for the native loader.

Filter thresholds (min_maf=0.01, max_missing_frac=0.10, gl_missing_threshold=0.4) mirror the original script defaults and are hard-coded in _load_from_gl. Can be surfaced as CLI flags in a follow-up if desired.

Test plan

pixi run pytest tests/test_gl_helpers.py tests/test_gl_input.py tests/test_gl_cli.py -v passes.
Full suite green on this branch (247 tests).
pixi run ruff check + pixi run ruff format --check clean.
Reviewer: spot-check that full_gl mode still produces a (3 * n_sites, n_samples) matrix matching what gl_to_locator.py --gl_mode full_gl + --matrix used to produce.

Lifted byte-identically from scripts/gl_to_locator.py. Next commits wire them into DataLoaderMixin and the CLI; the script is removed last. Mirrors the locator._microsat structure landed on microsats-sculpin. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

loc.load_genotypes(gl=..., bam_list=..., gl_mode=...) returns the same (n_sites, n_samples) float dosage representation that the continuous-dosage path produces (for dosage mode) or (3*n_sites, n_samples) for full_gl mode. Both flow through the existing is_dosage_matrix dispatch into filter_dosage_matrix — no downstream changes. Missing samples are imputed to per-site mean dosage (dosage mode) or per-site mean GL triplet (full_gl mode), matching the script's behavior. tests/test_input_extensions.py is renamed to tests/test_gl_input.py and rewritten to exercise the loader end-to-end via Locator(...). The CLI wiring follows in the next commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Threads args into loc.load_genotypes alongside vcf/zarr/matrix. --gl requires --bam_list (enforced in the loader's dispatch elif). No intermediate TSV; no preprocessing script.

The native loader (loc.load_genotypes(gl=..., bam_list=..., gl_mode=...) and locator --gl --bam_list --gl_mode {dosage,full_gl}) fully supersedes this. Parsing helpers live in locator._gl; both dosage and full_gl modes are preserved end-to-end. Also drop the gl_to_locator.py references from _load_from_matrix docstring and ValueError message, _load_from_gl docstring, cli.py --matrix help text, filters.py NaN ValueError message, and test_gl_input.py module docstring — the script is no longer the recommended GL preprocessing path.

User-facing guide for the native flag, --bam_list pairing, --gl_mode {dosage,full_gl}, and the hard-coded filter thresholds. The parent CLAUDE.md project note is updated separately (it's outside the ReLocator repo).

codecov · 2026-05-12T19:10:34Z

Codecov Report

❌ Patch coverage is 94.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.46%. Comparing base (be59c4d) to head (cc79a0c).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
locator/_gl.py	95.71%	3 Missing ⚠️
locator/cli.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   58.49%   59.46%   +0.97%     
==========================================
  Files          27       28       +1     
  Lines        3518     3617      +99     
==========================================
+ Hits         2058     2151      +93     
- Misses       1460     1466       +6

Flag	Coverage Δ
unittests	`59.46% <94.00%> (+0.97%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stsmall and others added 5 commits May 12, 2026 11:56

feat(gl): add --gl, --bam_list, --gl_mode CLI flags

4d11a61

Threads args into loc.load_genotypes alongside vcf/zarr/matrix. --gl requires --bam_list (enforced in the loader's dispatch elif). No intermediate TSV; no preprocessing script.

docs(gl): document native --gl loader

cc79a0c

User-facing guide for the native flag, --bam_list pairing, --gl_mode {dosage,full_gl}, and the hard-coded filter thresholds. The parent CLAUDE.md project note is updated separately (it's outside the ReLocator repo).

stsmall force-pushed the gl-native-loader branch from f47f6a1 to cc79a0c Compare May 12, 2026 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gl): native --gl loader (replaces gl_to_locator.py)#47

feat(gl): native --gl loader (replaces gl_to_locator.py)#47
stsmall wants to merge 5 commits into
kr-colab:mainfrom
stsmall:gl-native-loader

stsmall commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stsmall commented May 12, 2026

Summary

Changes

Test plan

Uh oh!

codecov Bot commented May 12, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant