Skip to content

feat: check that all input FASTQs can be opened before counting#29

Open
werner291 wants to merge 2 commits into
fulcrumgenomics:mainfrom
werner291:wk/check-input-fastqs-readable
Open

feat: check that all input FASTQs can be opened before counting#29
werner291 wants to merge 2 commits into
fulcrumgenomics:mainfrom
werner291:wk/check-input-fastqs-readable

Conversation

@werner291

@werner291 werner291 commented May 6, 2026

Copy link
Copy Markdown

Closes #6.

Opens each input FASTQ up front and reports every unreadable path in one error, so a typo on the last file doesn't surface only after earlier files have been processed.

Count::execute also loads --library and optionally --essential-genes / --nonessential-genes / --control-guides before the heavy work. Happy to extend the pre-flight to those too if you'd prefer one consolidated error covering all inputs.

@werner291 werner291 requested a review from tfenne as a code owner May 6, 2026 08:31
@werner291 werner291 force-pushed the wk/check-input-fastqs-readable branch from 1817b7d to 609f664 Compare May 6, 2026 08:34
@werner291 werner291 marked this pull request as draft May 18, 2026 12:07
Open each input FASTQ for reading up front so a missing or unreadable
file is reported immediately rather than after earlier files have been
processed. All inputs are checked so the user sees every problem in one
error rather than fixing them one at a time.

Closes fulcrumgenomics#6
@werner291 werner291 force-pushed the wk/check-input-fastqs-readable branch from 609f664 to 508cfc3 Compare May 18, 2026 12:11
@werner291 werner291 marked this pull request as ready for review May 18, 2026 12:11

@nh13 nh13 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate you tackling this, unfortunately this will fail on named pipes and shell redirection, which will need to be tackled as part of this PR.

The pre-flight check called File::open on every input path. For
stream-backed paths (FIFOs, character devices like /dev/stdin, and
shell process substitution backed by /dev/fd/N pipes) opening twice
either blocks waiting for a writer or consumes bytes the main read
path never sees, so what was intended as a friendly early error
broke perfectly valid invocations like

  guide-counter count -i <(zcat reads.fq.gz) ...
  mkfifo /tmp/p && producer > /tmp/p & guide-counter count -i /tmp/p ...

Pre-flight regular files only. Classify with fs::metadata so the
symlink /dev/stdin resolves to whatever stdin actually points at;
non-regular inputs fall through to the main read path, which
opens them exactly once.

Adds tests covering a real mkfifo'd named pipe and /dev/stdin.
@werner291 werner291 force-pushed the wk/check-input-fastqs-readable branch from ad7b3d3 to d52c13e Compare May 29, 2026 12:03
@werner291

Copy link
Copy Markdown
Author

Good catch. The fixup narrows the pre-flight to regular files; non-regular paths (FIFOs, /dev/stdin, <(...) via /dev/fd/N) skip and the main read path handles them. fs::metadata follows symlinks, so /dev/stdin resolves by whatever stdin actually points at. Test uses a real mkfifo'd FIFO.

@werner291 werner291 requested a review from nh13 May 29, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check all input FASTQ files exist and are readable before starting counting

2 participants