Skip to content

Write Log.out and Log.progress.out with real STAR-equivalent content #55

@pinin4fjords

Description

@pinin4fjords

Summary

rustar doesn't write Log.out (verbose run log) or Log.progress.out (per-chunk progress timestamps) — STAR writes both alongside Log.final.out. Consumers that parse these files for parameter dumps, per-phase progress, warnings, memory usage, or chunk-level mapping rates get nothing.

The goal is real content parity — not stubs that mimic STAR's section-header structure with placeholder content. Files that look like STAR's verbose log but carry only a {:#?} Debug dump of params and three timestamps mislead consumers worse than the files being absent (they pass file-existence checks but fail every actual parse).

STAR reference behaviour

  • Log.out is the verbose run log, written incrementally by source/InOutStreams.cpp plus parameter-dump and per-phase update calls scattered across source/Parameters.cpp, source/Aligner.cpp, and source/sjdbInsertJunctions.cpp. Content:
    • Full parameter dump with every default value (STAR's parameter format, one name<TAB>value line per parameter).
    • Per-phase progress messages (..... loading genome, ..... started mapping, ..... finished mapping, etc.) with timestamps.
    • Warnings (WARNING --X ...) and informational notes emitted during run.
    • Final timing and memory usage info.
  • Log.progress.out is updated periodically (roughly every minute) during alignment, one line per chunk reporting reads processed and mapping speed.

Reproducer

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-logout && cd /tmp/rustar-mre-logout

BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL  $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz

RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4

mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
    --genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
    --genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7

COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
        --runThreadN 4 --sjdbGTFfile genes.gtf --twopassMode Basic --runRNGseed 0
        --outSAMtype BAM Unsorted)

docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
    --genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix RUS.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
    --genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix STAR.

echo "=== STAR Log* files ==="; ls STAR.Log*
echo "=== rustar Log* files ==="; ls RUS./Log*

Observed: STAR writes STAR.Log.final.out, STAR.Log.out, STAR.Log.progress.out. rustar writes only RUS./Log.final.out.

Suggested approach

This is structural — Log.out needs progress hooks during the long-running phases (genome load, suffix-array build, per-chunk alignment) so events can be written as they happen, not at the end. Log.progress.out needs a periodic writer separate from the main alignment loop. Both need STAR-format parameter dumps and warning emission paths.

Not a one-PR drive-by; deferred until someone commits to the content fidelity. Stubs are not the goal — see the rejected approach in the conversation on PR #44.

Severity

Low. Today nf-core/rnaseq works around this with optional: true outputs. Affects provenance / QC tooling that parses STAR's verbose log.


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). Split out from #28.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions