perf: parallel ISA-L compression and pwrite output pipeline#666
Closed
KimYannn wants to merge 1 commit into
Closed
perf: parallel ISA-L compression and pwrite output pipeline#666KimYannn wants to merge 1 commit into
KimYannn wants to merge 1 commit into
Conversation
abc71f6 to
9d57b42
Compare
377a19a to
7769cc7
Compare
- Replace zlib with ISA-L for gzip compression in worker threads, using SIMD-accelerated stateless deflate for parallel chunk compression - Add pwrite-based multi-threaded output bypassing single writer bottleneck - Implement adaptive flight batch manager with timeout and size controls - Add bounded mismatch counting in overlap analysis for early exit - Decouple FASTQ reader/writer and add zstd input/output support - Switch zst backend to seekable output with autotuned workers - Refine thread budget split and writer-path autotuning - Keep -w as worker thread count (backward-compatible with upstream), extra reader/writer threads are added automatically - Add CI workflow for Ubuntu and macOS with ISA-L/Highway/htslib builds - Fix Debian/Ubuntu multiarch lib paths in Makefile FIND_STATIC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7769cc7 to
0f29e8b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pwrite()using a lock-free offset ring buffer, eliminating the writer thread as a serialization point-staticlinking that caused ISA-L relocation overflow on ARM64scripts/bench_e2e.py): PE/SE/stdin-stdout modes, hardware profiling, dual-mode (baseline vs optimized) comparison with JSON merge capabilityKey Performance Improvements
Changed Files
peprocessor.cpp/h,seprocessor.cpp/h,writerthread.cpp/h,writer.cpp/hisal_compress.h,flight_batch_manager.cpp/h,trace_profiler.cpp/hMakefile,.github/workflows/ci.ymlscripts/bench_e2e.py(replacesbench_e2e.sh)docs/rfc/, 2 benchmark results indocs/bench/Test Plan
python scripts/bench_e2e.py ./fastp_baseline ./fastp_opt🤖 Generated with Claude Code