A comprehensive collection of bioinformatics tools and scripts for sequence analysis, quality control, and data processing.
Requirements: Python 3.13+, BioPython 1.86+
-
fastx.py - FASTA/FASTQ file operations (sequence-level operations)
- Degenerate base counting
- Sequence validation and filtering
-
readsets.py - Read pair operations (R1/R2 aware)
- Complexity filtering
- Paired-end sequence analysis
-
sambams.py - SAM/BAM file manipulation
- CIGAR string filtering
-
fastx_utils.py - Utility functions
- Sequence complement/reverse complement (DNA, RNA, IUPAC)
- FASTA/FASTQ parsing helpers
- File collection utilities
-
macguffin_classes.py - Core classes
Primer- Genomic primer representation (chromosome, position, coordinates)RunCollection- Pipeline run organization (samples, read sets)RunSet- Paired-end read set management
-
configs.py - Configuration settings
-
tools/blastools.py - BLAST result manipulation
- Self-hit removal from BLAST reports
-
tools/ref_filler.py - Reference sequence expansion
- Fills alignment gaps by extracting variants from aligned sequences
- Generates subsequence FASTA files for missing regions
-
fasta/pairwise_align.py - Pairwise sequence alignment
- Aligns sequence pairs using Biopython
-
tools/ann_to_bed.py - Format conversion
- Converts UCSC RepMask annotation format to BED format
-
clstr_splitter.py - CD-HIT cluster parser
- Splits CD-HIT cluster output into individual cluster files
-
dircutadapt.py - CutAdapt wrapper
- Batch adapter trimming for FASTQ files in a directory
-
subsample.py - Subsampling tool
- Generates subsampled FASTQ files using SeqTK
- Optional bulk processing from TSV file
-
fastq/extract_index_files.py - Index extraction
- Extracts index sequences from paired-end FASTQ (generates I1/I2 from R1/R2)
- Supports gzipped input
-
download/fasta_subseq_dl.py - Sequence download
- Downloads FASTA subsequences from NCBI using Accession IDs and coordinates
- assembly.py - De Bruijn-like sequence assembly
- Graph-based read assembly using greedy overlap matching
- Implements Depth-First Search path finding
The archive/ directory contains deprecated/retired tools:
fadiff.py- FASTA difference (superseded by standard tools)fauniq.py- FASTA unique (superseded by standard tools)fqfilter.py- FASTQ filteringcigar_filter.py- CIGAR filtering (see sambams.py)map_accessiontaxid.py- Accession to TaxID mappingtaxid_annotate.py- TaxID annotationsnp_primer_validate.py- SNP primer validationread_pair_merger.py- Paired-end mergingtabtodb.py- TAB to database conversionwindowshopper.py- Window-based sequence analysisrpm_prep- RPM packaging preparation