Caution
GenoType is coming together but is not ready for the bigtime yet. In
particular, its BAM parser and compression handling are under-tested and may
come with correctness issues. However, FASTQ and FASTA parsing along with the
sequence operations SeqOps API is comparatively more stable and working in
internal tests. That said, breaking changes should be expected--this is
pre-alpha software.
GenoType's goal is to fill a gap in the TypeScript ecosystem by providing a fully type-safe, performant, idiomatic library for parsing and processing genomic data in any of the major bioinformatic data formats. It's built with an obsession with developer experience and is meant to enable users to compose their own pipelines of sequence transformations in a fluent DSL. For example, the following "pipeline", mirroring a Unix pipeline of operations from the excellent Seqkit command line interface, can be composed in TypeScript like so:
import { seqops } from "genotype";
const results = await seqops(genomeSequences)
.grep({ pattern: /^chr\d+/, target: "id" }) // Find chromosome sequences
.filter({ minLength: 100, maxGC: 60 }) // Quality filtering
.sample({ n: 1000, strategy: "reservoir" }) // Statistical sampling
.sort({ by: "length", order: "desc" }) // Compression-optimized sorting
.rmdup({ by: "sequence", caseSensitive: false }) // Remove duplicates
.writeFasta("analyzed_genome.fasta");A majority of features provided in SeqKit will also be provided by GenoType, plus some extra goodies that make sense in the context of a TypeScript library (e.g., K-mer set algebra or higher-order function combinators).
Also on the GenoType roadmap are Rust implementations that bring SIMD acceleration and multi-core parallelism to particular operations. These optimizations, together with the overall speed of modern JavaScript runtimes, will make GenoType's performance competitive with if not better than SeqKit, which is implemented in Go. Additionally, Rust implementations can be compiled to native extensions or WebAssembly modules, which opens the possibility for GenoType to be run on the server, in the browser, or anywhere in between.
GenoType uses Effect and specifically Effect Platform as its internal standard library for all file I/O and platform-specific operations. This architectural choice makes GenoType runtime-agnostic, running seamlessly on Node.js, Bun, and Deno without any runtime-specific code paths. Effect Platform provides a unified abstraction layer that handles platform differences internally, eliminating the need for manual runtime detection and conditional logic throughout the codebase. This approach also positions GenoType for potential browser compatibility in the future, as Effect Platform can target browser APIs through the same unified interface.
GenoType was designed with scripting in mind. As of 2025, the story for
scripting in JavaScript and TypeScript is exceptionally good, a fact that is
underappreciated in data science and bioinformatics. Runtimes like
Bun and Deno can run standalone
scripts, handling inline dependencies by default like how the Python manager
uv can handle Python dependencies--no
need for virtual environments and dependency hell. Unlike Python scripts, Bun
and Deno scripts also benefit from years of JavaScript runtime optimization.
Most remarkably, Bun and Deno don't even require that the runtimes themselves
are installed. Both runtimes can compile scripts
into portable standalone executables
for most popular OS/CPU architecture targets, with ease of cross-compilation to
rival Go or Zig.
Be it for the bioinformatics in the browser, portable scripting, or backend data crunching, GenoType's goal is to enable users to write intricate, high-throughput pipelines of genomic data transformations and run them anywhere. Providing a composable, type-safe DSL for building portable, dependency-free bioinformatic processing programs is what GenoType is all about.
Warning
GenoType is not available on NPM yet, so the following will not work. Instead,
if you're using bun, you can add it from source (again, at your own risk) into
your own project with bun add github:nrminor/genotype.
bun add @nrminor/genotypeProblem: You've received Illumina sequencing data from a collaborator. As always, it needs quality control before analysis.
import { seqops } from "genotype";
import { FastqParser } from "genotype/formats";
// Parse FASTQ with quality encoding specification (or automatic detection)
const parser = new FastqParser({
qualityEncoding: "phred33", // Specify encoding, or omit for automatic detection
parseQualityScores: true, // Enable quality score parsing for QC
});
const reads = parser.parseFile("SRR12345678.fastq.gz");
// Build a comprehensive QC pipeline
const qcStats = await seqops(reads)
// Step 1: Quality filtering
.quality({
minScore: 20, // Phred score threshold
trim: true, // Enable quality trimming
trimThreshold: 20, // Sliding window quality
trimWindow: 4, // Window size
})
// Step 2: Length filtering (post-trimming)
.filter({
minLength: 35, // Minimum read length
maxLength: 151, // Remove anomalously long reads
})
// Step 3: Contamination screening
.filter({
pattern: /^[ACGTN]+$/, // Valid bases only
hasAmbiguous: false, // No ambiguous bases
})
// Step 4: Calculate statistics
.stats({ detailed: true });
console.log(`
QC Report:
- Input reads: ${qcStats.totalSequences}
- Passed QC: ${qcStats.passedSequences} (${qcStats.passRate}%)
- Mean quality: ${qcStats.meanQuality}
- N50: ${qcStats.n50}
- GC content: ${qcStats.gcContent}%
`);Problem: You need to extract specific amplicon regions from sequencing data using PCR primers, with support for long reads and biological validation.
import { primer, seqops } from "genotype";
// Define primers with validation and IUPAC support
const forwardPrimer = primer`TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG`; // Nextera adapter
const reversePrimer = primer`GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG`; // Nextera adapter
// Simple amplicon extraction (90% use case)
const basicAmplicons = await seqops(reads)
.amplicon(forwardPrimer, reversePrimer) // Extract between primers
.filter({ minLength: 100, maxLength: 500 }) // Quality filtering
.writeFasta("extracted_amplicons.fasta");
// Advanced workflow with performance optimization for long reads
const optimizedAmplicons = await seqops(nanoporeReads)
.quality({ minScore: 10 }) // Nanopore quality threshold
.amplicon(forwardPrimer, reversePrimer, {
maxMismatches: 2, // Allow sequencing errors
searchWindow: { forward: 200, reverse: 200 }, // 🔥 100x+ speedup for long reads
flanking: false, // Inner region only (exclude primers)
outputMismatches: true, // Include debugging info
})
.filter({ minLength: 200, maxLength: 800 }) // Target region length
.rmdup({ by: "sequence" }) // Remove PCR duplicates
.validate({ mode: "strict" }) // Biological validation
.writeFasta("validated_amplicons.fasta");
// Real-world COVID-19 diagnostic example
import { FastqParser } from "genotype/formats";
const covidResults = await seqops(
new FastqParser().parseFile("covid_samples.fastq.gz"),
)
.quality({ minScore: 20, trim: true })
.amplicon(
primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
3, // Allow for sequencing errors
)
.validate({ mode: "strict" })
.stats({ detailed: true });
console.log(`Found ${covidResults.count} COVID amplicons`);Problem: You need to design guide RNAs for CRISPR, filtering for optimal GC content and checking for off-target sites.
const potentialGuides = await seqops(sequences)
// Extract all possible 20bp guide sequences
.transform({
custom: seq => extractKmers(seq, 20)
})
// Filter for optimal guide characteristics
.filter({
minGC: 40, // Minimum 40% GC
maxGC: 60, // Maximum 60% GC
pattern: /GG$/ // Must end with PAM-adjacent GG
})
// Remove guides with problematic sequences
.filter({
custom: guide => {
// No poly-T (terminates RNA pol III)
if (/TTTT/.test(guide.sequence)) return false;
// No extreme secondary structure
const mfe = calculateMFE(guide.sequence);
if (mfe < -10) return false;
return true;
}
})
// Check for off-targets in genome
.filter({
custom: async guide => {
const offTargets = await searchGenome(guide.sequence, maxMismatches: 3);
return offTargets.length === 1; // Only one perfect match
}
})
.collect();
console.log(`Found ${potentialGuides.length} suitable guide RNAs`);Problem: RNA-seq data needs preprocessing before differential expression analysis.
const processedReads = await seqops(rnaseqReads)
// Remove adapter sequences
.clean({
removeAdapters: true,
adapters: ["AGATCGGAAGAGC", "AATGATACGGCGAC"],
})
// Filter rRNA contamination (using bloom filter for speed)
.filter({
custom: (seq) => !rRNABloomFilter.contains(seq.sequence),
})
// Remove low-complexity sequences
.filter({
custom: (seq) => calculateComplexity(seq.sequence) > 0.5,
})
// Deduplicate while preserving read counts
.deduplicate({
by: "sequence",
keepCounts: true,
})
// Convert to format for aligner
.transform({ upperCase: true })
.writeFastq("processed_rnaseq.fastq");Problem: Validate assembled viral genomes before submission to GenBank.
const validationReport = await seqops(assemblies)
// Check genome completeness
.filter({
minLength: 29000, // SARS-CoV-2 minimum
maxLength: 30000, // SARS-CoV-2 maximum
})
// Validate sequence content
.validate({
mode: "strict",
allowAmbiguous: true, // Some Ns acceptable
maxAmbiguous: 100, // But not too many
action: "reject", // Reject invalid sequences
})
// Check for frameshifts in coding regions
.validate({
custom: async (seq) => {
const orfs = await findORFs(seq.sequence);
return orfs.every((orf) => orf.length % 3 === 0);
},
})
// Add metadata for submission
.annotate({
organism: "Severe acute respiratory syndrome coronavirus 2",
molType: "genomic RNA",
isolate: metadata.isolate,
country: metadata.country,
collectionDate: metadata.date,
})
.stats({ detailed: true });
if (validationReport.passedSequences === validationReport.totalSequences) {
console.log("✅ All genomes passed validation");
} else {
console.log(
`⚠️ ${validationReport.failedSequences} genomes failed validation`,
);
}