name: bio-genome-assembly-long-read-assembly
description: De novo genome assembly from Oxford Nanopore or PacBio long reads using Flye and Canu. Produces highly contiguous assemblies suitable for complete bacterial genomes and resolving complex regions. Use when assembling genomes from ONT or PacBio reads.
tool_type: cli
primary_tool: Flye
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
read_file
run_shell_command
Assemble genomes from Oxford Nanopore (ONT) or PacBio long reads for highly contiguous assemblies.
Tool
Speed
Memory
Best For
Flye
Fast
Moderate
General purpose, bacteria, ONT
Canu
Slow
High
High accuracy, complex genomes
Wtdbg2
Very fast
Low
Draft assemblies
Note: For PacBio HiFi data, see the dedicated hifi-assembly skill which covers hifiasm.
conda install -c bioconda flye
# Oxford Nanopore
flye --nano-raw reads.fastq.gz --out-dir flye_output --threads 16
# PacBio CLR
flye --pacbio-raw reads.fastq.gz --out-dir flye_output --threads 16
# PacBio HiFi
flye --pacbio-hifi reads.fastq.gz --out-dir flye_output --threads 16
Option
Read Type
--nano-raw
ONT regular reads
--nano-corr
ONT corrected reads
--nano-hq
ONT Q20+ reads (Guppy 5+)
--pacbio-raw
PacBio CLR
--pacbio-corr
PacBio corrected
--pacbio-hifi
PacBio HiFi/CCS
Option
Description
--out-dir
Output directory
--threads
Number of threads
--genome-size
Estimated genome size (e.g., 5m, 100m)
--iterations
Polishing iterations (default: 1)
--meta
Metagenome mode
--plasmids
Recover plasmids
--keep-haplotypes
Don't collapse haplotypes
--scaffold
Enable scaffolding
# Estimate if unknown
flye --nano-raw reads.fq.gz --out-dir output --genome-size 5m
# Size formats: 1000, 1k, 1m, 1g
flye_output/
├── assembly.fasta # Final assembly
├── assembly_graph.gfa # Assembly graph
├── assembly_info.txt # Contig statistics
└── flye.log # Log file
flye \
--nano-raw bacteria.fastq.gz \
--out-dir bacteria_assembly \
--genome-size 5m \
--threads 16
flye \
--nano-raw metagenome.fastq.gz \
--out-dir meta_assembly \
--meta \
--threads 32
flye \
--nano-raw isolate.fastq.gz \
--out-dir assembly \
--plasmids \
--threads 16
conda install -c bioconda canu
# ONT reads
canu -p assembly -d canu_output genomeSize=5m -nanopore reads.fastq.gz
# PacBio HiFi
canu -p assembly -d canu_output genomeSize=5m -pacbio-hifi reads.fastq.gz
Option
Description
-p
Assembly prefix
-d
Output directory
genomeSize=
Estimated size (required)
maxThreads=
Max threads
maxMemory=
Max memory (e.g., 64g)
useGrid=false
Disable grid execution
correctedErrorRate=
Expected error rate
Option
Read Type
-nanopore
ONT reads
-nanopore-raw
ONT raw (deprecated)
-pacbio
PacBio CLR
-pacbio-hifi
PacBio HiFi/CCS
canu -p asm -d output genomeSize=5m \
-nanopore reads.fq.gz \
useGrid=false \
maxThreads=16 \
maxMemory=32g
High-Quality Mode (PacBio HiFi)
canu -p asm -d output genomeSize=5m \
-pacbio-hifi reads.fq.gz \
correctedErrorRate=0.01
canu_output/
├── assembly.contigs.fasta # Contigs
├── assembly.unassembled.fasta
├── assembly.report
└── assembly.seqStore/
conda install -c bioconda wtdbg
# Assemble
wtdbg2 -x ont -g 5m -t 16 -i reads.fq.gz -o draft
# Consensus
wtpoa-cns -t 16 -i draft.ctg.lay.gz -o draft.ctg.fa
Preset
Platform
-x ont
ONT R9
-x ccs
PacBio HiFi
-x rs
PacBio CLR
-x sq
ONT R10
#! /bin/bash
set -euo pipefail
READS=$1
OUTDIR=$2
SIZE=${3:- 5m}
echo " === ONT Bacterial Assembly ==="
# Flye assembly
flye \
--nano-raw $READS \
--out-dir ${OUTDIR} /flye \
--genome-size $SIZE \
--threads 16
# Stats
echo " Assembly statistics:"
cat ${OUTDIR} /flye/assembly_info.txt
echo " Assembly: ${OUTDIR} /flye/assembly.fasta"
Hybrid Assembly (Long + Short)
#! /bin/bash
set -euo pipefail
LONG=$1
SHORT_R1=$2
SHORT_R2=$3
OUTDIR=$4
# 1. Long-read assembly with Flye
flye --nano-raw $LONG --out-dir ${OUTDIR} /flye --genome-size 5m --threads 16
# 2. Polish with short reads (Pilon)
# See assembly-polishing skill
Metric
Bacterial
Eukaryotic
Contigs
1-10
100-1000+
N50
>1 Mb
Variable
Complete chromosomes
Often
Rare
Check coverage (need >30x)
Try increasing iterations in Flye
Consider supplementing with short reads
Use Flye (more memory efficient)
Reduce threads
Filter reads by length/quality
Polish with Pilon/medaka
Validate with short reads
Check for contamination
hifi-assembly - PacBio HiFi assembly with hifiasm
assembly-polishing - Polish long-read assemblies
assembly-qc - QUAST and BUSCO assessment
short-read-assembly - Hybrid with Illumina
long-read-sequencing - Read QC and alignment