FuFiHLA: Full Field HLA allele typing for Long Reads

FuFiHLA is a pipeline for full field HLA allele typing and consensus sequence construction from long-read sequencing data. It currently supports PacBio HiFi data on six clinically important transplant genes: HLA-A, -B, -C, -DQA1, -DQB1, -DRB1.

Highlights

Reference-free: does not depend on a specific version of reference genome such GRCh38 or CHM13
Improved consensus accuracy compared to StarPhase

Citation: TBD

Installation

Install from Bioconda (recommended):

conda install -c bioconda -c conda-forge fufihla

Quick Test

with 'test.fa.gz' under the folder "test", run:

fufihla --fa test.fa.gz --out test_dir

The output includes:

test_dir/ → pipeline logs
test_dir.out → result output
test_dir.err → stderr log

Usage

To use the latest reference allele sequences from IMGT, type:

fufihla-ref-prep

This will create a directory called ref_data, which would contain the reference allele sequence ref.gene.fa.gz.

run the pipeline:

# with default reference allele sequences, version IPD-IMGT/HLA-V3.61.0
fufihla --fa <input_reads.fa.gz> --out <output_dir>
# or with the specific version of reference data
fufihla --fa <input_reads.fa.gz> --out <output_dir> --refdir <reference data directory> --hifi/--ont --debug

Arguments

<input_reads.fa.gz> : raw PacBio HiFi reads (.fa/.fa.gz/.fq/.fq.gz)
<output_dir> : directory for pipeline outputs
--refdir <reference_data_directory>(optional): path to reference allele dataset; if omitted, uses the default bundled set
--hifi/--ont(optional): choose HiFi long reads or Nanopore long read data as input, default is --hifi
--debug(optional): keep all intermediate files; otherwise only consensus results are kept

Outputs

A typical run produces:

<outdir>/consensus/*_asm*.fa        → consensus allele FASTA sequences

Allele calls are printed to <output_dir>.out in PAF-like format with minimap2 tags. Example:

HLA-A*01:01:01:01  cons_HLA-A*01_01_01_01  ...  cs:Z::3503
HLA-A*26:01:01:01  cons_HLA-A*26_01_01_01  ...  cs:Z::3517

Column 1 → the allele name called by FuFiHLA
Column 2 → the consensus sequence build upon allele in the suffix
Last column (cs:Z) → minimap2 cs tag encoding base-level matches/mismatches:
- Known Alleles: cs:Z::3503 → perfect match over 3503 bp
- Novel Alleles → cs:Z contains substitutions (*), insertions (+), or deletions (-)

Running tips

Extract reads from exist bam files can also generate similar result as using WGS reads.

## save the six gene locations into bed format based on the gene annotation file
echo "
chr6	29942254	29945755
chr6	31268254	31272571
chr6	31353362	31357442
chr6	32578769	32589848
chr6	32636717	32643200
chr6	32660031	32667132" > sel.bed

## Extract the reads covering the target gene region
samtools view -bh ${bam} --region-file sel.bed | samtools fasta | gzip -c > out.fa.gz

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
bin		bin
recipe		recipe
share/fufihla		share/fufihla
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FuFiHLA: Full Field HLA allele typing for Long Reads

Highlights

Installation

Quick Test

Usage

Outputs

Running tips

About

Uh oh!

Releases

Packages

Languages

License

hlilab/FuFiHLA

Folders and files

Latest commit

History

Repository files navigation

FuFiHLA: Full Field HLA allele typing for Long Reads

Highlights

Installation

Quick Test

Usage

Outputs

Running tips

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages