PALACE is a computational framework based on deep learning models and conjugate graph theory to assemble high-quality and confident phage genomes from metagenomic sequencing data. PALACE currently supports normal pair-end reads, Oxford Nanopore(ONT) and PacBio SMRT(PB) reads. The assembled phages genomes analyzed in the manuscript are available at Google Drive.
- Clone the repository and enter the directory:
git clone https://github.com/deepomicslab/PALACE
cd ./PALACE
- Create a conda environment with all dependencies and enter the environment:
mamba env create --prefix=./PALACE -f environment.yml
mamba activate ./PALACE
Or
conda env create --prefix=./PALACE -f environment.yml
conda activate ./PALACE
- Create a build directory and compile PALACE under it (use sudo, if required):
cd seqGraph_phage/
cd build
make
chmod u+x ./matching
cd ../scripts/
python setup.py build_ext --inplace
- pysam==0.17.0
- numpy==1.20.2
- sklearn==1.1.1
- biopython==1.78
- pysam==0.17.0
- matplotlib==3.4.2
Please check https://pytorch.org/get-started/previous-versions/ for installation
- torch==1.7.1
- torch-cluster==1.5.9
- torch-geometric==1.7.0
- torch-scatter==2.0.6
- torch-sparse==0.6.9
- torch-spline-conv==1.2.1
- torch-summary==1.4.5
- torchvision==0.8.0a0
- bwa BWA is a software package for reads mapping.
- samtools Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format.
- fastp Provide fast all-in-one preprocessing for FastQ files.
- spades Pre-assembly
- ncbi-blast Sequence alignment tool.
- htslib
Install the prerequisites first, then clone the repository and enter the directory:
git clone https://github.com/deepomicslab/PALACE
#create a new mamba(conda) env
mamba create -n palace ## or conda create -n palace
mamba activate palace ## or conda activate palace
cd ./PALACE/seqGraph_phage/
cd build
make
chmod u+x ./matching
cd ../scripts/
python setup.py build_ext --inplace
- Config the config.txt file, here is a demo file.
fastq1
, Read1 paired fastq file.fastq2
, Read2 paired fastq file.phagedb
, Phage reference database; the latest phage reference database can be download from here.protein_db
, Phage protein database; the latest phage protein database can be download from here. *gcn_model
, Deeplearning model for phage contigs predict; can be download from herethreads
, Threads to be used.out_dir
, Output directory.prefix
, Intermediate file prefix, can be sample name.PYTHON
, Python path.BWA
, bwa path.SAMTOOLS
, samtools path.FASTP
, fastp path.SPADES
, spades.py path.NCBI_BIN
, ncbi-blast bin path, must contains makeblastdb, blastn and tblastnPALACE
, PALACE path.
- Runing PALACE.
bash PALACE_PATH/pipe.sh config.txt
01-qc/
, fastp output.02-assembly/
, Raw assembly result with spades with --meta.03-search/
, This directory contains three main intermediate files:hit_seqs.out
contains contigs with phage protein.node_scores.out
, the second column is the score predicted by deeplearning network.{prefix}_ref_names.txt
, contains phage references identified by kmer alignment.04-match/
, This directory contains the graph structure of the conjugate graph({prefix}_filtered_graph.txt
), the results of the graph decompose({prefix}_all_result.txt
).05-furth
, This directory contains the local matching result based on the phage reference.final_result
, This directory contains the final result, final contig paths for phages({prefix}_final.txt
), cycle paths for phages({prefix}_cycle.txt
), phages fasta(```{prefix}_final.fasta)
PALACE is developed by DeepOmics lab under the supervision of Dr. Li Shuaicheng, City University of Hong Kong, Hong Kong, China. Should you have any queries, please feel free to contact us by [email protected] or [email protected].
This project is licensed under the MIT License - see the LICENSE.txt file for details.