-
Notifications
You must be signed in to change notification settings - Fork 195
feat: added wrapper for Paraphase #3071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MagdalenaZZ
wants to merge
43
commits into
snakemake:master
Choose a base branch
from
MagdalenaZZ:my-new-snakemake-wrapper-paraphrase
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 39 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
1abdb90
feat: added wrapper for Paraphrase by PacBio, which takes HiFi aligne…
MagdalenaZZ 4cda233
bug: Update bio/paraphase/meta.yaml
MagdalenaZZ 32e74dd
bug: Update bio/paraphase/meta.yaml
MagdalenaZZ 7b8a51c
bug: Update bio/paraphase/meta.yaml
MagdalenaZZ 70ebcaa
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
MagdalenaZZ 8e09198
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
MagdalenaZZ a2f85b3
Update bio/paraphase/test/Snakefile
MagdalenaZZ 0e0295b
Update bio/paraphase/test/Snakefile
MagdalenaZZ 9112bd1
Update bio/paraphase/wrapper.py
MagdalenaZZ 68e6be6
feat: improved environment
MagdalenaZZ 73454a1
Update meta.yaml
MagdalenaZZ 59010d1
Update bio/paraphase/test/Snakefile
MagdalenaZZ ac24eaa
Update wrapper.py
MagdalenaZZ ac87abe
bug: small fix
MagdalenaZZ 876aad7
bug: fixes
MagdalenaZZ bd65d78
bug: fixes
MagdalenaZZ 3d62ef3
bug: fixes
MagdalenaZZ f9bb343
bug: fixing flies
MagdalenaZZ d8f96dc
Update Snakefile
MagdalenaZZ b64175c
Update Snakefile
MagdalenaZZ 18da384
Update wrapper.py
MagdalenaZZ a4856c3
bug: fixing files
MagdalenaZZ 4ce43b2
bug: fix
MagdalenaZZ 7ad0aee
bug: fixed files
MagdalenaZZ 178e901
Update bio/paraphase/meta.yaml
MagdalenaZZ 108c9ca
Update bio/paraphase/test/Snakefile
MagdalenaZZ b63cf7f
Delete bio/paraphase/used_wrappers.yaml
MagdalenaZZ c5ed609
feat: linting
MagdalenaZZ 2f918ef
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
MagdalenaZZ 1baf9b8
Remove file handle close
fgvieira e12dcc5
Remove trailing spaces
fgvieira e41cb61
Remove unused imports
fgvieira c9dc011
Update wrapper.py
MagdalenaZZ e568525
Update meta.yaml
MagdalenaZZ 4cc9676
Update bio/paraphase/wrapper.py
MagdalenaZZ 5b81e0d
bug: linting wrapper.py
MagdalenaZZ 6745c6b
bug: linting
MagdalenaZZ ef1ba6e
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
MagdalenaZZ 34bf6b8
Remove import
fgvieira 02c8953
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
MagdalenaZZ 8631711
bug: black linting
MagdalenaZZ 781e912
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
MagdalenaZZ 502217e
Merge branch 'master' into my-new-snakemake-wrapper-paraphrase
johanneskoester File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
channels: | ||
- bioconda | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- minimap2=2.28 | ||
- paraphase=3.1.1 | ||
- htslib =1.20 | ||
- samtools =1.20 | ||
- bcftools =1.20 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
name: poraphrase | ||
url: https://github.com/PacificBiosciences/paraphase | ||
description: paraphase resolves SNVs in gene families for PacBio longreads from whole genomes | ||
authors: Magdalena | ||
input: | ||
bam: Path to bam file with aligned genomic reads | ||
fasta: genome fasta file for target genome | ||
faidx: fasta .fai index for the genome fasta file | ||
output: | ||
vcf: vcf file with resulting phased variants | ||
bam: bam file with realigned reads | ||
bai: bai index file for the realigned reads | ||
merged_vcf: VCF file containing all newly discovered variants, phased by gene copy and haplotype | ||
params: | ||
extra: additional parameters that will be forwarded to paraphase |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Set the custom wrapper prefix for the entire workflow | ||
|
||
|
||
rule paraphase: | ||
input: | ||
bam="test_data/PDPK1.bam", | ||
fasta="test_data/chr16s.fa", | ||
faidx="test_data/homo_sapiens.fa.fai", | ||
output: | ||
bam="paraphase/PDPK1_realigned.paraphase.bam", | ||
bai="paraphase/PDPK1_realigned.paraphase.bam.bai", | ||
merged_vcf="paraphase/PDK1_paraphase.vcf.gz", | ||
vcf_header="paraphase/vcf_chromosome_header.vcf", | ||
params: | ||
extra="-g PDPK1 --genome 38", | ||
log: | ||
"paraphase/PDPK1_realigned.vcf.log", | ||
benchmark: | ||
"paraphase/PDPK1_paraphase.vcf.benchmark.tsv" | ||
threads: 1 | ||
resources: | ||
mem_mb=1024, | ||
wrapper: | ||
"main/bio/paraphase" | ||
Binary file not shown.
Binary file not shown.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
chr1 2012 6 2012 2013 | ||
chr2 2012 2025 2012 2013 | ||
chr3 2012 4044 2012 2013 | ||
chr4 2012 6063 2012 2013 | ||
chr5 71099999 8082 71099999 71100000 | ||
chr6 2012 71108088 2012 2013 | ||
chr7 2012 71110107 2012 2013 | ||
chr8 2012 71112126 2012 2013 | ||
chr9 2012 71114145 2012 2013 | ||
chr10 2012 71116165 2012 2013 | ||
chr11 2012 71118185 2012 2013 | ||
chr12 2012 71120205 2012 2013 | ||
chr13 2012 71122225 2012 2013 | ||
chr14 2012 71124245 2012 2013 | ||
chr15 2012 71126265 2012 2013 | ||
chr16 2012 71128285 2012 2013 | ||
chr17 2012 71130305 2012 2013 | ||
chr18 2012 71132325 2012 2013 | ||
chr19 2012 71134345 2012 2013 | ||
chr20 2012 71136365 2012 2013 | ||
chr21 2012 71138385 2012 2013 | ||
chr22 2012 71140405 2012 2013 | ||
chrX 2012 71142424 2012 2013 | ||
chrY 2012 71144443 2012 2013 | ||
chrM 2012 71146462 2012 2013 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
import glob | ||
from snakemake.shell import shell | ||
from tempfile import TemporaryDirectory | ||
|
||
log = snakemake.log_fmt_shell(stdout=True, stderr=True) | ||
extra = snakemake.params.get("extra", "") | ||
|
||
|
||
with TemporaryDirectory() as tmpdirname: | ||
|
||
### Run paraphase | ||
shell( | ||
f""" | ||
(paraphase --bam {snakemake.input.bam} \ | ||
--reference {snakemake.input.fasta} \ | ||
--out {tmpdirname} \ | ||
{snakemake.params.genome} \ | ||
{extra}) {log} | ||
""", | ||
tmpdirname=tmpdirname, # Pass tmpdirname to the shell environment | ||
) | ||
shell( | ||
""" | ||
touch {snakemake.output} | ||
""" | ||
) | ||
|
||
### Create a new VCF header | ||
|
||
# Get the paths from Snakemake input and output objects | ||
input_faidx = snakemake.input.faidx | ||
output_vcf_header = snakemake.output.vcf_header | ||
|
||
# Open the .fai index file and read lines | ||
with open(input_faidx, "r") as fai_file: | ||
lines = fai_file.readlines() | ||
|
||
# Open the output file and write formatted header lines | ||
with open(output_vcf_header, "w") as output: | ||
for line in lines: | ||
contig_id, length = line.split()[:2] # Assuming the first two elements are ID and length | ||
output.write(f"##contig=<ID={contig_id},length={length}>\n") | ||
|
||
### Concatenating, reheadering, and sorting the zipped and indexed VCF files, and copy the remapped reads | ||
vcf_res = glob.glob(f"{tmpdirname}/*_vcfs/*vcf") | ||
if vcf_res: | ||
for vcf in vcf_res: | ||
bgzip_cmd = f"bgzip -c {vcf} > {vcf}.gz" | ||
shell(bgzip_cmd) | ||
index_cmd = f"bcftools index {vcf}.gz" | ||
shell(index_cmd) | ||
print(f"Compressed and indexed: {vcf}.gz") | ||
|
||
params_variant_files = " ".join([f"{vcf}.gz" for vcf in vcf_res]) | ||
shell( | ||
f"bcftools concat -a -Oz {params_variant_files} | " | ||
f"bcftools annotate --header-lines {snakemake.output.vcf_header} | " | ||
f"bcftools sort -Oz -o {snakemake.output.merged_vcf}" | ||
) | ||
print(f"Merged, reheadered, and sorted VCF file created: {snakemake.output.merged_vcf}") | ||
|
||
# Copy out bam and bai files | ||
bam_res = glob.glob(f"{tmpdirname}/*.bam") | ||
bai_res = glob.glob(f"{tmpdirname}/*.bai") | ||
# print("BAM RES: ", bam_res, bai_res) | ||
shell( | ||
f""" | ||
cp -pr {' '.join(bam_res)} {snakemake.output.bam}; | ||
cp -pr {' '.join(bai_res)} {snakemake.output.bai} | ||
""" | ||
) | ||
else: | ||
print("No output VCF or BAM files were produced by paraphase, I hope this is what you were expecting, human?") | ||
shell(f"touch {snakemake.output.merged_vcf}") | ||
|
||
|
||
with TemporaryDirectory() as tmpdirname: | ||
|
||
### Run paraphase | ||
shell( | ||
f""" | ||
(paraphase --bam {snakemake.input.bam} \ | ||
--reference {snakemake.input.fasta} \ | ||
--out {tmpdirname} \ | ||
{snakemake.params.genome} \ | ||
{extra}) {log} | ||
""", | ||
tmpdirname=tmpdirname, # Pass tmpdirname to the shell environment | ||
) | ||
fgvieira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
shell( | ||
""" | ||
touch {snakemake.output} | ||
""" | ||
) | ||
|
||
### Create a new VCF header | ||
|
||
# Get the paths from Snakemake input and output objects | ||
input_faidx = snakemake.input.faidx | ||
output_vcf_header = snakemake.output.vcf_header | ||
|
||
# Open the .fai index file and read lines | ||
with open(input_faidx, "r") as fai_file: | ||
lines = fai_file.readlines() | ||
|
||
# Open the output file and write formatted header lines | ||
with open(output_vcf_header, "w") as output: | ||
for line in lines: | ||
contig_id, length = line.split()[:2] # Assuming the first two elements are ID and length | ||
output.write(f"##contig=<ID={contig_id},length={length}>\n") | ||
output.close() | ||
fgvieira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Concatenating, reheadering, and sorting the zipped and indexed VCF files, and copy the remapped reads | ||
vcf_res = glob.glob(f"{tmpdirname}/*_vcfs/*vcf") | ||
if vcf_res: | ||
for vcf in vcf_res: | ||
bgzip_cmd = f"bgzip -c {vcf} > {vcf}.gz" | ||
shell(bgzip_cmd) | ||
index_cmd = f"bcftools index {vcf}.gz" | ||
shell(index_cmd) | ||
print(f"Compressed and indexed: {vcf}.gz") | ||
|
||
params_variant_files = " ".join([f"{vcf}.gz" for vcf in vcf_res]) | ||
shell( | ||
f"bcftools concat -a -Oz {params_variant_files} | " | ||
f"bcftools annotate --header-lines {snakemake.output.vcf_header} | " | ||
f"bcftools sort -Oz -o {snakemake.output.merged_vcf}" | ||
) | ||
print(f"Merged, reheadered, and sorted VCF file created: {snakemake.output.merged_vcf}") | ||
|
||
# Copy out bam and bai files | ||
bam_res = glob.glob(f"{tmpdirname}/*.bam") | ||
bai_res = glob.glob(f"{tmpdirname}/*.bai") | ||
# print("BAM RES: ", bam_res, bai_res) | ||
shell( | ||
f""" | ||
cp -pr {' '.join(bam_res)} {snakemake.output.bam}; | ||
cp -pr {' '.join(bai_res)} {snakemake.output.bai} | ||
""" | ||
) | ||
else: | ||
print("No output VCF or BAM files were produced by paraphase, I hope this is what you were expecting, human?") | ||
shell(f"touch {snakemake.output.merged_vcf}") | ||
fgvieira marked this conversation as resolved.
Show resolved
Hide resolved
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.