Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
41d5c92
Adding beagle5 to impute5, waiting for PR
gichas Jun 19, 2025
9f2e115
pause
gichas Jun 25, 2025
bbde3c4
Adding Beagle5 to phaseimpute (test issue)
gichas Jul 7, 2025
6e9e654
Adding Beagle 5.2 to phaseimpute impute tools
gichas Jul 9, 2025
23081a8
Adding Beagle 5.2 to phaseimpute impute tools
gichas Jul 9, 2025
b55df83
Update
gichas Jul 9, 2025
6999c60
Adding Beaglev5.2 to phaseimpute
gichas Jul 10, 2025
06fa701
Updating subworkflow tests (test with map)
gichas Jul 10, 2025
8d6785e
Update nf-schema plugin version to resolve conflict
gichas Jul 10, 2025
ab01b1b
Updating test file
gichas Jul 10, 2025
2554639
Address review comments
gichas Jul 16, 2025
5d3e10f
Apply BEAGLE5 review corrections
gichas Jul 16, 2025
27a7346
Align BEAGLE5 workflow with MINIMAC4 logic
gichas Jul 17, 2025
062e5f1
Merge remote-tracking branch 'upstream/dev' into beagle
gichas Jul 18, 2025
33d78b8
Add BEAGLE5 multi-chromosome test
gichas Jul 20, 2025
6ac5316
Allow Beagle5 without posfile input
Jul 21, 2025
2dca88a
Update subworkflows/local/vcf_impute_beagle5/main.nf
gichas Jul 21, 2025
e92448e
Update workflows/phaseimpute/main.nf
gichas Jul 21, 2025
6cf5d52
Update workflows/phaseimpute/main.nf
gichas Jul 21, 2025
c89561a
Update subworkflows/local/vcf_impute_beagle5/main.nf
gichas Jul 21, 2025
d8c4663
Update subworkflows/local/vcf_impute_beagle5/main.nf
gichas Jul 21, 2025
853c90e
Update usage.md
gichas Jul 21, 2025
addfcb1
Merge branch 'beagle' of github.com:gichas/phaseimpute into beagle
gichas Jul 21, 2025
bc1cf4f
Update channel handling and tests
gichas Jul 22, 2025
a8a8536
Update subworkflows/local/vcf_impute_beagle5/main.nf
gichas Jul 23, 2025
eca2408
Update main.nf
gichas Jul 25, 2025
73a7fb7
Update CHANGELOG.md
LouisLeNezet Jul 25, 2025
de73247
Update subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf
LouisLeNezet Jul 25, 2025
28cff03
Update subworkflows/local/vcf_impute_beagle5/main.nf
LouisLeNezet Jul 25, 2025
ef14b6b
Update subworkflows/local/vcf_impute_beagle5/meta.yml
LouisLeNezet Jul 25, 2025
590d301
Update subworkflows/local/vcf_impute_beagle5/meta.yml
LouisLeNezet Jul 25, 2025
ae2d04b
Update subworkflows/local/vcf_impute_beagle5/main.nf
gichas Jul 25, 2025
294d83d
Update Beagle subworkflow test snapshot
gichas Jul 25, 2025
4487311
Update map names, update beagle test snapshot
gichas Jul 28, 2025
ad47858
Add seed to Beagle5 subworkflow for the test reproductibility
gichas Sep 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#183](https://github.com/nf-core/phaseimpute/pull/183) - Remove wrongfully added files in `BAM_EXTRACT_REGION_SAMTOOLS`.
- [#185](https://github.com/nf-core/phaseimpute/pull/185) - Fix CSV generation and check that all mentioned path files exist.
- [#189](https://github.com/nf-core/phaseimpute/pull/189) - Set meta map id as string to avoid error when using numbers in csv files.
- [#200](https://github.com/nf-core/phaseimpute/pull/200) - Add BEAGLE5 support for genotype imputation.

### `Dependencies`

Expand All @@ -44,6 +45,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| `r-stitch` | 1.6.10 | 1.7.3 |
| `shapeit5` | 1.0.0 | 5.1.1 |
| `vcflib` | 1.0.3 | 1.0.14 |
| `beagle5` | | 5.2 |

## v1.0.0 - Black Labrador [2024-12-09]

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ The whole pipeline consists of five main steps, each of which can be run separat
- **Position Extraction** for targeted imputation sites.

4. **Imputation (`--impute`)**: This is the primary step, where genotypes in the target dataset are imputed using the prepared reference panel. The main steps are:
- **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), or [Quilt](https://github.com/rwdavies/QUILT).
- **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt](https://github.com/rwdavies/QUILT), or [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html).
- **Ligation** of imputed chunks to produce a final VCF file per sample, with all chromosomes unified.

5. **Validation (`--validate`)**: Assesses imputation accuracy by comparing the imputed dataset to a truth dataset. This step leverages the [Glimpse2](https://odelaneau.github.io/GLIMPSE/) concordance process to summarize differences between two VCF files.
Expand Down
56 changes: 56 additions & 0 deletions conf/steps/imputation_beagle5.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
----------------------------------------------------------------------------------------
*/

process {
// Configuration for the BEAGLE5 imputation subworkflow

// Impute the variants with BEAGLE5
withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_BEAGLE5:.*' {
publishDir = [ enabled: false ]
tag = {"${meta.id} ${meta.chr}"}
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_BEAGLE5:BEAGLE5_BEAGLE' {
ext.args = { "gp=true ap=true chrom=${meta.chr}" }
ext.prefix = { "${meta.id}.${meta.chr}.beagle5" }
publishDir = [ enabled: false ]
}

// Convert BCF to VCF if necessary
withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_BEAGLE5:BCFTOOLS_VIEW' {
ext.args = ["--output-type z", "--write-index=csi"].join(' ')
ext.prefix = { "${meta.id}.${meta.chr}.converted" }
publishDir = [ enabled: false ]
}

// Index the imputed VCF files
withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_BEAGLE5:BCFTOOLS_INDEX_BEAGLE' {
ext.args = ''
publishDir = [ enabled: false ]
}

// Concatenate the imputed chromosomes
withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_BEAGLE5:.*' {
publishDir = [
path: { "${params.outdir}/imputation/beagle5/concat" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_BEAGLE5:BCFTOOLS_CONCAT' {
ext.args = ["--output-type z", "--write-index=tbi"].join(' ')
ext.prefix = { "${meta.id}.beagle5" }
}


}
2 changes: 1 addition & 1 deletion conf/test_all.config
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ params {
steps = "all"

// Impute tools
tools = "glimpse1,glimpse2,stitch,quilt"
tools = "glimpse1,glimpse2,stitch,quilt,beagle5"
}

process {
Expand Down
2 changes: 1 addition & 1 deletion conf/test_all_fullchr.config
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ params {

// Pipeline steps
steps = "all"
tools = "glimpse1,glimpse2,quilt,stitch"
tools = "glimpse1,glimpse2,quilt,stitch,beagle5"
depth = 1

// Panelprep optional args
Expand Down
47 changes: 47 additions & 0 deletions conf/test_beagle5.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests with BEAGLE5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/phaseimpute -profile test_beagle5,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '8.GB',
time: '1.h'
]
}

params {
config_profile_name = 'Test profile for BEAGLE5'
config_profile_description = 'Minimal test dataset to check BEAGLE5 imputation function'

// Input data
input = "${projectDir}/tests/csv/sample_vcf.csv"
input_region = "${projectDir}/tests/csv/region.csv"
panel = "${projectDir}/tests/csv/panel.csv"

// Map file
map = "${projectDir}/tests/csv/map_plink.csv"

// Region file
input_region = "${projectDir}/tests/csv/region.csv"

// Genome references
fasta = params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz"

// Pipeline steps
steps = 'impute'

// Imputation tools
tools = 'beagle5'

// Main options
outdir = 'results'
}
2 changes: 1 addition & 1 deletion conf/test_validate.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ params {
// Genome references
fasta = params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz"
posfile = "${projectDir}/tests/csv/posfile_vcf_index.csv"
map = "${projectDir}/tests/csv/map.csv"
map = "${projectDir}/tests/csv/map_glimpse.csv"

// Pipeline steps
steps = "validate"
Expand Down
24 changes: 21 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,7 @@ For starting from the imputation steps, the required flags are:
| `GLIMPSE2` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ | ❌ |
| `QUILT` | ✅ | ✅ ² | ✅ | ❌ | ✅ | ✅ ⁴ |
| `STITCH` | ✅ | ✅ ² | ✅ | ❌ | ❌ | ✅ ³ |
| `BEAGLE5` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ❌ |

> ¹ Alignment files as well as variant calling format (i.e. BAM, CRAM, VCF or BCF)
> ² Alignment files only (i.e. BAM or CRAM)
Expand All @@ -332,12 +333,12 @@ When the number of samples exceeds the batch size, the pipeline will split the s

To summarize:

- If you have Variant Calling Format (VCF) files, join them into a single file and choose either GLIMPSE1 or GLIMPSE2.
- If you have Variant Calling Format (VCF) files, join them into a single file and choose either GLIMPSE1, GLIMPSE2 or BEAGLE5.
- If you have alignment files (e.g., BAM or CRAM), all tools are available, and processing will occur in `batch_size`:
- GLIMPSE1 and STITCH may induce batch effects, so all samples need to be imputed together.
- GLIMPSE2 and QUILT can process samples in separate batches.

## Imputation tools `--steps impute --tools [glimpse1, glimpse2, quilt, stitch]`
## Imputation tools `--steps impute --tools [glimpse1, glimpse2, quilt, stitch, beagle5]`

You can choose different software to perform the imputation. In the following sections, the typical commands for running the pipeline with each software are included. Multiple tools can be selected by separating them with a comma (eg. `--tools glimpse1,quilt`).

Expand Down Expand Up @@ -477,6 +478,23 @@ nextflow run nf-core/phaseimpute \

Make sure the CSV file with the input panel is the output from `--step panelprep` or has been previously prepared.

### BEAGLE5

[BEAGLE5](https://faculty.washington.edu/browning/beagle/beagle.html) is a software package for analyzing large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can phase genotype data and perform genotype imputation.

```bash
nextflow run nf-core/phaseimpute \
--input samplesheet.csv \
--panel samplesheet_reference.csv \
--steps impute \
--tool beagle5 \
--outdir results \
--genome GRCh37 \
-profile docker
```

The CSV file provided in `--panel` must be prepared with `--steps panelprep` and must contain four columns [panel, chr, vcf, index].

## Start with validation `--steps validate`

<img src="images/metro/Validate.png" alt="concordance_metro" width="600"/>
Expand Down Expand Up @@ -519,7 +537,7 @@ This mode runs all the previous steps. This requires several flags:
- `--input input.csv`: The samplesheet containing the input sample files in `bam` or `cram` format.
- `--depth`: The final depth of the input file [default: 1].
- `--genome` or `--fasta`: The reference genome of the samples.
- `--tools [glimpse1, glimpse2, quilt, stitch]`: A selection of one or more of the available imputation tools.
- `--tools [glimpse1, glimpse2, quilt, stitch, beagle5]`: A selection of one or more of the available imputation tools.
- `--panel panel.csv`: The samplesheet containing the reference panel files in `vcf.gz` format.
- `--remove_samples`: (optional) A comma-separated list of samples to remove from the reference.
- `--input_truth input_truth.csv`: The samplesheet containing the truth VCF files in `vcf` format.
Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -218,4 +218,4 @@
}
}
}
}
}
23 changes: 23 additions & 0 deletions modules/nf-core/beagle5/beagle/beagle5-beagle.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions modules/nf-core/beagle5/beagle/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

51 changes: 51 additions & 0 deletions modules/nf-core/beagle5/beagle/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

75 changes: 75 additions & 0 deletions modules/nf-core/beagle5/beagle/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading