Skip to content

Commit 802972d

Browse files
Update README.md
fixing the consensus fastq pipe
1 parent 18cce1d commit 802972d

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

  • Exercises/historical_population_size

Exercises/historical_population_size/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ srun --mem-per-cpu=5g --time=3:00:00 --account=populationgenomics --pty bash
3030
Starting from mapped reads, the first step is to produce a consensus sequence in FASTQ format, which stores both the sequence and its corresponding quality scores, that will be used for QC filtering. The consensus sequence has A, T, C or G at homozygous sites, and other letters [IUPAC codes](https://www.bioinformatics.org/sms/iupac.html) to represent heterozygotes. To make the consensus calls, we use the samtools/bcftools suite. We first use `samtools mpileup` to get the pileup of reads for each position. We then generate a consensus sequence with `bcftools`, which we convert to FASTQ (with some additional filtering) by `vcfutils.pl`. We take advantage of Unix pipes and the ability of `samtools` to work with streaming input and output to run the whole pipeline (`samtools` -> `bcftools` -> `vcfutils.pl`) as one command. We run our consensus calling pipeline, consisting of a linked set of `samtools`, `bcftools`, and `vcfutils.pl` commands:
3131

3232
```bash
33-
~/populationgenomics/software/bcftools mpileup -Q 30 -Ou -q 30 -f chr2.fa -r 2 S_Hungarian-2.chr2.bam | ~/populationgenomics/software/bcftools call -c ~/populationgenomics/software/vcfutils.pl vcf2fq -d 5 -D 100 -Q 30 > S_Hungarian-2.chr2.fq
33+
~/populationgenomics/software/bcftools mpileup -Q 30 -Ou -q 30 -f chr2.fa -r 2 S_Hungarian-2.chr2.bam | ~/populationgenomics/software/bcftools call -c | ~/populationgenomics/software/vcfutils.pl vcf2fq -d 5 -D 100 -Q 30 > S_Hungarian-2.chr2.fq
3434
```
3535

3636
The command takes as input an aligned bam file and a reference genome, generates a summary of the coverage of mapped reads on a reference sequence at a single base pair resolution using `bcftools mpileup`, then calls the consensus sequence with `bcftools`, and then filters and converts the consensus to FASTQ format. Some parameter explanations:

0 commit comments

Comments
 (0)