Skip to content

Issue : fastq data cat into output/tmp/reads having error. #7

@csittk

Description

@csittk

I had tried to use both fastq.gz and fastq files on the origami-alignment.
The first step origami-alignment unzip and cat the fastq files to the output/tmp/left-reads.fq and output/tmp/right-reads.fq had always come with an error

This is cutadapt 1.15 with Python 2.7.6
Command line parameters: -n 3 --overlap 10 -e 0 --discard-untrimmed -m 20 -a CTGCTGTCCG -A CTGCTGTCCG -o output/tmp/l_same_aa.fq -p output/tmp/r_same_aa.fq output/tmp/left_reads.fq output/tmp/right_reads.fq
Running on 1 core
Trimming 2 adapters with at most 0.0% errors in paired-end mode ...
cutadapt: error: In read named 'SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50': length of quality sequence (48) and length of read (50) do not match
This is cutadapt 1.15 with Python 2.7.6
Command line parameters: -n 3 --overlap 10 -e 0 --discard-untrimmed -m 20 -a CTGCTGTCAT -A CTGCTGTCAT -o output/tmp/l_same_bb.fq -p output/tmp/r_same_bb.fq output/tmp/left_reads.fq output/tmp/right_reads.fq
Running on 1 core
Trimming 2 adapters with at most 0.0% errors in paired-end mode ...
cutadapt: error: In read named 'SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50': length of quality sequence (48) and length of read (50) do not match
This is cutadapt 1.15 with Python 2.7.6
Command line parameters: -n 3 --overlap 10 -e 0 --discard-untrimmed -m 20 -a CTGCTGTCCG -A CTGCTGTCAT -o output/tmp/l_diff_ab.fq -p output/tmp/r_diff_ab.fq output/tmp/left_reads.fq output/tmp/right_reads.fq
Running on 1 core
Trimming 2 adapters with at most 0.0% errors in paired-end mode ...
cutadapt: error: In read named 'SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50': length of quality sequence (48) and length of read (50) do not match
This is cutadapt 1.15 with Python 2.7.6
Command line parameters: -n 3 --overlap 10 -e 0 --discard-untrimmed -m 20 -a CTGCTGTCAT -A CTGCTGTCCG -o output/tmp/l_diff_ba.fq -p output/tmp/r_diff_ba.fq output/tmp/left_reads.fq output/tmp/right_reads.fq
Running on 1 core
Trimming 2 adapters with at most 0.0% errors in paired-end mode ...
cutadapt: error: In read named 'SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50': length of quality sequence (48) and length of read (50) do not match
This is cutadapt 1.15 with Python 2.7.6
Command line parameters: -n 3 --overlap 10 -e 0 --discard-trimmed -m 20 -o output/tmp/l_neither.fq -p output/tmp/r_neither.fq SRR6010260_R1.fastq SRR6010260_R2.fastq

I checked with the raw fastq files the number of quality sequence and length of read for 'SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50' is correct. However, in the output/tmp/right-reads.fq files, it removed 2 base of quality sequence and result in error.

fastq raw data

@SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50
GTGCTGGGGTCCATGTGGGCCAGATGCCCTGGGCCCTGGGCAGGGCCAGG
+SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50
@d0@0<<<C/<D<C1<1<<1<@ghhc??CGC1E<@/1CGEFCDHHH@E??

output/tmp/right-reads.fq

@SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50
GTGCTGGGGTCCATGTGGGCCAGATGCCCTGGGCCCTGGGCAGGGCCAGG
+SRR6010260.33 WIGTC-HISEQ:1:1212:1644:2156 length=50
@d0@0<<<C/<D<C1<1<<1<@ghhc??CGC1E<@CGEFCDHHH@E??

It appears that the cat process of fastq files had removed "\1" from the fastq data?
Please check on this issue. I am not sure how it affect downstream alignment if the cutadapt step does not works properly.

Thanks.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions