Skip to content

Merged libraries do not show lower-count k-mers #30

@bnavarrodominguez

Description

@bnavarrodominguez

Dear Gene,

I have a large sequencing library that I needed to split into 10 smaller files so I could run FastK on different nodes. Following the instructions in the README, I ran FastK on the split files with the following command:

for file in library_split_*; do mkdir tmp.${file}; FastK -v -t5 -k31 -M50 -T24 -Ptmp.${file} $file; done

This produced a *.hist and a *.ktab file for each *.split.fastq file. I looked at the k-mer count histogram for each split file:

Histex -G library_split_01.hist > library_split_01.histogram
$ head library_split_01.histogram
1       6062202409
2       3370987439
3       1728287765
4       894614808
5       482057568

I then merged the split files using Fastmerge, and generated histograms for the merged k-mer database:

Fastmerge -T12 -t -h library_fastmerged library_split_*ktab
Histex -G library_fastmerged.hist > library_fastmerged.histogram
$ head library_fastmerged.histogram

4       2032698049
5       522131235
6       134785342
7       33514609
8       420971175

I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?

Thanks for your assistance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions