Skip to content

Comparative Genomics Exercise 6: Taxonomic profile of metagenomic samples

Jaime Huerta-Cepas edited this page Dec 16, 2025 · 11 revisions

Taxonomic profile of metagenomic samples

Running mOTUs taxonomic profiler

Metagenomic samples come from sequencing simultaneously all the DNA present in a sample, and usually contain thousands of sequences coming from different microbial species. Usually, the first approach for characterizing a metagenomic sample is to describe which species are present and in which abundance (ie. to build a taxonomic profile of the sample).

mOTUs is a program widely used for measuring the abundance of different microbial species in metagenomic samples. They have built a database with marker genes for more than 7700 microbial species, against which the software maps the reads from the samples and counts the number of reads matching every marker gene from each species. mOTUS output is a tab-separated file with the name of the screened species and their abundance in the sample. mOTUS can tell us the relative abundance (abundance of each species normalized by the total abundance in the sample) or the absolute abundance (number of reads mapped to each reference species; -c flag when calling mOTUs). You can also change the taxonomic level of your screening with the -k flag (eg. -k phylum will only measure the number of reads assigned to each phylum, not to each species).

Taxonomic profiles are constantly used for comparing samples with different origins and locating species enriched under some circumstances. For instance, they have been widely used to locate taxa enriched in samples from colorectal cancer patients compared to samples from control patients. We will run mOTUs on three human gut samples: two from control patients and one from a CRC patient.

$ motus profile -s /home/compgenomics/metagenomics/data/ERR688359.fastq.gz -o CTR.motus -t 20
$ motus profile -s /home/compgenomics/metagenomics/data/ERR688435.fastq.gz -o CRC.motus -t 20
$ motus profile -s /home/compgenomics/metagenomics/data/ERR688360.fastq.gz -o CTR_1.motus -t 20

We can learn many things from our samples from the mOTUs output. For instance, we can measure the alpha diversity (microbial diversity within the sample). There are several alpha diversity measurements. The simplest one consists of counting the total number of species present in the sample, which we can do directly on the mOTUs output.

$ perl -F"\t" -lane 'print if $F[1]>0' CTR.motus | wc -l
  • How many species do you detect in each sample?
  • Which one is the most abundant?
  • Can we compare taxonomic profiles from samples directly from the mOTUs result?

Clone this wiki locally