Tools for DNA metabarcoding data analysis

26 novembre 2013

Analyzing NGS data in a DNA metabarcoding context

DNA metabarcoding relies heavily on the use of next-generation sequencing (NGS), and thus calls upon the ability to deal with huge sequence datasets. The OBITools package satisfies this requirement thanks to a set of programs specifically designed for analyzing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to setup tailored-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses.

Barcode inference and testing

To be used as a DNA barcode, a genome locus should vary among individuals of the same species only to a minor degree and it should vary among species very quickly. From a practical point of view, a barcode locus should be flanked by two conserved regions to design PCR primers.

/ecoPrimers/ is a software that finds such locus from a set of genomic sequences and evaluate their quality in terms of universality (taxonomic coverage) and in terms of taxonomical discrimination capacity (barcode specificity).

/ecoPCR/ is an electronic PCR software. It helps to estimate Barcode primers quality. /ecoPCR/ output can be post-process with the /OBITools/ to compute barcode coverage and barcode specificity.

Fast and exact comparison and clustering of sequences

/sumaclust/ and /sumatra/ aim to compare sequences in a way that is fast and exact at the same time. These tools have been developed to be adapted to the type of data generated by DNA metabarcoding, i.e. short DNA sequences. /sumaclust/ and /sumatra/ are designed to be used at two key steps in the analysis of metabarcoding data : the filtering of erroneous sequences (removing of PCR/sequencing errors) and the clustering of sequences into MOTU (Molecular Operational Taxonomic Unit).

