The exponential growth of data on BOLD can make it challenging for experts and end-users to curate reference libraries. To address this challenge, partners of the Biodiversity Genomic Europe (BGE) project developed an automated pipeline and a manual curation tool.
BOLD Library Curation Pipeline
The BOLD Library Curation Pipeline has been developed to help automate the analysis of data on BOLD and to limit the need for manual curation by experts only where it is truly necessary.
The pipeline processes and curates BOLD public barcode vouchers and sequence data. It implements standardised quality assessment criteria developed by BGE to evaluate and rank DNA barcode sequences for library curation.
Key features of the pipeline are:
- Comprehensive Quality Assessment: Evaluates specimens against 16 standardised criteria, including metadata completeness, voucher information, sequence quality, and phylogenetic analyses
- Advanced Phylogenetic Analysis: Includes OTU clustering for genetic diversity assessment
- BAGS Species Assessment: Automated species-level quality grading system
- Geographic Representation: Country representative selection for balanced geographic sampling
- Scalable Architecture: Family-level database splitting for efficient analysis of large datasets
- FAIR Compliance: Built with reproducibility and provenance tracking using Snakemake workflows.
Library Curation Tool
The Library Curation Tool is a decentralised browser interface designed to support taxonomic experts in building a curated DNA barcode reference library for European animal species. It enables manual review and validation of BOLD records and metadata that the automated bioinformatics pipeline has pre-processed. The curation datasets are available here, and a video tutorial on how to use the tool is available here.
Both the pipeline and the tool were presented at the July Community Meeting, respectively by Ben Price (Natural History Museum, UK) and Stephan Kuehbander (SNSB).
If you have any comments or suggestions about either the tool or the pipeline (or both), please send us an email – we will get back to you as soon as possible!
Photo by Nana Smirnova on Unsplash.