The taxon hypothesis paradigm: On the unambiguous detection and communication of taxa

Urmas Kõljalg*, Henrik R. Nilsson, Dmitry Schigel, Leho Tedersoo, Karl Henrik Larsson, Tom W. May, Andy F.S. Taylor, Thomas Stjernegaard Jeppesen, Tobias Guldberg Frøslev, Björn D. Lindahl, Kadri Põldmaa, Irja Saar, Ave Suija, Anton Savchenko, Iryna Yatsiuk, Kristjan Adojaan, Filipp Ivanov, Timo Piirmann, Raivo Pöhönen, Allan ZirkKessy Abarenkov

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

107 Citations (Scopus)

Abstract

Here, we describe the taxon hypothesis (TH) paradigm, which covers the construction, identification, and communication of taxa as datasets. Defining taxa as datasets of individuals and their traits will make taxon identification and most importantly communication of taxa precise and reproducible. This will allow datasets with standardized and atomized traits to be used digitally in identification pipelines and communicated through persistent identifiers. Such datasets are particularly useful in the context of formally undescribed or even physically undiscovered species if data such as sequences from samples of environmental DNA (eDNA) are available. Implementing the TH paradigm will to some extent remove the impediment to hastily discover and formally describe all extant species in that the TH paradigm allows discovery and communication of new species and other taxa also in the absence of formal descriptions. The TH datasets can be connected to a taxonomic backbone providing access to the vast information associated with the tree of life. In parallel to the description of the TH paradigm, we demonstrate how it is implemented in the UNITE digital taxon communication system. UNITE TH datasets include rich data on individuals and their rDNA ITS sequences. These datasets are equipped with digital object identifiers (DOI) that serve to fix their identity in our communication. All datasets are also connected to a GBIF taxonomic backbone. Researchers processing their eDNA samples using UNITE datasets will, thus, be able to publish their findings as taxon occurrences in the GBIF data portal. UNITE species hypothesis (species level THs) datasets are increasingly utilized in taxon identification pipelines and even formally undescribed species can be identified and communicated by using UNITE. The TH paradigm seeks to achieve unambiguous, unique, and traceable communication of taxa and their properties at any level of the tree of life. It offers a rapid way to discover and communicate undescribed species in identification pipelines and data portals before they are lost to the sixth mass extinction.

Original languageEnglish
Article number1910
Number of pages24
JournalMicroorganisms
Volume8
Issue number12
DOIs
Publication statusPublished - Dec 2020

Bibliographical note

We thank the UNITE Community for collecting, identifying, and DNA sequencing specimens and other biological samples, and INSDC, and especially NCBI, for open data services which included DNA sequences, taxon names, and associated properties. We are much obliged to those who did hard work for generating and uploading rDNA ITS sequences into INSDC databases. Without your effort the TH paradigm would never have existed in its current form. We thank Markus Döring (GBIF/Catalogue of Life) for his work on the integration of the Species Hypotheses and Barcode Index Numbers in the GBIF Taxonomic Backbone. We thank Donald Hobern and Tony Kuo (International Barcode of Life Consortium) for valuable discussions about linking Barcode Index Numbers to the GBIF Taxonomic Backbone. We thank Thomas Pape (Natural History Museum of Denmark, University of Copenhagen) for consultation on registration of zoological names.

Keywords

  • Biodiversity informatics
  • Discovery of species
  • DNA taxonomy
  • Metabarcoding
  • Microbial species
  • Species hypotheses
  • Taxon hypotheses
  • Taxonomy

Fingerprint

Dive into the research topics of 'The taxon hypothesis paradigm: On the unambiguous detection and communication of taxa'. Together they form a unique fingerprint.

Cite this