Recent metagenomic studies have revealed that marine plankton is far more diverse than previously thought (Carradec et al. 2018, Salazar et al. 2019, Duarte et al. 2020), with hundreds of thousands of genetically distinct taxa and more than 116 million genes documented for eukaryotic plankton and 47 million genes for prokaryotes. However, the taxonomy and/or function of more than half of the planktonic ‘omic’ sequences is still unknown. These unprecedented amounts of data on planktonic communities call for innovative, data-driven approaches to quantify and observe their biogeographic importance (Faure et al. 2021).
Marine plankton play a fundamental role in the global biogeochemical cycles and marine food webs. They are also a sentinel of environmental changes. Gathering more information about their genomics can help us better describe plankton distributions at global scale and further understand their response to environmental changes.
The Blue-Cloud demonstrator Plankton Genomics responds to this challenge by mining the rich metagenomic and metatranscriptomic data collected during the Tara Oceans mission and combining it with in situ or climatological environmental information to infer the function, taxonomy and distribution of the large portion of unknown sequences. In this article, we are going to explore the main results of the demonstrator and its intended evolution.
The demonstrator is led by the European Bioinformatics Institute (EMBL-EBI) and created by the Faculty of Sciences at Sorbonne University.