Biogenomics

About

The European Nucleotide Archive (ENA) provides a comprehensive open record of the world's nucleotide sequencing information and a platform for the management and analysis of sequence and related data. Covering raw sequencing data, sequence assembly information, functional annotation and a host of further data types, content is measured in millions of taxa, hundreds of thousands of sequenced libraries and petabytes of storage. ENA is operated by the EMBL European Bioinformatics Institute (EMBL – EBI). ENA is designated by the ELIXIR infrastructure both as a Core Data Resource, and a Deposition Database.

ENA’s portfolio of services include user support (helpdesk, training), web sites (data submissions, browser with search, explore and download functions), RESTful interfaces (data submissions, data discovery, metadata interrogation) and a host of downloadable utilities to support data submissions and access. As a founding member of the celebrated International Nucleotide Sequence Database Collaboration (INSDC), ENA drives international standards and best practice in its domain.

EMBL-EBI operates API’s for ENA discovery and ENA data retrieval which seem very suitable endpoints for connecting to the Blue-Cloud data discovery and access service. The ENA system contains many data types / classes and a huge volume of data, which are only partly marine related. Blue-Cloud should focus on data and information relevant for the marine domain and on data types such as samples and their analyses. Moreover, the ENA system offers several algorithms / pipelines for processing data, which might be used in a ‘smart’ way for the Blue-Cloud.

Type & number of data sets

ENA covers many data types in a number of interlinked database tables. A list can be found at https://www.ebi.ac.uk/ena/portal/api/results?dataPortal=ena

Data can be retrieved in different formats and with easy file download options through RESTful services: EMBL Flatfile format, FASTA format for sequences and XML Format. Details about formats: https://ena-browser-docs.readthedocs.io/en/latest/browser/search/advanced.html#downloadena-records

Core Services

The ENA browser brings together a set of services via web interfaces, build upon underlying APIs. Of relevance for Blue-Cloud are two services:

How to use the API’s and build machine-to-machine services can be found in the documentation of the ENA Portal API: https://www.ebi.ac.uk/ena/portal/api/doc

Figure 1: Illustration of growth of selected data types since March 2016


Blue-Cloud partners involved