Call for papers! Submit your paper to the "International Journal of Data Science and Analytics" before the 30th of June 2024!

Blue-Cloud in the standardisation ecosystem

Standards are at the core of the EU single markets as they facilitate the adoption, collection, transfer and reuse of data. While standards bring considerable benefits in open science, their adoption is still not widely exploited due to lack of awareness of the advantages brought by them. 
This issue has been presented in the new EU Strategy on Standardisation, drawn up by the European Commission. Indeed, one of the most critical ‘standardisation urgencies’ highlighted in this report is the gap of adopting standards in the area of data interoperability, data sharing and data re-use in support of the Common European Data Spaces.
 

Blue-Cloud contributions to standards

The Blue-Cloud’s Data Discovery and Access Service (DD&AS), Virtual Research Environments (VREs) and related services and catalogues are standards-based interoperable infrastructures. The Blue-Cloud’s effort on interoperability and harmonisation via global standardisation can be assessed and reproduced by other research communities too. Indeed, the adoption of standards allows the DD&AS and VRE to exchange structured data across several information systems.

 

Blue-Cloud DD&AS

The Data Discovery and Access Services facilitates discovery and retrieval of data sets for external and internal Blue-Cloud users. The data sets concern measurement data and derived data products that are managed in Blue Data Infrastructures (BDIs), which are interacting machine-to-machine with the DD&AS to serve federated discovery and access. 
The BDIs, federated in the Blue-Cloud DDAS are currently:

  • SeaDataNet CDI data service
  • SeaDataNet data products service
  • EMODnet Chemistry data products service
  • EurOBIS collections service
  • Euro-Argo and Argo GDAC data service
  • ELIXIR- European Nucleotide Archive (ENA) data service
  • EcoTaxa image service
  • ICOS Marine data service
  • SOCAT - Surface Ocean CO2 Atlas service   

Together these BDIs manage more than 10+ million data sets and data products for physics, biology, geology, chemistry, bathymetry, genomics, biodiversity, and geophysics, which can be interrogated and accessed through the DD&AS common interface.  The query mechanism has a two-step approach:

  • At collection level, a common metadata profile, following the metadata standard ISO 19115 - 19139, is generated for each of the federated BDIs by using the DAB brokerage service. This  allows to identify interesting data collections, with free search, spatial and temporal criteria on a common catalogue for all federated BDIs.
  • As the next step, users can choose to get more specific data sets at a granule level from a selected BDI at detail level, adding search criteria, specific for that selected BDI. Finally, users can compose and submit shopping requests for associated data sets, which then can be downloaded from their MyBlueCloud dashboard. A shopping basket can hold data sets from multiple BDIs. This approach also allows to add more BDIs and fitting for adding semantic brokering.  

To illustrate the principle: the SeaDataNet CDI service manages circa 800 data collections at 1st level which result in circa 2.6 million individual data sets at the 2nd level. 

 

Blue-Cloud VRE

The Data Analytics Framework includes the services required for executing analytics methods and processes provided by the scientists. It uses, in a transparent way, the power of the underlying distributed computing cloud and of a plethora of standard statistical methods provided out of the box to make analytical computations.  In this way, the Blue-Cloud VRE resembles a modern and easy-to-use computational platform tailored for scientific data analysis and transforms any scientific result into a reusable, repeatable, and shareable product that can be verified, analysed and compared with previously generated results. This is achieved thanks to the automatic generation of provenance information, which is documented in the Blue-Cloud Catalogue in a standard way. This represents a driving force that promotes open-science and the adoption of standards.
The Blue-Cloud VRE is based on the OAuth2 standard for authentication and authorisation, which is the same standard exploited by the D4Science infrastructure.
Standardisation approaches are also implemented in the development and operation of the Blue-Cloud Vlabs.

 

Blue-Cloud VLabs: contributions to standardisation

VLab

Contribution to standardisation
Zoo and Phytoplankton EOV products

Data products in this Vlab are created using reusable notebooks/scripts in R, Python and Julia, ensuring the reproducibility of the products
This opens an opportunity to develop new standardised computational environments (e.g. containerised), where the input data and scripts can be stored together.

Plankton Genomics The VLab is composed of a notebook and has functions, dependencies and input data. The notebook and functions are bundled together through pre-installed dependencies in D4science.
Marine Environmental Indicators

The data are consolidated and improved using standards related to the data access, data delivery, data publication, and data/metadata formats.

Global Record of Stocks and Fisheries

The data collected by this VLab combines resources with semantic technologies for a global KnowledgeBase and geospatial technologies to manage fisheries data:

  • The core facilities of the Global Record for Stocks and Fisheries knowledge base (GRSF) collect and semantically integrate using the ISO 21127:2014 standard and the CIDOC-CRM ontology as well as its family of extensions (e.g. MarineTLO).
  • The Fisheries Atlas is a comprehensive solution based on standards for metadata, especially in the geospatial domain. These include: Dublin Core metadata terms (as main metadata model in geoflow), ISO 19115-19119/19139, ISO 19119/19139, OGC’s WMS, WFS, CSW, WPS. These are leveraged through IT middleware (R packages & scripts) to run the workflows. The results of the data harmonisation and standardisation are accessible through the metadata driven OpenFairViewer and other public protocols for data access.
Aquaculture Monitor This VLab uses the approach used by the Fisheries Atlas and thus builds on KER1 resources, combined with a solid metadata driven framework for geospatial data. In addition, it adds CLS domain specific analytics for remote sensing data; a jupyter notebook to manage detection of aquaculture cages based on S1 and another on land-type classification based on S2.