Experts from the partners involved provided information and examples about the technical and conceptual issues they are tackling thanks to the Blue-Cloud framework.
This demonstrator will compile and process several existing data resources that are available under different European marine networks, data systems and research infrastructures and apply big data analysis and machine learning (e.g. neural networks) methods to create fit for purpose data products, using an open transparent methodology that runs in Blue-Cloud. It will focus on biological Essential Ocean Variables (EOV’s) on plankton.
A method for creating gridded climatologies from zooplankton observations was also presented, developed by the GHER of the University of Liège using the variational inverse method and a neural network.
“It was very motivating to see the interaction of the participants. There were some very interesting questions, and with the results of the survey we can understand better the needs of the community and how to progress our work in the demonstrator. Also, the fact that 61% of participants are interested in re-using the models developed with their own data shows the great potential of Blue-Cloud as a collaborative research environment to share data and re-use workflows”. Patricia Cabrera (VLIZ)
The webinar welcomed more than 100 participants, featuring a solid presence from France, Italy and other European countries, but also from Africa, Asia, and the Americas. More than 70% of the attendants were from an academia/research background, which also reflected in the types of questions discussed in the highly interactive Q&A session. We have collected some of the most interesting questions and answers, for the benefit of those who could not join the webinar.
What is the weight of each input variable in the prediction of chla?
Renosh P.R. (LOV): We did not yet compute the sensitivity of the model to each input.
How to validate HPLC pigment data with 3d?
Renosh P.R. (LOV): We have a database that gathers reference HPLC pigment profiles from all over the ocean. We then estimate Chl from the diagnostic pigments using the Uitz et al. (2006) equations. Then, we to extract the matchup of remote sensing reflectance, PAR, SLA and PHYSICAL data corresponding to these stations and use it as inputs into our model to derive a vertical profile of Chl. Then, the Chl retrieved from our model is compared to the HPLC-derived Chl.
Is there any specific reason why you are following the Chl conc deeper than say 200m? Would your neural net output be different if you used only the information higher than a specific depth - say 200m?
Renosh P.R. (LOV): In the global ocean, some profiles in well-mixed waters have a mixed layer depth (MLD), where phytoplankton biomass is homogeneous and so significative, greater than 200 m. The MLD can be in some cases greater than 1000 m depth. That is why it is important to retrieve Chl below 200 m.
Why do you calculate Chla below 300 m depth, i.e. are there significant differences between 300 and 1000m depth?
Renosh P.R. (LOV): Yes, there are differences for some regions/seasons. In the global ocean, some profiles in well-mixed waters have a mixed layer depth (MLD), where phytoplankton biomass is homogeneous and so significative, greater than 200 m. The MLD can be in some cases greater than 1000 m depth.
What aspects of the Blue-Cloud approach support your work?
Gert Everaert (VLIZ): Having the Virtual Research Environment available drastically decreased our computational time needed to calibrate the model. So, for people not having access to high-performance computing, this really helps.
The model feeds on worldwide data. Is it possible to use the same method on a more limited dataset, and how?
Renosh P.R. (LOV): This method is designed only for open ocean waters (we used a bathymetric mask of 1500 m depth). If you want to compare your data with the estimates from our model, we recommend to make sure your pigment stations are from open ocean waters.
How could someone that has a time series of e.g. phytoplankton collaborate with Blue-Cloud?
Patricia Cabrera (VLIZ): By providing your data through one of the pertinent Blue Data infrastructures, such as EMODnet Biology/EurObis. Then this data can be accessed from the Blue-Cloud VRE, and the methodologies developed in this demonstrator can be applied to other datasets.
Is data exclusively from LifeWatch or did you combine with other data sources?
Gert Everaert (VLIZ): For now only LifeWatch, but our aim is to also work with the data products of the other partners that are currently being developed.
I have data of pigment in the Bay of Bengal region but it is near to the coast, is it possible to compare with remote sensing?
Renosh P.R. (LOV): This method is designed only for open ocean waters (we used a bathymetric mask of 1500 m depth). If you want to compare your data with the estimates from our model, we recommend making sure your pigment stations are from open ocean waters.
Blue-Cloud Virtual Research Environments
How can the "VRE'' solution presented here facilitate collaboration (with other people with skills in programming, but also with other colleagues without such skills)? Are the resulting "workflows" accessible and reusable publicly?
Patricia Cabrera (VLIZ): Yes, the workflows will be available publicly to users in the Blue Cloud Virtual Lab in the near future. The scripts are written to be reusable, and the Blue-Cloud VRE allows users to access the data and to (re)use the methodologies. Users without technical knowledge will be able to change input data/parameters and run the analyses. The scripts are documented and explained, so also non-technical people will be able to learn and see the different steps of the analysis in progress.
Here we see "workflow" as the content of Jupyter notebook if I'm not wrong. Considering dependencies, how are "declared" all dependencies + what is behind "full snapshot of dependency tree". Did you use a dependency manager and/or containers to do so?
Alex Barth (ULiege): In Julia, the full dependency tree (i.e. direct dependencies and dependencies of dependencies) are saved in a manifest file (Manifest.toml). The Julia package manager can instantiate from such a manifest file to replicate an exact environment (with all used Julia packages including all libraries). The instantiation of an environment is automatic if you use Jupyter notebooks. More information is available at https://julialang.github.io/Pkg.jl/v1/toml-files/#Manifest.toml
Considering workflows accessibility, Blue Cloud Virtual Lab allows exporting workflows using standards like CWL, RO-CRATE or others? Or workflows are all "Jupyter notebook" oriented so exportable in ypnb or something like that?
Patricia Cabrera (VLIZ): In this demonstrator, we will have notebook-oriented workflows; jupyter notebooks (in python/julio) or Rmarkdown files.
Considering "scripts", is there any review of the accessible scripts? Or "good practices" / SDK, a dedicated community/helpdesk to contribute to Blue-Cloud Virtual labs?
Patricia Cabrera (VLIZ): The scripts will be open and accessible soon, so anyone can review and adapt the script, or suggest improvements.
During the webinar, our experts asked a series of questions to the audience, providing an interesting overview of its composition and of researchers' expectations from Blue-Cloud. Please navigate the infographic below to see the results.
On Thursday 3 June, Blue-Cloud is co-hosting a joint workshop with the All-Atlantic Ocean Research Alliance (AANChOR), the G7 Future of the Seas and Oceans Initiative, and the two Horizon 2020 projects iAtlantic and AtlantECO, as a side-event at the All-Atlantic 2021.
The "Plankton Genomics: Multidisciplinary data mining to assess plankton distributions" webinar took place on 23 April 2021. It welcomed more than 70 participants and gave the opportunity to learn more about the Blue-Cloud Plankton Genomics demonstrator.
Read the highlights and watch the webinar recording!