The ocean is a difficult ecosystem to observe, as it is constantly evolving and not easily accessible, especially when it comes to the deep sea. It requires several complementary observation systems, ranging from satellite to underwater vehicles, as well as buoys and many others.
As satellites can only observe the first few millimeters of the ocean surface, it is also necessary to employ buoys (in situ data) that can go really deep in the water column. However, they are mostly fixed, and gliders that can move underwater at great depths and also move around are quite expensive. There are research cruises that also collect data, but they can be costly, data can be lost and it is not always possible to ensure continuity for the cruises. With all this in mind, comparing, aggregating processing, inter-comparing, and combining all the data past, present and future is a necessity. Of course, this process can be time and space-consuming, and analysing one data set at a time might cause researchers to lose the 'big picture' (i.e. the ecosystem) or the specific trends impacting their area of study.
The Blue-Cloud added value
The use of Big Data technologies will pave the way for new solutions. Blue-Cloud will compile and process several data resources currently available under different European marine networks, data centers, and research infrastructures, applying big data analysis and machine learning methods in order to obtain FAIR data products (Findable, Accessible, Interoperable, Reusable). These methods include exploring ways of adopting cloud storage, cloud computing, deep learning and neural networks for supporting big data processes for validation, extraction, interpolation, and generation of products.
However, Blue-Cloud doesn't simply focus on data storage, discovery and access services, but also on developing and providing analytical frameworks with Virtual Research Environments (VREs) for setting up and running analytical processes by and for users to support their research.
The Blue-Cloud use cases, coordinated by our partner IFREMER, employ several Big Data Technologies such as No-SQL databases, Data Cubes, and Machine Learning, especially in the two demonstrators “Zoo- and Phytoplankton Essential Ocean Variables Products” and “Marine Environmental Indicators”.
The No-SQL databases (e.g. a time-series) will be used for the visualisation of input data in the studied areas and to compare the results of different data analysing processes. They can be employed for data selection on portals, as they can quickly access several observations within large quantities. Data Cubes are used for data processing or parallel processing, they will also allow us to observe specific areas through different input data and parameter measurements. The Cubes also help with an easier and faster collocation between satellite data and in situ data, such as ships, observatories, buoys, gliders, rovers, or drones. Semantic Web is more useful for machine-to-machine processes as it makes data machine-readable.