Ocean Data Quality Assessment through Outlier Detection-enhanced Active Learning

Improving ocean data quality: enhanced active learning
Training Academy event
29 January 2024 11:00–12:00
Online

Improving ocean data quality: enhanced active learning

On 29 January, the Blue-Cloud team organised an internal training session featuring presentations by Drs. Na Li, Yiyang Qi, Ruyue Xin, and Zhiming Zhao from the University of Amsterdam. The focus was on exploring techniques to enhance the quality of ocean data.

The presentations emphasised the importance of global ocean observation initiatives such as Argo, GLOSS, and EMSO in ocean and climate research. Despite the vast volume of observatory data generated by the Argo network, data quality issues arising from sensor malfunctions and transmission errors necessitate rigorous quality assessment. Existing methods, including machine learning, face challenges due to limited labeled data and imbalanced datasets.

To address these challenges, the Blue-Cloud team proposed the ODEAL framework for ocean data quality assessment, which leverages Active Learning (AL) to reduce the workload of human experts in the quality assessment workflow and utilises outlier detection algorithms for effective model initialisation. The team conducted extensive experiments on five large-scale realistic Argo datasets to evaluate the proposed method, including the effectiveness of AL query strategies and the approach for constructing the initial dataset.

The results indicate that this framework enhances quality assessment efficiency by up to 465.5% with the uncertainty-based query strategy compared to random sampling and reduces overall annotation costs by up to 76.9% using the initial set constructed with outlier detectors.

Blue-Cloud EOV Workbenches

In Blue-Cloud 2026, advanced Workbenches (WBs) are being developed to streamline data handling for Essential Ocean Variables (EOV). These tools enhance data harmonisation, validation, and qualification from various sources. These novel Workbenches leverage cloud tech, AI, and advanced analytics for improved data handling. They are set to be adopted by key initiatives, consistently producing validated data collections, enhancing oceanic simulations. The EOV Workbenches cover Physics, Eutrophication, and Ecosystem-level Variables.

Would you like to watch this recording? Contact us!