Tiny Ocean: Physically Meaningful Dimensionality Reduction for pCO2 Reconstruction

Поделиться
HTML-код
  • Опубликовано: 7 фев 2025
  • Abstract: Accurate reconstruction of the full pCO2 field is a crucial element in estimating air-sea carbon flux and ultimately understanding the magnitude and trend of the ocean carbon sink. Ocean data used to build machine learning models to reconstruct the full field of pCO2 rely on variables of different nature (spatial, temporal, and physical features), and are distributed inhomogeneously in space and time. Additionally, the dimensionality of data is significant, with over a dozen features and hundreds of thousands of highly correlated data points. As a result, it is beneficial to seek physically meaningful ways to reduce the dimensionality of data for easier manipulation. Clustering methods are promising, but they are often based on the Euclidean distance in the space of features, which is not a good tracer of similarity in the space of the target variable, in this case pCO2. To alleviate this problem, we build a new distance that follows the pCO2 more closely. We introduce the concept of information imbalance, which quantifies the asymmetry of information between different metrics, and use it to evaluate the information content of various candidate distances with respect to the target, the distance in pCO2 space, looking for the “sweet spot” between dimensionality reduction and preservation of information. This new distance can be used to perform physically meaningful clustering and reduce dimensionality along both axes (features and number of data points), to build more agile machine learning models, and to aid visualization and physical interpretation of data.

Комментарии •