Meteo Imp

Evaluation of Kalman Filter for meteorological time series imputation for Eddy Covariance applications

This is developed by Simone Massaro as a master thesis in Ecosystem Analysis and Modelling at the faculty of forest science and forest ecology at the Georg-August-Universität Göttingen, Germany.

A Kalman Filter is implemented in PyTorch for the imputation of meteorological time series in Eddy Covariance context.

Repository Structure

Meteo Imp package

the majority of the code for this project is in the in the lib_nbs folder, which contains a set of jupyter notebooks that uses the nbdev library to generate the meteo_imp package. The documentation can be browsed at https://mone27.github.io/meteo_imp/libs

Installation

The Meteo Imp package can be installed using: pip install -U git+https://github.com/mone27/meteo_imp.git

Analysis

The analysis folder contain all the code that uses the Meteo Imp package to train a Kalman Filter Model and assess the performance in relation to the state-of-the-art methods

Abstract

Eddy Covariance (EC) is a state-of-the-art technique to measure greenhouse gases exchanges. EC towers include measurement of meteorological variables, but due to instrument failures the data is not always available. Many use cases of EC data, especially land surface modelling, require continuous meteorological time series as input. Therefore, it is necessary to impute the gaps in the meteorological time series. ONEFlux, one of the most widely used EC post-processing pipeline, imputes the missing data using either Marginal Distribution Sampling (MDS), which uses other observations from similar meteorological conditions, or ERA-Interim (ERA-I), which is a global meteorological dataset. The imputation performance of those methods is limited for short and medium gaps (up to 1 week), which constitute the majority of EC meteorological gaps.

In this work, I assess an imputation method for meteorological variables based on a Kalman Filter (KF). It has the advantages of combining in the prediction information from the ERA-I dataset, inter-variable correlation and temporal autocorrelation. Moreover, the KF is a probabilistic method, so for each data point the prediction is not a single value but an entire distribution, which provides an interpretable quantification of uncertainty of the model predictions.

I evaluate the KF by comparing the imputation performance with the state-of-the-art approaches (MDS and ERA-I) using data from the FLUXNET 2015 site of Hainich (DE-Hai) with gaps up to one week long. The KF outperforms the state-of-the-art approaches across all analyzed variables, except for precipitation, for which all methods are comparable. I observed an average reduction of the imputation error of 33 % compared to ERA-I and 57 % compared to MDS, when excluding precipitation. I further explore aspects that influence the performance of the KF: in general the error increases with the gap length only up to 24 hours, the use of ERA-I data improves the model predictions and the inter-variable correlation is effectively utilized. The main limitations of KF approach are: 1) the best performance is achieved only when fine-tuning the model parameters to the specific conditions of the gap, which increases the deployment complexity; 2) the current implementation of the KF is affected by numerical stability issues, which in case all variables are missing limits the maximum gap length to 15 hours; 3) careful initialization of the KF parameters and selection of the training conditions are required to mitigate the difficulty in learning the models parameters. However, I expect that all those issues can be resolved or at least significantly mitigated by further research.

full text available here

Additional code

The development branch of the repository contains all the intermediate steps for development, in particular:

an imputation method for meteo time series using Gaussian Processes Factor Analysis (GPFA)
a simple KF imputation method based on the pykalman library