In the MeteoSwiss data warehouse (DWH), all measurement data are consolidated, prepared for the users and stored long-term in a standardised form. The preparation process includes aggregation and calculation of meteorological parameters, quality control consisting of checks for completeness and plausibility, as well as homogenisation of long-term series with the aim of providing the data users with reliable data series.
Aggregation and calculation
Aggregation means the identification of a time series with a lower frequency than the initial data series. Functions such as averaging, summation or extreme-value searches are used for aggregation. For example, in temporal aggregation, measurements taken every ten minutes are aggregated to hourly, daily, monthly and annual values. In spatial aggregation, the measurements from weather stations located in a specific area are included, and a value for that area is determined.
Calculation means the determination of a derived parameter at the frequency of the initial data. For calculation purposes, functions including pressure reduction, difference calculation, determination of a ratio, and unit conversions are used. Derived parameters include air pressure reduced to sea level, the foehn index or wind speed in kn or km/h, for example.
Completeness check
A completeness check detects gaps in the measurement data. Shorter time gaps (up to 1h/6h) at a weather station are automatically filled by means of a ten-minute-based interpolation process. The length of the gap that is permitted depends on the measurement parameter in question. Longer gaps can be filled automatically for air pressure than for precipitation, for example, since there is less spatial and temporal variation for air pressure than there is for precipitation. The values that are automatically filled in are flagged with a marker so that it is clear in retrospect how the value was created.
Larger gaps that are not automatically interpolated can be filled manually with a knowledge of the weather situation and/or by using other weather stations for comparison. As a rule, gaps of up to 24h are interpolated on a ten-minute basis, gaps of >1 day on a daily basis and gaps of >10 days on a monthly basis. The manually interpolated gaps are also flagged accordingly.
Plausibility check
In a plausibility check, the measurements are subjected to rule-based and model-based tests. Values that violate one or more of the tests are flagged and classified as implausible or doubtful. Implausible values are automatically eliminated from the data series and, where possible, automatically interpolated. The measurements that are flagged as doubtful are assessed by a specialist the following day and, if necessary, manually corrected or confirmed.
Rule-based tests are based on logical, mathematical rules.
- Hard limit tests detect physically impossible and thus implausible values (e.g. wind speeds of >100 m/s).
- Soft limit tests are based on station-specific limits and indicate whether a value is climatologically doubtful (e.g. temperature in February in Zurich >25°).
- Consistency tests include comparisons with redundant measurements, comparisons within the same measurement location (e.g. a weather station reporting precipitation and sunshine at the same time) or comparisons within the measured variable (e.g. mean wind speed < gust peak).
- Variability tests can be used, for example, to detect frozen wind sensors (e.g. variability of wind speed during a 6-hour period <0.1 m/s) or extreme differences between two measurements (e.g. a humidity difference of >30% between two ten-minute values).
- Record value tests are triggered when a measurement is among the five highest or lowest measurements ever recorded at a weather station.
Model-based tests
These tests are based on statistical models that are trained using the manually processed data sets. They can include both historical and current data or predictive comparison variables, as well as meta data (e.g. station elevation).
Homogenisation
For selected and very significant measurement series, the data are homogenised months or years after the measurements are taken. Homogenisation involves the removal and correction of systematic measurement errors and data jumps in the data series, which can occur due to station shifts, measuring device errors or measuring device changes. Homogenised data are the highest-quality level of data.