This eupp package provides access to a variety of gridded data sets provided within the scope of the European Post-Processing benchmark project.

The gridded data sets consists of (re)analysis data used as the gridded ground-trough in some scenarios, deterministic and ensemble forecasts for the training and test period defined within the project as well as hindcasts (or reforecasts) to be worked with. While the different data sets differ in form and extent, the eupp package provides a uniform interface to download and–to some extent–process the data.

Purpose of this article

This article (Getting started with gridde data) shows the main use the eupp_*_gridded() functionality with some minimal examples on how to working with the data. Therefore, different types of gridded data sets will be used in different situations. Dedicated articles are available highlighting specific characteristics and explicit examples for the different types of data. Namely:

Note that these articles will often refer back to this ‘getting started’ article as most functions/procedures work the very same independent of the type of gridded data.

Underlying concept

The data set has been designed and prepared by colleagues at the RMI in Brussels (part of part of the R&D Department of the Royal Meteorological Institute of Belgium). The gridded data set consists of different ECMWF products (see LICENSE) with access granted via the europeanweather.cloud S3 bucket.

All gridded data sets are stored as GRIB version 1 files, alongside with a GRIB index file. These files can technically be accessed directly, however, this may be inconvenient for most/some. Thus, the eupp package provides an interface to download the data.

Rough scheme on the download/processing process.

Rough scheme on the download/processing process.

Independent of the product or subset, the procedure for all products is the same:

  1. Define what data should be downloaded (eupp_config()).
  2. Download/retrieve the data (GRIB version 1, NetCDF, stars.

This article contains a series of links to the article “Gridded data: Advanced” not required to follow as casual users but might be helpful to show some insights to more advanced users, programmers, and supporters.

Under the hood, the eupp package performs a series of intermediate steps for (2) to achive the goal.

  1. Defining the GRIB index files required to identify the necessary GRIB messages
  2. Downloading and parsing the GRIB index files to identify files and byte ranges
  3. Partially downloading the GRIB files (required messages via curl) and stores the requested messages in a new GRIB version 1 file.
  4. If a NetCDF file has been requested: making required manipulations on the GRIB file and converting it to NetCDF, wherefore ecCodes needs to be installed.
  5. If a stars object has been requested: read the NetCDF file. This goes trough the intermediate step of creating a NetCDF file; thus ecCodes is necessary.

Define dataset to be downloaded

Step one Before starting downloading data, a configuration object must be created using eupp_config() which contains the specification of the data to be retrieved.

# Loading the package
library("eupp")

# Create custom configuration
conf <- eupp_config(product   = "forecast",
                    level     = "surf",
                    type      = "ens",
                    date      = "2017-07-01",
                    parameter = c("cp", "2t"),
                    steps     = c(24L, 240L), # +1 and +10 days ahead
                    cache     = "_cache")     # optional; caching grib index

Getting inventory

Typically not done by the end-user but handy to see what messages will be downloaded or to have a look at available messages before downloading the data itself is to look at the GRIB inventory.

inv <- eupp_get_inventory(conf)
head(inv)
##                                                              path domain
## 529       data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g
## 546       data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g
## 2729      data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g
## 2746      data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g
## 115371 data/fcs/surf/EU_forecast_ens_surf_params_2017-07-01_0.grb      g
## 115388 data/fcs/surf/EU_forecast_ens_surf_params_2017-07-01_0.grb      g
##        levtype step_char param class type stream expver leg_number    offset
## 529        sfc        24    2t    od   cf   enfo   0001          1  12191040
## 546        sfc        24    cp    od   cf   enfo   0001          1  12602400
## 2729       sfc       240    2t    od   cf   enfo   0001          1  63022200
## 2746       sfc       240    cp    od   cf   enfo   0001          1  63433560
## 115371     sfc        24    2t    od   pf   enfo   0001          1 609540720
## 115388     sfc        24    cp    od   pf   enfo   0001          1 609952080
##        length param_id number       init step      valid
## 529     23412      167      0 2017-07-01   24 2017-07-02
## 546     23412      143      0 2017-07-01   24 2017-07-02
## 2729    23412      167      0 2017-07-01  240 2017-07-11
## 2746    23412      143      0 2017-07-01  240 2017-07-11
## 115371  23412      167      1 2017-07-01   24 2017-07-02
## 115388  23412      143      1 2017-07-01   24 2017-07-02
dim(inv)
## [1] 204  17

In this case the configuration (conf) defines a set of 204 messages to be processed/downloaded. To see what messages are available, one can simply set up a configuration for a specific product/level/type/date but not specifying steps or parameters. This will return the full inventory with all available parameters and steps.

Getting data: GRIB format

From eupp_get_inventory() we know that there are 204 fields matching our configuration. eupp_download_gridded() allows us to retrieve the data in the original GRIB version 1 file format by specifying output_format = "grib".

The function will first download/parse the GRIB index file (uses cache if specified) to know which GRIB messages are required given the configuration (conf) before starting to download the requires messages. All messages matching the configuration will be stored in one single file specified by output_file (GRIB version 1 file format).

eupp_download_gridded(conf, output_file = "_test.grb", overwrite = TRUE)

Alongside with the GRIB vile ("_test.grb") an .rds file "_test.grb.rds" will be stored containing the GRIB inventory (meta information about the fields). Whilst not really required this allows to interpolate the GRIB files without the need to have ecCodes to be installed (see next section).

Interpolate GRIB files

The eupp package allows to interpolate GRIB data directly. Commonly this is done using additional libraries which are able to read the GRIB meta information (index) such as the ecCodes.

stars can also read GRIB files directly (via rgdal), it does, however, not return this meta information. eupp_interpolate_gridded() thus does the following:

  • Checks if the .rds file exists alongside the GRIB file to be interpolated (see previous section). Uses this information to perform interpolation (does not require ecCodes).
  • If the .rds does not exist, grib_ls (ecCodes) is called to create the inventory/index from the GRIB file.

Currently, eupp_interpolate_gridded() only allows to interpolate one or multiple points (POINT features). The interpolation is performed via stars before being manipulated and brought to a ‘more usable’ form.

Point locations

First an sf object containing the target locations has to be created. Only point locations are allowed and the object must have a valid coordinate reference system (CRS).

library("sf")
locations <- data.frame(name = c("Innsbruck", "Brussels"),
                        lon  = c(11.39, 4.35),
                        lat  = c(47.27, 50.85))
(locations <- st_as_sf(locations, coords = c("lon", "lat"), crs = 4326))
## Simple feature collection with 2 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 4.35 ymin: 47.27 xmax: 11.39 ymax: 50.85
## Geodetic CRS:  WGS 84
##        name            geometry
## 1 Innsbruck POINT (11.39 47.27)
## 2  Brussels  POINT (4.35 50.85)

Once available, the GRIB file can be interpolated.

ip <- eupp_interpolate_grib("_test.grb", at = locations,
                            atname = "name", bilinear = TRUE)

The warnings come from readGDAL() (rgdal) and can be ignored at this point. By default, a wide-format is returned, but a long format can be retrieved if needed.

head(ip[, 1:11]) # First 11 columns only
##         init      valid step            geometry      name        cp_0
## 1 2017-07-01 2017-07-02   24 POINT (11.39 47.27) Innsbruck 0.008243402
## 2 2017-07-01 2017-07-11  240 POINT (11.39 47.27) Innsbruck 0.024329524
## 3 2017-07-01 2017-07-02   24  POINT (4.35 50.85)  Brussels 0.001813469
## 4 2017-07-01 2017-07-11  240  POINT (4.35 50.85)  Brussels 0.012095566
##          cp_1       cp_10       cp_11       cp_12        cp_13
## 1 0.006312119 0.006833686 0.009337233 0.004342669 0.0068883202
## 2 0.035008109 0.038468658 0.036461267 0.056630942 0.0335263306
## 3 0.002549515 0.002692184 0.002620182 0.001578808 0.0008776283
## 4 0.006306648 0.015253906 0.006379013 0.038313293 0.0048063660
# Long format; contains more extensive information
# (differs between rds/grib_ls).
head(eupp_interpolate_grib("_test.grb", at = locations,
                           atname = "name", wide = FALSE), n = 3)
##                                                      path domain levtype
## 1 data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g     sfc
## 2 data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g     sfc
## 3 data/fcs/surf/EU_forecast_ctr_surf_params_2017-07_0.grb      g     sfc
##   step_char param class type stream expver leg_number   offset length param_id
## 1        24 t2m_0    od   cf   enfo   0001          1 12191040  23412      167
## 2        24  cp_0    od   cf   enfo   0001          1 12602400  23412      143
## 3       240 t2m_0    od   cf   enfo   0001          1 63022200  23412      167
##   number       init step      valid            geometry        value      name
## 1      0 2017-07-01   24 2017-07-02 POINT (11.39 47.27) 2.778642e+02 Innsbruck
## 2      0 2017-07-01   24 2017-07-02 POINT (11.39 47.27) 8.243402e-03 Innsbruck
## 3      0 2017-07-01  240 2017-07-11 POINT (11.39 47.27) 2.799946e+02 Innsbruck

Please check out the additional arguments of eupp_interpolate_grib() for details on the arguments and additional arguments not demonstrated here.

Further functionality

The eupp contains some additional functionality to download/process gridded data sets. They, however, all go trough grib_to_netcdf (ecCodes) which comes with a series of benefits and drawbacks. A separate article shows that, however, when using it keep in mind that this must be seen as ‘experimental’.