The article “Getting started with gridded data” shows the basic use of the eupp gridded data interface. For those interested, this article shows some more insights how the package works under the hood.
Some functions might be useful for (i) debugging or (ii) adding additional functionality around the eupp package. In some more detail, the gridded dataset functionality works as follows:
eupp_config()
function which returns an object of class eupp_config
.eupp_download_gridded()
or eupp_get_gridded()
to retrieve the data in different formats (the first allows for GRIB version 1 and different NetCDF file formats; the latter for stars
objects). Below the surface eupp performs the following steps:
curl
) and stores the requested messages in a new GRIB version 1 file.stars
object has been requested: read the NetCDF file. This goes trough the intermediate step of creating a NetCDF file; thus ecCodes is necessary.When calling eupp_download_gridded()
a file will be created on success (GRIB version 1 or NetCDF), while eupp_get_gridded()
returns a stars
object in the active R session. Temporary files are deleted as soon as no longer needed (stored in tempdir()
).
To demonstrate the intermediate steps listed above, a data set specification (configuration) is required. For this purpose a small subset of gridded surface ensemble forecast data is used.
cache = "_cache"
: Enables GRIB index caching which can be useful if the same GRIB indes file has to be accessed multiple times (as in this article).
library("eupp")
(conf <- eupp_config(product = "forecast", # forecasts
type = "ens", # ensemble forecasts
level = "surface", # surface fields
date = c("2017-05-05", "2017-06-05"), # 'random' dates; ISO YYYY-mm-dd
cache = "_cache")) # enable caching
## EUPP Config
## Product: forecast (fcs)
## Level: surface
## Type: ens
## Date(s): 2017-05-05,2017-06-05
## Parameter: all available
## Steps: all available
## Members: all available
## Version: 0
## Cache: _cache
## Area: not defined
Until now an R object of class eupp_config
has been created which is used further down in the pipeline to process the request.
The next step is to define the URL(s) of the file(s) to be accessed to process the request. This is done by the function eupp_get_source_urls()
.
# Required GRIB index files:
eupp_get_source_urls(conf, fileext = "index")
## [1] "https://storage.ecmwf.europeanweather.cloud/eumetnet-postprocessing-benchmark-training-dataset/data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb.index"
## [2] "https://storage.ecmwf.europeanweather.cloud/eumetnet-postprocessing-benchmark-training-dataset/data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb.index"
## [3] "https://storage.ecmwf.europeanweather.cloud/eumetnet-postprocessing-benchmark-training-dataset/data/fcs/surf/EU_forecast_ctr_surf_params_2017-06_0.grb.index"
## [4] "https://storage.ecmwf.europeanweather.cloud/eumetnet-postprocessing-benchmark-training-dataset/data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb.index"
As shown above, four different files have to be accessed as we (i) are asking for forecasts issued on two different dates (date
) and have not explicitly defined members
wherefore we need both, control run forecasts (handled as member = 0
) and perturbed forecasts (members
1, 2, …).
When fileext
is not defined (fileext = NULL
; default) one gets the URLs for the corresponding GRIB files for direct access.
In this scenario we imagined not having more information on what is available. To get more insights we can use the configuration conf
from above to get a complete list of all messages in the GRIB index inventories listed above by calling eupp_get_inventory()
.
eupp_get_inventory()
internally calls eupp_get_source_urls(..., fileext = "index")
, downloads the index files (line-wise JSON strings), parses them, and puts them into an object of class c("eupp_inventory", "data.frame")
(basic data.frame
; no dedicated S3 methods so far).
# Getting inventory (based on `conf` from above)
inv <- eupp_get_inventory(conf)
class(inv)
## [1] "eupp_inventory" "data.frame"
dim(inv)
## [1] 292740 17
head(inv)
## path domain levtype
## 11001 data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb g sfc
## 11002 data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb g sfc
## 11003 data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb g sfc
## 11004 data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb g sfc
## 11005 data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb g sfc
## 11006 data/fcs/surf/EU_forecast_ctr_surf_params_2017-05_0.grb g sfc
## step_char param class type stream expver leg_number offset length
## 11001 0 2t od cf enfo 0001 1 253676400 23412
## 11002 0 10u od cf enfo 0001 1 253699920 23412
## 11003 0 10v od cf enfo 0001 1 253723440 23412
## 11004 0 tcc od cf enfo 0001 1 253746960 23412
## 11005 0 tp od cf enfo 0001 1 253770480 23412
## 11006 0 100u od cf enfo 0001 1 253794000 23412
## param_id number init step valid
## 11001 167 0 2017-05-05 0 2017-05-05
## 11002 165 0 2017-05-05 0 2017-05-05
## 11003 166 0 2017-05-05 0 2017-05-05
## 11004 164 0 2017-05-05 0 2017-05-05
## 11005 228 0 2017-05-05 0 2017-05-05
## 11006 228246 0 2017-05-05 0 2017-05-05
As cache
is enabled, the resulting data.frame
is stored in _R_s RDS file format into the cache
folder; using an md5 checksum of the original URL to keep track of the origin. When downloading another set of data stored in the same GRIB file (thus, same GRIB index file) the cached file will be used which can significantly increase the performance.
The object returned contains information about the path
of the grib file (not full URL) alongside with a series of additional information which differ between different products. This inventory tells us that the following parameters (param
), steps (step
), and ensemble members (number
; perturbation number) are available.
unique(inv$param)
## [1] "2t" "10u" "10v" "tcc" "tp" "100u" "100v" "cape" "stl1"
## [10] "sshf" "slhf" "tcw" "tcwv" "swvl1" "ssr" "str" "sd" "cp"
## [19] "cin" "ssrd" "strd" "vis" "10fg6" "mn2t6" "mx2t6"
unique(inv$step)
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
## [55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
## [73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
## [91] 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141
## [109] 144 150 156 162 168 174 180 186 192 198 204 210 216 222 228 234 240
unique(inv$number)
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## [26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
## [51] 50
The full URL to the grib files can be constructed based on inv$path
and $BASEURL
from eupp:::eupp_get_url_config()
(can be redefined using system environment variable EUPP_BASEURL
). eupp:::eupp_get_url_config()
not only returns the BASEURL
but a series of template strings for the different files on the bucket.
A more practical use is to more explicitly define the data set configuration (as we now know what’s needed). Given cache
was used above, the GRIB index file should be loaded from disc in a few secs.
library("eupp")
(conf <- eupp_config(product = "forecast",
type = "ens",
level = "surface",
date = c("2017-05-05", "2017-06-05"),
parameter = c("tp", "sd"), # total precip + sunshine duration
steps = seq(13, 15, by = 2L), # +13 and +15 hour ahead forecast
members = c(10, 14), # perturbation 10 and 14 (why not)
cache = "_cache")) # use caching
## EUPP Config
## Product: forecast (fcs)
## Level: surface
## Type: ens
## Date(s): 2017-05-05,2017-06-05
## Parameter: tp, sd
## Steps: 13, 15
## Members: 10, 14
## Version: 0
## Cache: _cache
## Area: not defined
Getting the required part of the inventory given the configuration above:
(inv <- eupp_get_inventory(conf))
## path domain
## 14503 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 14515 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 14591 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 14603 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 16703 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 16715 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 16791 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 16803 data/fcs/surf/EU_forecast_ens_surf_params_2017-05-05_0.grb g
## 158003 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 158015 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 158091 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 158103 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 160203 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 160215 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 160291 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## 160303 data/fcs/surf/EU_forecast_ens_surf_params_2017-06-05_0.grb g
## levtype step_char param class type stream expver number leg_number
## 14503 sfc 13 tp od pf enfo 0001 10 1
## 14515 sfc 13 sd od pf enfo 0001 10 1
## 14591 sfc 13 tp od pf enfo 0001 14 1
## 14603 sfc 13 sd od pf enfo 0001 14 1
## 16703 sfc 15 tp od pf enfo 0001 10 1
## 16715 sfc 15 sd od pf enfo 0001 10 1
## 16791 sfc 15 tp od pf enfo 0001 14 1
## 16803 sfc 15 sd od pf enfo 0001 14 1
## 158003 sfc 13 tp od pf enfo 0001 10 1
## 158015 sfc 13 sd od pf enfo 0001 10 1
## 158091 sfc 13 tp od pf enfo 0001 14 1
## 158103 sfc 13 sd od pf enfo 0001 14 1
## 160203 sfc 15 tp od pf enfo 0001 10 1
## 160215 sfc 15 sd od pf enfo 0001 10 1
## 160291 sfc 15 tp od pf enfo 0001 14 1
## 160303 sfc 15 sd od pf enfo 0001 14 1
## offset length param_id init step valid
## 14503 334505760 23412 228 2017-05-05 13 2017-05-05 13:00:00
## 14515 334788000 35036 141 2017-05-05 13 2017-05-05 13:00:00
## 14591 336534960 23412 228 2017-05-05 13 2017-05-05 13:00:00
## 14603 336817200 35036 141 2017-05-05 13 2017-05-05 13:00:00
## 16703 385237320 23412 228 2017-05-05 15 2017-05-05 15:00:00
## 16715 385519560 35036 141 2017-05-05 15 2017-05-05 15:00:00
## 16791 387266280 23412 228 2017-05-05 15 2017-05-05 15:00:00
## 16803 387548520 35036 141 2017-05-05 15 2017-05-05 15:00:00
## 158003 335894520 23412 228 2017-06-05 13 2017-06-05 13:00:00
## 158015 336176760 35036 141 2017-06-05 13 2017-06-05 13:00:00
## 158091 337931640 23412 228 2017-06-05 13 2017-06-05 13:00:00
## 158103 338213880 35036 141 2017-06-05 13 2017-06-05 13:00:00
## 160203 386824320 23412 228 2017-06-05 15 2017-06-05 15:00:00
## 160215 387106560 35036 141 2017-06-05 15 2017-06-05 15:00:00
## 160291 388860960 23412 228 2017-06-05 15 2017-06-05 15:00:00
## 160303 389143200 35036 141 2017-06-05 15 2017-06-05 15:00:00
dim(inv)
## [1] 16 17
The number of observations (rows) in conf
matches our exception as asking for (i) two different initialization dates, (ii) two parameters, (iii) two forecast steps (lead times), and (iv) two different members (\(2^4 = 16\)).
The data sets can be retrieved in three different formats which, however, are connected (top down).
curl
/rcurl
)stars
(requires the stars package plus ecCodes)Given the inventory above the eupp package first downloads segments of the original GRIB file via curl byterange. The result is stored in one GRIB file. If this what has been requested by the user, that’s it (1). If the user requests a NetCDF file the GRIB file is stored temporarily and then converted to NetCDF (the console tool grib_set
is used to perform some ensemble-required manipulations; then converted to NetCDF via grib_to_netcdf
). When asking for stars
objects we go trough the two steps above before reading the data sets via read_stars()
(stars package). The conversion GRIB > NetCDF > stars is required to do some naming manipulation.
Download data as GRIB Version 1:
eupp_download_gridded(conf, "my_new_file.grib", "grib")
Download data and store as NetCDF:
eupp_download_gridded(conf, "my_new_file.grib", "grib")
.netcdf_kind
can be used to control the -k
flag when calling grib_to_netcdf
(defaults to 3
); see eupp_download_gridded()
and grib_to_netcdf
manual.Getting data as [stars
][stars] object:
x <- eupp_get_gridded(conf)
stars