vignettes/import_data.Rmd
import_data.Rmd
This page shows how one can import observational data in such a way that they can be used with foehnix. foehnix uses zoo
objects (time series objects) to handle the input data. This page is not intended to be a complete manual or introduction to zoo! The zoo itself comes with a range of vignettes including commonly asked questions and detailed introductory material. If you are not familiar with the package and/or the following examples are not enough to get started, please visit the zoo package website to get more information.
Imagine you have a CSV file which looks as follows:
## "date_time","dd","ff","rh","t"
## "2006-01-01 01:00:00",171,0.6,90,-0.4
## "2006-01-01 02:00:00",268,0.3,100,-1.8
## "2006-01-01 03:00:00",NA,NA,79,0.9
## "2006-01-01 04:00:00",152,2.1,88,-0.6
## "2006-01-01 05:00:00",319,0.7,100,-2.6
## "2006-01-01 06:00:00",36,0.1,99,-1.7
## "2006-01-01 07:00:00",338,1,100,-2.1
## "2006-01-01 08:00:00",324,1.1,100,-2.7
## "2006-01-01 09:00:00",303,0.2,100,-2.6
The output shows the first 10 rows (header and 9 data rows) of the demo data set “ellboegen_A.csv” with hourly observations. This is the “default R CSV file format” (see write.csv
) where the values are comma-separated, strings are quoted, missing values are simply "" (no character), and no unnecessary blanks are added to the file. The colums conntain:
date_time
: a column with date and time information (default english date format)dd
: wind direction in degreesff
: wind speed in meters per secondrh
: relative humidityt
: 2m air temperatureTo import the data set and convert the data into zoo
, all we have to do is to use the function zoo::read.zoo
like this:
# Load package 'zoo'
library("zoo")
# Import the data set
data <- read.zoo("../pkgdown/data/ellboegen_A.csv", format = "%Y-%m-%d %H:%M:%S",
tz = "UTC", sep = ",", header = TRUE)
The call read.zoo(...)
loads the demo data set (here ../pkgdown/data/ellboegen_A.csv
), specifies the date/time format (format
; read.zoo
expects this information in the first column by default), a time zone argument (tz
), the separator in the CSV file (sep
), and that we do have a header line (header = TRUE
). read.zoo
calls read.table
, tries to extract the date/time information, and creates a zoo
object.
head(data, n = 4)
## dd ff rh t
## 2006-01-01 01:00:00 171 0.6 90 -0.4
## 2006-01-01 02:00:00 268 0.3 100 -1.8
## 2006-01-01 03:00:00 NA NA 79 0.9
## 2006-01-01 04:00:00 152 2.1 88 -0.6
In case a crest station is available, we need to combine observations from two stations, in this case two demo data sets for Ellbögen (ellboegen_A.csv) and Sattelberg (sattelberg_A.csv; our crest station). Both files are in the very same format. Thus, we can load both data sets as follows:
# Load package 'zoo' if not already loaded
library("zoo")
# Import data set 'Ellboegen'
ell <- read.zoo("../pkgdown/data/ellboegen_A.csv", format = "%Y-%m-%d %H:%M:%S",
tz = "UTC", sep = ",", header = TRUE)
# Import data set 'Sattelberg'
sat <- read.zoo("../pkgdown/data/sattelberg_A.csv", format = "%Y-%m-%d %H:%M:%S",
tz = "UTC", sep = ",", header = TRUE)
All we have to do is to combine the two objects ell
and sat
which an be done using the function zoo::merge
. zoo::merge
automatically takes care that the two time series are matched propperly (conditional on date/time).
## dd.ell ff.ell rh.ell t.ell dd.sat ff.sat rh.sat t.sat
## 2006-01-01 01:00:00 171 0.6 90 -0.4 NA NA NA NA
## 2006-01-01 02:00:00 268 0.3 100 -1.8 NA NA NA NA
## 2006-01-01 03:00:00 NA NA 79 0.9 NA NA NA NA
## 2006-01-01 04:00:00 152 2.1 88 -0.6 NA NA NA NA
## 2006-01-01 05:00:00 319 0.7 100 -2.6 176 13.1 100 -7.1
## 2006-01-01 06:00:00 36 0.1 99 -1.7 184 10.0 100 -6.9
## 2006-01-01 07:00:00 338 1.0 100 -2.1 188 7.2 100 -6.6
## 2006-01-01 08:00:00 324 1.1 100 -2.7 194 5.8 100 -6.6
By default, missing data are filled with NA
(missing value). As the demo data set for station Sattelberg starts four hours later than the one for Ellbögen, the first for rows for sat
(01:00:00
to 04:00:00
) are empty. As the variables in both files are the very same, R automatically adds .ell
or .sat
to the original variable names (columns in the CSV file). In case we would like to have nicer names, we could prepare them manually, e.g.,:
# Rename the variables in 'sat'
names(sat) <- paste("crest", names(sat), sep = "_")
# Show new names
names(sat)
## [1] "crest_dd" "crest_ff" "crest_rh" "crest_t"
And combine the data set once again (overwrites data
):
## dd ff rh t crest_dd crest_ff crest_rh crest_t
## 2006-01-01 01:00:00 171 0.6 90 -0.4 NA NA NA NA
## 2006-01-01 02:00:00 268 0.3 100 -1.8 NA NA NA NA
## 2006-01-01 03:00:00 NA NA 79 0.9 NA NA NA NA
## 2006-01-01 04:00:00 152 2.1 88 -0.6 NA NA NA NA
## 2006-01-01 05:00:00 319 0.7 100 -2.6 176 13.1 100 -7.1
## 2006-01-01 06:00:00 36 0.1 99 -1.7 184 10.0 100 -6.9
## 2006-01-01 07:00:00 338 1.0 100 -2.1 188 7.2 100 -6.6
## 2006-01-01 08:00:00 324 1.1 100 -2.7 194 5.8 100 -6.6
And that’s it. This object (data
) could now be used as input for the foehnix
method.
The next demo data contains the very same as the data set above, however, the are distinct differences in the format of the CSV file (see ellboegen_B.csv, sattelberg_B.csv):
## dd ff rh t date_time
## 171 0.6 90 -0.4 20060101010000
## 268 0.3 100 -1.8 20060101020000
## missing missing 79 0.9 20060101030000
## 152 2.1 88 -0.6 20060101040000
## 319 0.7 100 -2.6 20060101050000
## 36 0.1 99 -1.7 20060101060000
## 338 1.0 100 -2.1 20060101070000
## 324 1.1 100 -2.7 20060101080000
## 303 0.2 100 -2.6 20060101090000
In contrast to ‘data set A’ the file solely contains numeric values - except the missing values (missing
), the date/time information is coded as integer (YYYYmmddHHMMSS
; last column) and there is no explicit column separator (columns are separated by one or multiple blanks).
To be able to import the data set we do have to specify the format. In contrast to ‘data set A’ we need:
format
FUN
to convert the integers (column date_time
) into POSIXt
index.column = "date_time"
to tell zoo
where the date/time information is storedna.strings
which defines how the “missing values” in the CSV file look likeOverall we can read the file(s) like this:
# Load library (if not yet done)
library("zoo")
# Custom function to convert the integers (data_time) into POSIXct
FUN = function(x, format, tz, ...) as.POSIXct(strptime(sprintf("%.0f", x), format), tz = tz)
# Import data set
data <- read.zoo("../pkgdown/data/ellboegen_B.csv", format = "%Y%m%d%H%M%S", tz = "UTC",
FUN = FUN, index.column = "date_time",
header = TRUE, na.strings = "missing")
head(data, n = 3)
## dd ff rh t
## 2006-01-01 01:00:00 171 0.6 90 -0.4
## 2006-01-01 02:00:00 268 0.3 100 -1.8
## 2006-01-01 03:00:00 NA NA 79 0.9
We can do the very same for the second data set (Sattelberg) and combine the the data from Sattelberg and Ellbögen:
# Loading library 'zoo'
library("zoo")
# User-defined function to convert date/time information
FUN = function(x, format, tz, ...) as.POSIXct(strptime(sprintf("%.0f", x), format), tz = tz)
# Read ellboegen data set
ell <- read.zoo("../pkgdown/data/ellboegen_B.csv", format = "%Y%m%d%H%M%S", tz = "UTC",
FUN = FUN, index.column = "date_time",
header = TRUE, na.strings = "missing")
sat <- read.zoo("../pkgdown/data/sattelberg_B.csv", format = "%Y%m%d%H%M%S", tz = "UTC",
FUN = FUN, index.column = "date_time",
header = TRUE, na.strings = "missing")
# Rename columns in 'sat'
names(sat) <- paste("crest", names(sat), sep = "_")
# Combine
data <- merge(ell, sat)
# Show first 8 entries
head(data, n = 8)
## dd ff rh t crest_dd crest_ff crest_rh crest_t
## 2006-01-01 01:00:00 171 0.6 90 -0.4 NA NA NA NA
## 2006-01-01 02:00:00 268 0.3 100 -1.8 NA NA NA NA
## 2006-01-01 03:00:00 NA NA 79 0.9 NA NA NA NA
## 2006-01-01 04:00:00 152 2.1 88 -0.6 NA NA NA NA
## 2006-01-01 05:00:00 319 0.7 100 -2.6 176 13.1 100 -7.1
## 2006-01-01 06:00:00 36 0.1 99 -1.7 184 10.0 100 -6.9
## 2006-01-01 07:00:00 338 1.0 100 -2.1 188 7.2 100 -6.6
## 2006-01-01 08:00:00 324 1.1 100 -2.7 194 5.8 100 -6.6
The method zoo::read.zoo
has a wide range of arguments (see also ?read.table
) which allows to import a wide range of possible formats. If you need more information please visit the zoo package information page on CRAN where you can find manuals and vignettes with more details about the zoo package and how to import/create zoo
time series objects in R.