This page shows how one can import observational data in such a way that they can be used with foehnix. foehnix uses zoo objects (time series objects) to handle the input data. This page is not intended to be a complete manual or introduction to zoo! The zoo itself comes with a range of vignettes including commonly asked questions and detailed introductory material. If you are not familiar with the package and/or the following examples are not enough to get started, please visit the zoo package website to get more information.

Example A

Imagine you have a CSV file which looks as follows:

## "date_time","dd","ff","rh","t"
## "2006-01-01 01:00:00",171,0.6,90,-0.4
## "2006-01-01 02:00:00",268,0.3,100,-1.8
## "2006-01-01 03:00:00",NA,NA,79,0.9
## "2006-01-01 04:00:00",152,2.1,88,-0.6
## "2006-01-01 05:00:00",319,0.7,100,-2.6
## "2006-01-01 06:00:00",36,0.1,99,-1.7
## "2006-01-01 07:00:00",338,1,100,-2.1
## "2006-01-01 08:00:00",324,1.1,100,-2.7
## "2006-01-01 09:00:00",303,0.2,100,-2.6

The output shows the first 10 rows (header and 9 data rows) of the demo data set “ellboegen_A.csv” with hourly observations. This is the “default R CSV file format” (see write.csv) where the values are comma-separated, strings are quoted, missing values are simply "" (no character), and no unnecessary blanks are added to the file. The colums conntain:

  • date_time: a column with date and time information (default english date format)
  • dd: wind direction in degrees
  • ff: wind speed in meters per second
  • rh: relative humidity
  • t: 2m air temperature

Import data

To import the data set and convert the data into zoo, all we have to do is to use the function zoo::read.zoo like this:

# Load package 'zoo'
library("zoo")
# Import the data set
data <- read.zoo("../pkgdown/data/ellboegen_A.csv", format = "%Y-%m-%d %H:%M:%S",
                 tz = "UTC", sep = ",", header = TRUE)

The call read.zoo(...) loads the demo data set (here ../pkgdown/data/ellboegen_A.csv), specifies the date/time format (format; read.zoo expects this information in the first column by default), a time zone argument (tz), the separator in the CSV file (sep), and that we do have a header line (header = TRUE). read.zoo calls read.table, tries to extract the date/time information, and creates a zoo object.

head(data, n = 4)
##                      dd  ff  rh    t
## 2006-01-01 01:00:00 171 0.6  90 -0.4
## 2006-01-01 02:00:00 268 0.3 100 -1.8
## 2006-01-01 03:00:00  NA  NA  79  0.9
## 2006-01-01 04:00:00 152 2.1  88 -0.6

Import and combine data

In case a crest station is available, we need to combine observations from two stations, in this case two demo data sets for Ellbögen (ellboegen_A.csv) and Sattelberg (sattelberg_A.csv; our crest station). Both files are in the very same format. Thus, we can load both data sets as follows:

# Load package 'zoo' if not already loaded
library("zoo")
# Import data set 'Ellboegen'
ell <- read.zoo("../pkgdown/data/ellboegen_A.csv", format = "%Y-%m-%d %H:%M:%S",
                tz = "UTC", sep = ",", header = TRUE)
# Import data set 'Sattelberg'
sat <- read.zoo("../pkgdown/data/sattelberg_A.csv", format = "%Y-%m-%d %H:%M:%S",
                tz = "UTC", sep = ",", header = TRUE)

All we have to do is to combine the two objects ell and sat which an be done using the function zoo::merge. zoo::merge automatically takes care that the two time series are matched propperly (conditional on date/time).

# Combine data
data <- merge(ell, sat)
head(data, n = 8)
##                     dd.ell ff.ell rh.ell t.ell dd.sat ff.sat rh.sat t.sat
## 2006-01-01 01:00:00    171    0.6     90  -0.4     NA     NA     NA    NA
## 2006-01-01 02:00:00    268    0.3    100  -1.8     NA     NA     NA    NA
## 2006-01-01 03:00:00     NA     NA     79   0.9     NA     NA     NA    NA
## 2006-01-01 04:00:00    152    2.1     88  -0.6     NA     NA     NA    NA
## 2006-01-01 05:00:00    319    0.7    100  -2.6    176   13.1    100  -7.1
## 2006-01-01 06:00:00     36    0.1     99  -1.7    184   10.0    100  -6.9
## 2006-01-01 07:00:00    338    1.0    100  -2.1    188    7.2    100  -6.6
## 2006-01-01 08:00:00    324    1.1    100  -2.7    194    5.8    100  -6.6

By default, missing data are filled with NA (missing value). As the demo data set for station Sattelberg starts four hours later than the one for Ellbögen, the first for rows for sat (01:00:00 to 04:00:00) are empty. As the variables in both files are the very same, R automatically adds .ell or .sat to the original variable names (columns in the CSV file). In case we would like to have nicer names, we could prepare them manually, e.g.,:

# Rename the variables in 'sat'
names(sat) <- paste("crest", names(sat), sep = "_")
# Show new names
names(sat)
## [1] "crest_dd" "crest_ff" "crest_rh" "crest_t"

And combine the data set once again (overwrites data):

# Combine (again)
data <- merge(ell, sat)
head(data, n = 8)
##                      dd  ff  rh    t crest_dd crest_ff crest_rh crest_t
## 2006-01-01 01:00:00 171 0.6  90 -0.4       NA       NA       NA      NA
## 2006-01-01 02:00:00 268 0.3 100 -1.8       NA       NA       NA      NA
## 2006-01-01 03:00:00  NA  NA  79  0.9       NA       NA       NA      NA
## 2006-01-01 04:00:00 152 2.1  88 -0.6       NA       NA       NA      NA
## 2006-01-01 05:00:00 319 0.7 100 -2.6      176     13.1      100    -7.1
## 2006-01-01 06:00:00  36 0.1  99 -1.7      184     10.0      100    -6.9
## 2006-01-01 07:00:00 338 1.0 100 -2.1      188      7.2      100    -6.6
## 2006-01-01 08:00:00 324 1.1 100 -2.7      194      5.8      100    -6.6

And that’s it. This object (data) could now be used as input for the foehnix method.

Example B

The next demo data contains the very same as the data set above, however, the are distinct differences in the format of the CSV file (see ellboegen_B.csv, sattelberg_B.csv):

## dd ff rh t date_time
##     171     0.6  90 -0.4 20060101010000
##     268     0.3 100 -1.8 20060101020000
## missing missing  79  0.9 20060101030000
##     152     2.1  88 -0.6 20060101040000
##     319     0.7 100 -2.6 20060101050000
##      36     0.1  99 -1.7 20060101060000
##     338     1.0 100 -2.1 20060101070000
##     324     1.1 100 -2.7 20060101080000
##     303     0.2 100 -2.6 20060101090000

In contrast to ‘data set A’ the file solely contains numeric values - except the missing values (missing), the date/time information is coded as integer (YYYYmmddHHMMSS; last column) and there is no explicit column separator (columns are separated by one or multiple blanks).

Import data

To be able to import the data set we do have to specify the format. In contrast to ‘data set A’ we need:

  • different format
  • a custom function FUN to convert the integers (column date_time) into POSIXt
  • an additional argument index.column = "date_time" to tell zoo where the date/time information is stored
  • an input na.strings which defines how the “missing values” in the CSV file look like

Overall we can read the file(s) like this:

# Load library (if not yet done)
library("zoo")
# Custom function to convert the integers (data_time) into POSIXct
FUN = function(x, format, tz, ...) as.POSIXct(strptime(sprintf("%.0f", x), format), tz = tz)
# Import data set
data <- read.zoo("../pkgdown/data/ellboegen_B.csv", format = "%Y%m%d%H%M%S", tz = "UTC",
                 FUN = FUN, index.column = "date_time", 
                 header = TRUE, na.strings = "missing")
head(data, n = 3)
##                      dd  ff  rh    t
## 2006-01-01 01:00:00 171 0.6  90 -0.4
## 2006-01-01 02:00:00 268 0.3 100 -1.8
## 2006-01-01 03:00:00  NA  NA  79  0.9

Import and combine

We can do the very same for the second data set (Sattelberg) and combine the the data from Sattelberg and Ellbögen:

# Loading library 'zoo'
library("zoo")
# User-defined function to convert date/time information
FUN = function(x, format, tz, ...) as.POSIXct(strptime(sprintf("%.0f", x), format), tz = tz)
# Read ellboegen data set
ell <- read.zoo("../pkgdown/data/ellboegen_B.csv", format = "%Y%m%d%H%M%S", tz = "UTC",
                FUN = FUN, index.column = "date_time", 
                header = TRUE, na.strings = "missing")
sat <- read.zoo("../pkgdown/data/sattelberg_B.csv", format = "%Y%m%d%H%M%S", tz = "UTC",
                FUN = FUN, index.column = "date_time", 
                header = TRUE, na.strings = "missing")
# Rename columns in 'sat'
names(sat) <- paste("crest", names(sat), sep = "_")
# Combine
data <- merge(ell, sat)
# Show first 8 entries
head(data, n = 8)
##                      dd  ff  rh    t crest_dd crest_ff crest_rh crest_t
## 2006-01-01 01:00:00 171 0.6  90 -0.4       NA       NA       NA      NA
## 2006-01-01 02:00:00 268 0.3 100 -1.8       NA       NA       NA      NA
## 2006-01-01 03:00:00  NA  NA  79  0.9       NA       NA       NA      NA
## 2006-01-01 04:00:00 152 2.1  88 -0.6       NA       NA       NA      NA
## 2006-01-01 05:00:00 319 0.7 100 -2.6      176     13.1      100    -7.1
## 2006-01-01 06:00:00  36 0.1  99 -1.7      184     10.0      100    -6.9
## 2006-01-01 07:00:00 338 1.0 100 -2.1      188      7.2      100    -6.6
## 2006-01-01 08:00:00 324 1.1 100 -2.7      194      5.8      100    -6.6

Other Formats

The method zoo::read.zoo has a wide range of arguments (see also ?read.table) which allows to import a wide range of possible formats. If you need more information please visit the zoo package information page on CRAN where you can find manuals and vignettes with more details about the zoo package and how to import/create zoo time series objects in R.