Evaluates Data Filter Rules for foehnix Mixture Moel Calls

foehnix models allow to specify an optional foehnix_filter. If a filter is given only a subset of the data set provided to foehnix is used for the foehn classification. A typical example is a wind direction filter such that only observations (times) are used where the observed wind direction was within a user defined wind sector corresponding to the wind direction during foehn events for a specific location. However, the filter option allows to even implement complex filter rules if required. The 'Details' section contains further information and examples how this filter rules can be used.

foehnix_filter(x, filter, cols = NULL)

# S3 method for class 'foehnix.filter'
print(x, ...)

Arguments

x: object of class zoo or data.frame containing the observations.
filter: can be NULL (no filter applied), a function operating on x, or a named list with a simple filter rule (numeric of length two) or custom filter functions. Details provided in the 'Details' section.
cols: NULL or a character vector containing the names in 'x' which are not allowed to contain missing values. If NULL all elements have to be non-missing.
...: currently unused.

Value

Returns a vector of integers corresponding to those rows in the data set x which fulfill all filte criteria. If input filter = NULL an integer vector 1:nrow(x) is returned.

Details

Foehn winds often (not always) show a very specific wind direction due to the canalization of the air flow trough the local topography. The foehnix_filter option allows to subset the data according to a user defined set of filters from simple filters to complex filters.

These filters classify each observation (each row in x) as good (within filter), bad (outside filter), and ugly (at least one variable required to apply the filter was NA).

No filter: If filter = NULL no filter will be applied and the whole data set provided is used to do the foehn classification (all observations will be treated as 'good').

Simple filter rules: The filter is a named list containing one or several numeric vectors of length 2 with finite numeric values. The name of the list element defines the column of the data set (input x), the numeric vector of length 2 the range which should be used to filter the data. This is the simplest option to apply the mentioned wind direction filter. Examples:

filter = list(dd = c(43, 223)): applies the filter to column x$dd. The filter classifies observations/rows as 'good' (within filter) if x$dd >= 43 & x$dd <= 223.
filter = list(dd = c(330, 30): similar to the filter rule above, allows to specify a wind sector going trough 0 (if dd is wind direction in degrees between [0, 360]). The filter classifies observations/rows as 'good' (within filter) if x$dd >= 330 | x$dd <= 30.
filter = list(dd = c(43, 223), crest_dd = c(90, 270): two filter rules, one for x$dd, one for x$crest_dd. The filter classifies observations/rows as 'good' (within filter) if x$dd >= 43 & x$dd <= 223 AND x$crest_dd >= 330 | x$crest_dd <= 30. If an observation/row does not fulfill one or the other rule the observation/row is classified as 'bad' (outside filter), if one of x$dd or x$crest_dd is NA the corresponding observation/row will be classified as 'ugly'.
Filters are not restricted to wind direction (as shown in the examples above)!

Custom filter functions: Instead of only providing a segment/sector defined by two finite numeric values (see 'Simple filter' above) a named list of functions can be provided. These functions DO HAVE TO return a vector of logical values (TRUE (good),FALSE (bad), or NA (ugly)) of length nrow{x}. If not, an error will be thrown. The function will be applied to the column specified by the name of the list element. Some examples:

filter = list(dd = function(x) x >= 43 & x <= 223): The function will be applied to x$dd. A vector with TRUE, FALSE, or NA is returned for each 1:nrow{x} which takes NA if is.na(x$dd), TRUE if x$dd >= 43 & x$dd <= 223 and FALSE else. Thus, this filter is the very same as the first example in the 'Simple filter' section above.
filter = list(ff = function(x) x > 2.0): Custom filter applied to column x$ff. A vector with TRUE, FALSE, and NA is returned for each observation 1:nrow{x} which takes NA if is.na(x$ff), TRUE if x$ff > 2.0, and FALSE else.
filter = list(ff = function(x) ..., dd = function(x) ...): two filter functions, one applied to x$ff, one to x$dd. Note that observations/rows will be classified as 'ugly' if one of the two filters returns NA. If no NA is returned the observation is classified as 'good' if both return TRUE, and as 'bad' (outside filter) if at least one returns FALSE.

Complex filters: If filter is a function this filter function will be applied to the full input object x. This allows to write functions of any complexity. As an example:

filter = function(x) (x$dd >= 43 & x$dd <= 223) & x$ff >= 2.0: Input x to the filter function is the object as provided to the foehnix_filter function (x). Thus, the different columns of the object can be accessed directly trough their names (e.g., x$dd, x$ff). A vector of length nrow(x) with TRUE, FALSE, and NA has to be returned. Only those classified as 'good' (TRUE) will be used for classification.

Author

Reto Stauffer

Examples

# Loading example data set and conver to zoo time series
# time series object (station Ellboegen).
ellboegen <- demodata("ellboegen")

# Case 1:
# -----------------
# Filter for observations where the wind direction is
# within 100 - 260 (southerly flow):
idx_south <- foehnix_filter(ellboegen, list(dd = c(100, 260)))
print(idx_south)
#> 
#> Foehnix Filter Object:
#> Call: foehnix_filter(x = ellboegen, filter = list(dd = c(100, 260)))
#> Total data set length:            105370
#> The good (within filter):          50072 (47.5 percent)
#> The bad (outside filter):          52576 (49.9 percent)
#> The ugly (NA; missing values):      2722 ( 2.6 percent)

# Same filter but for northerly flows, taking rows with
# wind direction observations (dd) smaller than 45 or
# larger than 315 degrees:
idx_north <- foehnix_filter(ellboegen, list(dd = c(315, 45)))
print(idx_north)
#> 
#> Foehnix Filter Object:
#> Call: foehnix_filter(x = ellboegen, filter = list(dd = c(315, 45)))
#> Total data set length:            105370
#> The good (within filter):          18143 (17.2 percent)
#> The bad (outside filter):          84505 (80.2 percent)
#> The ugly (NA; missing values):      2722 ( 2.6 percent)

par(mfrow = c(1,3))
hist(ellboegen$dd,                 xlab = "dd", main = "all observations")
hist(ellboegen$dd[idx_south$good], xlab = "dd", main = "southerly winds")
hist(ellboegen$dd[idx_north$good], xlab = "dd", main = "northerly winds")


# Case 2:
# -----------------
# A second useful option is to add two filters:
# the wind direction at the target station (here Ellboegen)
# has to be within c(43, 223), the wind direction at the
# corresponding crest station (upstream, crest of the European Alps)
# has to show southerly flows with a wind direction from
# 90 degrees (East) to 270 degrees (West).

# Loading combined demo data set
data <- demodata()

# Now apply a wind filter
my_filter  <- list(dd = c(43, 223), crest_dd = c(90, 270))
filter_obj <- foehnix_filter(data, my_filter)
print(filter_obj)
#> 
#> Foehnix Filter Object:
#> Call: foehnix_filter(x = data, filter = my_filter)
#> Total data set length:            108425
#> The good (within filter):          24236 (22.4 percent)
#> The bad (outside filter):          50538 (46.6 percent)
#> The ugly (NA; missing values):     33651 (31.0 percent)

# Subsetting the 'good' rows
data <- data[filter_obj$good,]

summary(subset(data, select = c(dd, crest_dd)))
#>      Index                              dd           crest_dd  
#>  Min.   :2006-01-01 01:00:00.00   Min.   : 43.0   Min.   : 90  
#>  1st Qu.:2008-05-24 03:45:00.00   1st Qu.:124.0   1st Qu.:176  
#>  Median :2011-03-03 22:30:00.00   Median :132.0   Median :181  
#>  Mean   :2011-12-01 13:06:51.30   Mean   :131.3   Mean   :182  
#>  3rd Qu.:2016-01-05 15:15:00.00   3rd Qu.:138.0   3rd Qu.:187  
#>  Max.   :2018-12-24 05:00:00.00   Max.   :223.0   Max.   :270