• foehnix

Statistical Model

The automated foehn classification foehnix is based on a two-component mixture model. The basic idea is that two unobservable components (or clusters) exist. One component for situations without foehn, one component for situations with foehn. foehnix uses an unsupervised statistical method to identify the two components based on a set of observed values (e.g., wind speed, gust speed, potential temperature differences) to model the probability whether or not a specific observation is related to a foehn event.

The statistical model consists of two parts: one part to identify the two components, and a second part modelling the probability whether or not a specific observation belongs to component 1 or component 2. The latter is known as the concomitant model.

The density of a two-component mixed distribution $h(\dots)$ in its general form is specified as follows for a specific observation $i$ :

$h(y_i, \mathit{x}_i, \mathit{\theta}, \mathit{\alpha}) = \underbrace{(1 - \pi(\mathit{x}_i, \alpha)) \cdot f(y_i, \mathit{\theta}_1)}_{\text{component 1}} + \underbrace{\pi(\mathit{x}_i, \alpha) \cdot f(y_i, \mathit{\theta}_2)}_{\text{component 2}}$

… where $\mathit{y}$ is the covariate for the first part of the statistical model to identify components 1 and 2, and $\mathbf{x}$ the covariates for the concomitant model. The density of the mixed distribution $h$ is the sum (or superposition) of the densities of the two components ( $f$ ; i.e., Gaussian distribution) times the probability $\pi$ from the concomitant model which describes whether or not a specific observation belongs to component 2. $\mathit{\theta} = (\mathit{\theta}_1, \mathit{\theta}_2)$ are the distribution parameters of the components, $\mathit{\alpha}$ the parameters of the concomitant model.

The concomitant model can be any model which fulfills $\pi \in~]0,1[$ , e.g., an constant value or intercept only model (mixture model without concomitants), or any kind of probability model. foehnix uses a logistic regression model of the following form:

$\log\big(\frac{\pi}{1 - \pi}\big) = \mathbf{x}^\top \mathit{\alpha};~~ \pi = \frac{\exp(\mathbf{x}^\top \mathit{\alpha})}{1 + \exp(\mathbf{x}^\top \mathit{\alpha})}$

The final foehn probability of the two-component mixture model, also known as the a-posteriori probability, is given by:

$\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha}) = \frac{\pi(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2)}{ (1 - \pi(\mathbf{x}, \mathit{\alpha})) \cdot f(\mathbf{y}, \mathbf{\theta}_1) ~+~ \pi(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2) }$

… where $\hat{\mathit{p}}$ in our case represents the probability of foehn. All one has to know are the parameters $\mathit{\theta}$ and $\mathit{\alpha}$ which can be estimated using an appropriate M-estimator such as maximum likelihood.

Parameter Estimation

The maximum likelihood of a mixture model can usually not be maximized directly. One possibility to estimate the coefficients of is an iterative expectation maximization (EM) algorithm. The EM algorithm otimizes the following log-likelihood:

$\ell = \sum_{i=1}^N \big( (1 - \hat{\mathit{p}}) \cdot \log(f(\mathit{y}, \mathit{\theta}_1)) + \hat{\mathit{p}} \cdot \log(f(\mathit{y}, \mathit{\theta}_2)) + (1 - \hat{\mathit{p}} \cdot \log(1 - \mathit{\pi}(\mathbf{x}, \mathit{\alpha})) + \hat{\mathit{p}} \cdot \log(\mathit{\pi}(\mathbf{x}, \mathit{\alpha})) \big)$ .

with $\hat{\mathit{p}}$ as specified above (a-posteriori probability). $N$ represents the number of observations.

The EM algorithm is specified as follows:

Initialization: initialize values for $\mathbf{\theta}$ and $\mathit{\alpha}$ .
Estimation: compute the posterior class probability $\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha})$
Maximize: estimate $\mathit{\theta}$ and $\mathit{\alpha}$ which maximize the likelihood using the posterior class probability $\hat{\mathit{p}}$ from the estimation step as weights: $\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha}) = \frac{\hat{p}(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2)}{ (1 - \hat{p}(\mathbf{x}, \mathit{\alpha})) \cdot f(\mathbf{y}, \mathbf{\theta}_1) + \hat{p}(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2) }$

The EM steps are repeated until the likelihood improvement falls below a certain threshold or the maximum number of iterations is reached.

Gaussian Mixture Model Without Concomitants

The simplest case is a Gaussian two-component mixture model without concomitants. In this case the density of the two components is the density $\phi$ of the Gaussian distribution with its parameters $\mathit{\theta}_1 = (\mu_1, \sigma_1)$ and $\mathit{\theta}_2 = (\mu_2, \sigma_2)$ where $\mu$ and $\sigma$ are the location and scale parameter of the Gaussian distribution, or mean and standard deviation.

Initialization step

First, initial values for the parameters ( $\mathit{\theta}$ ) and the posterior weights ( $\hat{\mathit{p}}$ ) have to be specified. $\mathit{\alpha}$ does not have to be initialized as no concomitant model is used in this case! To be able to do so we have to attribute each observation $\mathit{y}_i \forall i = 1, \dots, N$ to one of the two components. This initial membership will be denoted as $\mathit{z}$ and takes 1 if observation $y_i$ belongs to component 2 and 0 else. This initial attribution defines that observations with high values of $\mathit{y}$ belong to component 2, observations with low values of $\mathit{y}$ to component 1.

Note: Depending on the model specification this can lead to models where the probability for no foehn will be returned by foehnix rather than posteriori probability of foehn. However, the switch argument of the foehnix(...) function allows you to control this behavior (see foehnix manual page).

foehnix uses the following initialization for the two-component Gaussian mixture model without concomitants:

Initialize class membership: $z_i = \begin{cases}1 & \text{if}~y_i \ge \bar{y} \\ 0 & \text{else}\end{cases}$
Initial parameters for 𝛉(0)\mathbf{\theta}^{(0)} using weighted empirical moments for μ1\mu_1, μ2\mu_2 and the standard deviation of yy as initial guess for σ1\sigma_1 and σ2\sigma_2:
- $\mu_1^{(0)} = \frac{1}{\sum_{i=1}^{N} (1-z_i)} \sum_{i=1}^{N} (1-z_i) \cdot y_i$
- $\mu_2^{(0)} = \frac{1}{\sum_{i=1}^{N} z_i} \sum_{i=1}^{N} z_i \cdot y_i$
- $\sigma_1^{(0)} = \sigma_2^{(0)} = \big(\frac{1}{N} \sum_{i=1}^{N} (y_i - \bar{y})^2\big)^\frac{1}{2}$
Initialize $\mathit{\pi}^{(0)} = 0.5$
Given 𝜃(0)\mathit{\theta}^{(0)} and 𝜋(0)\mathit{\pi}^{(0)}: calculate a-posteriory probability:
- $\hat{\mathit{p}}^{(0)} = \frac{\mathit{\pi}^{(0)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(0)})}{ (1 - \mathit{\pi}^{(0)}) \cdot \phi(\mathit{y}, \mathit{\theta}_1^{(0)}) + \mathit{\pi}^{(0)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(0)})}$

Once the required elements have been initialized start the EM algorithm for $j = 1, ..., maxit$ :

Update $\pi^{(j)} = \text{mean}(\hat{\mathit{p}}^{(j-1)})$
Obtain new 𝜃(j)\mathit{\theta}^{(j)} using 𝑝̂(j−1)\hat{\mathit{p}}^{(j-1)}:
- $\mu_1^{(j)} = \frac{1}{\sum_{i=1}^{N} (1 - \hat{p}_i^{(j-1)})} \sum_{i=1}^{N} (1 - \hat{\mathit{p}}_i^{(j-1)}) \cdot y_i$
- $\mu_2^{(j)} = \frac{1}{\sum_{i=1}^{N} \hat{p}_i^{(j-1)}} \sum_{i=1}^{N} \hat{\mathit{p}}_i^{(j-1)} \cdot y_i$
- $\sigma_1^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} (1-\hat{p}_i^{(j-1)})} \sum_{i=1}^{N} (1 - \hat{p}_i^{(j-1)}) \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2}$
- $\sigma_2^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} \hat{p}_i^{(j-1)}} \sum_{i=1}^{N} \hat{p}_i^{(j-1)} \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2}$
Update posterior probabilities 𝑝̂(j)\hat{\mathit{p}}^{(j)}:
- $\hat{\mathit{p}}^{(j)} = \frac{\mathit{\pi}^{(j)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(j)})}{(1 - \mathit{\pi}^{(j)}) \cdot \phi(\mathit{y}, \mathit{\theta}_1^{(j)}) + \mathit{\pi}^{(j)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(j)})}$
Calculate likelihood: $\ell^{(j)}$ . If $j = 1$ proceed with step 5.
For $j > 1$ : if $(\ell^{(j)} - \ell^{(j-1)}) < \text{tol}$ the likelihood could not have been improved in iteration $j$ (converged or stuck): stop EM algorithm and return parameters of iteration $j-1$ . If $j = \text{maxit}$ : maximum number of iterations reached, stop EM algorithm, return parameters of iteration $j$ . Else proceed with step 5 until one of the stopping criteria is reached.

Gaussian Mixture Model With Concomitants

The optimizer for a two-component Gaussian mixture model with additional concomitants is very similar except that we also have to update the concomitant model (logistic regression model). For mixed models with concomitants the probabilities $\mathit{\pi}$ are a function of the concomitant covariates $\mathbf{x}$ and the regression coefficients $\mathit{\alpha}$ .

The following algorithm is used:

Initialize class membership $\mathit{z}$ as for the Gaussian mixture model without concomitants.
Initialize coefficients $\mathit{\theta}^{(0)}$ as for the Gaussian mixture model without concomitants.
Given $\mathit{z}^{(0)}$ and $\mathbf{x}$ : estimate logistic regression model to obtain the parameters $\mathit{\alpha}^{(0)}$ , calculate $\mathit{\pi}^{(0)} = \frac{\exp(\mathbf{x}^\top \mathit{\alpha})}{1 + \exp(\mathbf{x}^\top \mathit{\alpha})}$ (see logistic regression vignette).
Calculate a-posteriori probability $\hat{\mathit{p}}^{(0)}$ as for the Gaussian mixture model without concomitants.

The EM algorithm for $j = 1, \dots, \text{maxit}$ :

Update $\pi^{(j)}$ by updating the concomitant model (logistic regression model) using $\hat{\mathit{p}}^{(j-1)}$ as response for the concomitant model (see logistic regression vignette).
Obtain new $\mathit{\theta}^{(j)}$ as for the Gaussian mixture model without concomitants.
Update posterior probabilities $\hat{\mathit{p}}^{(j)}$ as for the Gaussian mixture model without concomitants.
Calculate likelihood as for the Gaussian mixture model without concomitants.
As for the Gaussian mixture model without concomitants: proceed with step 5 until one of the stopping criteria is reached.

Logistic Mixture Model

The logistic two-component mixture models can be estimated as the Gaussian ones except that component density is the density of the logistic distribution, and that the weighted empirical moments for $\sigma_1$ and $\sigma_2$ , the scale of the logistic distribution, is now:

$\sigma_1^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} (1-\hat{p}_i^{(j-1)})} \sum_{i=1}^{N} (1 - \hat{p}_i^{(j-1)}) \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2} \cdot \frac{\sqrt{3}}{3.1415}$
$\sigma_2^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} \hat{p}_i^{(j-1)}} \sum_{i=1}^{N} \hat{p}_i^{(j-1)} \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2} \cdot \frac{\sqrt{3}}{3.1415}$

Censored and Truncated Models

In case of a censored or truncated mixed model the distributional parameters $\mathit{\theta}$ of the components of the mixture model cannot be calculated using weighted empirical moments. In these cases a numreical likelihood-based solver is used to estimate $\mu_1$ , $\mu_2$ , $\sigma_1$ , and $\sigma_2$ .