Statistical Model

The automated foehn classification foehnix is based on a two-component mixture model. The basic idea is that two unobservable components (or clusters) exist. One component for situations without foehn, one component for situations with foehn. foehnix uses an unsupervised statistical method to identify the two components based on a set of observed values (e.g., wind speed, gust speed, potential temperature differences) to model the probability whether or not a specific observation is related to a foehn event.

The statistical model consists of two parts: one part to identify the two components, and a second part modelling the probability whether or not a specific observation belongs to component 1 or component 2. The latter is known as the concomitant model.

The density of a two-component mixed distribution h(…)h(\dots) in its general form is specified as follows for a specific observation ii:

  • h(yi,π‘₯i,πœƒ,𝛼)=(1βˆ’Ο€(π‘₯i,Ξ±))β‹…f(yi,πœƒ1)⏟component 1+Ο€(π‘₯i,Ξ±)β‹…f(yi,πœƒ2)⏟component 2h(y_i, \mathit{x}_i, \mathit{\theta}, \mathit{\alpha}) = \underbrace{(1 - \pi(\mathit{x}_i, \alpha)) \cdot f(y_i, \mathit{\theta}_1)}_{\text{component 1}} + \underbrace{\pi(\mathit{x}_i, \alpha) \cdot f(y_i, \mathit{\theta}_2)}_{\text{component 2}}

… where 𝑦\mathit{y} is the covariate for the first part of the statistical model to identify components 1 and 2, and 𝐱\mathbf{x} the covariates for the concomitant model. The density of the mixed distribution hh is the sum (or superposition) of the densities of the two components (ff; i.e., Gaussian distribution) times the probability Ο€\pi from the concomitant model which describes whether or not a specific observation belongs to component 2. πœƒ=(πœƒ1,πœƒ2)\mathit{\theta} = (\mathit{\theta}_1, \mathit{\theta}_2) are the distribution parameters of the components, 𝛼\mathit{\alpha} the parameters of the concomitant model.

The concomitant model can be any model which fulfills Ο€βˆˆ]0,1[\pi \in~]0,1[, e.g., an constant value or intercept only model (mixture model without concomitants), or any kind of probability model. foehnix uses a logistic regression model of the following form:

  • log(Ο€1βˆ’Ο€)=π±βŠ€π›Ό;Ο€=exp(π±βŠ€π›Ό)1+exp(π±βŠ€π›Ό)\log\big(\frac{\pi}{1 - \pi}\big) = \mathbf{x}^\top \mathit{\alpha};~~ \pi = \frac{\exp(\mathbf{x}^\top \mathit{\alpha})}{1 + \exp(\mathbf{x}^\top \mathit{\alpha})}

The final foehn probability of the two-component mixture model, also known as the a-posteriori probability, is given by:

  • 𝑝̂(𝑦,𝐱,πœƒ,𝛼)=Ο€(𝐱,𝛼)β‹…f(𝐲,𝛉2)(1βˆ’Ο€(𝐱,𝛼))β‹…f(𝐲,𝛉1)+Ο€(𝐱,𝛼)β‹…f(𝐲,𝛉2)\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha}) = \frac{\pi(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2)}{ (1 - \pi(\mathbf{x}, \mathit{\alpha})) \cdot f(\mathbf{y}, \mathbf{\theta}_1) ~+~ \pi(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2) }

… where 𝑝̂\hat{\mathit{p}} in our case represents the probability of foehn. All one has to know are the parameters πœƒ\mathit{\theta} and 𝛼\mathit{\alpha} which can be estimated using an appropriate M-estimator such as maximum likelihood.

Parameter Estimation

The maximum likelihood of a mixture model can usually not be maximized directly. One possibility to estimate the coefficients of is an iterative expectation maximization (EM) algorithm. The EM algorithm otimizes the following log-likelihood:

  • β„“=βˆ‘i=1N((1βˆ’π‘Μ‚)β‹…log(f(𝑦,πœƒ1))+𝑝̂⋅log(f(𝑦,πœƒ2))+(1βˆ’π‘Μ‚β‹…log(1βˆ’πœ‹(𝐱,𝛼))+𝑝̂⋅log(πœ‹(𝐱,𝛼)))\ell = \sum_{i=1}^N \big( (1 - \hat{\mathit{p}}) \cdot \log(f(\mathit{y}, \mathit{\theta}_1)) + \hat{\mathit{p}} \cdot \log(f(\mathit{y}, \mathit{\theta}_2)) + (1 - \hat{\mathit{p}} \cdot \log(1 - \mathit{\pi}(\mathbf{x}, \mathit{\alpha})) + \hat{\mathit{p}} \cdot \log(\mathit{\pi}(\mathbf{x}, \mathit{\alpha})) \big).

with 𝑝̂\hat{\mathit{p}} as specified above (a-posteriori probability). NN represents the number of observations.

The EM algorithm is specified as follows:

  • Initialization: initialize values for 𝛉\mathbf{\theta} and 𝛼\mathit{\alpha}.

  • Estimation: compute the posterior class probability 𝑝̂(𝑦,𝐱,πœƒ,𝛼)\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha})

  • Maximize: estimate πœƒ\mathit{\theta} and 𝛼\mathit{\alpha} which maximize the likelihood using the posterior class probability 𝑝̂\hat{\mathit{p}} from the estimation step as weights: 𝑝̂(𝑦,𝐱,πœƒ,𝛼)=pΜ‚(𝐱,𝛼)β‹…f(𝐲,𝛉2)(1βˆ’pΜ‚(𝐱,𝛼))β‹…f(𝐲,𝛉1)+pΜ‚(𝐱,𝛼)β‹…f(𝐲,𝛉2)\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha}) = \frac{\hat{p}(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2)}{ (1 - \hat{p}(\mathbf{x}, \mathit{\alpha})) \cdot f(\mathbf{y}, \mathbf{\theta}_1) + \hat{p}(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2) }

The EM steps are repeated until the likelihood improvement falls below a certain threshold or the maximum number of iterations is reached.

Gaussian Mixture Model Without Concomitants

The simplest case is a Gaussian two-component mixture model without concomitants. In this case the density of the two components is the density Ο•\phi of the Gaussian distribution with its parameters πœƒ1=(ΞΌ1,Οƒ1)\mathit{\theta}_1 = (\mu_1, \sigma_1) and πœƒ2=(ΞΌ2,Οƒ2)\mathit{\theta}_2 = (\mu_2, \sigma_2) where ΞΌ\mu and Οƒ\sigma are the location and scale parameter of the Gaussian distribution, or mean and standard deviation.

Initialization step

First, initial values for the parameters (πœƒ\mathit{\theta}) and the posterior weights (𝑝̂\hat{\mathit{p}}) have to be specified. 𝛼\mathit{\alpha} does not have to be initialized as no concomitant model is used in this case! To be able to do so we have to attribute each observation 𝑦iβˆ€i=1,…,N\mathit{y}_i \forall i = 1, \dots, N to one of the two components. This initial membership will be denoted as 𝑧\mathit{z} and takes 1 if observation yiy_i belongs to component 2 and 0 else. This initial attribution defines that observations with high values of 𝑦\mathit{y} belong to component 2, observations with low values of 𝑦\mathit{y} to component 1.

Note: Depending on the model specification this can lead to models where the probability for no foehn will be returned by foehnix rather than posteriori probability of foehn. However, the switch argument of the foehnix(...) function allows you to control this behavior (see foehnix manual page).

foehnix uses the following initialization for the two-component Gaussian mixture model without concomitants:

  1. Initialize class membership: zi={1ifyiβ‰₯yβ€Ύ0elsez_i = \begin{cases}1 & \text{if}~y_i \ge \bar{y} \\ 0 & \text{else}\end{cases}
  2. Initial parameters for 𝛉(0)\mathbf{\theta}^{(0)} using weighted empirical moments for ΞΌ1\mu_1, ΞΌ2\mu_2 and the standard deviation of yy as initial guess for Οƒ1\sigma_1 and Οƒ2\sigma_2:
    • ΞΌ1(0)=1βˆ‘i=1N(1βˆ’zi)βˆ‘i=1N(1βˆ’zi)β‹…yi\mu_1^{(0)} = \frac{1}{\sum_{i=1}^{N} (1-z_i)} \sum_{i=1}^{N} (1-z_i) \cdot y_i
    • ΞΌ2(0)=1βˆ‘i=1Nziβˆ‘i=1Nziβ‹…yi\mu_2^{(0)} = \frac{1}{\sum_{i=1}^{N} z_i} \sum_{i=1}^{N} z_i \cdot y_i
    • Οƒ1(0)=Οƒ2(0)=(1Nβˆ‘i=1N(yiβˆ’yβ€Ύ)2)12\sigma_1^{(0)} = \sigma_2^{(0)} = \big(\frac{1}{N} \sum_{i=1}^{N} (y_i - \bar{y})^2\big)^\frac{1}{2}
  3. Initialize πœ‹(0)=0.5\mathit{\pi}^{(0)} = 0.5
  4. Given πœƒ(0)\mathit{\theta}^{(0)} and πœ‹(0)\mathit{\pi}^{(0)}: calculate a-posteriory probability:
    • 𝑝̂(0)=πœ‹(0)β‹…Ο•(𝑦,πœƒ2(0))(1βˆ’πœ‹(0))β‹…Ο•(𝑦,πœƒ1(0))+πœ‹(0)β‹…Ο•(𝑦,πœƒ2(0))\hat{\mathit{p}}^{(0)} = \frac{\mathit{\pi}^{(0)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(0)})}{ (1 - \mathit{\pi}^{(0)}) \cdot \phi(\mathit{y}, \mathit{\theta}_1^{(0)}) + \mathit{\pi}^{(0)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(0)})}

Once the required elements have been initialized start the EM algorithm for j=1,...,maxitj = 1, ..., maxit:

  1. Update Ο€(j)=mean(𝑝̂(jβˆ’1))\pi^{(j)} = \text{mean}(\hat{\mathit{p}}^{(j-1)})
  2. Obtain new πœƒ(j)\mathit{\theta}^{(j)} using 𝑝̂(jβˆ’1)\hat{\mathit{p}}^{(j-1)}:
    • ΞΌ1(j)=1βˆ‘i=1N(1βˆ’pΜ‚i(jβˆ’1))βˆ‘i=1N(1βˆ’π‘Μ‚i(jβˆ’1))β‹…yi\mu_1^{(j)} = \frac{1}{\sum_{i=1}^{N} (1 - \hat{p}_i^{(j-1)})} \sum_{i=1}^{N} (1 - \hat{\mathit{p}}_i^{(j-1)}) \cdot y_i
    • ΞΌ2(j)=1βˆ‘i=1NpΜ‚i(jβˆ’1)βˆ‘i=1N𝑝̂i(jβˆ’1)β‹…yi\mu_2^{(j)} = \frac{1}{\sum_{i=1}^{N} \hat{p}_i^{(j-1)}} \sum_{i=1}^{N} \hat{\mathit{p}}_i^{(j-1)} \cdot y_i
    • Οƒ1(j)=(1βˆ‘i=1N(1βˆ’pΜ‚i(jβˆ’1))βˆ‘i=1N(1βˆ’pΜ‚i(jβˆ’1))β‹…(yiβˆ’yβ€Ύ)2)12\sigma_1^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} (1-\hat{p}_i^{(j-1)})} \sum_{i=1}^{N} (1 - \hat{p}_i^{(j-1)}) \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2}
    • Οƒ2(j)=(1βˆ‘i=1NpΜ‚i(jβˆ’1)βˆ‘i=1NpΜ‚i(jβˆ’1)β‹…(yiβˆ’yβ€Ύ)2)12\sigma_2^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} \hat{p}_i^{(j-1)}} \sum_{i=1}^{N} \hat{p}_i^{(j-1)} \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2}
  3. Update posterior probabilities 𝑝̂(j)\hat{\mathit{p}}^{(j)}:
    • 𝑝̂(j)=πœ‹(j)β‹…Ο•(𝑦,πœƒ2(j))(1βˆ’πœ‹(j))β‹…Ο•(𝑦,πœƒ1(j))+πœ‹(j)β‹…Ο•(𝑦,πœƒ2(j))\hat{\mathit{p}}^{(j)} = \frac{\mathit{\pi}^{(j)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(j)})}{(1 - \mathit{\pi}^{(j)}) \cdot \phi(\mathit{y}, \mathit{\theta}_1^{(j)}) + \mathit{\pi}^{(j)} \cdot \phi(\mathit{y}, \mathit{\theta}_2^{(j)})}
  4. Calculate likelihood: β„“(j)\ell^{(j)}. If j=1j = 1 proceed with step 5.
  5. For j>1j > 1: if (β„“(j)βˆ’β„“(jβˆ’1))<tol(\ell^{(j)} - \ell^{(j-1)}) < \text{tol} the likelihood could not have been improved in iteration jj (converged or stuck): stop EM algorithm and return parameters of iteration jβˆ’1j-1. If j=maxitj = \text{maxit}: maximum number of iterations reached, stop EM algorithm, return parameters of iteration jj. Else proceed with step 5 until one of the stopping criteria is reached.

Gaussian Mixture Model With Concomitants

The optimizer for a two-component Gaussian mixture model with additional concomitants is very similar except that we also have to update the concomitant model (logistic regression model). For mixed models with concomitants the probabilities πœ‹\mathit{\pi} are a function of the concomitant covariates 𝐱\mathbf{x} and the regression coefficients 𝛼\mathit{\alpha}.

The following algorithm is used:

  1. Initialize class membership 𝑧\mathit{z}as for the Gaussian mixture model without concomitants.
  2. Initialize coefficients πœƒ(0)\mathit{\theta}^{(0)}as for the Gaussian mixture model without concomitants.
  3. Given 𝑧(0)\mathit{z}^{(0)} and 𝐱\mathbf{x}: estimate logistic regression model to obtain the parameters 𝛼(0)\mathit{\alpha}^{(0)}, calculate πœ‹(0)=exp(π±βŠ€π›Ό)1+exp(π±βŠ€π›Ό)\mathit{\pi}^{(0)} = \frac{\exp(\mathbf{x}^\top \mathit{\alpha})}{1 + \exp(\mathbf{x}^\top \mathit{\alpha})} (see logistic regression vignette).
  4. Calculate a-posteriori probability 𝑝̂(0)\hat{\mathit{p}}^{(0)}as for the Gaussian mixture model without concomitants.

The EM algorithm for j=1,…,maxitj = 1, \dots, \text{maxit}:

  1. Update Ο€(j)\pi^{(j)} by updating the concomitant model (logistic regression model) using 𝑝̂(jβˆ’1)\hat{\mathit{p}}^{(j-1)} as response for the concomitant model (see logistic regression vignette).
  2. Obtain new πœƒ(j)\mathit{\theta}^{(j)}as for the Gaussian mixture model without concomitants.
  3. Update posterior probabilities 𝑝̂(j)\hat{\mathit{p}}^{(j)}as for the Gaussian mixture model without concomitants.
  4. Calculate likelihood as for the Gaussian mixture model without concomitants.
  5. As for the Gaussian mixture model without concomitants: proceed with step 5 until one of the stopping criteria is reached.

Logistic Mixture Model

The logistic two-component mixture models can be estimated as the Gaussian ones except that component density is the density of the logistic distribution, and that the weighted empirical moments for Οƒ1\sigma_1 and Οƒ2\sigma_2, the scale of the logistic distribution, is now:

  • Οƒ1(j)=(1βˆ‘i=1N(1βˆ’pΜ‚i(jβˆ’1))βˆ‘i=1N(1βˆ’pΜ‚i(jβˆ’1))β‹…(yiβˆ’yβ€Ύ)2)12β‹…33.1415\sigma_1^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} (1-\hat{p}_i^{(j-1)})} \sum_{i=1}^{N} (1 - \hat{p}_i^{(j-1)}) \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2} \cdot \frac{\sqrt{3}}{3.1415}
  • Οƒ2(j)=(1βˆ‘i=1NpΜ‚i(jβˆ’1)βˆ‘i=1NpΜ‚i(jβˆ’1)β‹…(yiβˆ’yβ€Ύ)2)12β‹…33.1415\sigma_2^{(j)} = \Big(\frac{1}{\sum_{i=1}^{N} \hat{p}_i^{(j-1)}} \sum_{i=1}^{N} \hat{p}_i^{(j-1)} \cdot (y_i - \bar{y})^2\Big)^\frac{1}{2} \cdot \frac{\sqrt{3}}{3.1415}

Censored and Truncated Models

In case of a censored or truncated mixed model the distributional parameters πœƒ\mathit{\theta} of the components of the mixture model cannot be calculated using weighted empirical moments. In these cases a numreical likelihood-based solver is used to estimate ΞΌ1\mu_1, ΞΌ2\mu_2, Οƒ1\sigma_1, and Οƒ2\sigma_2.