vignettes/mixedmodel.Rmd
mixedmodel.Rmd
The automated foehn classification foehnix
is based on a two-component mixture model. The basic idea is that two unobservable components (or clusters) exist. One component for situations without foehn, one component for situations with foehn. foehnix
uses an unsupervised statistical method to identify the two components based on a set of observed values (e.g., wind speed, gust speed, potential temperature differences) to model the probability whether or not a specific observation is related to a foehn event.
The statistical model consists of two parts: one part to identify the two components, and a second part modelling the probability whether or not a specific observation belongs to component 1 or component 2. The latter is known as the concomitant model.
The density of a two-component mixed distribution \(h(\dots)\) in its general form is specified as follows for a specific observation \(i\):
… where \(\mathit{y}\) is the covariate for the first part of the statistical model to identify components 1 and 2, and \(\mathbf{x}\) the covariates for the concomitant model. The density of the mixed distribution \(h\) is the sum (or superposition) of the densities of the two components (\(f\); i.e., Gaussian distribution) times the probability \(\pi\) from the concomitant model which describes whether or not a specific observation belongs to component 2. \(\mathit{\theta} = (\mathit{\theta}_1, \mathit{\theta}_2)\) are the distribution parameters of the components, \(\mathit{\alpha}\) the parameters of the concomitant model.
The concomitant model can be any model which fulfills \(\pi \in~]0,1[\), e.g., an constant value or intercept only model (mixture model without concomitants), or any kind of probability model. foehnix
uses a logistic regression model of the following form:
The final foehn probability of the two-component mixture model, also known as the a-posteriori probability, is given by:
… where \(\hat{\mathit{p}}\) in our case represents the probability of foehn. All one has to know are the parameters \(\mathit{\theta}\) and \(\mathit{\alpha}\) which can be estimated using an appropriate M-estimator such as maximum likelihood.
The maximum likelihood of a mixture model can usually not be maximized directly. One possibility to estimate the coefficients of is an iterative expectation maximization (EM) algorithm. The EM algorithm otimizes the following log-likelihood:
with \(\hat{\mathit{p}}\) as specified above (a-posteriori probability). \(N\) represents the number of observations.
The EM algorithm is specified as follows:
Initialization: initialize values for \(\mathbf{\theta}\) and \(\mathit{\alpha}\).
Estimation: compute the posterior class probability \(\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha})\)
Maximize: estimate \(\mathit{\theta}\) and \(\mathit{\alpha}\) which maximize the likelihood using the posterior class probability \(\hat{\mathit{p}}\) from the estimation step as weights: \(\hat{\mathit{p}}(\mathit{y}, \mathbf{x}, \mathit{\theta}, \mathit{\alpha}) = \frac{\hat{p}(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2)}{ (1 - \hat{p}(\mathbf{x}, \mathit{\alpha})) \cdot f(\mathbf{y}, \mathbf{\theta}_1) + \hat{p}(\mathbf{x}, \mathit{\alpha}) \cdot f(\mathbf{y}, \mathbf{\theta}_2) }\)
The EM steps are repeated until the likelihood improvement falls below a certain threshold or the maximum number of iterations is reached.
The simplest case is a Gaussian two-component mixture model without concomitants. In this case the density of the two components is the density \(\phi\) of the Gaussian distribution with its parameters \(\mathit{\theta}_1 = (\mu_1, \sigma_1)\) and \(\mathit{\theta}_2 = (\mu_2, \sigma_2)\) where \(\mu\) and \(\sigma\) are the location and scale parameter of the Gaussian distribution, or mean and standard deviation.
First, initial values for the parameters (\(\mathit{\theta}\)) and the posterior weights (\(\hat{\mathit{p}}\)) have to be specified. \(\mathit{\alpha}\) does not have to be initialized as no concomitant model is used in this case! To be able to do so we have to attribute each observation \(\mathit{y}_i \forall i = 1, \dots, N\) to one of the two components. This initial membership will be denoted as \(\mathit{z}\) and takes 1 if observation \(y_i\) belongs to component 2 and 0 else. This initial attribution defines that observations with high values of \(\mathit{y}\) belong to component 2, observations with low values of \(\mathit{y}\) to component 1.
Note: Depending on the model specification this can lead to models where the probability for no foehn will be returned by foehnix
rather than posteriori probability of foehn. However, the switch
argument of the foehnix(...)
function allows you to control this behavior (see foehnix
manual page).
foehnix
uses the following initialization for the two-component Gaussian mixture model without concomitants:
Once the required elements have been initialized start the EM algorithm for \(j = 1, ..., maxit\):
The optimizer for a two-component Gaussian mixture model with additional concomitants is very similar except that we also have to update the concomitant model (logistic regression model). For mixed models with concomitants the probabilities \(\mathit{\pi}\) are a function of the concomitant covariates \(\mathbf{x}\) and the regression coefficients \(\mathit{\alpha}\).
The following algorithm is used:
The EM algorithm for \(j = 1, \dots, \text{maxit}\):
The logistic two-component mixture models can be estimated as the Gaussian ones except that component density is the density of the logistic distribution, and that the weighted empirical moments for \(\sigma_1\) and \(\sigma_2\), the scale of the logistic distribution, is now:
In case of a censored or truncated mixed model the distributional parameters \(\mathit{\theta}\) of the components of the mixture model cannot be calculated using weighted empirical moments. In these cases a numreical likelihood-based solver is used to estimate \(\mu_1\), \(\mu_2\), \(\sigma_1\), and \(\sigma_2\).