vignettes/inference.Rmd
inference.Rmd
foehnix
objects also provide asymptotic inference of the estimated coefficients of the mixture model (see statistical model and logistic regression with IWLS).
Let’s start with estimating our demo flexmix
foehn diagnosis model. We have already prepared the data sets (object data
). Details how to generate the data object and more information about the foehnix
model specification can be found on the getting started manual page.
# Estimate the model
mod <- foehnix(diff_t ~ ff + rh, data = data)
The summary
method prints the test statistics for both parts of the mixture model, the component model and the concomitant model (if specified).
summary(mod, detailed = TRUE)
##
## Call: foehnix(formula = diff_t ~ ff + rh, data = data)
##
## Number of observations (total) 113952 (5527 due to inflation)
## Removed due to missing values 20014 (17.6 percent)
## Outside defined wind sector 0 (0.0 percent)
## Used for classification 93938 (82.4 percent)
##
## Climatological foehn occurance 75.87 percent (on n = 93938)
## Mean foehn probability 75.61 percent (on n = 93938)
##
## Log-likelihood: -221461.2, 7 effective degrees of freedom
## Corresponding AIC = 442936.3, BIC = 443002.5
## Time required for model estimation: 18.9 seconds
##
## Cluster separation (ratios close to 1 indicate
## well separated clusters):
## prior size post>0 ratio
## Component 1 (foehn) 0.76 71268 93682 0.76
## Component 2 (no foehn) 0.24 22670 62085 0.37
##
## ---------------------------------
##
## Concomitant model: z test of coefficients
## Estimate Std. error z value Pr(>|z|)
## cc.(Intercept) 2.34150441 0.04768558 49.103 < 2.2e-16 ***
## cc.ff -1.88353356 0.00519504 -362.564 < 2.2e-16 ***
## cc.rh 1.76129854 0.00077906 2260.790 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Number of IWLS iterations: 1 (algorithm converged)
## Dispersion parameter for binomial family taken to be 1.
The inference for the two location parameters \(\mu_1\) and \(\mu_2\) of the two components are based on the asymptotic theory. We expect that the estimates of our coefficients are unbiased. Thus, the expectation of our estimated coefficients is the estimated coefficient itself (\(\text{E}(\hat{\mu}_1) = \mu_1\), \(\text{E}(\hat{\mu}_2) = \mu_2\)).
In a general form the covariance matrix of a liner model (one component, unweighted) for a set of \(i = 1, \dots, N\) observations can be expressed as follows:
… where \(N\) is the sample size, \(P\) the number of parameters or covariates, \(\mathit{\epsilon} = \mathit{y} - \mu_\bullet\) the model residuals, and \(\mathbf{X}\) the model matrix of the linear model. In case of a foehnix
model we have two components where each component consists of an intercept only model (\(\mu_1\) and \(\mu_2\) do not depend on additional covariates). Thus, \(P = 1\) and \(\mathbf{X}\) is an \(N \times 1\) matrix with 1s. The estimates are based on a set of weighted observation \(y\), in this example diff_t
(\(y =\) diff_t
). The weights are the a-posteriori probabilities \(\hat{\mathit{p}}\) (the foehn probabilities) of the foehnix
model and have to be taken into account when calculating the standard errors. With these weights the standard error for component 2 can be written as:
… or much simpler:
The same holds for component 1 except that our weights are \(1 - \hat{\mathit{p}}\):
If a concomitant model has been specified summary
will also return the corresponding z statistics for the estimated regression coefficients of the logistic regression model (see logistic regression with IWLS).
The covariance matrix of a logistic logistic regression model with the regression coefficients \(\mathit{\alpha}\) (with \(P\) coefficients) with a dispersion parameter of 1 of the binomial family is given as:
where \(\mathbf{X}\) is the model matrix of the concomitant model (un-standardized) of dimension \(N \times P\) and \(\mathit{\omega} = (\mathit{\pi} * (1 - \mathit{\pi}))^\frac{1}{2}\). \(\pi\) is final response, probability returned by the logistic regression model. The diagonal of the covariance matrix contains the variances of the estimated regression coefficients. Thus,