vignettes/inference.Rmd
inference.Rmd
foehnix
objects also provide asymptotic inference of the
estimated coefficients of the mixture model (see statistical model and logistic regression with IWLS).
Let’s start with estimating our demo flexmix
foehn
diagnosis model. We have already prepared the data sets (object
data
). Details how to generate the data object and more
information about the foehnix
model specification can be
found on the getting started manual page.
# Estimate the model
mod <- foehnix(diff_t ~ ff + rh, data = data)
The summary
method prints the test statistics for both
parts of the mixture model, the component model and the concomitant
model (if specified).
summary(mod, detailed = TRUE)
##
## Call: foehnix(formula = diff_t ~ ff + rh, data = data)
##
## Number of observations (total) 113952 (5527 due to inflation)
## Removed due to missing values 20014 (17.6 percent)
## Outside defined wind sector 0 (0.0 percent)
## Used for classification 93938 (82.4 percent)
##
## Climatological foehn occurance 75.87 percent (on n = 93938)
## Mean foehn probability 75.61 percent (on n = 93938)
##
## Log-likelihood: -221461.2, 7 effective degrees of freedom
## Corresponding AIC = 442936.3, BIC = 443002.5
## Time required for model estimation: 19.8 seconds
##
## Cluster separation (ratios close to 1 indicate
## well separated clusters):
## prior size post>0 ratio
## Component 1 (foehn) 0.76 71268 93682 0.76
## Component 2 (no foehn) 0.24 22670 62085 0.37
##
## ---------------------------------
##
## Concomitant model: z test of coefficients
## Estimate Std. error z value Pr(>|z|)
## cc.(Intercept) 2.34150441 0.04768558 49.103 < 2.2e-16 ***
## cc.ff -1.88353356 0.00519504 -362.564 < 2.2e-16 ***
## cc.rh 1.76129854 0.00077906 2260.790 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Number of IWLS iterations: 1 (algorithm converged)
## Dispersion parameter for binomial family taken to be 1.
The inference for the two location parameters and of the two components are based on the asymptotic theory. We expect that the estimates of our coefficients are unbiased. Thus, the expectation of our estimated coefficients is the estimated coefficient itself (, ).
In a general form the covariance matrix of a liner model (one component, unweighted) for a set of observations can be expressed as follows:
… where
is the sample size,
the number of parameters or covariates,
the model residuals, and
the model matrix of the linear model. In case of a foehnix
model we have two components where each component consists of an
intercept only model
(
and
do not depend on additional covariates). Thus,
and
is an
matrix with 1s. The estimates are based on a set of weighted observation
,
in this example diff_t
(diff_t
). The weights are the a-posteriori probabilities
(the foehn probabilities) of the foehnix
model and have to
be taken into account when calculating the standard errors. With these
weights the standard error for component 2 can be written
as:
… or much simpler:
The same holds for component 1 except that our weights are :
If a concomitant model has been specified summary
will
also return the corresponding z statistics for the estimated regression
coefficients of the logistic regression model (see logistic regression with IWLS).
The covariance matrix of a logistic logistic regression model with the regression coefficients (with coefficients) with a dispersion parameter of 1 of the binomial family is given as:
where is the model matrix of the concomitant model (un-standardized) of dimension and . is final response, probability returned by the logistic regression model. The diagonal of the covariance matrix contains the variances of the estimated regression coefficients. Thus,