STATA FUN: The scaling problem in logistic multilevel regression

Aug 12, 2012

The scaling problem in logistic multilevel regression

The scaling problem in logit and probit regression models has been variously described (Auspurg and Hinz 2011, Best and Wolf 2012, Karlson et al. 2012, Mood 2010, Williams 2009), and for the single level case, some remedies have been proposed (Hoetker 2007 presented -complogit-, Kohler et al. 2011 presented -khb-). But for multilevel models, the problem has so far largely gone unacknowledged. Hox (2010, pp. 133-139) is one of the rare instance where the problem has been described for the logistic multilevel model, however a Stata implementation has been lacking. Dirk Enzmann and Ulrich Kohler have now provided such a solution with the command -meresc- (install via -ssc install meresc-, see slides with explanations by the creators here).

In order to be identified, logistic regression models need to assume a distribution for the level-1 error term amath sigma_R^2endamath, and customarily, a standard logistic distribution with a mean of zero and a variance of amath pi^2/3 ~~3.29 endamath is used for that. The problem with the logistic model now is that whenever an explanatory variable (which explains some of the level-1 variance) is added to the equation, the level-1 residual variance (representing the unobserved heterogeneity) does not actually decrease like in OLS regression, but is rescaled to again follow the standard logistic distribution. Consequently, besides any substantive parameter changes in the model, all other parameters in the model, both fixed and random effects, are also rescaled. So when a new variable is added to the equation, any changes in other coefficients could be either due to A) substantive changes (confounding, mediation, or suppression), or B) it could be due to reduced unobserved heterogeneity. Importantly, using a standard logistic regression model, we cannot tell which is which by just looking at the logit coefficients, odds ratios, and variance components.

In order to tackle this problem, Hox (2010) suggests standardizing the parameter estimates in a way that allows for seeing the 'real' change in parameter values. He suggests a procedure as follows:

Estimate a null model that only contains the outcome variable and calculate the total variance in the outcome, amath sigma_0^2 = sigma_u0^2 + sigma_R^2endamath, where the level-1 residual variance amath sigma_R^2 ~~ 3.29endamath and amath sigma_u0^2endamath is the intercept variance.
Estimate the model of interest m and calculate the total variance equals amath sigma_m^2 = sigma_F^2 + sigma_u0^2 + sigma_R^2endamath, where `sigma_F^2` represents the variance of the predictors from the fixed part of m.
The scale correction factor then equals amath sigma_0^2/sigma_m^2endamath.
In order to rescale all the parameter estimates of model m by multiplying A) variance estimates by the scale correction factor amath sigma_0^2/sigma_m^2endamath, and B) fixed effects parameter estimates by the square root of the scale correction factor amath sqrt(sigma_0^2/sigma_m^2endamath.

An example illustrates the use of -meresc-. Download the .do-file here.

capture which meresc
if _rc ssc install meresc
capture which r2_mz
if _rc ssc install r2_mz

webuse pisa2000, clear

// M0
xtmelogit pass_read || id_school: , var
r2_mz  // Necessary for the output
est sto m0

// M1
xtmelogit pass_read female || id_school: , var
r2_mz
est sto m1

// M1 rescaled
meresc
est sto m1sc

// M2
xtmelogit pass_read female high_school college || id_school: , var
r2_mz
est sto m2

// M2 rescaled
meresc
est sto m2sc


esttab m0 m1 m1sc m2 m2sc, drop(lns1_1_1:_cons) ///
       stat(Var_Rresc Var_u1 Var_u1resc deviance, ///
    labels("Var(Res.-r)" "Var(Int.)" "Var(Int.-r)" "Deviance")) ///
    mtitles("M0" "M1" "M1 rescaled" "M2" "M2 rescaled") ///
    nonumbers ///
 note("Var(Res.-r): Residual variance sigma_r^2," ///
              "rescaled; Var(Int.): Intercept variance" ///
              "sigma_u0^2;") ///
 addnote("Var(Int.-r): Intercept variance sigma_u0^2, rescaled")

The Table shows that the effects of rescaling in this case are rather small.

References

Auspurg, Katrin, and Thomas Hinz. 2011. "Gruppenvergleiche bei Regressionen mit binären abhängigen Variablen. Probleme und Fehleinschätzungen am Beispiel von Bildungschancen im Kohortenverlauf." [Group comparisons for regression models with binary dependent variables] Zeitschrift für Soziologie 40(1):62-73.

Best, Henning, and Christof Wolf. 2012. "Modellvergleich und Ergebnisinterpretation in Logit- und Probit-Regressionen." [Comparing nested models and interpreting results from logit and probit regression] Kölner Zeitschrift für Soziologie und Sozialpsychologie 64(2):377-395. doi: 10.1007/s11577-012-0167-4

Hoetker, Glenn. 2007. "The Use of Logit and Probit Models in Strategic Management Research. Critical Issues." Strategic Management Journal 28(4):331-343. doi: 10.1002/smj.582

Hox, Joop. 2010. Multilevel Analysis. Techniques and Applications, 2nd ed. Routledge.

Karlson, Kristian Bernt, Anders Holm, and Richard Breen. 2012. "Comparing Regression Coefficients Between Same-sample Nested Models Using Logit and Probit. A New Method." Sociological Methodology 42(1):286-313. doi: 10.1177/0081175012444861

Kohler, Ulrich, Kristian Bernt Karlson, and Anders Holm. 2011. "Comparing Coefficients of Nested Nonlinear Probability Models." Stata Journal 11(3):420-438.

Mood, Carina. 2010. "Logistic Regression. Why We Cannot Do What We Think We Can Do, and What We Can Do About It." European Sociological Review 26(1):67-82. doi: 10.1093/esr/jcp006

Williams, Richard. 2009. "Using Heterogeneous Choice Models to Compare Logit and Probit Coefficients Across Groups." Sociological Methods & Research 37(4):531-559. doi: 10.1177/0049124109335735