Jan 10, 2018

Expectation Maximization (EM) for missing values using Stata

The code below allows replicating the analyses from Allison (2002, pp. 21-3).

use "https://statisticalhorizons.com/wp-content/uploads/college.dta", clear

// Table 4.1
eststo clear estpost summarize gradrat csat lenroll private stufac rmbrd act esttab using test.tex, cells("count(label(Nonmissing cases)) mean(label(Mean) fmt(2)) sd(label(SD) fmt(2))") /// nomtitle nonumber /// title(Descriptive Statistics for College Data Based on Available Cases) /// booktabs replace // Tabe 4.2
 
eststo clear
eststo: regress gradrat csat lenroll private stufac rmbrd

#delimit ;
esttab using test.tex, cells("b(fmt(3) label(Coefficient))
                              se(fmt(3) label(Standard Error)) 
                              t(fmt(2) label(t Statistic)) 
                              p(fmt(4) label(p Value))")
                       order(_cons) coeflabel(_cons "Intercept")
         nomtitle nonumber
                       title(Regression that predicts GRADRAT Using Listwise Deletion) 
         booktabs append ;
#delimit cr

// EM imputation
mi set mlong
mi register imputed gradrat csat lenroll private stufac rmbrd act 
mi impute mvn gradrat csat lenroll private stufac rmbrd act, emonly
matrix m = r(Beta_em)' // Transpose matrix of imputed means
matrix C = corr(r(Sigma_em)) // Matrix of correlations
matrix variances = diag((vecdiag(r(Sigma_em)))) // Matrix of variances
matrix sds = vecdiag(cholesky(variances))' // Vector of standard deviations
matrix descriptives = m, sds // Matrix needed for Table 4.3

// Table 4.3 

 
esttab matrix(descriptives, fmt(2 2)) using test.tex, ///
       nomtitle title("Means and Standard Deviations from the EM Algorithm") ///
       booktabs append

// Table 4.4
esttab matrix(C, fmt(3 3)) using test.tex, ///
       nomtitle title("Correlations from the EM Algorithm") ///
       booktabs append

// Table 4.5
drop *                                         // Get rid of data but not matrices
ssd init gradrat csat lenroll private stufac rmbrd act 
ssd set observations 1302
ssd set means (stata) m
ssd set sd (stata) sds
ssd set corr (stata) C

eststo clear

eststo: sem (gradrat <- csat lenroll private stufac rmbrd) 
 
 
 
#delimit ;
esttab using test.tex, cells("b(fmt(3) label(Coefficient))
               se(fmt(3) label(Standard Error)) 
               t(fmt(2) label(t Statistic)) 
      p(fmt(4) label(p Value))")
     order(_cons) coeflabel(_cons "Intercept")
  nomtitle nonumber title(Regression that predicts GRADRAT Based on the EM Algorithm)
  keep(gradrat:) eqlabels("", none) // Removes equation label
  booktabs append 
  ;
#delimit cr

Reference

Allison, Paul D. 2002. Missing Data. Sage. doi: 10.4135/9781412985079