STATA FUN: Morgan and Winship's (2007) example for bias due to conditioning on a collider in Stata

Jul 9, 2013

Morgan and Winship's (2007) example for bias due to conditioning on a collider in Stata

Morgan and Winship (2007: p. 66) illustrate Pearl's (2009) concern about conditioning on a collider variable using a simple example.

The general problem of conditioning on a collider is as follows. Consider three variables A, B, and C, with both A and B being causes of C: A → C ← B. (Formally, any variable C that has two arrows pointing to it along a given path is a collider.) Unlike a confounder (an uncontrolled common cause of A and B), a collider does not induce a zero-order correlation between A and B. However, when handled inappropriately, a collider can induce a conditional correlation between A and B.

Morgan and Winship's example shows just that: A college admits applicants based on their SAT scores and ratings of their motivation based on an interview. Those in the top 15 per cent of the sum of SAT and motivation ratings are being admitted. SAT scores and motivation ratings are largely uncorrelated.

// Generate and label two variables
drawnorm sat motivation, ///
         n(250) ///
         means(.007, -.053) ///
         sds(1.01, 1.02) ///
         corr(1, .035, 1) cstorage(lower) ///
         clear seed(1)
label var sat        "SAT"
label var motivation "Motivation"

// Only the 15 per cent at the top are admitted
gen admission_sc = sat + motivation
_pctile admission_sc, percentiles(85)

gen admission = (admission_sc > r(r1))
label var admission "Admission status"
label define admission 1 "Admitted applicant" ///
                       0 "Rejected applicant"
label val admission admission
drop admission_sc

// Plot as in Morgan and Winship, p. 67:
twoway (scatter motivation sat if admission == 1) ///
       (scatter motivation sat if admission == 0) ///
       , ///
       legend(label(1 "Admitted applicants") ///
              label(2 "Rejected applicants") ///
              pos(5) ring(0)) ///
       ylabel(none) xlabel(none) ///
       name(collider1, replace)

// Enhanced plot with fitted lines and correlations
quietly cor motivation sat
local r_overall = round(r(rho), .01)
quietly cor motivation sat if admission == 1
local r_admitted = round(r(rho), .01)
quietly cor motivation sat if admission == 0
local r_rejected = round(r(rho), .01)
    
twoway (scatter motivation sat if admission == 1) ///
       (scatter motivation sat if admission == 0) ///
       (lfit motivation sat) ///
       (lfit motivation sat if admission == 1) ///
       (lfit motivation sat if admission == 0) ///
       , ///
       legend(label(1 "Admitted applicants") ///
              label(2 "Rejected applicants") ///
              label(3 "Overall fit, {it:r} = `r_overall'") ///
              label(4 "Fit for admitted, {it:r} = `r_admitted'") ///
              label(5 "Fit for rejected, {it:r} = `r_rejected'") ///
              pos(5) ring(0)) ///
       ylabel(none) xlabel(none) ytitle("Motivation")

What the Figures show is that the very small correlation between motivation and SAT score for the overall group turns out to be much larger when conditioning for admission status.

This also shows in an OLS regression:

regress motivation sat, beta
estimates store m1
regress motivation sat admission, beta
estimates store m2

estimates table m1 m2, b(%7.2f) se(%7.2f) stats(N) label

----------------------------------------------
                Variable |   m1        m2     
-------------------------+--------------------
                     SAT |    0.10     -0.21  
                         |    0.06      0.06  
        Admission status |              1.60  
                         |              0.17  
                Constant |   -0.12     -0.36  
                         |    0.06      0.06  
-------------------------+--------------------
                       N |     250       250  
----------------------------------------------
                                  legend: b/se

Cole et al. (2010) present additional illustrations for this problem.

References

Cole, Stephen R., Robert W. Platt, Enrique F. Schisterman, Haitao Chu, Daniel Westreich, David Richardson, and Charles Poole. 2010. "Illustrating Bias Due to Conditioning on a Collider." International Journal of Epidemiology 39(2):417-420. doi: 10.1093/ije/dyp334

Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal Inference. Methods and Principles for Social Research. Cambridge University Press.

Pearl, Judea. 2009. Causality. Models, Reasoning, and Inference, 2nd ed. Cambridge University Press.