Aug 20, 2012

Publication-style correlation tables in Stata

Quite a number of Stata users have engaged in programming commands to create readymade correlation tables to use in publications. Stata's own -correlate- and -pwcorr- lack many desirable features; for instance, means and standard deviations cannot be included in the table automatically and columns cannot be numbered. Furthermore, variable labels cannot be displayed, only variable names can be shown in the table.
sysuse auto
correlate price-foreign, means


-estpost-, -mkcorr-, -corrtab-, and -makematrix- are all user-written commands which aim to improve the Stata default table.

 

-estpost-

-estpost- stems from Ben Jann's mighty -estout- package. The general use of the command is as follows:
sysuse auto
capture which estout
if _rc ssc install estout
estpost correlate price-trunk, matrix
esttab, unstack not nonum compress noobs


In order to display both the upper and the lower triangle of the matrix, the -nohalf- option of -estpost- needs to be used:
estpost correlate price-trunk, matrix nohalf
However, getting -esttab- to use variable labels or to include descriptive statistics doesn't seem to be a trivial task, although the latter can be done. Finally, it should be noted that the default of -estpost correlate- is pairwise deletion, not listwise deletion; so that it is -pwcorr- rather than -correlate-. In order to obtain a correlation matrix based on listwise deletion, the -listwise- option needs to be specified.

 

-mkcorr-

-mkcorr- by Glenn Hoetker comes with many nice features and only few drawbacks.
sysuse auto, clear
capture which mkcorr
if _rc ssc install mkcorr
mkcorr price-trunk, log(corr_table)
It allows including summary statistics (mean, SD, minimum, and maximum) into the table:
mkcorr price-trunk, log(corr_table) replace means
The use of variable labels:
mkcorr price-trunk, log(corr_table) replace lab
The use of numbers in the column headers:
mkcorr price-trunk, log(corr_table) replace num
The inclusion of p-values:
mkcorr price-trunk, log(corr_table) replace means sig 
Furthermore, the number of decimal places for correlation can be manipulated via -cdec(#)- (-mdec(#)- for summary statistics).
The main drawback is that the table is not being displayed in the results window; instead it can only be written into a tab-separated text-only file that then needs additional formatting in Word or Excel. Also, the p-values are rather ugly; an option to include stars to denote statistical significance would be a great improvement. Again, the default setting is to provide correlations based on pairwise deletion; the option -casewise- will yield results based on listwise deletion.

-corrtab-

-corrtab- by Fred Wolfe is a tool that has a somewhat more limited functionality, or, more functions that I don't find to be of great use.
sysuse auto, clear
capture which corrtab
if _rc ssc install corrtab
corrtab price-trunk 
-corrtab- also allows surpressing correlations based on their p-values when using the option -print(#)-:
corrtab price-trunk, print (.10)
Or, based on the absolute value of the correlation coefficient when specifying -above(#)-:
corrtab price-trunk, above(.4)

Correlations can also be surpressed based on the position in the list of variables. The option -var(#)- only lists the first # variables in the columns, later variables are only included in the rows of the table:
corrtab price-trunk, var(3)
Correlations can be sorted by size for a single variable specified in -vsort()-:
corrtab price-trunk, vsort(price)
In order to use variable labels, some complicated additional commands need to be used; and the default for treating missing values is also pairwise deletion.
In sum, there seems to be little reason to ever use -corrtab-.

 

-makematrix-

-makematrix- by Nick Cox is a very flexible tool for all sort of applications, one of them being correlation tables. He explains it a bit in Cox (2003), but it seems to require too much tweaking to get readymade publication-style correlation tables in a short amount of time. Furthermore, there's a bug in the command: The option -listwise- will yield a pairwise correlation matrix instead of a correlation based on listwise deletion.

In sum, the commands reviewed here all leave a lot to be desired; -mkcorr- is the command that seems to be the most useful one to me.

References

Cox, Nicholas J. 2003. "Speaking Stata. Problems with Tables, Part II." Stata Journal 3(4):420-439.