by country, sort: ipolate value year, generate(interpolated_value)
creates a new variable interpolated_value that has gaps filled in with linearly interpolated values. However, any first or last observation of a time series will not be imputed, as this is considered to be extrapolation by the -ipolate- command. In order to have these time points filled in as well, the -epolate- option needs to be specified:
by country, sort: ipolate value year, generate(interpolated_value) epolate
A useful additional variable for sensitivity checks is a dummy indicator distinguishing between original and interpolated values:
// Create interpolation indicator generate interpolated_value_dummy = (value == .) label define interpolated_value_dummy 0 "Original value" 1 "Interpolated value" label val interpolated_value_dummy interpolated_value_dummy label var interpolated_value_dummy "Value interpolation y/n"
A general problem with these interpolated values is that when they are analyzed like complete data, standard errors will be underestimated (Allison 2002) even if the linear interpolations are consistent. An alternative approach (to which, however, the same caveat applies) would be to use Lowess smoothing (Cleveland 1979) via -lowess-.
References
Allison, Paul D. 2002. Missing Data. Sage.Cleveland, William S. 1979. "Robust Locally Weighted Regression and Smoothing Scatterplots." Journal of the American Statistical Association 74(368):829-836. doi: 10.2307/2286407