经验分享:如何运用R的MICE包对数据集中不同变量采用不同方法及跳过部分变量进行多重插补

运用R的MICE包对数据集进行多重插补(multiple imputation),遇到两个具体需求:(1)只需针对缺失值较高的部分变量而不是全部变量进行填充(但仍想将全部变量纳入数据集中);(2)对于不同的具体变量,采用不同的多重插补具体方法(如处理存在多重共线性问题的部分变量需要采用“cart”方法)。

遍寻全网,终在一篇文章中找到解决方法,将相关内容记录分享如下。


Imputations can be created as


R> imp <- mice(nhanes2, me = c("polyreg", "pmm", "logreg", "norm"))

where function mice.impute.polyreg() is used to impute the first (categorical) variable age, mice.impute.ppm() for the second numeric variable bmi, function mice.impute.logreg() for the third binary variable hyp and function mice.impute.norm() for the numeric variable chl. The me parameter is a legal abbreviation of the method argument.


The mice() function will automatically skip imputation of variables that are complete. One of the problems in previous versions this function was that all incomplete data needed to be imputed. In mice 2.9 it is possible to skip imputation of selected incomplete variables by specifying the empty method "". This works as long as the incomplete variable that is skipped is not being used as a predictor for imputing other variables. The mice() function will detect this case, and automatically remove the variable from the predictor list. For example, suppose that we do not want to impute bmi, but still want to retain in it the imputed data. We can run the following

R> imp <- mice(nhanes2, meth = c("", "", "logreg", "norm"))

This statement runs because bmi is removed from the predictor list. When removal is not possible, the program aborts with an error message like Error in check.predictorMatrix(predictorMatrix, method, varnames, nmis, : Variable bmi is used, has missing values, but is not imputed.

*注意此处操作都需要按照各变量名的排列顺序,不能有所遗漏,否则将会有如上般的报错。


Reference: Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of statistical software45, 1-67.

https://www.jstatsoft.org/article/download/v045i03/550

你可能感兴趣的:(r语言,开发语言)