Analyzing and Pooling Results From Multiply Imputed Data

Multiple imputation is a statistical technique for handling missing data. It outperforms classical approaches to treating missing data, such as listwise deletion or mean imputation by generating more unbiased parameter estimates and more efficient standard errors. Multiple imputation takes a dataset with missing values and creates m complete datasets that can be analyzed with complete data analytic approaches.

In this blog post, I will be explaining analyzing multiply imputed data and then and pooling the results. What I was first learning about multiple imputation, one of the most confusing part was how to combine the results from each imputed dataset. I had m sets of results rather than just 1. Which one was correct? How did I get a single "final" result? That is what we are going to be going over today.

The main thing to keep in mind about analyzing multiply imputed data is that you always do the analysis - or extract the statistical information - separately for each imputed dataset. Then you pool (aka combine) the results from each analysis together with specific formulas. Two things you DON'T do is 1) only analyze one of the imputed datasets and use that as the final result or 2) stack the imputed datasets together and analyze that "super" dataset to get the final result (For some types of analyses, a stacked "supermatrix" dataset can be used to get unbiased parameter estimates. The difficulty is obtaining correct standard errors and/or confidence intervals. See Lang & Little (2014) for a creative solution though).

For this blog post, I am assuming we have already created our multiply imputed datasets. Here, we will be just going over what to do once we have a set of multiply imputed datasets and wish to conduct our analysis. I show how I create the imputed data for readers interested in the code I used to set up the blog post, but I will not be discussing those details. For the statistical programming, I will be using R - open source computer software. For more information about R go to <https://www.r-project.org/about.html>.

I will also be using a package that I created called `str2str` (read as “structure to structure”), which contains a lot of simple wrapper functions for converting R objects from one structure to another. I find using these functions save a few lines of code and generally makes code easier to read. If you want to learn more about the package, you can go to the str2str documentation webpage. Because I used Rmarkdown for the analyses, the blog post itself is saved to a PDF. Click here to download the PDF file.