Software for multiple imputation
Multiple imputation
Multiple imputation is a simulation-based approach to the statistical
analysis of incomplete data. In multiple imputation, each missing
datum is replaced by m>1 simulated values. The resulting
m versions of the complete data can then be analyzed by
standard complete-data methods, and the results combined to produce
inferential statements (e.g. interval estimates or p-values) that
incorporate missing-data uncertainty.
Click here for answers to frequently asked
questions about multiple
imputation.
Back to top
Libraries for S-PLUS
At present, four different software packages are available for
creating multiple imputations in S-PLUS.
NORM
- Multiple imputation of multivariate continuous data under a
normal model. Routines described in Chapter 5 of Schafer (1997a).
CAT
- Multiple imputation of multivariate categorical data under
loglinear models. Routines described in Chapters 7-8 of Schafer (1997a).
MIX
- Multiple imputation of mixed continuous and categorical data under
the general location model. Routines described in Chapter 9 of Schafer (1997a).
PAN
- Multiple imputation of panel data or clustered data under a
multivariate linear mixed-effects model. Routines described in Schafer (1997b).
All four packages (NORM, CAT, MIX and PAN) are available as
functions in S-PLUS. For efficiency, the computationally intensive
portions are carried out in Fortran-77; the compiled Fortran
object code is dynamically loaded in the S-PLUS session.
S-PLUS for Unix:
Each package comes in the form of a shar archive, including Fortran
source that must be compiled for your particular system.
S-PLUS Version 3.3 for Windows:
Each package is a self-extracting zip (*.exe) file. Executing the
file will create an S-PLUS library.
S-PLUS Version 4.0 for Windows:
Each package is a self-extracting zip (*.exe) file. Executing the
file will create an S-PLUS library.
Having trouble with NORM in S-PLUS version 4.5 for Windows? Try
replacing your current "norm.obj" file with this new
version.
Back to top
Stand-alone packages for Windows
95/98/NT 
We have also been developing free, stand alone applications for
Windows 95, 98, and NT. As of July 1999, one package is available.
NORM
- Version 2.02 for Windows 95/98/NT. Multiple imputation of multivariate
continuous data under a normal model. This is a major update of
the package that we first released in 1997. It has lots of great new
features. Check it out!
Download NORM Version 2.03 for Windows.
Future
Windows software releases. We are still working on stand-alone
Windows versions of our other software packages CAT, MIX, and PAN. The
next package to be released will be PAN, perhaps by late summer
1999. Quality software takes time to develop, especially with our
limited resources. Please be patient; we are working as fast as we can!
Back to top
Authorship and use
This software was written by Joe Schafer of the
Department of Statistics, The Pennsylvania State University. Maren
Olsen (same affiliation) assisted in the development of the
stand-alone Windows applications. The software may be distributed free
of charge and used by anyone if credit is given. It has been tested
fairly well, but it comes with no guarantees and the authors assume no
liability for its use or misuse.
Back to top
Acknowledgements
Development of this software has been supported by grant 2R44CA65147-02
from National Institutes of Health, and by grant 1-P50-DA10075 from
the National Institute on Drug Abuse (NIDA). This ongoing work is carried out
at
The Pennsylvania State University,
in the
Department of Statistics
and at the NIDA-supported
Center for the Study of
Prevention through Innovative Methodology.
Back to top
Problems? Questions?
Because our software is distributed free of charge, our ability to
handle user's questions is limited. We are unable to provide detailed
advice regarding the use of these packages in specific data sets. In
our experience, many questions arise because users are unfamiliar with
the new statistical techniques implemented here. Potential users of
our software should first become throroughly familiar with the
technique of multiple imputation (see our FAQ
page). You should also browse the documentation provided with each
software package.
Back to top
References
- Schafer, J.L. (1997a)
- Analysis of Incomplete Multivariate
Data, Chapman & Hall, London.
- Schafer, J.L. (1997b)
- Imputation of missing covariates under a general
linear mixed model. Technical report, Dept. of Statistics, Penn
State University.
Back to top
created by Joe Schafer/
jls@stat.psu.edu/
revised July 12, 1999