This homework uses the modified Parental HIV data set found in Canvas
under this assignment. Use this template qmd
file.
Part 1: Exploring Imputation Techniques
- Identify the amount of missing in the entire data set.
- Identify the amount of missing in the 2 Parental bonding and 10
Brief Symptom Inventory scales.
- Explore and describe bivariate missing patterns between the parental
overprotection subscale and a different scale variable.
- Single impute
parent_overprotection using the
hotdeck(dataset, variable = "var") function in VIM. See
vignette("donorImp") for more information.
- Multiply impute
parent_overprotection using a non-mice
based imputation method of your choice that has a random component to
it.
- Calculate the point estimate \(Q\)
and the variance \(U\) from each
imputation.
- Pool estimates
- Comparison of Estimates. Create a summary table and plot containing
the point estiamte and 95% CI parental overprotection variable under a)
complete case, b) single imputation done in #4, and c) multiple
imputation done in #5. Summarize your findings.
Part 2: Multiple Imputation using Chained Equations
- Build an better imputation model for
parental_overprotection. Do this by imputing the
pb01-pb25, then recreate the
parental_overprotection scale post-imputation. “Talk me”
through your process.
- Explore missing data patterns in other (non-scale) variables before
you build your model. Not all variables should be considered in the
imputation models but be sure to include
gender and
hookey. Use tables and plots but ensure all output is
discussed and don’t create output that won’t be discussed.
- Multiply impute this data set between \(m=5\) and \(m=10\) times using MICE. Make sure the
imputation models used for each variable are showing in your final
output. Adjust any that may not make sense for their variable type.
- Update the summary table and plot from Part I and compare how your
new model did compared to the others.
- After controlling for other measures, what is the effect of gender
on the odds a student will skip school? Adjust the model for
fit or stability as needed. Report your results in a nice table and/or
plot.
- Fit this model on the complete cases (no imputation).
- Fit this model on the multiply imputed data sets from the prior
problem, report the pooled estimates and intervals.
- Interpret the effect of gender on playing hookey. Did it change from
the complete case model?
- Create a plot to compare the results for all coefficients in the
model.
- What are the biggest differences you notice? Would the
inference/interpretation of the effect of any covariate on the odds of a
student skipping school change depending on what model you use?