This homework uses the modified Parental HIV data set found in Canvas
under this assignment. Use this template qmd
file.
Part 1: Exploring Imputation Techniques
- Identify the amount of missing in the entire data set.
- Identify the amount of missing in the 2 Parental bonding and 10
Brief Symptom Inventory scales.
- Explore and describe bivariate missing patterns between the parental
overprotection subscale and a different scale variable.
- Single impute
parent_overprotection
using the
hotdeck(dataset, variable = "var")
function in VIM. See
vignette("donorImp")
for more information.
- Multiply impute
parent_overprotection
using a non-mice
based imputation method of your choice that has a random component to
it.
- Calculate the point estimate \(Q\)
and the variance \(U\) from each
imputation.
- Pool estimates
- Comparison of Estimates. Create a summary table and plot containing
the point estiamte and 95% CI parental overprotection variable under a)
complete case, b) single imputation done in #4, and c) multiple
imputation done in #5. Summarize your findings.
Part 2: Multiple Imputation using Chained Equations
- Build an better imputation model for
parental_overprotection
. Do this by imputing the
pb01-pb25
, then recreate the
parental_overprotection
scale post-imputation. “Talk me”
through your process.
- Explore missing data patterns in other (non-scale) variables before
you build your model. Not all variables should be considered in the
imputation models but be sure to include
gender
and
hookey
. Use tables and plots but ensure all output is
discussed and don’t create output that won’t be discussed.
- Multiply impute this data set between \(m=5\) and \(m=10\) times using MICE. Make sure the
imputation models used for each variable are showing in your final
output. Adjust any that may not make sense for their variable type.
- Update the summary table and plot from Part I and compare how your
new model did compared to the others.
- After controlling for other measures, what is the effect of gender
on the odds a student will skip school? Adjust the model for
fit or stability as needed. Report your results in a nice table and/or
plot.
- Fit this model on the complete cases (no imputation).
- Fit this model on the multiply imputed data sets from the prior
problem, report the pooled estimates and intervals.
- Interpret the effect of gender on playing hookey. Did it change from
the complete case model?
- Create a plot to compare the results for all coefficients in the
model.
- What are the biggest differences you notice? Would the
inference/interpretation of the effect of any covariate on the odds of a
student skipping school change depending on what model you use?