HW 03 - Missing Data Assignment

This homework uses the modified Parental HIV data set found in Canvas under this assignment. Use this template qmd file.

Part 1: Exploring Imputation Techniques

Identify the amount of missing in the entire data set.
Identify the amount of missing in the 2 Parental bonding and 10 Brief Symptom Inventory scales.
Explore and describe bivariate missing patterns between the parental overprotection subscale and a different scale variable.
Single impute parent_overprotection using the hotdeck(dataset, variable = "var") function in VIM. See vignette("donorImp") for more information.
Multiply impute parent_overprotection using a non-mice based imputation method of your choice that has a random component to it.
1. Calculate the point estimate \(Q\) and the variance \(U\) from each imputation.
2. Pool estimates
Comparison of Estimates. Create a summary table and plot containing the point estiamte and 95% CI parental overprotection variable under a) complete case, b) single imputation done in #4, and c) multiple imputation done in #5. Summarize your findings.

Part 2: Multiple Imputation using Chained Equations

Build an better imputation model for parental_overprotection. Do this by imputing the pb01-pb25, then recreate the parental_overprotection scale post-imputation. “Talk me” through your process.
1. Explore missing data patterns in other (non-scale) variables before you build your model. Not all variables should be considered in the imputation models but be sure to include gender and hookey. Use tables and plots but ensure all output is discussed and don’t create output that won’t be discussed.
2. Multiply impute this data set between \(m=5\) and \(m=10\) times using MICE. Make sure the imputation models used for each variable are showing in your final output. Adjust any that may not make sense for their variable type.
3. Update the summary table and plot from Part I and compare how your new model did compared to the others.
After controlling for other measures, what is the effect of gender on the odds a student will skip school? Adjust the model for fit or stability as needed. Report your results in a nice table and/or plot.
1. Fit this model on the complete cases (no imputation).
2. Fit this model on the multiply imputed data sets from the prior problem, report the pooled estimates and intervals.
3. Interpret the effect of gender on playing hookey. Did it change from the complete case model?
4. Create a plot to compare the results for all coefficients in the model.
5. What are the biggest differences you notice? Would the inference/interpretation of the effect of any covariate on the odds of a student skipping school change depending on what model you use?