Submission Instructions.

At this point you should be using your cleaned version of the book data
Submit your draft PDF to the Google Drive: Topic 02: Logistic regression and classification/Draft folder by the due date.
Read the question carefully. Some book questions require you to use appropriate variable selection techniques.

Part I: Logistic Regression

Playing with odds: PMA6 12.1
Logistic Regression modeling: PMA6 12.7, 12.8
Compare the models in part 2 for acute vs chronic illness.
1. Are the measures that are important in explaining the outcome similar?
2. For covariates that are the same, compare the effect of that covariate on each of the outcomes.
Northridge Earthquake: (PMA6 12.22-12.29 modified). This problem use the Northridge earthquake data set. [See here for more info about this event.] We are interested in knowing if homeowners (V449) were more likely than renters to report emotional injuries (W238) as a result of the Northridge earthquake, controlling for age (RAGE), gender (RSEX), and ethnicity (NEWETHN).
1. Download and use my cleaning script as a starter.
2. Fit a logistic regression model on emotional injury as the outcome using the variables listed above as predictors. Use tbl_regression to create a nice table of results that include odds ratios and 95% confidence intervals. See if you can figure out how to drop the intercept term from the table.
3. Interpret each predictor (except the intercept) in context of the problem.
4. Are the estimated effects of home ownership upon reporting emotional injuries different for men and women, controlling for age and ethnicity? That is, is there a significant interaction effect between gender and home ownership?

Playing Hookey (PMA6 12.18-12.19 modified)

Perform a binary logistic regression analysis using the Parental HIV data to model the probability of having been absent from school without a reason (variable HOOKEY). Find the variables that best predict whether an adolescent had been absent without a reason or not. Use a hefty dose of common sense here, not all variables are reasonable to use (e.g. using the # of times a student skips school to predict whether or not they will predict school)
Use the default value for the predict() function to create a vector of predictions for each student.
Explore the distribution of predictions against a few variables that you identified (via the model) as being highly predictive of skipping school
Create a confusion matrix for these predictions and interpret: accuracy, balanced accuracy, sensitivity, specificity, PPV, NPV.