- At this point you should be using your cleaned version of the book data
- Submit your draft PDF to the Google Drive:
`Topic 02: Logistic regression and classification/Draft`

folder by the due date. - Read the question carefully. Some book questions require you to use appropriate variable selection techniques.

- Playing with odds: PMA6 12.1
- Logistic Regression modeling: PMA6 12.7, 12.8
- Compare the models in part 2 for acute vs chronic illness.
- Are the measures that are important in explaining the outcome similar?
- For covariates that are the same, compare the effect of that covariate on each of the outcomes.

- Northridge Earthquake: (PMA6 12.22-12.29 modified). This problem use
the Northridge earthquake data set. [See
here for more info about this event.] We are interested in knowing
if homeowners (
`V449`

) were more likely than renters to report emotional injuries (`W238`

) as a result of the Northridge earthquake, controlling for age (`RAGE`

), gender (`RSEX`

), and ethnicity (`NEWETHN`

).- Download and use my cleaning script as a starter.
- Fit a logistic regression model on emotional injury as the outcome
using the variables listed above as predictors. Use
`tbl_regression`

to create a nice table of results that include odds ratios and 95% confidence intervals. See if you can figure out how to drop the intercept term from the table. - Interpret each predictor (except the intercept) in context of the problem.
- Are the estimated effects of home ownership upon reporting emotional injuries different for men and women, controlling for age and ethnicity? That is, is there a significant interaction effect between gender and home ownership?

Playing Hookey (PMA6 12.18-12.19 modified)

- Perform a binary logistic regression analysis using the Parental HIV
data to model the probability of having been absent from school without
a reason (variable
`HOOKEY`

). Find the variables that best predict whether an adolescent had been absent without a reason or not. Use a**hefty dose of common sense**here, not all variables are reasonable to use (e.g. using the # of times a student skips school to predict whether or not they will predict school) - Use the default value for the
`predict()`

function to create a vector of predictions for each student. - Explore the distribution of predictions against a few variables that you identified (via the model) as being highly predictive of skipping school
- Create a confusion matrix for these predictions and interpret: accuracy, balanced accuracy, sensitivity, specificity, PPV, NPV.