HW04 - Dimension reduction via PCA and FA
This assignment uses the
depression
andparental hiv
data sets, and the factoextra package.
Principal Components
1. PMA6 14.1 (modified)
For the depression data set, perform a PCA on the last seven variables DRINK-CHRONILL
. Use the covariance matrix, but do not center or scale the data. You should have the codebook for this data set open during this homework.
1a) Select the relevant variables and ensure their data type is numeric.
1b) Generate the PC’s using princomp
, use the loading values to write out the equations for the first two PC’s.
1c) Use the output from summary
and get_eigenvalue
to determine the number of PC’s to retain that contain 80% of the original variance.
1d) Create TWO biplots using both biplot
and/or fviz_pca_biplot
; 1) of PC’s 1 & 2, 2) of two other PC’s (e.g. 3 and 4, or 3 and 5….). Use these biplots to start to interpret the correlations between the PC’s (dims) and the original variables.
2. Repeat question #1b-d using the correlation matrix.
Compare the results and comment. (Are the same number of PC’s retained? Are the loadings different?)
3 PMA6 14.11 (modified)
This question uses the Parental HIV data set.
3a) If you were to conduct a PCA on all items of the Parental Bonding scale (pb01-pb25
), a priori how many PC’s would you expect to retain for this scale?
3b) Perform this PCA using the correlation matrix. Using all of the three guides: % variance retained, Eigenvalues > 1, and a scree plot, choose a number of PC’s to retain.
4. Perform a regression analysis of bsi_overall
.
Include your retained PC’s from question 3 as predictors. Compare this model to one that only uses the two parental bonding scales as defined in the codebook. Compare the models using appropriate measures of model fit.
Factor Analysis
1. PMA6 15.1
The CESD scale items (C1-C20
) from the depression data set were used to obtain the factor loadings listed in Table 15.7. The initial factor solution was obtained from the principal components method, and a varimax rotation was performed. Analyze this same data set by using an oblique rotation such as the direct quartimin procedure. Compare the results.
2. PMA6 15.6
Separate the depression data set into two subgroups, men and women. Using four factors, repeat the factor analysis in Table 15.7. Compare the results of your two factor analyses to each other, and do the results in Table 15.7.
3. PMA6 15.8
Perform a factor analysis on all of the items of the Parental Bonding scale for the Parental HIV data set. Retain two factors. Rotate the factors using an orthogonal rotation. Do the items with the highest loadings for each of the factors correspond to the items of the overprotection and care scale? Interpret the findings.
4. PMA6 15.9
Repeat problem #3 (15.8) using an oblique rotation. Do the substantive conclusions change?