Ridge regression is an alternative to least squares regression when predictors is highly correlated and it can effectively eliminates collinearity, leading to more precise, and therefore more interpretable, parameter estimates.
The basic requirement to perform ordinary least squares regression (OLS) is that the inverse of the matrix \(X^TX\) exists. \(X^TX\) is typically scaled so that it represents a correlation matrix of all predictors. However, multicollinearity between predictors will cause \((X^TX)^{-1}\) to be indeterminate. For this case, we need some other regression method to make \((X^TX)^{-1}\) to be calculable. Ridge regression modifies \(X^TX\) such that its determinant does not equal 0; this ensures that \((X^TX)^{-1}\) is calculable. Modifying the matrix in this way effectively eliminates collinearity, leading to more precise, and therefore more interpretable, parameter estimates.
But, in statistics, there is always a trade-off between variance and bias. Therefore, there is a cost to this decrease in variance: an increase in bias. However, the bias introduced by ridge regression is almost always toward the null. Thus, ridge regression is considered a “shrinkage method”, since it typically shrinks the \(\hat \beta\) toward 0.
For our data analysis below, we will use the data set seatpos. This dataset appears is available in R paCkage:faraway. The variables are Age in year (Age), Weight in lbs (Weight), Height in shoes in cm (HtShoes), Height bare foot in cm (Ht), Seated height in cm (Seated), lower arm length in cm (Arm), Thigh length in cm (Thigh), Lower leg length in cm (Leg), and horizontal distance of the midpoint of the hips from a fixed location in the car in mm (hipcenter). We are going to use all variables except hipcenter to predict it.
libname mylib 'F:/mylib';
proc import datafile="F:/grouppj/seatpos.csv" out=mylib.seatpos dbms=csv replace;
getnames=yes;
run;
proc means data = mylib.seatpos;
var hipcenter Age Weight HtShoes Ht Seated Arm Thigh Leg;
run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------
hipcenter 38 164.8848684 59.6473848 30.9500000 279.1500000
Age 38 35.2631579 15.3687718 19.0000000 72.0000000
Weight 38 155.6315789 35.7811826 100.0000000 293.0000000
HtShoes 38 171.3894737 11.1482586 152.8000000 201.2000000
Ht 38 169.0842105 11.1733159 150.2000000 198.4000000
Seated 38 88.9526316 4.9317908 79.4000000 101.6000000
Arm 38 32.2157895 3.3714642 26.0000000 39.6000000
Thigh 38 38.6552632 3.8749854 31.0000000 45.5000000
Leg 38 36.2631579 3.4036881 30.2000000 43.1000000
-------------------------------------------------------------------------------
Before we implement a regression model, we typically standardize the variables first. It is necessary for ridge regression, since it would change the parameter we estimate. It won’t change multicollinearity between predictors, so standardizing the variables could be done before after checking for multicollinearity.
To standardize variables in SAS, you can use proc standard. The mean=0 and std=1 options are used to tell SAS what you want the mean and standard deviation to be for the variables named on the var statement. Of course, a mean of 0 and standard deviation of 1 indicate that you want to standardize the variables. The out=zseatpos option states that the output file with the standardized variables will be called zseatpos.
libname mylib 'F:/mylib';
proc standard data=mylib.seatpos mean=0 std=1 out=mylib.zseatpos;
var hipcenter Age Weight HtShoes Ht Seated Arm Thigh Leg;
run;
According what we mentioned above, ridge regression is to slove the problems in the regression caused by multicollinearity. We have to check for multicollinearity first. In this page, we check for high variance inflation factors (VIFs). The rule of thumb is that a \(VIF>10\) indicates multicollinearity. In SAS, VIFs can be obtained by using the option /vif.
libname mylib 'F:/mylib';
proc reg data=mylib.zseatpos;
model hipcenter = Age Weight HtShoes Ht Seated Arm Thigh Leg/vif;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: hipcenter
Number of Observations Read 38
Number of Observations Used 38
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 25.40248 3.17531 7.94 <.0001
Error 29 11.59752 0.39991
Corrected Total 37 37.00000
Root MSE 0.63239 R-Square 0.6866
Dependent Mean -3.988E-16 Adj R-Sq 0.6001
Coeff Var -1.58571E17
Parameter Estimates
Parameter Standard Variance
Variable DF Estimate Error t Value Pr > |t| Inflation
Intercept 1 1.73091E-16 0.10259 0.00 1.0000 0
Age 1 -0.19987 0.14695 -1.36 0.1843 1.99793
Weight 1 -0.01578 0.19854 -0.08 0.9372 3.64703
HtShoes 1 0.50322 1.82287 0.28 0.7845 307.42938
Ht 1 -0.11265 1.89756 -0.06 0.9531 333.13783
Seated 1 -0.04413 0.31104 -0.14 0.8882 8.95105
Arm 1 0.07507 0.22045 0.34 0.7359 4.49637
Thigh 1 0.07426 0.17281 0.43 0.6706 2.76289
Leg 1 0.36743 0.26899 1.37 0.1824 6.69429
We notice that vif on predictors Ht and HtShoes greater 300, which indicates these two terms are highly correlated. So, implementing ridge regression is suitable here.
Now let’s run our first ridge regression. The procedure for running ridge regression is proc reg. The command plot / ridegeplot to plot \(k\) versus \(\hat \beta\). We will choose \(k\) regarding the plots.
libname mylib 'F:/mylib';
proc reg data=mylib.zseatpos outvif
outest=mylib.zseatpos_vif ridge=0 to 30 by 1;
model hipcenter=Age Weight HtShoes Ht Seated Arm Thigh Leg;
/* plot / ridegeplot;*/
run;
The REG Procedure
Model: MODEL1
Dependent Variable: hipcenter
Number of Observations Read 38
Number of Observations Used 38
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 25.40248 3.17531 7.94 <.0001
Error 29 11.59752 0.39991
Corrected Total 37 37.00000
Root MSE 0.63239 R-Square 0.6866
Dependent Mean -3.988E-16 Adj R-Sq 0.6001
Coeff Var -1.58571E17
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 1.73091E-16 0.10259 0.00 1.0000
Age 1 -0.19987 0.14695 -1.36 0.1843
Weight 1 -0.01578 0.19854 -0.08 0.9372
HtShoes 1 0.50322 1.82287 0.28 0.7845
Ht 1 -0.11265 1.89756 -0.06 0.9531
Seated 1 -0.04413 0.31104 -0.14 0.8882
Arm 1 0.07507 0.22045 0.34 0.7359
Thigh 1 0.07426 0.17281 0.43 0.6706
Leg 1 0.36743 0.26899 1.37 0.1824
We get the parameters estiamted by ordinary least squares regression (OLS) here, the standard error of \(\hat \beta\) is large here, which is caused by collinearity between predictors. To get more precise estimate of \(\beta\), ridge regression is a necessary here.
SAS ridge trace plots have two panels. The top panel shows the VIF for each predictor with increasing values of the ridge parameter \(k\). Each VIF should decrease toward 1 with increasing values of k, as multicollinearity is resolved. We could see that VIF goes down greatly as the \(k\) increases to 1 here.
The bottom panel shows the actual values of the ridge coefficients with increasing values of \(k\).(SAS will automatically standardize these coefficients) \(\hat \beta\) will stablize as \(k\) increases. All the \(\hat \beta\) will shrink toward the null with increasing values of \(k\). Some even switch signs.
The last thing we have to solve is to pick up a proper \(k\). There are two popular ways to choose \(k\), the first method is invented by Hoerl and Kennard (1970). The formula to compute is: \(\hat k = \frac {\hat {\sigma^{2}}} {max \hat\alpha^{2}_{i}}\). They proved that there is always a value of lambda>0 such that \(MSE(\hat \beta(\hat k))<MSE(\hat \beta)\)
However, determining the ideal value of lambda is impossible, because it ultimately depends on the unknown parameters. Thus, we use a graphical means of selecting \(k\) here. Estimated coefficients and VIFs are plotted against a range of specified values of \(k\).
We choose \(k\) which stabilizes and leads to coefficients with reasonable values, and ensure that coefficients with improper signs at \(k=0\) have switched to the proper sign. According to the traces of \(k\) above, we could choose 20 as \(k\).