Ridge regression is an alternative to least squares regression when predictors is highly correlated and it can effectively eliminates collinearity, leading to more precise, and therefore more interpretable, parameter estimates.

Introduction|Home

The basic requirement to perform ordinary least squares regression (OLS) is that the inverse of the matrix \(X^TX\) exists. \(X^TX\) is typically scaled so that it represents a correlation matrix of all predictors. However, multicollinearity between predictors will cause \((X^TX)^{-1}\) to be indeterminate. For this case, we need some other regression method to make \((X^TX)^{-1}\) to be calculable. Ridge regression modifies \(X^TX\) such that its determinant does not equal 0; this ensures that \((X^TX)^{-1}\) is calculable. Modifying the matrix in this way effectively eliminates collinearity, leading to more precise, and therefore more interpretable, parameter estimates.

But, in statistics, there is always a trade-off between variance and bias. Therefore, there is a cost to this decrease in variance: an increase in bias. However, the bias introduced by ridge regression is almost always toward the null. Thus, ridge regression is considered a “shrinkage method”, since it typically shrinks the \(\hat \beta\) toward 0.

Description of the example data

For our data analysis below, we will use the data set seatpos. This dataset appears is available in R paCkage:faraway. The variables are Age in year (Age), Weight in lbs (Weight), Height in shoes in cm (HtShoes), Height bare foot in cm (Ht), Seated height in cm (Seated), lower arm length in cm (Arm), Thigh length in cm (Thigh), Lower leg length in cm (Leg), and horizontal distance of the midpoint of the hips from a fixed location in the car in mm (hipcenter). We are going to use all variables except hipcenter to predict it.

libname mylib 'F:/mylib';
proc import datafile="F:/grouppj/seatpos.csv" out=mylib.seatpos dbms=csv replace;
    getnames=yes;
run;

proc means data = mylib.seatpos;
    var hipcenter Age Weight HtShoes Ht Seated Arm Thigh Leg;
run;
                                                  The MEANS Procedure

                    Variable      N            Mean         Std Dev         Minimum         Maximum
                    -------------------------------------------------------------------------------
                    hipcenter    38     164.8848684      59.6473848      30.9500000     279.1500000
                    Age          38      35.2631579      15.3687718      19.0000000      72.0000000
                    Weight       38     155.6315789      35.7811826     100.0000000     293.0000000
                    HtShoes      38     171.3894737      11.1482586     152.8000000     201.2000000
                    Ht           38     169.0842105      11.1733159     150.2000000     198.4000000
                    Seated       38      88.9526316       4.9317908      79.4000000     101.6000000
                    Arm          38      32.2157895       3.3714642      26.0000000      39.6000000
                    Thigh        38      38.6552632       3.8749854      31.0000000      45.5000000
                    Leg          38      36.2631579       3.4036881      30.2000000      43.1000000
                    -------------------------------------------------------------------------------

Standardize variables

Before we implement a regression model, we typically standardize the variables first. It is necessary for ridge regression, since it would change the parameter we estimate. It won’t change multicollinearity between predictors, so standardizing the variables could be done before after checking for multicollinearity.

To standardize variables in SAS, you can use proc standard. The mean=0 and std=1 options are used to tell SAS what you want the mean and standard deviation to be for the variables named on the var statement. Of course, a mean of 0 and standard deviation of 1 indicate that you want to standardize the variables. The out=zseatpos option states that the output file with the standardized variables will be called zseatpos.

libname mylib 'F:/mylib';

proc standard data=mylib.seatpos mean=0 std=1 out=mylib.zseatpos;
  var hipcenter Age Weight HtShoes Ht Seated Arm Thigh Leg;
run;

Diagnosing multicollinearity

According what we mentioned above, ridge regression is to slove the problems in the regression caused by multicollinearity. We have to check for multicollinearity first. In this page, we check for high variance inflation factors (VIFs). The rule of thumb is that a \(VIF>10\) indicates multicollinearity. In SAS, VIFs can be obtained by using the option /vif.

libname mylib 'F:/mylib';

proc reg data=mylib.zseatpos;
    model hipcenter = Age Weight HtShoes Ht Seated Arm Thigh Leg/vif;
run;
                                                   The REG Procedure
                                                     Model: MODEL1
                                             Dependent Variable: hipcenter 

                                        Number of Observations Read          38
                                        Number of Observations Used          38

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     8       25.40248        3.17531       7.94    <.0001
                     Error                    29       11.59752        0.39991                     
                     Corrected Total          37       37.00000                                    

                                  Root MSE              0.63239    R-Square     0.6866
                                  Dependent Mean     -3.988E-16    Adj R-Sq     0.6001
                                  Coeff Var         -1.58571E17                       

                                                  Parameter Estimates
 
                                       Parameter       Standard                              Variance
                  Variable     DF       Estimate          Error    t Value    Pr > |t|      Inflation

                  Intercept     1    1.73091E-16        0.10259       0.00      1.0000              0
                  Age           1       -0.19987        0.14695      -1.36      0.1843        1.99793
                  Weight        1       -0.01578        0.19854      -0.08      0.9372        3.64703
                  HtShoes       1        0.50322        1.82287       0.28      0.7845      307.42938
                  Ht            1       -0.11265        1.89756      -0.06      0.9531      333.13783
                  Seated        1       -0.04413        0.31104      -0.14      0.8882        8.95105
                  Arm           1        0.07507        0.22045       0.34      0.7359        4.49637
                  Thigh         1        0.07426        0.17281       0.43      0.6706        2.76289
                  Leg           1        0.36743        0.26899       1.37      0.1824        6.69429

We notice that vif on predictors Ht and HtShoes greater 300, which indicates these two terms are highly correlated. So, implementing ridge regression is suitable here.

Ridge regression

Now let’s run our first ridge regression. The procedure for running ridge regression is proc reg. The command plot / ridegeplot to plot \(k\) versus \(\hat \beta\). We will choose \(k\) regarding the plots.

libname mylib 'F:/mylib';
proc reg data=mylib.zseatpos outvif
    outest=mylib.zseatpos_vif ridge=0 to 30 by 1;
    model hipcenter=Age Weight HtShoes Ht Seated Arm Thigh Leg;
/* plot / ridegeplot;*/
run;
                                                   The REG Procedure
                                                     Model: MODEL1
                                             Dependent Variable: hipcenter 

                                        Number of Observations Read          38
                                        Number of Observations Used          38

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     8       25.40248        3.17531       7.94    <.0001
                     Error                    29       11.59752        0.39991                     
                     Corrected Total          37       37.00000                                    

                                  Root MSE              0.63239    R-Square     0.6866
                                  Dependent Mean     -3.988E-16    Adj R-Sq     0.6001
                                  Coeff Var         -1.58571E17                       

                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1    1.73091E-16        0.10259       0.00      1.0000
                          Age           1       -0.19987        0.14695      -1.36      0.1843
                          Weight        1       -0.01578        0.19854      -0.08      0.9372
                          HtShoes       1        0.50322        1.82287       0.28      0.7845
                          Ht            1       -0.11265        1.89756      -0.06      0.9531
                          Seated        1       -0.04413        0.31104      -0.14      0.8882
                          Arm           1        0.07507        0.22045       0.34      0.7359
                          Thigh         1        0.07426        0.17281       0.43      0.6706
                          Leg           1        0.36743        0.26899       1.37      0.1824

We get the parameters estiamted by ordinary least squares regression (OLS) here, the standard error of \(\hat \beta\) is large here, which is caused by collinearity between predictors. To get more precise estimate of \(\beta\), ridge regression is a necessary here.

SAS ridge trace plots have two panels. The top panel shows the VIF for each predictor with increasing values of the ridge parameter \(k\). Each VIF should decrease toward 1 with increasing values of k, as multicollinearity is resolved. We could see that VIF goes down greatly as the \(k\) increases to 1 here.

The bottom panel shows the actual values of the ridge coefficients with increasing values of \(k\).(SAS will automatically standardize these coefficients) \(\hat \beta\) will stablize as \(k\) increases. All the \(\hat \beta\) will shrink toward the null with increasing values of \(k\). Some even switch signs.

Choosing k

The last thing we have to solve is to pick up a proper \(k\). There are two popular ways to choose \(k\), the first method is invented by Hoerl and Kennard (1970). The formula to compute is: \(\hat k = \frac {\hat {\sigma^{2}}} {max \hat\alpha^{2}_{i}}\). They proved that there is always a value of lambda>0 such that \(MSE(\hat \beta(\hat k))<MSE(\hat \beta)\)

However, determining the ideal value of lambda is impossible, because it ultimately depends on the unknown parameters. Thus, we use a graphical means of selecting \(k\) here. Estimated coefficients and VIFs are plotted against a range of specified values of \(k\).

We choose \(k\) which stabilizes and leads to coefficients with reasonable values, and ensure that coefficients with improper signs at \(k=0\) have switched to the proper sign. According to the traces of \(k\) above, we could choose 20 as \(k\).

Things to consider

  • Choosing k using ridge trace plots are straightford. However, these criteria are very subjective. Therefore, it is best to use another method in addition to the ridge trace plot. And in this case, the k we chooose here is too large. The MSE computed by ridge regression model in the sample is even larger than computed by OLS.

References

  • Faraway, Julian J. Linear models with R. CRC press, 2014.
  • “Ridge Regression.” Ridge Regression, www.mailman.columbia.edu/research/population-health-methods/ridge-regression.