Ridge regression is an alternative to least squares regression when predictors is highly correlated and it can effectively eliminates collinearity, leading to more precise, and therefore more interpretable, parameter estimates.

Introduction

Ridge regression method is one of the so-called “shrinkage methods”, which is usually applied to a regression model when there is instablity resulting from collinearity of predictors.

In ordinary linear regression (OLS), the estimates of coefficient \(\beta\) are given by: \[\hat\beta=(X^TX)^{-1}X^Ty.\] When the predictors are collinear or almost collinear, the matrix \(X^TX\) here becomes singular (rarely the case) or almost singular, then the inverse would respond sensitively to errors, which results in instability of prediction. To solve this problem, we may consider midifying \(X^TX\) to make it determinate (then \((X^TX)^{-1}\) would be calculable), which effectively collinearity, leading to more precise, and therefore more interpretable, parameter estimates.

This is achieved by introducing a penalty term into the loss function: \[(y-X\beta)^T(y-X\beta)+\lambda\sum_j\beta_j^2.\] The \(\beta\) that minimizes the new loss function is the ridge regression estimate of \(\beta\): \[\hat\beta=(X^TX+\lambda I)^{-1}X^Ty.\] It is clearly biased as \(E(\hat\beta)=(X^TX+\lambda I)^{-1}(X^TX)\beta\) (shrinking the coefficients towards \(0\)), but note that \((X^TX)^{-1}\) is replaced by \((X^TX+\lambda I)^{-1}\) here, which should be calculable.

In fact, ridge regression makes a trade-off between bias and variance in prediction. With a sacrifice of introducing a relatively small bias, you may expect a large reduction in the variance, and thus in the mean-squared error: \[MSE=E(\hat\beta-\beta)^2=(E(\hat\beta-\beta))^2+E(\hat\beta-E\hat\beta)^2=\mathrm{bias^2+variance}.\]

Description of the example data

For our data analysis below, we will use the data set seatpos. This dataset appears is available in R paCkage:faraway. The variables are Age in year (Age), Weight in lbs (Weight), Height in shoes in cm (HtShoes), Height bare foot in cm (Ht), Seated height in cm (Seated), lower arm length in cm (Arm), Thigh length in cm (Thigh), Lower leg length in cm (Leg), and horizontal distance of the midpoint of the hips from a fixed location in the car in mm (hipcenter). We are going to use all variables except hipcenter to predict it.

We use ggplot2 pacakge to visualize the correlation between predictors.

According to the heatmap plot we get here, we can easily see that there is a collinearity problem between predictors. So we need to implement ridge regression here.