Fixed Effects Models

Introduction

General Description

In statistics, a model that has fixed parameters or non-random quantities is called fixed effects model.

In general, based on some observed factors, data can be divided into groups. The group means could be assumed as constant or non-constant across groups. And in a fixed effects model, just as its name implies, each group mean is a specifically fixed quantity.

Furthermore, the assumption of fixed effect is that the group-specific effects are correlated with the independent variables.

Thus, in the fixed effect models, if the heterogeneity is fixed over time, this unobserved heterogeneity can be controlled. This heterogeneity is removable from the data by differencing, for instance, any time invariant components of the model can be taken away by taking a first difference.

Panel Data

In this tutorial, we will focus on fixed effects model with panel data.

Panel data (also known as longitudinal or cross-sectional time-series data) is a dataset, where the behavior of entities is observed across time. The possible entities could be states, companies, individuals, countries, etc.

In panel data, fixed effects stand for the subject-specific means. In panel data analysis, fixed effects estimator is referred to an estimator for the coefficients in the regression model including those fixed effects, in other words, one time-invariant intercept for each subject.

Classical Representation

The linear unobserved effects model for \(N\) observations and \(T\) time periods:

\[y_{it}=X_{it}\beta+\alpha_i+\mu_{it} ,\ for \ t=1,..,T \ and \ i=1,...,N\]

Where:

\(y_{it}\) is the dependent variable observed for individual i at time t.

\(X_{it}\) is the time-variant \(T\times k\) (the number of independent variables) regression matrix.

\(\beta\) is the \(k\times 1\) matrix of parameters.

\(\alpha _{i}\) is the unobserved time-invariant individual effect.

\(\mu_{it}\) is the error term.

Overview

In this tutorial, we will use R, SAS and STATA to fit fixed effect models and compared them with ordinary linear regression models.

The packages we use in R are basic R, lfe and plm. The package we use in STATA is glm, and two different commands class and absorb are both showed. In STATA, we use the packages areg, xtreg, and reghdfe to do the regression.

The dataset Cigar is a built-in dataset in the plm package in R. It is clean enough for us to do the data analysis directly.

Example Dataset: Cigar

The dataset Cigar is a panel of 46 observations from 1963 to 1992 of cigarette consuming.

The total number of observations is 1380.

The panel data Cigar looks like this （first 10 observations）:

state	year	price	pop	pop16	cpi	ndi	sales	pimin
1	63	28.6	3383	2236.5	30.6	1558.305	93.9	26.1
1	64	29.8	3431	2276.7	31.0	1684.073	95.4	27.5
1	65	29.8	3486	2327.5	31.5	1809.842	98.5	28.9
1	66	31.5	3524	2369.7	32.4	1915.160	96.4	29.5
1	67	31.6	3533	2393.7	33.4	2023.546	95.5	29.6
1	68	35.6	3522	2405.2	34.8	2202.486	88.4	32.0
1	69	36.6	3531	2411.9	36.7	2377.335	90.1	32.8
1	70	39.6	3444	2394.6	38.8	2591.039	89.8	34.3
1	71	42.7	3481	2443.5	40.5	2785.316	95.4	35.8
1	72	42.3	3511	2484.7	41.8	3034.808	101.1	37.4

Variables:

The varaibles used for regression and fixed effect model:

Dependent variable:

sales: cigarette sales in packs per capita.

Independent variables (may be transformed):

pop: population.
                    
pop16: population above the age of 16.

price: price per pack of cigarettes.

cpi: consumer price index (1983=100).

ndi: per capita disposable income.

Fixed effects variables:

state (46 levels): state abbreviation.

year (29 levels): the year.

Why Fixed Effects Models

Heterogeneity in fixed effects models means different means among categories such as states and years. When the data can be grouped by such categories, and there are also some evidences indicating heterogeneity, the OLS is not sufficient to control the effects of these unobservable factors. However, fixed effects models can control and estimate these effects. Moreover, if these unobservable factors are time-invariant, then omitted variable bias can be eliminated by fixed effects regression.

Heterogeneity across year

The above graph shows that the means of sales for distinct year are different.

Heterogeneity across state

We can also observe heterogeneity across state from the above graph. Therefore, fixed effects model is an ideal choice.

Tutorial in R

Data Manipulation

Import the data:

# data: the dataset 'Cigar' is available inside the 'plm' package
library(plm)
data(Cigar)

Transform the variables:

# Adjust the price, and disposable income with cpi to 
# get the dollar value in 1983
attach(Cigar)
Cigar$price_adj=(price/cpi)*100
Cigar$income_adj = (ndi/cpi)*100

OLS regression

Fit an OLS regression model with sale as the response and price_adj, pop, pop16 and income_adj as predictors:

# Run ordinary linear regression without fixed effect
ols = lm(sales ~ price_adj + pop + pop16 + income_adj, data = Cigar)
summary(ols)

## 
## Call:
## lm(formula = sales ~ price_adj + pop + pop16 + income_adj, data = Cigar)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -73.905 -12.834  -2.860   7.873 162.438 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.897e+02  5.254e+00  36.111  < 2e-16 ***
## price_adj   -1.247e+00  5.378e-02 -23.185  < 2e-16 ***
## pop          1.040e-02  2.447e-03   4.248 2.30e-05 ***
## pop16       -1.495e-02  3.276e-03  -4.564 5.46e-06 ***
## income_adj   5.278e-03  4.379e-04  12.054  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26 on 1375 degrees of freedom
## Multiple R-squared:  0.2981, Adjusted R-squared:  0.2961 
## F-statistic:   146 on 4 and 1375 DF,  p-value: < 2.2e-16

From the summary above, we see that the coefficient of price_adj, pop, pop16 and income_adj are -1.247e+00, 1.040e-02, -1.495e-02, 5.278e-03 respectively.

Fixed Effects Models

We fit a fixed effects model with sale as the response, price_adj, pop, pop16, income_adj as independent variables, and state and year as fixed effects variables.

There are three ways to do with R, using the regular funtions lm, the felm in the lfe package, or plm in the plm package. In fact, they produce the same results.

The lm generates dummies variables for state and year and then run linear regression. However, the felm and plm will absorb individual fixed effects estimates.

If we just want to control for fix effect and only care about coefficients of interests, either felm and plm is a good choice. But if we want to know the effect of some specific groups, lm is preferred.

Basic R:

In fact, the summary of lm will show individual fixed effects estimates for every year and every state. But for convenience, we only show the estimated coefficients of independent variables and first five estimated effects for years.

# Fixed effects using Least squares dummy variable model
ols_fixed = lm(sales ~ price_adj + pop + pop16 + income_adj +factor(year) + factor(state), data = Cigar)
summary(ols_fixed)$coefficients[1:10,]

##                     Estimate   Std. Error     t value      Pr(>|t|)
## (Intercept)    254.871577992 8.0047498533  31.8400428 5.576540e-165
## price_adj       -1.474957838 0.0712776661 -20.6931276  1.840630e-82
## pop              0.001908401 0.0018072543   1.0559672  2.911793e-01
## pop16           -0.002180001 0.0021201973  -1.0282068  3.040437e-01
## income_adj      -0.002320871 0.0007200866  -3.2230442  1.299815e-03
## factor(year)64  -0.675374024 2.5764491002  -0.2621337  7.932599e-01
## factor(year)65   1.071170096 2.6189757232   0.4090034  6.826045e-01
## factor(year)66   5.615730191 2.6730827193   2.1008441  3.584666e-02
## factor(year)67   5.353523777 2.7114760535   1.9743946  4.854810e-02
## factor(year)68   7.346161519 2.7803640858   2.6421581  8.336775e-03

Package lfe:

# Use lfe package, treat *state* and *year* as fixed effects variables, and fit a model 
library(lfe)
felm_fixed = felm(sales ~ price_adj + pop + pop16 + income_adj |factor(year) + factor(state), data = Cigar)
summary(felm_fixed)

## 
## Call:
##    felm(formula = sales ~ price_adj + pop + pop16 + income_adj |      factor(year) + factor(state), data = Cigar) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.049  -5.140  -0.117   5.525 108.515 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## price_adj  -1.4749578  0.0712777 -20.693   <2e-16 ***
## pop         0.0019084  0.0018073   1.056   0.2912    
## pop16      -0.0021800  0.0021202  -1.028   0.3040    
## income_adj -0.0023209  0.0007201  -3.223   0.0013 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.29 on 1301 degrees of freedom
## Multiple R-squared(full model): 0.8515   Adjusted R-squared: 0.8426 
## Multiple R-squared(proj model): 0.2545   Adjusted R-squared: 0.2098 
## F-statistic(full model):95.67 on 78 and 1301 DF, p-value: < 2.2e-16 
## F-statistic(proj model):   111 on 4 and 1301 DF, p-value: < 2.2e-16

Package plm:

# Use plm package, treat *state* and *year* as fixed effects variables, and fit a model
library(plm)
plm_md = plm(sales ~ price_adj + pop + pop16 + income_adj, data = Cigar,
          index = c("year", "state"), model = "within", effect = "twoways")
summary(plm_md)

## Twoways effects Within Model
## 
## Call:
## plm(formula = sales ~ price_adj + pop + pop16 + income_adj, data = Cigar, 
##     effect = "twoways", model = "within", index = c("year", "state"))
## 
## Balanced Panel: n = 30, T = 46, N = 1380
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -63.04920  -5.13997  -0.11695   5.52514 108.51496 
## 
## Coefficients:
##               Estimate  Std. Error  t-value Pr(>|t|)    
## price_adj  -1.47495784  0.07127767 -20.6931   <2e-16 ***
## pop         0.00190840  0.00180725   1.0560   0.2912    
## pop16      -0.00218000  0.00212020  -1.0282   0.3040    
## income_adj -0.00232087  0.00072009  -3.2230   0.0013 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    263760
## Residual Sum of Squares: 196630
## R-Squared:      0.25451
## Adj. R-Squared: 0.20982
## F-statistic: 111.043 on 4 and 1301 DF, p-value: < 2.22e-16

Summary

From the summaries, we see that the coefficients of price_adj, pop, pop16 and income_adj change after adding state and year as fixed effects. The most obvious change is that the coefficient of income_adj flips sign. It changes from 5.278e-03 to 2.321e-03. The coefficents of price_adj, pop, pop16 are -1.475e+00, 1.908e-03, -2.180e-03 respectively. In addition, the variables pop and pop16 are changed to be insignificant in the fixed effects models.

Tutorial in SAS

Data Manipulation

Import the data:

/* read the data file  */
proc import datafile=".\Cigar.csv" 
out=mydata dbms=csv replace; 
getnames=yes; 
run;

Transform the variables:

/*change the price, and income with cpi to get the dollar value in 1983 */
data Cigar; set mydata;
price_adj = (price/cpi)*100;
income_adj = (ndi/cpi)*100;
run;

OLS regression

proc reg data=Cigar; 
 model sales = price_adj pop pop16 income_adj;
 run;
quit;

Fixed Effects Models

In SAS, the glm is to fit with fixed effects models. In glm, we can either use class or absorb to determine the fixed effects variables.

If we want to see the fixed effects estimates for every state and every year, class will be the first choice. The class will automatically generate a set of dummy variables for each level of the variable state and year.

It we only want to know the estimates of our interested independent variables, we can use absorb instead of class. But it can only absorb one variable at a time. And to use the absorb, we need to suppress the intercept to avoid a dummy variable trap.

We only show the estimated coefficients of independent variables and first five estimated effects for years.

Use class:

For convenience, we only show the estimated coefficients of independent variables and first five estimated effects for years. But in fact, in SAS, individual fixed effects estimate for every state and every year will be displayed.

/* Fixed effects by class, generating a set of dummy variables */
/* for each level of the variable state and year               */
proc glm data=Cigar;
 class year state; 
 model sales = price_adj pop pop16 income_adj year state/ solution; run;
quit;

Use absorb:

In SAS, as we absorb the variable state, only individual fixed effects estimate for every year will be displayed. And we only show the first five estimated effects for years.

/* Absorbing the variable *state* and generating dummies of years */
proc glm data=Cigar;
 absorb state; 
 class year;
 model sales = price_adj pop pop16 income_adj year/ solution noint; run;
quit;

Summary

The estimates of the independent variables price_adj, pop, pop16, income_adj are the same as the R.

We will find the estimates for years are different from those in R. This is because that R will automatically treat one level of the factors as the reference levels, in this case, the reference levels are year 63 or state 1, and incorporating them into the estimated intercept. But SAS has no such process.

Though some differences are watched, by simple calculations the estimates are the same.

Tutorial in STATA

Data Manipulation

Import the data:

import delimited Cigar.csv, clear

Transform the variables:

## change the price, and income with cpi to get the dollar value in 1983
g price_adj = (price/cpi)*100
g income_adj = (ndi/cpi)*100

OLS regression

reg sales price_adj pop pop16 income_adj

Fixed Effects Models

There are three ways to do with STATA, using the commands areg, xtreg, or reghdfe. In fact, they produce the same results.

The areg and xtreg cannot absorb more than one fix effect, but we can still put factor variable i.var in. Sometimes, they are computationally inefficient since they actually calculate and report coefficients for those dummy variables. However, in some cases they could be helpful if we want to see the effect of one specific group.

If we just want to control for fix effect and only care about other coefficients of interests, reghdfe is the best option.

Command areg:

## Absorbing the variable state and generating dummies of years
areg sales price_adj pop pop16 income_adj i.year, absorb(state)

Command xtreg:

## Absorbing the variable state and generating dummies of years
xtset state year
xtreg sales price_adj pop pop16 income_adj i.year, fe

Command reghdfe:

install packages:

## install reghdfe packages, and also ftools
ssc install reghdfe
ssc install ftools

regression:

## Absorbing the variables state and year using reghdfe
reghdfe sales price_adj pop pop16 income_adj, absorb(state year)

Summary

The estimates of the independent variables price_adj, pop, pop16, income_adj are the same as the R and the SAS.

Same as R, STATA will also take year 63 and state 1 by default as reference levels. So, the estimated effects for years equal to those of R. As STATA absorbs variables, the estimated intercepts are different. But in fact, the models are the same.

Discussion and Summary

Compare Fixed Effects Model to OLS

The results of the OLS and the fixed effects model are extremely different. To be more specific, with fixed effect the negative effect of price on sales is stronger in magnitude than the OLS, and the coefficient on income flips sign.

Importance of Fixed Effects Model

If we fit OLS instead of fixed effects, we will underestimate the effects of price on sales of cigarette, and even have wrong conclusion for the influence of income on sales. So, it highlights the importance of controlling for fix effect.

Absorption or Not

When computing fixed effects models estimates, we should choose to absorb them or not. It depends on what our aim is. Absorption is computationally fast, and looks concisely, however, individual fixed effects estimates will not be showed. In order to get every individual fixed effects estimates, the preferred method is “no absorption”, which will automatically generate a set of dummy variables for each level of the fixed effects variable.

References

Wikipedia: Fixed effects model

Dataset: Cigar

R Package: plm

R Package: lfe

STATA Package: reghdfe

Notes: Panel Data using R

Notes: Fixed Effects in SAS

Fixed Effects Models

Group 8: Chen Xie, Yanlin Yang, Nam H Le

December 07, 2018

Introduction

General Description

Panel Data

Classical Representation

Overview

Example Dataset: Cigar

Variables:

Why Fixed Effects Models

Heterogeneity across year

Heterogeneity across state

Tutorial in R

Data Manipulation

OLS regression

Fixed Effects Models

Basic R:

Package lfe:

Package plm:

Summary

Tutorial in SAS

Data Manipulation

OLS regression

Fixed Effects Models

Use class:

Use absorb:

Summary

Tutorial in STATA

Data Manipulation

OLS regression

Fixed Effects Models

Command areg:

Command xtreg:

Command reghdfe:

Summary

Discussion and Summary

Compare Fixed Effects Model to OLS

Importance of Fixed Effects Model

Absorption or Not

References