Generalized linear models

In this example we will look at methods associated with the glm class of objects representing models fits from generalized linear models.

Just as the name implies, GLMs generalize the linear model through the use of a link function relating the expected or mean outcome to a linear predictor.

Generalized linear models have the following form relating a dependent variable \(Y\) to independent variables \(X\) and coefficients \(\beta\):

\[ h(E[Y| X]) = X\beta. \]

Here \(h(\cdot)\) is called the link function and we will often be interested in its inverse \(g(\cdot)\) so that an equivalent specification is:

\[ E[Y | X] = g(X\beta). \] The matrix product \(X\beta\) in a generalized linear model is known as the linear predictor.

Classical linear regression has this form with the identity link \(h(x) = g(x) = x\),

\[ E[Y | X] = X\beta. \] Two popular GLMs are:

\[ \log \frac{ P(Y = 1 | X) }{P(Y = 0|X)} = X\beta, \qquad \textrm{or} \qquad E[Y | X] = \frac{e^{X\beta}}{1 + e^{X\beta}}, \]

\[ \log E[Y | X] = X\beta, \qquad \textrm{or} \qquad E[Y | X] = e^{X\beta}. \]

GLMs in R

Generalized linear models can be fit in \(R\) using the glm() function. The first argument is a formula used for specifying the linear predictor in terms of one or more columns from a data frame. The other key argument is family which determines the class of models to be fit by specifying a probability distribution for the response and a link function.

We will review the R help for glm, formula, and family together in class.

Case Study: Energy Star Compliance

In this first case study we will use logistic regression to explore predictors of whether a home’s primary fridge is Energy Star compliant. For simplicity, we will assume this is a simple random sample and ignore the survey weights. You can find a version of this example in Stata here.

First, we will load the RECS data and clean up some variables of interest.

# 80: --------------------------------------------------------------------------
# load packages: ---------------------------------------------------------------
library(tidyverse) 

# read or load data: -----------------------------------------------------------
data_file = '../data/recs2009_public.RData'
if(!file.exists(data_file)){
  recs = read_delim('./recs2009_public.csv', col_names=TRUE, delim=',')
  save(recs,file=data_file)
} else{
  foo = load(data_file)
  cat(sprintf('Loaded object(s) %s from %s.\n', foo, data_file))
}
## Loaded object(s) recs from ../data/recs2009_public.RData.
# tidy data: -------------------------------------------------------------------
# function to decode region
decode_region = function(rvec){
  sapply(rvec, function(r) switch(r, "NE", "MW", "S", "W"))
}

recs = recs %>% 
  mutate(es_fridge = ifelse(ESFRIG < 0, NA, ESFRIG),
                       totsqft = TOTSQFT / 100,
                       region = decode_region(REGIONC)
        ) %>% 
  filter(!is.na(es_fridge))

Now we are ready to fit a logistic regression. Note that R will treat “region” as a factor since it is of class character.

fit0 = glm(es_fridge ~ 0 + region + totsqft, data=recs,
           family=binomial(link='logit'))
summary(fit0)
## 
## Call:
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"), 
##     data = recs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -3.094  -1.182   0.730   1.020   1.496  
## 
## Coefficients:
##           Estimate Std. Error z value Pr(>|z|)    
## regionMW -0.791512   0.073039 -10.837  < 2e-16 ***
## regionNE -0.367748   0.071932  -5.112 3.18e-07 ***
## regionS  -0.576576   0.057470 -10.033  < 2e-16 ***
## regionW  -0.315000   0.061458  -5.125 2.97e-07 ***
## totsqft   0.046881   0.002195  21.355  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10571.9  on 7626  degrees of freedom
## Residual deviance:  9612.5  on 7621  degrees of freedom
## AIC: 9622.5
## 
## Number of Fisher Scoring iterations: 4
class(fit0)
## [1] "glm" "lm"
typeof(fit0)
## [1] "list"

Notice that the glm class inherits from the lm class for linear models. Above we used the summary method to review the model fit fit0. Here are some other S3 methods we can make use of:

print(fit0)
## 
## Call:  glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"), 
##     data = recs)
## 
## Coefficients:
## regionMW  regionNE   regionS   regionW   totsqft  
## -0.79151  -0.36775  -0.57658  -0.31500   0.04688  
## 
## Degrees of Freedom: 7626 Total (i.e. Null);  7621 Residual
## Null Deviance:       10570 
## Residual Deviance: 9613  AIC: 9623
coef(fit0)
##    regionMW    regionNE     regionS     regionW     totsqft 
## -0.79151228 -0.36774783 -0.57657557 -0.31500023  0.04688135
round(vcov(fit0),4)
##          regionMW regionNE regionS regionW totsqft
## regionMW   0.0053   0.0021  0.0021  0.0019  -1e-04
## regionNE   0.0021   0.0052  0.0017  0.0016  -1e-04
## regionS    0.0021   0.0017  0.0033  0.0015  -1e-04
## regionW    0.0019   0.0016  0.0015  0.0038  -1e-04
## totsqft   -0.0001  -0.0001 -0.0001 -0.0001   0e+00
broom::tidy(fit0)
##       term    estimate   std.error  statistic       p.value
## 1 regionMW -0.79151228 0.073039161 -10.836821  2.303375e-27
## 2 regionNE -0.36774783 0.071932358  -5.112412  3.180716e-07
## 3  regionS -0.57657557 0.057470069 -10.032624  1.095660e-23
## 4  regionW -0.31500023 0.061457722  -5.125479  2.967830e-07
## 5  totsqft  0.04688135 0.002195341  21.354929 3.509039e-101
AIC(fit0)
## [1] 9622.523
-2*logLik(fit0) + 2*length(coef(fit0))
## 'log Lik.' 9622.523 (df=5)
BIC(fit0)
## [1] 9657.219
-2*logLik(fit0) + log(nrow(recs))*length(coef(fit0))
## 'log Lik.' 9657.219 (df=5)
head(fitted(fit0))
##         1         2         3         4         5         6 
## 0.8302945 0.4699826 0.5391441 0.6291620 0.6989500 0.7060929
head(resid(fit0))
##          1          2          3          4          5          6 
##  0.6098767 -1.1268056  1.1115506  0.9626697 -1.5495024  0.8342763

Be careful about inheritance if you try to use the plot method as plots for linear models may not be the most instructive for GLMs.

methods(plot)[grep('lm', methods(plot))]
## [1] "plot.intervals.lmList" "plot.lm"               "plot.lme"             
## [4] "plot.lmList"           "plot.mlm"              "plot.ranef.lme"       
## [7] "plot.ranef.lmList"     "plot.simulate.lme"

Another useful generic is predict for using the fitted model to predict expected values for new data. Below, we use predict to compute “adjusted predictions at the mean” for the region and square footage variables.

Adjusted Predictions

Adjusted predictions are simply predictions from the model at specially chosen values of the independent variables in a model.

Adjusted Predictions at the Mean(s)

Adjusted predictions for regions

Here we compute adjusted predictions (aka “predictive margins”) for regions with total square feet held at its mean.

# Set up a new data frame for predictions: ------------------------------------
recs_regions_atmean = 
  tibble(region=unique(recs$region)) %>%
  mutate(totsqft=mean(recs$totsqft))

## Use predict to estimate probabilities for the new data
m0 = predict(fit0, recs_regions_atmean, type='response', se=TRUE)

## We can include these in the data frame and plot to compare
recs_regions_atmean %>%
  mutate(fit=m0$fit, se=m0$se.fit, lwr=fit - 2*se, upr=fit + 2*se) %>%
  mutate(region=factor(region,region[order(fit)])) %>% # order
  ggplot(aes(y=region, x=fit)) +
  geom_point() + 
  geom_errorbarh(aes(xmin=lwr, xmax=upr), height=.1) + 
  geom_vline( aes( xintercept = fit), lty = 'dashed', color='grey', alpha =.5 ) + 
  xlab('Predicted probability for an average-sized house') +
  ggtitle('Estimated usage of Energy Star compliant refrigerators by region.') +
  theme_bw() +
  xlim(c(0,1))

Adjusted predictions for home size

What if we wanted to examine the effect of totsqft averaged over regions?

# Compute proportions of homes in each region
df_regions =
  recs %>% 
  group_by(region) %>%
  summarize(Count=n()) %>%
  spread(region,Count) %>%
  mutate(Total=MW+NE+S+W,
         MW=MW/Total, 
         NE=NE/Total,
         S =S/Total,
         W =W/Total
  ) %>%
  gather(region, prop, MW:W)

p_regions = df_regions$prop
names(p_regions) = df_regions$region
p_regions
##        MW        NE         S         W 
## 0.2128245 0.1835825 0.3554944 0.2480986
# Pick representative values: -------------------------------------------------
with(recs, quantile(totsqft, c(.25, .5, .75) ))
##    25%    50%    75% 
## 11.160 18.735 28.320
at_totsqft = c(10, 20, 30)

# Compute linear predictors
lp_avg_region = sum(coef(fit0)[-5] * p_regions)
lp = lp_avg_region + coef(fit0)[5]*at_totsqft

# Use inverse link to compute response
pred = exp(lp)/{1 + exp(lp)}
ap_totsqft = tibble(totsqft = at_totsqft, adjusted_prediction = pred)
ap_totsqft
## # A tibble: 3 x 2
##   totsqft adjusted_prediction
##     <dbl>               <dbl>
## 1      10           0.4874346
## 2      20           0.6031341
## 3      30           0.7083440

We could also do this using predict.

## Using predict
new_data = recs %>% select(region, totsqft)

p10 = predict(fit0, new_data %>% mutate(totsqft=10), type='link')
p20 = predict(fit0, new_data %>% mutate(totsqft=20), type='link')
p30 = predict(fit0, new_data %>% mutate(totsqft=30), type='link')

## For explaining what is happening here ... 
new_data %>% mutate(totsqft=10, p10 = p10) %>% head()
## # A tibble: 6 x 3
##   region totsqft        p10
##    <chr>   <dbl>      <dbl>
## 1     MW      10 -0.3226988
## 2     NE      10  0.1010657
## 3     MW      10 -0.3226988
## 4     NE      10  0.1010657
## 5     MW      10 -0.3226988
## 6     NE      10  0.1010657
# Notice the precedence of the mean
pred = sapply(list(p10, p20, p30), function(x) exp(mean(x)) / {1 + exp(mean(x))})

# Built in inverse logit is CDF of logistic distribution
pred_alt = sapply(list(p10, p20, p30), function(x) plogis(mean(x)))

ap_totsqft %>% mutate(adj_pred = pred, adj_pred_alt = pred_alt)
## # A tibble: 3 x 4
##   totsqft adjusted_prediction  adj_pred adj_pred_alt
##     <dbl>               <dbl>     <dbl>        <dbl>
## 1      10           0.4874346 0.4874346    0.4874346
## 2      20           0.6031341 0.6031341    0.6031341
## 3      30           0.7083440 0.7083440    0.7083440

In the above example, we computed predictions for each observation in the data set adjusted for fixed values of totsqft and averaged the linear predictors before transforming to the probability scale using the link function. These are examples of adjusted predictions at the means.

Average Adjusted Predictions

An alternative to adjusted predictions at the means are average adjusted predictions computed by averaging adjusted predictions for each observation in the data set. In the example below, we average after forming adjusted predictions on the probability scale, i.e the scale of the response. In this case the results are rather similar because the logistic function is nearly linear in the center.

# Average adjusted predictions: -----------------------------------------------
new_data = recs %>% select(region, totsqft)

p10 = predict(fit0, new_data %>% mutate(totsqft=10), type='response')
p20 = predict(fit0, new_data %>% mutate(totsqft=20), type='response')
p30 = predict(fit0, new_data %>% mutate(totsqft=30), type='response')


ap_totsqft = ap_totsqft %>%
  mutate(
    avg_response = sapply(list(p10, p20, p30), mean),
    diff = avg_response - adjusted_prediction    
  )

ap_totsqft
## # A tibble: 3 x 4
##   totsqft adjusted_prediction avg_response          diff
##     <dbl>               <dbl>        <dbl>         <dbl>
## 1      10           0.4874346    0.4875659  0.0001313622
## 2      20           0.6031341    0.6023908 -0.0007433375
## 3      30           0.7083440    0.7070102 -0.0013337676

We can visualize the reasons for this by plotting the observed and fitted values of the response against the linear predictor.

# S3 method to extract linear predictors: -------------------------------------
lin_pred = function(obj){
  UseMethod("lin_pred")
}
lin_pred.glm = function(obj){
  obj$linear.predictors
}

## See also
#broom::augment_columns()
recs %>%
  select(es_fridge, region, totsqft) %>%
  mutate( lin_pred = lin_pred(fit0),
          fitted = fitted(fit0)
  ) %>%
  ggplot(aes(y=es_fridge, x = lin_pred, col = es_fridge)) +
  geom_point() +
  geom_line( aes(y = fitted), col = 'red', lwd=2 ) +
  geom_abline( aes(intercept = .5, slope = .25), col='lightgrey') + 
  theme_bw() 

Factors, Interactions, and linked Covariates

Suppose we fit a model with an interaction between region and home size.

fit1 = glm(es_fridge ~ 0 + region*totsqft, 
           data=recs, family=binomial(link='logit')
           )
summary(fit1)
## 
## Call:
## glm(formula = es_fridge ~ 0 + region * totsqft, family = binomial(link = "logit"), 
##     data = recs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -3.383  -1.177   0.729   1.022   1.460  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## regionMW         -0.707612   0.108817  -6.503 7.89e-11 ***
## regionNE         -0.263363   0.109073  -2.415   0.0158 *  
## regionS          -0.586226   0.081669  -7.178 7.07e-13 ***
## regionW          -0.494804   0.100740  -4.912 9.03e-07 ***
## totsqft           0.043247   0.004113  10.515  < 2e-16 ***
## regionNE:totsqft -0.001749   0.006265  -0.279   0.7801    
## regionS:totsqft   0.004152   0.005608   0.740   0.4591    
## regionW:totsqft   0.014390   0.006722   2.141   0.0323 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10571.9  on 7626  degrees of freedom
## Residual deviance:  9606.2  on 7618  degrees of freedom
## AIC: 9622.2
## 
## Number of Fisher Scoring iterations: 4

When computing adjusted predictions at specific values of totsqft it would not make sense to average over the interaction terms which also involve totsqft. Instead, we should ensure that totsqft enters the model as expected for all terms involving it. Using predict will take care of this for us.

p10 = predict(fit1, new_data %>% mutate(totsqft=10), type='response')
p20 = predict(fit1, new_data %>% mutate(totsqft=20), type='response')
p30 = predict(fit1, new_data %>% mutate(totsqft=30), type='response')

ap_totsqft = ap_totsqft %>%
  mutate(avg_response = sapply( list(p10, p20, p30), mean) )

ap_totsqft
## # A tibble: 3 x 4
##   totsqft adjusted_prediction avg_response          diff
##     <dbl>               <dbl>        <dbl>         <dbl>
## 1      10           0.4874346    0.4874897  0.0001313622
## 2      20           0.6031341    0.6048856 -0.0007433375
## 3      30           0.7083440    0.7106078 -0.0013337676

We could also do this directly.

# model coefficients
beta_hat = matrix(coef(fit1), ncol=1)

# Data matrices for each level of totsqft
X10 = recs %>%
  transmute(
    MW = 1*{region=='MW'},
    NE = 1*{region=='NE'},
    S  = 1*{region=='S'},
    W  = 1*{region=='W'},
    tot = 10,
    tot_NE = tot*NE,
    tot_S  = tot*S,
    tot_W  = tot*W
  )
X10 = as.matrix(X10)
X20 = cbind(X10[,1:4], 2*X10[,5:8])
X30 = cbind(X10[,1:4], 3*X10[,5:8])

# inverse link
inv_logit = function(x) exp(x) / {1+exp(x)}

# average adjusted predictions
ap_totsqft %>% 
  mutate(
   by_hand = c( mean(inv_logit(X10 %*% beta_hat)), 
                mean(inv_logit(X20 %*% beta_hat)),
                mean(inv_logit(X30 %*% beta_hat))
              )
  )
## # A tibble: 3 x 5
##   totsqft adjusted_prediction avg_response          diff   by_hand
##     <dbl>               <dbl>        <dbl>         <dbl>     <dbl>
## 1      10           0.4874346    0.4874897  0.0001313622 0.4874897
## 2      20           0.6031341    0.6048856 -0.0007433375 0.6048856
## 3      30           0.7083440    0.7106078 -0.0013337676 0.7106078

The prediction package

The prediction package provides an S3 generic prediction and methods for computing predictive margins from various classes representing fitted models. It is “type safe” in that it always returns an object of class prediction which is essentially a copy of the original data frame with the requested fitted values plus a few additional attributes.

Below we use prediction::prediction to compute adjusted predictions at the mean for our fit0 object.

# Prediction package: ----------------------------------------------------------
library(prediction)
lp_fit0 = prediction(fit0, at=list(totsqft=c(10, 20, 30)), type='link')
class(lp_fit0)
## [1] "prediction" "data.frame"
names(attributes(lp_fit0))
## [1] "names"       "row.names"   "class"       "at"          "model.class"
## [6] "type"
An aside on memory management

Because we requested predictions at three values of totsqft our data frame is three time as large.

cat(' dim(lp_fit0): ', dim(lp_fit0), '\n', 'dim(recs): ', dim(recs),'\n',
    'nrow(lp_fit0)/nrow(recs): ', nrow(lp_fit0)/nrow(recs), '\n')
##  dim(lp_fit0):  22878 945 
##  dim(recs):  7626 943 
##  nrow(lp_fit0)/nrow(recs):  3

You should keep this in mind if working with large data sets as this could quickly eat up memory unnecessarily. This is also true of model objects returned by glm itself.

print(object.size(recs), units='Mb')
## 29.5 Mb
print(object.size(lp_fit0), units='Mb')
## 88 Mb
object.size(lp_fit0) / object.size(recs)
## 3 bytes
print(object.size(fit0), units='Mb')
## 33.4 Mb

One way to minimize the impact of this is to fit models using data frames limited to necessary variables.

recs_glm = recs %>% select(es_fridge, region, totsqft)
fit0 = glm(es_fridge~0+region+totsqft, family=binomial(link='logit'), data=recs_glm)
fit1 = glm(es_fridge~0+region*totsqft, family=binomial(link='logit'), data=recs_glm)

print(object.size(recs_glm), units='Mb')
## 0.1 Mb
print(object.size(recs_glm), units='Mb')
## 0.1 Mb
print(object.size(fit0), units='Mb')
## 4.1 Mb
Average adjusted predictions

Returning to the theme of computing adjusted predictions, here are average adjusted predictions using type = 'response' (the default).

m_totsqft = prediction(fit0, at=list(totsqft=c(10, 20, 30)), type='response')
m_totsqft
## Average predictions for 7626 observations:
##  at(totsqft)  value
##           10 0.4876
##           20 0.6024
##           30 0.7070

Question Why doesn’t m_totsqft print the entire data frame in the last line above?

Summary

  • Adjusted predictions at the mean are computed by fixing the values of some variables at representative values and setting others to the mean value.

  • Average adjusted predictions are computed adjusted predictions for each observation in the data leaving other values unchanged and then averaging the resulting predictions.

  • In a linear model these are the same.

  • In GLMs:
    • adjusted predictions at the mean can be computed by averaging adjusted linear predictors from each observation in the data,
    • average adjusted predictions instead average over the response.

Marginal or Partial Effects

Marginal effects (aka partial effects) summarize the impact of a model coefficient using differences between adjusted predictions. For factor variables this means comparing each factor level to a reference level. For continuous variables we may define marginal effects between reference levels of interest, often based on quantiles, or using a “derivative”.

Marginal Effects at the Mean

As you might expect, marginal effects at the mean (MEM) are differences between adjusted predctions at the mean. Below are some examples.

First, we examine simple differences of adjusted predictions at the mean for different regions.

# New data for prediction
new_data = 
  tibble(region  = unique(recs_glm$region),
         totsqft = mean(recs_glm$totsqft)
  )

# Predictions
new_data = 
  new_data %>%
  mutate(fitted = predict(fit0, new_data, type='response'))

# Compute differences
new_data %>% spread(region, fitted) %>%
  mutate(mem_NE = NE - MW,
         mem_S  = S  - MW,
         mem_W  = W  - MW
         ) %>% 
  knitr::kable(digits=2)
totsqft MW NE S W mem_NE mem_S mem_W
21.91 0.56 0.66 0.61 0.67 0.1 0.05 0.11

Next, we compute marginal effects at the mean for totsqft.

# MEM for totsqft quantiles
q_tot = quantile(recs_glm$totsqft, c(.25, .75))

# Adjusted Predictions at "mean" region
lp_fit0 = c(sum(coef(fit0)*c(p_regions, q_tot[1])),
            sum(coef(fit0)*c(p_regions, q_tot[2]))
            )
apm = exp(lp_fit0) / {1 + exp(lp_fit0)}    
apm
## [1] 0.5010275 0.6918098
# Marginal Effect
diff(apm)
## [1] 0.1907823

From the above, for an “average” region a house at the upper quartile (~2800 sq ft) is 19% more likely to have an Energy Star compliant refrigerator than a house at the lower quartile (~1100 sq ft).

Average Marginal Effects

Similarly, average marignal effects (AME) are differences between average adjusted predictions.

Below we first compute predictions for each region at the chosen levels of totsqft

# AME for totsqft qu0antiles
q_tot = quantile(recs_glm$totsqft, c(.25, .75))

# New data for prediction
new_data = 
  tibble(totsqft = rep(q_tot, each=4),
         region = rep(names(p_regions), 2),
         w_regions = rep(p_regions, 2)
  )

            

new_data = new_data %>% mutate(fitted = predict(fit0, new_data, type='response'))

new_data
## # A tibble: 8 x 4
##   totsqft region w_regions    fitted
##     <dbl>  <chr>     <dbl>     <dbl>
## 1   11.16     MW 0.2128245 0.4333205
## 2   11.16     NE 0.1835825 0.5387839
## 3   11.16      S 0.3554944 0.4866582
## 4   11.16      W 0.2480986 0.5518617
## 5   28.32     MW 0.2128245 0.6309204
## 6   28.32     NE 0.1835825 0.7231082
## 7   28.32      S 0.3554944 0.6794193
## 8   28.32      W 0.2480986 0.7335442

Now, we can compute a weighted average of the marginal effects.

aap = new_data %>% group_by(totsqft) %>% summarize(ame = sum(w_regions*fitted))
with(aap, ame[2] - ame[1])
## [1] 0.1894935

This could also be done using predict. Note that here we make predictions for each row of the data with “totsqft” replaced by our representative values.

aap_upr = 
  mean(predict(fit0, recs_glm %>% mutate(totsqft = q_tot[2]), type='response'))
aap_lwr =
  mean(predict(fit0, recs_glm %>% mutate(totsqft = q_tot[1]), type='response'))
c('Lower Q'=aap_lwr, 'Upper Q'=aap_upr, 'AME' = aap_upr - aap_lwr)
##   Lower Q   Upper Q       AME 
## 0.5010529 0.6905464 0.1894935

The margins package

The author of the prediction package also has a margins package meant to mimic the behavior of Stata’s margins command. As you might expect, it is built on the prediction package meaning that functions in margins call functions in prediction for computing adjusted predictions.

# Undertanding margins: -------------------------------------------------------
library(margins)
mem0 = margins(fit0)
summary(mem0)
##    factor    AME     SE       z      p  lower  upper
##  regionNE 0.0944 0.0174  5.4193 0.0000 0.0603 0.1286
##   regionS 0.0486 0.0152  3.2057 0.0013 0.0189 0.0784
##   regionW 0.1057 0.0161  6.5725 0.0000 0.0742 0.1372
##   totsqft 0.0104 0.0004 23.9824 0.0000 0.0095 0.0112
# Plotting margins

summary(mem0) %>%
  ggplot( aes( x = factor, y = AME)) +
  geom_point() +
  geom_errorbar( aes(ymax=upper, ymin=lower), width = 0) +
  theme_bw() +
  ylab("Average Marginal Effect")

# Somethings wrong with the method: -------------------------------------------
plot(mem0)

## If time permits, we'll debug this together: --------------------------------
#getS3method('plot', 'margins')
#getS3method('summary', 'margins')

What happens when we call margins? Below is a call graph for functions in the margins package.

mvbutils::foodweb(where=asNamespace("margins"))

Computing marginal effects

We can call margins directly using the “change” parameter to specify we want continuous variables summarized using differences between adjusted predictions at the upper and lower quartiles.

margins(fit0, change='iqr')
## Warning in warn_for_weights(model): 'weights' used in model estimation are
## currently ignored!
## Average marginal effects
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"),     data = recs_glm)
##  totsqft regionNE regionS regionW
##   0.1895  0.09445 0.04864  0.1057

For factor variables, “dydx” is the first difference between the predicted response at a particular level of the factor and the predicted response at the baseline level. For numeric variables, the default behavior is to use a numeric derivative at the observed value for each observation. Below we investigate this for our earlier call creating the margins data frame mem0.

dydx_W = 
  predict(fit0, recs_glm %>% mutate(region='W'), type='response') - 
  predict(fit0, recs_glm %>% mutate(region='MW'), type='response')
head(cbind(dydx_W, dydx_regionW = mem0$dydx_regionW))
##       dydx_W dydx_regionW
## 1 0.05708288   0.05708288
## 2 0.11587721   0.11587721
## 3 0.11411806   0.11411806
## 4 0.11519206   0.11519206
## 5 0.09003447   0.09003447
## 6 0.10563376   0.10563376

The overall marginal effect is the average adjusted difference.

mean(dydx_W)
## [1] 0.1057027
mem0
## Average marginal effects
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"),     data = recs_glm)
##  totsqft regionNE regionS regionW
##  0.01036  0.09445 0.04864  0.1057

In the next code block, we look at the computation of the ‘numeric’ derivatives for continuous variables like totsqft.

# Step size
setstep = function(x){
  max(abs(x), 1, na.rm=TRUE)*sqrt(1e-7)
}
recs_glm = recs_glm %>% mutate(h = sapply(totsqft, setstep))

# Numeric differentiation
dydx_totsqft = 
  {
    predict(fit0, 
            recs_glm %>% mutate(totsqft = totsqft + .5*h),
            type='response'
    ) - 
    predict(fit0,
          recs_glm %>% mutate(totsqft = totsqft - .5*h),
          type='response'
    )
  } / {recs_glm$h}

dydx_compare = 
  cbind("mem0"=mem0$dydx_totsqft, "dydx"=dydx(recs_glm, fit0, "totsqft"),
           "hand"=dydx_totsqft)
head(dydx_compare)
##          mem0 dydx_totsqft        hand
## 1 0.006605842  0.006605842 0.006605841
## 2 0.011678090  0.011678090 0.011678095
## 3 0.011648498  0.011648498 0.011648503
## 4 0.010938220  0.010938220 0.010938224
## 5 0.009864720  0.009864720 0.009864722
## 6 0.009729083  0.009729083 0.009729085

The average marginal effect of totsqft is the average of these values.

apply(dydx_compare, 2, mean)
##         mem0 dydx_totsqft         hand 
##   0.01036292   0.01036292   0.01036293
mem0
## Average marginal effects
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"),     data = recs_glm)
##  totsqft regionNE regionS regionW
##  0.01036  0.09445 0.04864  0.1057

We can also use “at” to select representative values at which we wish to see marginal effects. Using our interaction model, we can summarize the marginal effect of region at different levels of totsqft.

mem1 = margins(fit1, at=list(totsqft=q_tot))
## Warning in warn_for_weights(model): 'weights' used in model estimation are
## currently ignored!
mem1
## Average marginal effects at specified values
## glm(formula = es_fridge ~ 0 + region * totsqft, family = binomial(link = "logit"),     data = recs_glm)
##  at(totsqft) totsqft regionNE regionS regionW
##        11.16 0.01192   0.1058 0.04170 0.09304
##        28.32 0.01000   0.0869 0.05403 0.13073

Here are some plots to help us visualize the interactions.

mem1a = margins(fit1)
## Warning in warn_for_weights(model): 'weights' used in model estimation are
## currently ignored!
mem1a %>%
  ggplot(aes(x=totsqft, y=fitted, group=region, color=region)) +
  geom_line() +
  theme_bw()

mem1a %>%
  ggplot(aes(x=totsqft, y=fitted, group=region, color=region)) +
  geom_line() + 
  geom_point(aes(y=es_fridge)) +
  facet_wrap(~region) +
  theme_bw()

Let’s test the contrast for whether the slope for totsqft is different between the S and NE and the S and W.

# Create a matrix of contrasts
b = coef(fit1)
contrasts = matrix(0, 2, length(b))
colnames(contrasts) = names(b)

contrasts[1, c('regionNE:totsqft', 'regionS:totsqft')] = c(-1, 1)
contrasts[2, c('regionS:totsqft', 'regionW:totsqft')] = c(-1, 1)

# Estimate the contrasts
est = contrasts %*% matrix(b, ncol=1)

# Get the standard error of the contrasts
se = sqrt( diag(contrasts %*% vcov(fit1) %*% t(contrasts)) )

# Wald test
z = est / se
p = 2*{1-pnorm(abs(z))}
out = round(cbind(est, se, z, p), 3)
colnames(out) = c('est', 'se', 'z', 'p')
out
##        est    se     z     p
## [1,] 0.006 0.006 0.972 0.331
## [2,] 0.010 0.007 1.565 0.118
# Here's a package with a method for the estimation piece
round(doBy::esticon(fit1, contrasts), 3)
##   beta0 Estimate Std.Error X2.value DF Pr(>|X^2|)  Lower Upper
## 1     0    0.006     0.006    0.945  1      0.331 -0.006 0.018
## 2     0    0.010     0.007    2.449  1      0.118 -0.003 0.023
# They use a chi-squared test
round(cbind( z^2, 1 - pchisq(z^2, 1) ), 3)
##       [,1]  [,2]
## [1,] 0.945 0.331
## [2,] 2.449 0.118

Summary

  • Marginal effects at the mean represent differences in adjusted predictions at the mean.

  • Average marginal effects represent differences in average adjusted predictions. This is equivalent to the mean marginal effect computed from adjusted predictions for each observation in the data since averaging and difference are linear operations.

  • For continuous values we can computed marginal effects as differences in adjusted predictions at representative values or use (possibly numeric) derivatives in place of differences.