In this example we will look at methods associated with the glm
class of objects representing models fits from generalized linear models.
Just as the name implies, GLMs generalize the linear model through the use of a link function relating the expected or mean outcome to a linear predictor.
Generalized linear models have the following form relating a dependent variable \(Y\) to independent variables \(X\) and coefficients \(\beta\):
\[ h(E[Y| X]) = X\beta. \]
Here \(h(\cdot)\) is called the link function and we will often be interested in its inverse \(g(\cdot)\) so that an equivalent specification is:
\[ E[Y | X] = g(X\beta). \] The matrix product \(X\beta\) in a generalized linear model is known as the linear predictor.
Classical linear regression has this form with the identity link \(h(x) = g(x) = x\),
\[ E[Y | X] = X\beta. \] Two popular GLMs are:
\[ \log \frac{ P(Y = 1 | X) }{P(Y = 0|X)} = X\beta, \qquad \textrm{or} \qquad E[Y | X] = \frac{e^{X\beta}}{1 + e^{X\beta}}, \]
\[ \log E[Y | X] = X\beta, \qquad \textrm{or} \qquad E[Y | X] = e^{X\beta}. \]
Generalized linear models can be fit in \(R\) using the glm()
function. The first argument is a formula
used for specifying the linear predictor in terms of one or more columns from a data frame. The other key argument is family
which determines the class of models to be fit by specifying a probability distribution for the response and a link function.
We will review the R help for glm
, formula
, and family
together in class.
In this first case study we will use logistic regression to explore predictors of whether a home’s primary fridge is Energy Star compliant. For simplicity, we will assume this is a simple random sample and ignore the survey weights. You can find a version of this example in Stata here.
First, we will load the RECS data and clean up some variables of interest.
# 80: --------------------------------------------------------------------------
# load packages: ---------------------------------------------------------------
library(tidyverse)
# read or load data: -----------------------------------------------------------
data_file = '../data/recs2009_public.RData'
if(!file.exists(data_file)){
recs = read_delim('./recs2009_public.csv', col_names=TRUE, delim=',')
save(recs,file=data_file)
} else{
foo = load(data_file)
cat(sprintf('Loaded object(s) %s from %s.\n', foo, data_file))
}
## Loaded object(s) recs from ../data/recs2009_public.RData.
# tidy data: -------------------------------------------------------------------
# function to decode region
decode_region = function(rvec){
sapply(rvec, function(r) switch(r, "NE", "MW", "S", "W"))
}
recs = recs %>%
mutate(es_fridge = ifelse(ESFRIG < 0, NA, ESFRIG),
totsqft = TOTSQFT / 100,
region = decode_region(REGIONC)
) %>%
filter(!is.na(es_fridge))
Now we are ready to fit a logistic regression. Note that R will treat “region” as a factor since it is of class character.
fit0 = glm(es_fridge ~ 0 + region + totsqft, data=recs,
family=binomial(link='logit'))
summary(fit0)
##
## Call:
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"),
## data = recs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.094 -1.182 0.730 1.020 1.496
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## regionMW -0.791512 0.073039 -10.837 < 2e-16 ***
## regionNE -0.367748 0.071932 -5.112 3.18e-07 ***
## regionS -0.576576 0.057470 -10.033 < 2e-16 ***
## regionW -0.315000 0.061458 -5.125 2.97e-07 ***
## totsqft 0.046881 0.002195 21.355 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10571.9 on 7626 degrees of freedom
## Residual deviance: 9612.5 on 7621 degrees of freedom
## AIC: 9622.5
##
## Number of Fisher Scoring iterations: 4
class(fit0)
## [1] "glm" "lm"
typeof(fit0)
## [1] "list"
Notice that the glm
class inherits from the lm
class for linear models. Above we used the summary
method to review the model fit fit0
. Here are some other S3 methods we can make use of:
print(fit0)
##
## Call: glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"),
## data = recs)
##
## Coefficients:
## regionMW regionNE regionS regionW totsqft
## -0.79151 -0.36775 -0.57658 -0.31500 0.04688
##
## Degrees of Freedom: 7626 Total (i.e. Null); 7621 Residual
## Null Deviance: 10570
## Residual Deviance: 9613 AIC: 9623
coef(fit0)
## regionMW regionNE regionS regionW totsqft
## -0.79151228 -0.36774783 -0.57657557 -0.31500023 0.04688135
round(vcov(fit0),4)
## regionMW regionNE regionS regionW totsqft
## regionMW 0.0053 0.0021 0.0021 0.0019 -1e-04
## regionNE 0.0021 0.0052 0.0017 0.0016 -1e-04
## regionS 0.0021 0.0017 0.0033 0.0015 -1e-04
## regionW 0.0019 0.0016 0.0015 0.0038 -1e-04
## totsqft -0.0001 -0.0001 -0.0001 -0.0001 0e+00
broom::tidy(fit0)
## term estimate std.error statistic p.value
## 1 regionMW -0.79151228 0.073039161 -10.836821 2.303375e-27
## 2 regionNE -0.36774783 0.071932358 -5.112412 3.180716e-07
## 3 regionS -0.57657557 0.057470069 -10.032624 1.095660e-23
## 4 regionW -0.31500023 0.061457722 -5.125479 2.967830e-07
## 5 totsqft 0.04688135 0.002195341 21.354929 3.509039e-101
AIC(fit0)
## [1] 9622.523
-2*logLik(fit0) + 2*length(coef(fit0))
## 'log Lik.' 9622.523 (df=5)
BIC(fit0)
## [1] 9657.219
-2*logLik(fit0) + log(nrow(recs))*length(coef(fit0))
## 'log Lik.' 9657.219 (df=5)
head(fitted(fit0))
## 1 2 3 4 5 6
## 0.8302945 0.4699826 0.5391441 0.6291620 0.6989500 0.7060929
head(resid(fit0))
## 1 2 3 4 5 6
## 0.6098767 -1.1268056 1.1115506 0.9626697 -1.5495024 0.8342763
Be careful about inheritance if you try to use the plot
method as plots for linear models may not be the most instructive for GLMs.
methods(plot)[grep('lm', methods(plot))]
## [1] "plot.intervals.lmList" "plot.lm" "plot.lme"
## [4] "plot.lmList" "plot.mlm" "plot.ranef.lme"
## [7] "plot.ranef.lmList" "plot.simulate.lme"
Another useful generic is predict
for using the fitted model to predict expected values for new data. Below, we use predict to compute “adjusted predictions at the mean” for the region and square footage variables.
Adjusted predictions are simply predictions from the model at specially chosen values of the independent variables in a model.
Here we compute adjusted predictions (aka “predictive margins”) for regions with total square feet held at its mean.
# Set up a new data frame for predictions: ------------------------------------
recs_regions_atmean =
tibble(region=unique(recs$region)) %>%
mutate(totsqft=mean(recs$totsqft))
## Use predict to estimate probabilities for the new data
m0 = predict(fit0, recs_regions_atmean, type='response', se=TRUE)
## We can include these in the data frame and plot to compare
recs_regions_atmean %>%
mutate(fit=m0$fit, se=m0$se.fit, lwr=fit - 2*se, upr=fit + 2*se) %>%
mutate(region=factor(region,region[order(fit)])) %>% # order
ggplot(aes(y=region, x=fit)) +
geom_point() +
geom_errorbarh(aes(xmin=lwr, xmax=upr), height=.1) +
geom_vline( aes( xintercept = fit), lty = 'dashed', color='grey', alpha =.5 ) +
xlab('Predicted probability for an average-sized house') +
ggtitle('Estimated usage of Energy Star compliant refrigerators by region.') +
theme_bw() +
xlim(c(0,1))
What if we wanted to examine the effect of totsqft
averaged over regions?
# Compute proportions of homes in each region
df_regions =
recs %>%
group_by(region) %>%
summarize(Count=n()) %>%
spread(region,Count) %>%
mutate(Total=MW+NE+S+W,
MW=MW/Total,
NE=NE/Total,
S =S/Total,
W =W/Total
) %>%
gather(region, prop, MW:W)
p_regions = df_regions$prop
names(p_regions) = df_regions$region
p_regions
## MW NE S W
## 0.2128245 0.1835825 0.3554944 0.2480986
# Pick representative values: -------------------------------------------------
with(recs, quantile(totsqft, c(.25, .5, .75) ))
## 25% 50% 75%
## 11.160 18.735 28.320
at_totsqft = c(10, 20, 30)
# Compute linear predictors
lp_avg_region = sum(coef(fit0)[-5] * p_regions)
lp = lp_avg_region + coef(fit0)[5]*at_totsqft
# Use inverse link to compute response
pred = exp(lp)/{1 + exp(lp)}
ap_totsqft = tibble(totsqft = at_totsqft, adjusted_prediction = pred)
ap_totsqft
## # A tibble: 3 x 2
## totsqft adjusted_prediction
## <dbl> <dbl>
## 1 10 0.4874346
## 2 20 0.6031341
## 3 30 0.7083440
We could also do this using predict
.
## Using predict
new_data = recs %>% select(region, totsqft)
p10 = predict(fit0, new_data %>% mutate(totsqft=10), type='link')
p20 = predict(fit0, new_data %>% mutate(totsqft=20), type='link')
p30 = predict(fit0, new_data %>% mutate(totsqft=30), type='link')
## For explaining what is happening here ...
new_data %>% mutate(totsqft=10, p10 = p10) %>% head()
## # A tibble: 6 x 3
## region totsqft p10
## <chr> <dbl> <dbl>
## 1 MW 10 -0.3226988
## 2 NE 10 0.1010657
## 3 MW 10 -0.3226988
## 4 NE 10 0.1010657
## 5 MW 10 -0.3226988
## 6 NE 10 0.1010657
# Notice the precedence of the mean
pred = sapply(list(p10, p20, p30), function(x) exp(mean(x)) / {1 + exp(mean(x))})
# Built in inverse logit is CDF of logistic distribution
pred_alt = sapply(list(p10, p20, p30), function(x) plogis(mean(x)))
ap_totsqft %>% mutate(adj_pred = pred, adj_pred_alt = pred_alt)
## # A tibble: 3 x 4
## totsqft adjusted_prediction adj_pred adj_pred_alt
## <dbl> <dbl> <dbl> <dbl>
## 1 10 0.4874346 0.4874346 0.4874346
## 2 20 0.6031341 0.6031341 0.6031341
## 3 30 0.7083440 0.7083440 0.7083440
In the above example, we computed predictions for each observation in the data set adjusted for fixed values of totsqft
and averaged the linear predictors before transforming to the probability scale using the link function. These are examples of adjusted predictions at the means.
An alternative to adjusted predictions at the means are average adjusted predictions computed by averaging adjusted predictions for each observation in the data set. In the example below, we average after forming adjusted predictions on the probability scale, i.e the scale of the response. In this case the results are rather similar because the logistic function is nearly linear in the center.
# Average adjusted predictions: -----------------------------------------------
new_data = recs %>% select(region, totsqft)
p10 = predict(fit0, new_data %>% mutate(totsqft=10), type='response')
p20 = predict(fit0, new_data %>% mutate(totsqft=20), type='response')
p30 = predict(fit0, new_data %>% mutate(totsqft=30), type='response')
ap_totsqft = ap_totsqft %>%
mutate(
avg_response = sapply(list(p10, p20, p30), mean),
diff = avg_response - adjusted_prediction
)
ap_totsqft
## # A tibble: 3 x 4
## totsqft adjusted_prediction avg_response diff
## <dbl> <dbl> <dbl> <dbl>
## 1 10 0.4874346 0.4875659 0.0001313622
## 2 20 0.6031341 0.6023908 -0.0007433375
## 3 30 0.7083440 0.7070102 -0.0013337676
We can visualize the reasons for this by plotting the observed and fitted values of the response against the linear predictor.
# S3 method to extract linear predictors: -------------------------------------
lin_pred = function(obj){
UseMethod("lin_pred")
}
lin_pred.glm = function(obj){
obj$linear.predictors
}
## See also
#broom::augment_columns()
recs %>%
select(es_fridge, region, totsqft) %>%
mutate( lin_pred = lin_pred(fit0),
fitted = fitted(fit0)
) %>%
ggplot(aes(y=es_fridge, x = lin_pred, col = es_fridge)) +
geom_point() +
geom_line( aes(y = fitted), col = 'red', lwd=2 ) +
geom_abline( aes(intercept = .5, slope = .25), col='lightgrey') +
theme_bw()
Suppose we fit a model with an interaction between region and home size.
fit1 = glm(es_fridge ~ 0 + region*totsqft,
data=recs, family=binomial(link='logit')
)
summary(fit1)
##
## Call:
## glm(formula = es_fridge ~ 0 + region * totsqft, family = binomial(link = "logit"),
## data = recs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.383 -1.177 0.729 1.022 1.460
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## regionMW -0.707612 0.108817 -6.503 7.89e-11 ***
## regionNE -0.263363 0.109073 -2.415 0.0158 *
## regionS -0.586226 0.081669 -7.178 7.07e-13 ***
## regionW -0.494804 0.100740 -4.912 9.03e-07 ***
## totsqft 0.043247 0.004113 10.515 < 2e-16 ***
## regionNE:totsqft -0.001749 0.006265 -0.279 0.7801
## regionS:totsqft 0.004152 0.005608 0.740 0.4591
## regionW:totsqft 0.014390 0.006722 2.141 0.0323 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10571.9 on 7626 degrees of freedom
## Residual deviance: 9606.2 on 7618 degrees of freedom
## AIC: 9622.2
##
## Number of Fisher Scoring iterations: 4
When computing adjusted predictions at specific values of totsqft
it would not make sense to average over the interaction terms which also involve totsqft
. Instead, we should ensure that totsqft
enters the model as expected for all terms involving it. Using predict
will take care of this for us.
p10 = predict(fit1, new_data %>% mutate(totsqft=10), type='response')
p20 = predict(fit1, new_data %>% mutate(totsqft=20), type='response')
p30 = predict(fit1, new_data %>% mutate(totsqft=30), type='response')
ap_totsqft = ap_totsqft %>%
mutate(avg_response = sapply( list(p10, p20, p30), mean) )
ap_totsqft
## # A tibble: 3 x 4
## totsqft adjusted_prediction avg_response diff
## <dbl> <dbl> <dbl> <dbl>
## 1 10 0.4874346 0.4874897 0.0001313622
## 2 20 0.6031341 0.6048856 -0.0007433375
## 3 30 0.7083440 0.7106078 -0.0013337676
We could also do this directly.
# model coefficients
beta_hat = matrix(coef(fit1), ncol=1)
# Data matrices for each level of totsqft
X10 = recs %>%
transmute(
MW = 1*{region=='MW'},
NE = 1*{region=='NE'},
S = 1*{region=='S'},
W = 1*{region=='W'},
tot = 10,
tot_NE = tot*NE,
tot_S = tot*S,
tot_W = tot*W
)
X10 = as.matrix(X10)
X20 = cbind(X10[,1:4], 2*X10[,5:8])
X30 = cbind(X10[,1:4], 3*X10[,5:8])
# inverse link
inv_logit = function(x) exp(x) / {1+exp(x)}
# average adjusted predictions
ap_totsqft %>%
mutate(
by_hand = c( mean(inv_logit(X10 %*% beta_hat)),
mean(inv_logit(X20 %*% beta_hat)),
mean(inv_logit(X30 %*% beta_hat))
)
)
## # A tibble: 3 x 5
## totsqft adjusted_prediction avg_response diff by_hand
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10 0.4874346 0.4874897 0.0001313622 0.4874897
## 2 20 0.6031341 0.6048856 -0.0007433375 0.6048856
## 3 30 0.7083440 0.7106078 -0.0013337676 0.7106078
prediction
packageThe prediction package provides an S3 generic prediction
and methods for computing predictive margins from various classes representing fitted models. It is “type safe” in that it always returns an object of class prediction
which is essentially a copy of the original data frame with the requested fitted values plus a few additional attributes.
Below we use prediction::prediction
to compute adjusted predictions at the mean for our fit0
object.
# Prediction package: ----------------------------------------------------------
library(prediction)
lp_fit0 = prediction(fit0, at=list(totsqft=c(10, 20, 30)), type='link')
class(lp_fit0)
## [1] "prediction" "data.frame"
names(attributes(lp_fit0))
## [1] "names" "row.names" "class" "at" "model.class"
## [6] "type"
Because we requested predictions at three values of totsqft
our data frame is three time as large.
cat(' dim(lp_fit0): ', dim(lp_fit0), '\n', 'dim(recs): ', dim(recs),'\n',
'nrow(lp_fit0)/nrow(recs): ', nrow(lp_fit0)/nrow(recs), '\n')
## dim(lp_fit0): 22878 945
## dim(recs): 7626 943
## nrow(lp_fit0)/nrow(recs): 3
You should keep this in mind if working with large data sets as this could quickly eat up memory unnecessarily. This is also true of model objects returned by glm
itself.
print(object.size(recs), units='Mb')
## 29.5 Mb
print(object.size(lp_fit0), units='Mb')
## 88 Mb
object.size(lp_fit0) / object.size(recs)
## 3 bytes
print(object.size(fit0), units='Mb')
## 33.4 Mb
One way to minimize the impact of this is to fit models using data frames limited to necessary variables.
recs_glm = recs %>% select(es_fridge, region, totsqft)
fit0 = glm(es_fridge~0+region+totsqft, family=binomial(link='logit'), data=recs_glm)
fit1 = glm(es_fridge~0+region*totsqft, family=binomial(link='logit'), data=recs_glm)
print(object.size(recs_glm), units='Mb')
## 0.1 Mb
print(object.size(recs_glm), units='Mb')
## 0.1 Mb
print(object.size(fit0), units='Mb')
## 4.1 Mb
Returning to the theme of computing adjusted predictions, here are average adjusted predictions using type = 'response'
(the default).
m_totsqft = prediction(fit0, at=list(totsqft=c(10, 20, 30)), type='response')
m_totsqft
## Average predictions for 7626 observations:
## at(totsqft) value
## 10 0.4876
## 20 0.6024
## 30 0.7070
Question Why doesn’t m_totsqft
print the entire data frame in the last line above?
Adjusted predictions at the mean are computed by fixing the values of some variables at representative values and setting others to the mean value.
Average adjusted predictions are computed adjusted predictions for each observation in the data leaving other values unchanged and then averaging the resulting predictions.
In a linear model these are the same.
Marginal effects (aka partial effects) summarize the impact of a model coefficient using differences between adjusted predictions. For factor variables this means comparing each factor level to a reference level. For continuous variables we may define marginal effects between reference levels of interest, often based on quantiles, or using a “derivative”.
As you might expect, marginal effects at the mean (MEM) are differences between adjusted predctions at the mean. Below are some examples.
First, we examine simple differences of adjusted predictions at the mean for different regions.
# New data for prediction
new_data =
tibble(region = unique(recs_glm$region),
totsqft = mean(recs_glm$totsqft)
)
# Predictions
new_data =
new_data %>%
mutate(fitted = predict(fit0, new_data, type='response'))
# Compute differences
new_data %>% spread(region, fitted) %>%
mutate(mem_NE = NE - MW,
mem_S = S - MW,
mem_W = W - MW
) %>%
knitr::kable(digits=2)
totsqft | MW | NE | S | W | mem_NE | mem_S | mem_W |
---|---|---|---|---|---|---|---|
21.91 | 0.56 | 0.66 | 0.61 | 0.67 | 0.1 | 0.05 | 0.11 |
Next, we compute marginal effects at the mean for totsqft
.
# MEM for totsqft quantiles
q_tot = quantile(recs_glm$totsqft, c(.25, .75))
# Adjusted Predictions at "mean" region
lp_fit0 = c(sum(coef(fit0)*c(p_regions, q_tot[1])),
sum(coef(fit0)*c(p_regions, q_tot[2]))
)
apm = exp(lp_fit0) / {1 + exp(lp_fit0)}
apm
## [1] 0.5010275 0.6918098
# Marginal Effect
diff(apm)
## [1] 0.1907823
From the above, for an “average” region a house at the upper quartile (~2800 sq ft) is 19% more likely to have an Energy Star compliant refrigerator than a house at the lower quartile (~1100 sq ft).
Similarly, average marignal effects (AME) are differences between average adjusted predictions.
Below we first compute predictions for each region at the chosen levels of totsqft
# AME for totsqft qu0antiles
q_tot = quantile(recs_glm$totsqft, c(.25, .75))
# New data for prediction
new_data =
tibble(totsqft = rep(q_tot, each=4),
region = rep(names(p_regions), 2),
w_regions = rep(p_regions, 2)
)
new_data = new_data %>% mutate(fitted = predict(fit0, new_data, type='response'))
new_data
## # A tibble: 8 x 4
## totsqft region w_regions fitted
## <dbl> <chr> <dbl> <dbl>
## 1 11.16 MW 0.2128245 0.4333205
## 2 11.16 NE 0.1835825 0.5387839
## 3 11.16 S 0.3554944 0.4866582
## 4 11.16 W 0.2480986 0.5518617
## 5 28.32 MW 0.2128245 0.6309204
## 6 28.32 NE 0.1835825 0.7231082
## 7 28.32 S 0.3554944 0.6794193
## 8 28.32 W 0.2480986 0.7335442
Now, we can compute a weighted average of the marginal effects.
aap = new_data %>% group_by(totsqft) %>% summarize(ame = sum(w_regions*fitted))
with(aap, ame[2] - ame[1])
## [1] 0.1894935
This could also be done using predict
. Note that here we make predictions for each row of the data with “totsqft” replaced by our representative values.
aap_upr =
mean(predict(fit0, recs_glm %>% mutate(totsqft = q_tot[2]), type='response'))
aap_lwr =
mean(predict(fit0, recs_glm %>% mutate(totsqft = q_tot[1]), type='response'))
c('Lower Q'=aap_lwr, 'Upper Q'=aap_upr, 'AME' = aap_upr - aap_lwr)
## Lower Q Upper Q AME
## 0.5010529 0.6905464 0.1894935
margins
packageThe author of the prediction
package also has a margins package meant to mimic the behavior of Stata’s margins
command. As you might expect, it is built on the prediction
package meaning that functions in margins
call functions in prediction
for computing adjusted predictions.
# Undertanding margins: -------------------------------------------------------
library(margins)
mem0 = margins(fit0)
summary(mem0)
## factor AME SE z p lower upper
## regionNE 0.0944 0.0174 5.4193 0.0000 0.0603 0.1286
## regionS 0.0486 0.0152 3.2057 0.0013 0.0189 0.0784
## regionW 0.1057 0.0161 6.5725 0.0000 0.0742 0.1372
## totsqft 0.0104 0.0004 23.9824 0.0000 0.0095 0.0112
# Plotting margins
summary(mem0) %>%
ggplot( aes( x = factor, y = AME)) +
geom_point() +
geom_errorbar( aes(ymax=upper, ymin=lower), width = 0) +
theme_bw() +
ylab("Average Marginal Effect")
# Somethings wrong with the method: -------------------------------------------
plot(mem0)
## If time permits, we'll debug this together: --------------------------------
#getS3method('plot', 'margins')
#getS3method('summary', 'margins')
What happens when we call margins? Below is a call graph for functions in the margins
package.
mvbutils::foodweb(where=asNamespace("margins"))
We can call margins directly using the “change” parameter to specify we want continuous variables summarized using differences between adjusted predictions at the upper and lower quartiles.
margins(fit0, change='iqr')
## Warning in warn_for_weights(model): 'weights' used in model estimation are
## currently ignored!
## Average marginal effects
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"), data = recs_glm)
## totsqft regionNE regionS regionW
## 0.1895 0.09445 0.04864 0.1057
For factor variables, “dydx” is the first difference between the predicted response at a particular level of the factor and the predicted response at the baseline level. For numeric variables, the default behavior is to use a numeric derivative at the observed value for each observation. Below we investigate this for our earlier call creating the margins data frame mem0
.
dydx_W =
predict(fit0, recs_glm %>% mutate(region='W'), type='response') -
predict(fit0, recs_glm %>% mutate(region='MW'), type='response')
head(cbind(dydx_W, dydx_regionW = mem0$dydx_regionW))
## dydx_W dydx_regionW
## 1 0.05708288 0.05708288
## 2 0.11587721 0.11587721
## 3 0.11411806 0.11411806
## 4 0.11519206 0.11519206
## 5 0.09003447 0.09003447
## 6 0.10563376 0.10563376
The overall marginal effect is the average adjusted difference.
mean(dydx_W)
## [1] 0.1057027
mem0
## Average marginal effects
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"), data = recs_glm)
## totsqft regionNE regionS regionW
## 0.01036 0.09445 0.04864 0.1057
In the next code block, we look at the computation of the ‘numeric’ derivatives for continuous variables like totsqft
.
# Step size
setstep = function(x){
max(abs(x), 1, na.rm=TRUE)*sqrt(1e-7)
}
recs_glm = recs_glm %>% mutate(h = sapply(totsqft, setstep))
# Numeric differentiation
dydx_totsqft =
{
predict(fit0,
recs_glm %>% mutate(totsqft = totsqft + .5*h),
type='response'
) -
predict(fit0,
recs_glm %>% mutate(totsqft = totsqft - .5*h),
type='response'
)
} / {recs_glm$h}
dydx_compare =
cbind("mem0"=mem0$dydx_totsqft, "dydx"=dydx(recs_glm, fit0, "totsqft"),
"hand"=dydx_totsqft)
head(dydx_compare)
## mem0 dydx_totsqft hand
## 1 0.006605842 0.006605842 0.006605841
## 2 0.011678090 0.011678090 0.011678095
## 3 0.011648498 0.011648498 0.011648503
## 4 0.010938220 0.010938220 0.010938224
## 5 0.009864720 0.009864720 0.009864722
## 6 0.009729083 0.009729083 0.009729085
The average marginal effect of totsqft
is the average of these values.
apply(dydx_compare, 2, mean)
## mem0 dydx_totsqft hand
## 0.01036292 0.01036292 0.01036293
mem0
## Average marginal effects
## glm(formula = es_fridge ~ 0 + region + totsqft, family = binomial(link = "logit"), data = recs_glm)
## totsqft regionNE regionS regionW
## 0.01036 0.09445 0.04864 0.1057
We can also use “at” to select representative values at which we wish to see marginal effects. Using our interaction model, we can summarize the marginal effect of region at different levels of totsqft
.
mem1 = margins(fit1, at=list(totsqft=q_tot))
## Warning in warn_for_weights(model): 'weights' used in model estimation are
## currently ignored!
mem1
## Average marginal effects at specified values
## glm(formula = es_fridge ~ 0 + region * totsqft, family = binomial(link = "logit"), data = recs_glm)
## at(totsqft) totsqft regionNE regionS regionW
## 11.16 0.01192 0.1058 0.04170 0.09304
## 28.32 0.01000 0.0869 0.05403 0.13073
Here are some plots to help us visualize the interactions.
mem1a = margins(fit1)
## Warning in warn_for_weights(model): 'weights' used in model estimation are
## currently ignored!
mem1a %>%
ggplot(aes(x=totsqft, y=fitted, group=region, color=region)) +
geom_line() +
theme_bw()
mem1a %>%
ggplot(aes(x=totsqft, y=fitted, group=region, color=region)) +
geom_line() +
geom_point(aes(y=es_fridge)) +
facet_wrap(~region) +
theme_bw()
Let’s test the contrast for whether the slope for totsqft is different between the S and NE and the S and W.
# Create a matrix of contrasts
b = coef(fit1)
contrasts = matrix(0, 2, length(b))
colnames(contrasts) = names(b)
contrasts[1, c('regionNE:totsqft', 'regionS:totsqft')] = c(-1, 1)
contrasts[2, c('regionS:totsqft', 'regionW:totsqft')] = c(-1, 1)
# Estimate the contrasts
est = contrasts %*% matrix(b, ncol=1)
# Get the standard error of the contrasts
se = sqrt( diag(contrasts %*% vcov(fit1) %*% t(contrasts)) )
# Wald test
z = est / se
p = 2*{1-pnorm(abs(z))}
out = round(cbind(est, se, z, p), 3)
colnames(out) = c('est', 'se', 'z', 'p')
out
## est se z p
## [1,] 0.006 0.006 0.972 0.331
## [2,] 0.010 0.007 1.565 0.118
# Here's a package with a method for the estimation piece
round(doBy::esticon(fit1, contrasts), 3)
## beta0 Estimate Std.Error X2.value DF Pr(>|X^2|) Lower Upper
## 1 0 0.006 0.006 0.945 1 0.331 -0.006 0.018
## 2 0 0.010 0.007 2.449 1 0.118 -0.003 0.023
# They use a chi-squared test
round(cbind( z^2, 1 - pchisq(z^2, 1) ), 3)
## [,1] [,2]
## [1,] 0.945 0.331
## [2,] 2.449 0.118
Marginal effects at the mean represent differences in adjusted predictions at the mean.
Average marginal effects represent differences in average adjusted predictions. This is equivalent to the mean marginal effect computed from adjusted predictions for each observation in the data since averaging and difference are linear operations.
For continuous values we can computed marginal effects as differences in adjusted predictions at representative values or use (possibly numeric) derivatives in place of differences.