Instructions

Questions

Question 1 [60 points]

This question is a modified version of question 1 from problem set 2. It is worth fewer points because the grouping structure is simplified and you should be able to adapt code for creating figures and tables from the earlier assignment.

Use the 2009 and 2015 Residential Energy Consumption Survey RECS data to profile the quantities and types of televisions in US homes, by Census Region.

  1. [30 points] Compare the average number of televisions (TVCOLOR) in US homes in 2009 and 2015 by Census Region.
    1. Compute point estimates and 95% confidence intervals for both years (in SAS) and produce a figure (in R) to display the results.
    2. Compute point estimates and 95% confidence intervals for the 2015 less 2009 differences (in SAS) and produce a figure (in R) to display the results.
    3. Combine the estimates for 2009, 2015, and their difference into a nicely formatted table.
  2. [30 points] Repeat part “a” for the proportion of primary televisions by display type for most used television (TVTYPE1).

Notes:

  • Remember, to compute the variance of the differences, you should assume the 2009 and 2015 estimates are independent. That is if \((\hat \theta_1, \hat v_1)\) and \((\hat \theta_2, \hat v_2)\) are the estimates and variances for 2009 and 2015, respectively, then the differences and their variances are: \((\hat \theta_2 - \hat \theta_1, \hat v_1 + \hat v_2)\).

  • You may adapt either your own code or the solutions for problem set 2 for the figures and tables. In either case, please provide a clear citation in your write up, e.g. “A portion of this solution is adapted from (my or Dr. Henderson’s) solution to PS2 Question 1.”

  • Use proc export or a similar procedure to to output your results in SAS to be read into your write-up document.

Question 2 [25 points]

In this question you will use the NHANES dentition and demographics data from PS3.

  1. [10 points] Pick a single tooth (OHXxxTC) and model the probability that a permanent tooth is present as a function of age using logistic regression. For this part (“a”), assume the data are iid and ignore the survey weights. You should consider non-linear transforms of age but only need to document your final model in the write up. Control for other demographics included in the data as warranted.

  2. [10 points] Refit your model from part a using proc surveylogistic to account for the weights. See the notes below for links to example code.

  3. [5 points] In your write up, provide a side-by-side comparison of the results when using or ignoring the survey weights. This could be either a figure or a table (one will suffice).

Notes: