Problem Set 2

Instructions

Submit the assignment by the due date via Canvas. Assignments may be submitted up to 72 hours late for a 5 point reduction.
All files read, sourced, or referred to within scripts should be assumed to be in the same working directory (./).
Your code should be clearly written and it should be possible to assess it by reading it. Use appropriate variable names and comments. Your style will be graded using the style rubric [15 points].
Some of these exercises may require you to use commands or techniques that were not covered in class or in the course notes. You can use the web as needed to identify appropriate approaches. Part of the purpose of these exercises is for you to learn to be resourceful and self sufficient. Questions are welcome at all times, but please make an attempt to locate relevant information yourself first.
Please use the provided templates.
This assignment should be done entirely in R, though you may use the Linux shell for data preparation and download documentation. You should use the tidyverse for data manipulation (tidyr and dplyr) and plotting (ggplot2).
Your submission should include a write-up as a pdf or HTML document and all scripts needed to reproduce it. In your document, describe how the files submitted relate to one another and be sure to answer the questions.
For this assignment scripts you should submit: R (.R), Rmarkdown (.Rmd or .R with spin) for the write-up, and the write up itself (.pdf or .html), and (optionally) a shell script (.sh) ps2_make.sh to build the assignment.

Questions

Question 1 [85 points]

This is the only questions for this problem set. In this question, you will use the 2009 and 2015 Residential Energy Consumption Survey RECS data to profile the quantities and types of televisions in US homes.

[30 points] Using the 2009 RECS data, estimate the mean or proportion, as appropriate, for the following variables by census division (DIVISION) and urban/rural (UR or UATYP10) status:
1. number of televisions (TVCOLOR),
2. display type for most used television (TVTYPE1).
[30 points] Repeat part “a” using the 2015 data RECS data.
[25 points] For each set of estimates from the prior two parts, estimate the change from 2009 to 2015. To compute the variance of each change, assume the 2009 and 2015 estimates are independent. That is if \((\hat \theta_1, \hat v_1)\) and \((\hat \theta_2, \hat v_2)\) are the estimates and variances for 2009 and 2015, respectively, then the differences and their variances are: \((\hat \theta_2 - \hat \theta_1, \hat v_1 + \hat v_2)\).

For full credit, your solution should:

Provide 95% confidence intervals using the replicate weights for all point estimates presented in tables or figures. Estimate the variance \(\hat{v}(\hat{\theta})\) of your estimates as indicated in the documentation and use \(\hat\theta \pm \Phi^{-1}(.975) \sqrt{\hat v(\hat \theta)}\) as your interval.
Present all three parts together in a cohesive fashion. You may choose to organize this either by variable, by estimate (2009 estimate, 2015 estimate, difference), or by census division. Choose an organization to emphasize what you see as the most interesting findings. Use aesthetics such as color and/or facets to help with organization.
Provide both figures and tables of the final results.
Write your code in a manner that avoids excessive repetition.

Notes:

The replicate weights for the 2015 data are included in the data file.
The replicate weights for the 2009 data are distributed separately. You can find a link to the weights on the same page as the 2009 data.

Problem Set 2

Statistics 506, Fall 2020

Due: Friday October 9, by 7pm

Instructions

Questions

Question 1 [85 points]

Notes: