Final Project

Overview

Your final project will take the form of a short (~2 page) report on data analyses you design to answer a substantive question of your choosing. You have three options from which to choose for posing your substantive question.

  1. Pose a question that can be answered using the 2012 US Commercial Building Energy Consumption Survey (CBECS).

  2. Pose a question that can be answered using NHANES.

  3. Pose a question about how a statistical method performs in an atypical situation. Answer this question using a Monte Carlo study.

Questions coming from options 1 and 2 should be similar in scope to a homework question and phrased as inferential questions as described in problem set 5, question 1. For instance, if using NHANES,

Are people in the US more likely to drink water on a weekday than on a weekend day?

is a better question than,

What fraction of people reported drinking water on weekends and weekdays in the 2006-2008 NHANES sample?

Questions posed from option 3 should be specific and relevant to actual statistical practice. Here is an example for option 3:

When comparing the means for two groups with paired data, it is standard to use a paired t-test. Sometimes, however, researchers use two sample t-tests instead either by mistake or for some other reasons such as because the pairings are unknown. How does this impact the type 1 error and power?

Project Proposal

Before beginning your project, submit a proposal to me by email detailing:

  • your substantive question,
  • the specific data sets and variables you will use to answer that question, or an outline of your Monte Carlo design for option 3,
  • the analysis you plan to do,
  • the statistical software you will use (one of the options from this course).

Please submit an initial proposal to me by Tuesday, December 1 at 5pm and aim to have your proposal approved prior to Friday December 4 at 5pm. I welcome proposals over the break. I will provide feedback on your proposal to help you design an interesting but feasible question.

Guidelines

Your report should be approximately two pages in length if printed and no more than 3 pages long. You should included at least 1 but no more than 3 tabular or graphical elements. The word count for the report should be between 200 and 600 words.

Organize your report into the following sections:

  • Introduction: Approximately 2-3 paragraphs explaining what your question is, why it is interesting, and ending with a high level description of the analysis you did (not the results).
  • Data / Methods: Describe your data source and the methods you used. There should be enough detail here that I could repeat your analysis. Focus on what you did, not how you did it. Include a sentence with a link to a GitHub repository containing your code.
  • Results: What did you find? This should be the largest section and is where all of your tabular/graphical elements go.
  • Conclusion / Discussion: What do your results allow us to conclude about the question you posed? What are the strengths and limitations of your analysis?

Timeline

  1. Initial Proposal Due before: Tuesday December 1, 5pm.

  2. Approved proposal by: Friday December 4, 5pm.

  3. Draft due: Monday December 7, 5pm.

  4. Peer review due: Thursday December 10, 5pm.

  5. Final project due: Tuesday December 15, 6pm.

Submit drafts to Canvas as an html or pdf document with an embedded link to a Git repo containing your analysis files.

Notes

If you using the CBECS data, be aware that he replicate weights work slightly differently than those in the RECS data. Refer to this document.