Instructions

Questions

Question 1 [25 points]

In this question, you will construct a demographics balance table similar to the one constructed in the week6 class activity. Specifically, you should construct a balance table comparing demographics for those who are or are not missing the oral health examination in the NHANES data prepared in problem set 1. The data are available in the Stats506_F20 repository under problem_sets/data/. You will need the following two files:

  • nahanes_demo.csv
  • nhanes_ohxden.csv

Documentations for the variables included can be found at:

Organize your table to fill in the numbers and percent of participants with a complete dentition examination for each of the demographics identified below. Separate the presentation into those over 20 and those younger than 20. As in the activity, you can exclude those who did not participate in any of the medical examinations (MEC) as identified by the RIDSTATR variable. For all others consider the dentition examination non-missing only when OHDDESTS from the oral health examination data takes the value “complete” and is non-missing.

Your table or tables should compare balance for the demographics below. Use a chi-squared test to assess whether there are marginal associations between categorical variables and whether the dentition exam is complete. Use t-tests to compare continuous variables (e.g. age) and present such variables as mean (IQR) rather than n (%).

For those under 20 years old:

  • age (present as mean (IQR), use a t-test to compare)
  • gender
  • race/ethnicity from RIDRETH3

For those 20 or older:

  • age (present as mean (IQR), use a t-test to compare)
  • gender
  • race/ethnicity from RIDRETH3
  • college - with two levels, ‘some college/college graduate’ or ‘No college/Unknown’ where the latter category includes “Don’t Know” and all levels less than ‘some college’.

Question 2 [20 points]

Use logistic regression to model the probability that an NHANES participant has a complete dentition exam conditional on the demographics include in the table above. You may wish to use (e.g. quadratic) functions of age and to consider interactions. Use AIC for model selection. Present your final model as a regression table. In a separate table, report the AIC for each model considered.

Question 3 [40 points]

In this question you will compute descriptive statistics to investigate the relationship between a tooth’s status as captured by the ‘tooth count’ variables OHXxxTC and age. The goal is to quantify how teeth progress as we age.

First, divide participants into regularly spaced age groups of six consecutive years, e.g. \([0, 6), [6, 12), [12, 18), \cdots\). Then, separately for each (numbered) tooth, compute proportions for each level of the tooth count variable by age group. Although there are survey weights present, you can treat the data as iid for this analysis and compute confidence intervals for each proportion using the standard error \(\sqrt{\hat{p}(1 - \hat{p})/n}\). Your Stata code for this question should output a .csv or .xlsx file with the estimated proportions and standard errors for each tooth and age group.

Read this file into R and construct a figure or figures presenting the results.

Notes:

  • Elements of style we will emphasize in your Stata code include:

    • headers
    • line length
    • use of comments
    • use of macros, globs (*), or built in Stata commands to avoid excessive repetition (e.g. repeating the same block of code for each of the 32 teeth)
  • For question 1, you should use some combination of putexcel and/or export delimited to output your results in Stata to be read into your write-up document.

  • Consider using the user-written command outreg2 or the built-in putexcel to transfer results for question 2 to your Rmd report in a reproducible way. In a pinch, a screen shot will suffice.