The group project will be completed in groups of three. Groups have been assigned randomly and will be posted to Canvas. There may be 1-2 groups of four.
Each group will choose a data management, analysis, or visualization technique and produce a tutorial on the selected topic. The tutorial should include:
A short description or explanation of the topic;
A real data set on which to demonstrate the technique;
Three approximate translations of the selected example (one per group member). These translation must use at least two software languages (Stata, R, SAS, etc) and at least one version must be in R. Up to two versions may be in R, provided they make use of different packages for the core technique (i.e. one could use base R and another data.tables or dplyr).
Topics should be of no greater scope than those at the UCLA page here. Please keep other time demands in mind when deciding the scope of your project. It is preferable to have a well put together project of limited scope than a less-polished project on a broader topic.
You can find completed projects from last year’s course here.
Each group should write a short proposal containing:
Each member of the group should assume primary responsibility for an example script and these responsibilities should be included in your proposal.
A group liaison should submit the group’s proposal to me via email with the subject header “Stats 506 Group Project Proposal”.
I plan to post your tutorials to this page – you may include or omit your name from the author information at your own discretion.
Your introduction and overview should be approximately 3-5 paragraphs explaining:
All three examples should follow a common outline to the extent possible. Deviations should be due to limitations or stylistic differeneces in the languages chosen rather than lack of coordiation among group members. Where deviations do occur, please explain and justify them in the text surrounding the example.
It is permissible for one or more examples to extend beyond the scope of others provided that all three examples share a common core set of tasks.
You may use a language (i.e. Python, Matlab) not taught in this course for one or more examples as long as I approve it in your proposal.
Groups should use git to coordinate their work. Each member of the group should create an account at github.com. One group member should create a public repository for the project with others submitting pull requests to them. Your git repo is considered part of the final submission and should include at minimum:
Excluding extraordinary circumstances, all group members will receive the same grade. However, I reserve the right to modify this policy in cases where one or more group members clearly put in less effort than the others.
Topic Proposals due Friday November 16 at 4pm: Your group must have your proposal approved by me prior to this time. Groups are required to select unique topics so it is to your benefit to submit early.
Draft Due Date: Tuesday November 27 at 4pm. Drafts should be mostly complete and contain a concise to-do list of outstanding items. Submit drafts to Canvas as a zip archive and link to the official git repo.
Peer Review: Due Monday December 3 at 5pm. You will be asked to provide constructive feedback to another group. Guidelines for how to structure this feedback will be provided.
Final Due Date: Friday December 7 at 5pm. This is the deadline to submit the final version of your tutorial. Please submit as a zip archive to canvas with html, Rmd, and (possibly) scripts or data files included.
Please make edits in response to peer feedback.
I will post group proposals here as they are approved. Two groups may not choose the same or closely related topics. The two topics below are also reserved.
If you are repeating a topic from last year, please be clear how your tutorial will add value to what was done previously.
Reshaping data between long and wide formats
The “split-apply-combine” pattern
Group 1 -
Group 11 -