Group Project

The group project will be completed in groups of three. Groups have been assigned randomly and will be posted to Canvas. There may be 1-2 groups of four.

Each group will choose a data management, analysis, or visualization technique and produce a tutorial on the selected topic. The tutorial should include:

Topics should be of no greater scope than those at the UCLA page here. Please keep other time demands in mind when deciding the scope of your project. It is preferable to have a well put together project of limited scope than a less-polished project on a broader topic.

You can find completed projects from last year’s course here.

Each group should write a short proposal containing:

Each member of the group should assume primary responsibility for an example script and these responsibilities should be included in your proposal.

A group liaison should submit the group’s proposal to me via email with the subject header “Stats 506 Group Project Proposal”.

I plan to post your tutorials to this page – you may include or omit your name from the author information at your own discretion.

Guidelines

Introduction and overview

Your introduction and overview should be approximately 3-5 paragraphs explaining:

  • what your topic is
  • when it is useful and/or why it is important
  • important information about the topic
  • sources for the information you provide and additional resources for learning more about the topic
  • a link to and brief description of the data used in the tutorial
  • the scope of your tutorial
  • the languages used in your tutorial
  • reasons, if any, you could not obtain the same results from all three language examples.

Examples

All three examples should follow a common outline to the extent possible. Deviations should be due to limitations or stylistic differeneces in the languages chosen rather than lack of coordiation among group members. Where deviations do occur, please explain and justify them in the text surrounding the example.

It is permissible for one or more examples to extend beyond the scope of others provided that all three examples share a common core set of tasks.

You may use a language (i.e. Python, Matlab) not taught in this course for one or more examples as long as I approve it in your proposal.

Git

Groups should use git to coordinate their work. Each member of the group should create an account at github.com. One group member should create a public repository for the project with others submitting pull requests to them. Your git repo is considered part of the final submission and should include at minimum:

  • sources files for your tutorial including all examples
  • Rmd files should be included
  • scripts should be included
  • an html page should be included
  • a readme
  • a minimum of two commits per team member.

Excluding extraordinary circumstances, all group members will receive the same grade. However, I reserve the right to modify this policy in cases where one or more group members clearly put in less effort than the others.

Timeline

  1. Topic Proposals due Friday November 16 at 4pm: Your group must have your proposal approved by me prior to this time. Groups are required to select unique topics so it is to your benefit to submit early.

  2. Draft Due Date: Tuesday November 27 at 4pm. Drafts should be mostly complete and contain a concise to-do list of outstanding items. Submit drafts to Canvas as a zip archive and link to the official git repo.

  3. Peer Review: Due Monday December 3 at 5pm. You will be asked to provide constructive feedback to another group. Guidelines for how to structure this feedback will be provided.

  4. Final Due Date: Friday December 7 at 5pm. This is the deadline to submit the final version of your tutorial. Please submit as a zip archive to canvas with html, Rmd, and (possibly) scripts or data files included.
    Please make edits in response to peer feedback.

Approved Group Proposals

I will post group proposals here as they are approved. Two groups may not choose the same or closely related topics. The two topics below are also reserved.

If you are repeating a topic from last year, please be clear how your tutorial will add value to what was done previously.

Reserved

  1. Reshaping data between long and wide formats

  2. The “split-apply-combine” pattern

Approved

  1. Group 1 -

  2. Group 2 -
    • Topic: Factor Analysis;
    • Data: bfi;
    • Languages: Python (sklearn), R (psych), Stata
    • Tutorial
  3. Group 3 -
    • Topic: Parametric and Non-Parametric ANOVA;
    • Data: Rat Survival;
    • Languages: Matlab, R, Stata
    • Tutorial
  4. Group 4 -
    • Topic: Ordinal Logistic Regression;
    • Data: Soup;
    • Languages: R (ordinal), SAS, Stata
    • Tutorial
  5. Group 5 -
    • Topic: Logistic Regression with Model Diagnostics;
    • Data: Pima;
    • Languages: Python (sklearn, pysal), R, Stata
    • Tutorial
  6. Group 6 -
    • Topic: Truncated negative binomial regression;
    • Data: Abalone;
    • Languages: R, SAS, Stata
    • Tutorial
  7. Group 7 -
    • Topic: Clustering using Finite Mixture Models;
    • Data: Wine;
    • Languages: Python, R, Stata
    • Tutorial
  8. Group 8 -
    • Topic: Fixed effects models;
    • Data: Cigar;
    • Languages: R (plm), SAS (proc reg), Stata (xtreg, areg, reghdfe)
  1. Group 9 -
    • Topic: Cubic splines regression;
    • Data: uswages;
    • Languages: Python (statsmodels), R (splines), Stata (mkspline)
    • Tutorial
  2. Group 10 -
    • Topic: (Divisive) Hierachical Clustering;
    • Data: black-friday;
    • Languages: Python (sklearn.cluster), R (hclust), Stata (cluster)
    • Tutorial
  3. Group 11 -

    • Topic: Linear Discriminant Analysis (LDA);
    • Data: Seeds;
    • Languages: R, SAS, Stata
    • Tutorial
  4. Group 12 -
    • Topic: Multinomial Logistic Regression (all vs reference);
    • Data: iris;
    • Languages: R (multinom), SAS (proc logistic), Stata (mlogit)
    • Tutorial
  5. Group 13 -
  6. Group 14 -
    • Topic: Probit Regression;
    • Data: Mroz;
    • Languages: Python (StatsModels), R (glm), SAS, Stata
    • Tutorial
  7. Group 15 -
    • Topic: (Agglomertive) Hierarchical Clustering using Average Linkage;
    • Data: USArrests;
    • Languages: Python (sklearn, scipy), R (cluster), Stata (cluster)
    • Tutorial
  8. Group 16 -
    • Topic: Linear Mixed Effects Models;
    • Data: PM2.5;
    • Languages: Python (Statsmodels), R (lme4 and glmmADMB)
    • Tutorial
  9. Group 17 -
    • Topic: Multidimesional Scaling;
    • Data: 93cars
    • Languages: Stata, R (dplyr + cmdscale, data.table + smacof)
    • Tutorial
  10. Group 18 -
  11. Group 19 -
    • Topic: Model Selection and Diagnostics in Linear Regression;
    • Data: Insurance;
    • Languages: Matlab, Python, R
    • Tutorial
  12. Group 20 -
  13. Group 21 -
    • Topic: Monte-Carlo simulation of Portfolio Stock Returns;
    • Data: Stock prices for Apple, Facebook, and Google 11/14/2017 - 11/14/2018;
    • Languages: Matlab, Python, R
    • Tutorial