The course project is worth 20% of the course grade. It will be graded out of 200 points evenly split between group and individual components. Read more about each part below.
The open cities data for this question can be found here.
In this component, you should pose two clearly-defined research questions relating to the open city data sets above. You will then answer each question yourself, using one or more data sets to support your arguments.
You must use at least three data sets in your solutions. You may either pose two questions related to the same group of three or more of these data sets, or pose one question based on a single data set and then a second utilizing a group of two or more data sources.
Your questions should be limited in their overall scope, similar to a single question from one of the first three problem sets. Your answer to each question - including any text, tables, and graphs - should be a single typed page, submitted as a pdf via Canvas. You should also submit the code you write to answer your questions. Your code can use any of the software packages we have learned this semester. Code should follow the style guidelines and be reasonably concise and efficient. Submit code as plain text (with .txt extension) with the software used clearly labeled in the header.
There are three due dates for this component:
Project Proposals Tuesday November 21 at 2pm: You must have the questions you pose approved by me. Your proposal should include two well-posed questions, each followed by a brief (1-3 sentence) description of the data sets, approach, and software you will used to answer it. Your proposal should be sent to me in the body of an email with subject header: “Stats 506 Individual Project Proposal”. I reserve the right to deny proposed questions that are too similar to those already submitted; it is to your benefit to submit your proposal in advance of the deadline.
Draft Due: Tuesday December 12 at 9am via Canvas. We will engage in a peer-review process in class this day. Please bring three printed versions of your submissions with you. You will have an opportunity to revise your submission based on the feedback you receive.
If your draft is incomplete in any way, please include placeholders or an outline with a “to do” list.
The group component will be completed in groups of three. Groups have been assigned randomly and will be posted to Canvas.
Each group will choose a data management, analysis, or visualization technique and produce a tutorial on the selected topic. The tutorial should include:
A short description or explanation of the topic;
A real data set on which to demonstrate the technique;
Three approximate translations of the selected example. These translation must use at least two software languages (Stata, R, SAS, etc). At least one version must be in R. Up to two version may be in R, provided they make use of different packages (i.e. one could use base R and another data.tables or dplyr).
Topics should be of similar scope to the UCLA page here.
Each group should write a short proposal containing: the topic for the tutorial and the languages/packages they will use for the examples. Each member of the group should assume primary responsibility for an example script and these responsibilities should be included in your proposal.
The group liaison should submit the groups’ proposal to me via email with the subject header “Stats 506 Group Project Proposal”.
The final tutorial should also be submitted to me via email as a stand alone html page, such as generated by R markdown.
I plan to post your submissions to this page – you may include or omit your name from the author information at your own discretion.
Topic Proposals due November 22 at 4pm: Your group must have your proposal approved by me prior to this time. Groups are required to select unique topics so it is again to your benefit to submit early.
Draft Due Date: Friday December 8 at 4pm. Drafts should be mostly complete and contain a concise to-do list of outstanding items.
Final Due Date: Wednesday December 20 at 7am. This is the deadline to submit the final version of your tutorial. Please submit as a webpage to me via email.
I will post group proposals here as they are approved. Two groups may not choose the same or related topics. The first two topics below are also reserved:
Reshaping Data between long and wide formats
The “split-apply-combine” pattern
MASS::rlm
), Stata, Matlabfaraway::seatpos
(R); Languages: R, SAS, and PythonMASS::Boston
); Languages: Matlab (‘Neural Net Clustering toolbox’), Python (‘Sklearn’), R (‘Neuralnet’)bcp
& changepoint
) and SAS (proc mcmc
)princomp()
), Stata