Complete all questions of the assignment below and submit to Canvas by the due date. Remember, if you are using late days you should submit a draft of the assignment by the due date and leave a comment indicating how many late days you want to use.
For this problem set, you should submit source code as a plain text markdown script (with extension .md
) and an associated Jupyter notebook (with extension .ipynb
). Use Jupytext to associate the two files.
Questions on this and future problem sets may ask you to use concepts or ideas that have not been discussed in class. One of the goals of these assignments is to help you learn to be independent, read documentation, and otherwise make reasonable decisions about how to analyze and present data, code, or other data science material.
You may discuss the problem set and its solution with your peers, but you are required to work independently on the files to be submitted and to submit your own original work. If you use or closely follow patterns or code from sources other than the course notes or texts you should cite the source.
In addition to the content of your submission, you will be graded on the quality of your source code and the professionalism of your notebook file.
Maintain a consistent and literate style in your markdown script and work to make your notebook look professional and well polished. Follow all style rules from previous problem sets.
In this question you will create a GitHub account and create a repository for the work you’ve done in the course.
If you don’t already have one, create a GitHub account. Make sure your username is professional and appropriate – I recommend new users use their umich unique name as their username.
Create a repository Stats507
(or similar if you have one of that name already). It should be a public repository to facilitate grading.
All you need to include in your notebook is a link to this repository.
For each part in this question, please succinctly record the steps you take – through a web browser or at the command line – to complete each part.
Provide enough detail that someone without prior knowledge of git could emulate the steps for themselves, but otherwise be concise and omit details that are only relevant to (e.g.) how your local file tree is setup.
Extract your code from PS2, Question 3 into a stand alone script or notebook and add it to the repo.
Create a README that briefly documents the purpose of the repo created in the warmup. In the README, briefly document the script you included for the previous part - state what it dose and for what purpose. The README should include a link to this file. Hint: Use a local file path for the link.
Commit the changes from the previous step and push them to the remote. Include a direct link to the commit from the remote’s history in your write up.
Create a branch named “ps4”. Checkout that branch and edit the file from step 3 to include “Gender” as you did for PS4 Q1. Commit these changes to the branch and create an upstream branch on GitHub to track this branch. Don’t delete the branch (at least until after the assignment is graded).
Merge the “ps4” branch into the “main” branch. Include a direct link to the commit from the remote’s history in your write up.
In this you question will extract your notes on a Panda’s topic from PS4, Question 0 to a script of their own. Then, you will collaborate to aggregate these into a single document for the course.
In your Stats507 repo make a folder called “pandas_notes”.
Extract your PS4, Question 0 topic tutorial and copy it into a script called “pd_topic_XYZ.py” replacing XYZ with your UM unique name. Include your name and UM email on a title “slide” (markdown cell) if you don’t have one already. Include a link in your writeup to this file.
It’s okay if this next step takes longer than the November 12 due date. There are due dates for each stage of the tree to help us finish before the end of the term. Due dates are based on your “level” in the tree:
- Level 4 (root) - Tuesday, November 16
- Level 3 - Friday, November 19
- Level 2 - Tuesday, November 23
- Level 1 - Tuesday, November 30
- Level 0 - Friday, December 3.
Try to complete this question by your due date, so you don’t hold up the fan out. Please be professional and respectful in your communications with one another. If either person (“person1” or “person2”) above you on the tree have not completed their part by their due date, please reach out to those above them on the list.
Note - those at the roots (level 4) of each tree (group) have a slightly different assignment; they should clone the starter script and add their topic notes to it.
Note to those at level 0 – please email Dr. Henderson with a link when you’ve completed your portion.
This question is optional. If you lost points for style, lack of professional “polish” in your notebook or any other easily correctable mistake on any of the graded problem sets may correct them here.
You may correct these types of mistakes on (up to) any 2 assignments. You may not, however, completely redo a question or a major part of a question. For example, you will not receive credit for replacing your solution with the official solution or a peer’s solution. The instructors have final say on whether corrections are in the spirit of the question.
To receive credit for your corrections please follow the steps below.
Create a commit in your “Stats507” repository (from question 0) with the original version of your source files for the problem set(s) you are correcting.
Copy, paste, and clearly label the GSI comments from Canvas into your solution for this question. Clearly describe which comments you are making corrections for.
Make your corrections and commit them to your Stats507 repo. Include a direct link to the commit (showing the diff) in your submission. All changes should be in a single commit to facilitate this – you may wish to work in a branch if you need to make intermediate commits.
Do not include the corrections in the submission – just link to the commit showing the corrections.