A version control system is a tool for managing changes to files over time by efficiently tracking all modifications. One of the most popular tools for version control is “git”. Another version control tool called subversion or SVN has also been widely used, but git has become a de facto standard. Git generally is already installed on most Linux-like systems.
Version control is essential for projects with multiple people editing the same collection of code. However, it can also help individual users more effectively and efficiently manage and document their scripts and programs.
Read more about version control and git here. In particular, you should read:
Read this post on writing better commit messages.
Optional reading:
To begin tracking an existing project, move to the top folder in the project tree and type:
To create a local copy of an existing repository:
For example, to clone the “Stats506_F20” repository from gihtub.com:
When cloning repositories to which you have push access, I suggest you append your username to the url like below:
This will ensure you are granted push permissions and prompted for a password. This is not necessary when using ssh keys (see below).
Git is a “distributed version control system” meaning that all copies are “local”.When you begin a version controlled project using git, the project folder is itself a repository. You do not need a remote repository to use git.
However, git is used most effectively with a remote repository. Remote git repository services such as github or bitbucket are effective means for both backing up and sharing the code you write. It is also a great way to manage work across multiple computers, such as a personal laptop and one or more servers.
The following commands are useful for working with a remote repository.
To link a local and remote repository, use git remote
:
If you haven’t made any local commits and want to bring your local copy up to date with the remote repository use git pull
. To retrieve remote changes without merging into the local repository use git fetch
followed by git status
or git diff
to see changes that have been made. You can then use git merge
or git pull
to finish merging.
This example shows how to setup a git remote in your afs space.
Try the example on your own. Use this repository to practice version control while doing your assignments.
Add, commit, and push the R and Rmarkdown templates from: https://github.com/jbhender/Stats506_20
506
repoWhen using a remote code repository like GitHub or Bitbucket or even AFS be careful not to submit any protected or sensitive data. When working with such data, it is especially important to use git status
before git commit
or git push
to ensure you know exactly what will be uploaded to a remote repository.
Similarly, when working on a project with someone else, be sure to check whether there is any concern about sharing the code in a public or private repository.
As with most programs, you can set preferences in git to improve your work flow. Specifically, this can be done from the command line using git config
. Here are a few options you set everywhere you use git:
user.name
- the username for your remote repository,email
- associated with your remote repository,core.editor
- your preferred text editor (defaults to vi).Setting your user name and email allows you to omit explicitly specifying them when using git push
and git pull
to interact with your remote repository.
To set your default editor to emacs
(substitute your choice of editor) use:
This could also be done in the shell by exporting the EDITOR
environment variable,
but doing so will affect all programs that rely on the EDITOR
variable.
When making these changes be aware that they will only apply within the environments that share the same .gitconfig
file. You may need to repeat these steps separately on the SCS servers, your personal computer(s), and the Great Lakes HPC environment.
.gitignore
At times you may find you have certain files that you want to keep in your local directory but not the remote repository for privacy or other reasons.
You can tell git to ignore such files by including a .gitignore
file at the top of your repositories file tree. For example, running
from the command line will create a .gitignore
file in My_Repo
telling git to ignore all files with the .csv
extension.
I’d recommend including the following in your .gitignore
at a minimum:
.*
: all hidden files*~
: temporary backups created by Emacs.DS_store
: if you ever work on Mac*.log
*.out
*.Rout
To track a file usually ignored by git use git add -f
.
.gitignore
file in your 506
repo with the extensions above.506
repo.git status
. Where does “test.tmp” appear?*.tmp
to your .gitignore
for 506
git status
. Why does “test.tmp” no longer show up?Read more about configuring git here.
For routine use of git from your personal computer, you may want to connect to your remote repository using ssh
. Using ssh keys you can communicate with remotes without supplying your password for each interaction. Read more here.
Those working in a Windows environment may be interested in the Git for Windows program which also includes a BASH shell.
There is also a GUI application GitHub Desktop that some may find useful.
The RStudio IDE also has support for Git. See the tab Git
near the Environment
tab.
I encourage you to use git
for your work in this class. However, please refrain from posting solutions to problem sets to public repositories created at GitHub or Bitbucket for this course prior to the due date. Instead, set up a private repo.
After the solutions are released, I suggest you do post your (possibly corrected) code to a public repo and include a concise description of what you’ve learned in a README.md
written in markdown. Your repository can serve as a sort of “digital portfolio” for your computing and analysis skills that you can share with potential employers.