About this page
This page presents a collection of tables organizing commonly used commands from various statistical programming languages. The tables are organized according to Hadley Wickham’s model for a data science program:
This is a working document which will be updated throughout the course. If there are functions here you are unfamiliar with it would be a good idea to read the documentation in the corresponding languages.
Importing Data
| import data in native formats |
load() (.RData), data() |
- |
use (.dta), webuse, sysuse |
| saving data in native formats |
save |
- |
save |
| import delimited data (i.e. csv, tsv) |
read.csv(), read.table() |
readr: read_delim() |
import delimited |
| write/export delimited data |
write.csv(), write.table() |
readr: write_delim() |
export delimited |
Tidy
| reshape data from wide to long |
reshape |
tidyr: gather, spread |
reshape |
Model
| linear models |
lm |
- |
regress |
| generalized linear models |
glm(), family() |
- |
logit/logistic, poisson, etc. |
Visualize
| scatter plots |
plot(x,y) |
ggplot2: ggplot(aes(x=x,y=y)) + geom_point() |
twoway scatter |
Course Homepage