Consider the 2014 flights data used for the data.table
examples. For each code snippent first provide a single-setence summary of the value(s) being computed. Make your answer as concise and descriptive as possible.
data.table
expression:tab1 =
nyc14[ , .(n = .N), by=.(origin, dest)] %>%
.[, .(origin, n = n, pct = n / sum(n)), by=dest] %>%
.[pct > .75] %>%
.[order(-pct, dest)]
This code finds all destination cities for which more than 75% of the flights are from a single origin airport.
dplyr
syntax.nyc14 %>%
group_by(dest, origin) %>%
summarize(n=n()) %>%
mutate(pct = n/sum(n)) %>%
filter(pct > .75) %>%
arrange(-pct, dest)
## # A tibble: 47 x 4
## # Groups: dest [47]
## dest origin n pct
## <chr> <chr> <int> <dbl>
## 1 ABQ JFK 278 1
## 2 ACK JFK 277 1
## 3 AGS LGA 3 1
## 4 ALB EWR 169 1
## 5 ANC EWR 13 1
## 6 AVP EWR 1 1
## 7 BDL EWR 210 1
## 8 BGR LGA 274 1
## 9 BHM LGA 124 1
## 10 BUR JFK 289 1
## # ... with 37 more rows
dplyr
code snippet below.tab2 =
nyc14 %>%
group_by(origin, dest, carrier) %>%
summarize(n = n()) %>%
filter(n >= 80) %>%
group_by(origin, carrier) %>%
summarize(n = n()) %>%
arrange(origin,-n)
Provide a one-sentence summary of what is being computed. For each carrier and origin airport, this code counts the number of destination cities with at least 80 observed flights.
Provide a translation using data.table
syntax.
nyc14[, .(n=.N), by = .(origin, dest, carrier)
][n>=80
][, .(n=.N), by = .(origin, carrier)
][order(origin,-n)
]
## origin carrier n
## 1: EWR EV 47
## 2: EWR UA 29
## 3: EWR WN 8
## 4: EWR B6 7
## 5: EWR DL 4
## 6: EWR AA 3
## 7: EWR US 3
## 8: EWR VX 2
## 9: EWR AS 1
## 10: EWR MQ 1
## 11: JFK B6 42
## 12: JFK DL 22
## 13: JFK AA 15
## 14: JFK MQ 11
## 15: JFK VX 3
## 16: JFK UA 2
## 17: JFK US 2
## 18: JFK HA 1
## 19: JFK EV 1
## 20: LGA EV 26
## 21: LGA MQ 20
## 22: LGA DL 13
## 23: LGA WN 8
## 24: LGA B6 6
## 25: LGA AA 4
## 26: LGA US 4
## 27: LGA UA 3
## 28: LGA FL 2
## 29: LGA F9 1
## 30: LGA OO 1
## origin carrier n
nyc14_df = as.data.frame(nyc14)
tab3 =
with(
with(nyc14_df,nyc14_df[grepl('HOU',dest),]),
{
keys = paste(carrier,month,sep=':')
u = unique(keys)
n = sapply(u, function(key) sum(key==keys))
tmp = strsplit(u,':')
carrier = sapply(tmp,function(x) x[1])
month = sapply(tmp,function(x) x[2])
cr = unique(carrier)
ind = sapply(cr, function(x) grep(x,carrier, fixed=TRUE))
data.frame(carrier,month,n)[ind,]
}
)
This code computes the number of flights from NYC to HOU each month by carrier.
data.table
syntax.setkey(nyc14, dest)
nyc14['HOU', .(n = .N), by=.(carrier, month)][order(carrier,month)]
## carrier month n
## 1: B6 1 50
## 2: B6 2 47
## 3: B6 3 55
## 4: B6 4 54
## 5: B6 5 60
## 6: B6 6 59
## 7: B6 7 61
## 8: B6 8 62
## 9: B6 9 60
## 10: B6 10 62
## 11: WN 1 130
## 12: WN 2 110
## 13: WN 3 144
## 14: WN 4 144
## 15: WN 5 185
## 16: WN 6 199
## 17: WN 7 205
## 18: WN 8 194
## 19: WN 9 186
## 20: WN 10 192
dplyr
syntax.tab3c = nyc14 %>%
filter(dest=='HOU') %>%
group_by(carrier, month) %>%
summarize(n = n())