x = rnrom(n = 5e3)
y = sample(LETTERS, size = 5e3, replace = TRUE)
z = factor(y)
mat = matrix( 0L, nrow = 1e4, ncol = 5e3 )
libname
statements and two part file names (active)%let var = 'text';
and "&var."
(passive)proc summary
(passive)proc sql
(active)proc transpose
(passive)The script below reads in data related to hospital stays among Medicare patients in 2014 and 2016 and then limits to rows related to pneumonia or other respiratory infections. After sub-setting, each row represents a hospital (“hosp_id”) and a particular type of hospital stay (“drg”). The script computes z-scores for each hospital comparing the difference in proportions for one type of stay (“drg” = 178) to the others between the two years.
Note: This is based on question 4 from the 2018 final exam.
List all tables written to disk (in non temporary locations) as sas7bdat files after running the script below. For each table, also give the number of columns it contains.
List all tables in the work
library after running this script.
Rewrite the chunks labeled “merge” and “Compute totals, remerge, and create pooled estimates” using proc sql
.
Find all missing comments indicated as [Comment X]
and provide an appropriate, descriptive comment.
/* libraries: ------------------------------------------- */
%let datpath = '~/path/to/project/data';
libname mylib "&datpath.";
%let respath = '~/path/to/project/results';
libname reslib "&respath.";
/* Import the data: ------------------------------------- */
proc import datafile='./data2014.csv' out=ip14;
proc import datafile='./data2016.csv' out=ip16;
run;
/* [Comment 1]: ----------------------------------------- */
data mylib.p14;
set ip14;
where drg in (178, 179, 185);
n14 = total_discharges;
keep drg hosp_id n14;
data mylib.p16;
set ip16;
where drg in (178, 179, 185);
n16 = total_discharges;
keep drg hosp_id n16;
run;
/* Merge: ----------------------------------------------- */
data pneum;
merge mylib.p14 mylib.p16;
by drg hosp_id;
/* Compute totals, remerge, and create pooled estimates:- */
proc summary data = pneum;
by hosp_id;
output out = totals
sum(n14) = tot14
sum(n16) = tot16;
/* Remerge and create pooled estimate*/
data pneum;
merge pneum totals;
by hosp_id;
p_pool = (n14 + n16) / (tot14 + tot16);
run;
/* Compute z-scores: ------------------------------------ */
data result;
set pneum;
where drg = 178 and p_pool lt 1; /* lt = < */
p14 = n14/tot14;
p16 = n16/tot16;
z = (p16 - p14)/(p_pool*(1-p_pool)*(1/n14 + 1/n16));
run;
/* [Comment 2]: ----------------------------------------- */
proc sort data=reslib.result;
by descending z;
run;
/* 60: -------------------------------------------------- */
A solution to the proc sql
question:
/* merge: ----------------------------------------------- */
proc sql;
create table pneum as
select *
from mylib.p14
left join mylib.p16
on p14.hosp_id = p16.hosp_id and p14.drg = p16.drg;
quit;
run;
/* compute totals, remerge, and create pooled estimates:- */
proc sql;
/* Compute totals */
create table totals as
select hosp_id, sum(n14) as tot14, sum(n16) as tot16
from pneum
group by hosp_id;
/* Remerge and create pooled estimate*/
create table pneum as
select *, (n14 + n16) / (tot14 + tot16) as p_pool
from pneum p
left join totals t
on p.hosp_id = t.hosp_id;
quit;
run;
generate
, replace
, keep
, etc.local myvars "a b c"
; `myvar'
(active)preserve
/ restore
(passive)reshape
(passive)See question 6 from the Fall 2018 midterm
Question 2 from the Fall 2018 final.
The example SAS question above.
filter
, mutate
, select
, transmute
, group_by
, summarize
, left_join
(active)See above or any prior midterm/final exam.
dt[i, j, by]
(active)by
vs keyby
(passive)j
is/returns a list, the result is a data.table. (concept)dt[ , new := comp(old) ]
(active).N
, .SD
, and use of .SDcols
argument (passive)dcast
(wider), melt
(longer)mclapply
and the role of mc.preschedule
(active)future
and its value
, blocking (active)