SAS is closed source commercial software widely used by industry. For instance, SAS promotional materials claim 83,000 installations including most of the top 100 companies from the Fortune 500. It is also quite popular in bio statistics and the healthcare industry.
Our examples will largely be based on Professor Shedden’s 2016 course notes and case studies. Please his course notes as your primary resource and think of the below as supplementary examples.
You have several options for accessing SAS for learning and assignments. All examples shown in class will use SAS in batch mode as that is the way I primarily use it.
You can use SAS in batch mode on the scs servers:
ssh luigi.dsc.umich.edu
sas Example0.SAS -log Example0.log
Several versions of SAS are also available on Flux (module load SAS
).
SAS offers a free “University Edition” for academic use.
You can also access SAS using midesktop through the UM computing service. You will need to figure out the details yourself if you choose this route.
You can download all of the examples and data shown below as a tar ball from www-personal.umich.edu/~jbhender/sas_examples.tgz.
From the SCS servers, you can also copy this directly from my public AFS space:
cp //afs/umich.edu/user/j/b/jbhender/Public/html/sas_examples.tgz ~/
To extract the files from the archive use:
tar xvfz sas_examples.tgz
This will create a folder ‘SAS’ so be sure you do not already have a folder of that name in the directory you extract to.
This video explains the basics of a SAS program and how to write one using SAS studio.
Here are some key points to keep in mind:
This script uses proc import
to import a comma delimited copy of the RECS data. We then use proc print
and proc contents
to explore it.
SAS uses a binary format sas7bdat
for native data storage on disk. SAS also uses the concept of ‘libraries’ similar to how schema are used in SQL. The default library is WORK
set up in a temporary directory. You create handles for libraries using a libname
statement.
In example 1, we create a library handle mylib
and save the RECS data to it after importing.
In example 2, we create a data table recs
referencing the RECS data in sas7bdat format downloaded from the EIA site. Note the additional metadata it contains relative to the version imported from CSV.
In example 3 we create rural and urban subsets of the RECS data an save them to our library using “data
” steps.
There are several procedures useful for obtaining descriptive statistics.
In example 4 we explore proc tabulate
.
In example 5 we explore proc means
, proc summary
and proc freq
.
An important difference between proc means
and proc summary
is that the former computes output to be printed to the listing file while the latter constructs a table of summary statistics. The latter is thus useful for implementing the “split, apply, combine” pattern. We revisit the “roof types” problem in example 6.
You can download all of the examples below as a tar archive from
You can download all of the examples and data shown below as a tar ball from www-personal.umich.edu/~jbhender/sas_day2.tgz.
Compressed data can be read using a filename
statement with a pipe as in this example.
In example 7 above, we used “modified list input” to import (compressed) delmited data. The colon “:” is used to separate variable names from optional formats. Note that the formats end in a period “.” as before.
A fixed width file uses location rather than delimiters to separate variables into columns. Fixed-width files can be read into SAS using an infile
statement followed by an input
statement to specify columns and variable names.
In the example below we use “column” input to import weather station data in fixed width format.
Please review Professor Shedden’s example. You will need this for the problem set.
Complex analyses can be done using multiple data steps. In the example below, we use the RECS data to find the percent of single family homes within each census region more than one standard deviation above the mean electrical usage.
SAS has a procedure proc sql
which allows you to form SQL like queries within SAS. This can be more efficient than similar programs constructed using multiple proc and data steps.
The example below from Professor Shedden’s notes uses proc sql
to find all single family homes with ‘heating degree days’ above 2000.
In the next example, we use proc sql
to repeat the analysis of finding the “States” with the highest proportion of wood-shingled roofs.
Exercise: Use proc sql
to carry out the analysis from example 9. Solution