About SAS

SAS is a both a programming language and a collection of data analysis routines. It is closed source commercial software widely used by industry. For instance, SAS promotional materials claim 83,000 installations including most of the top 100 companies from the Fortune 500. It is also quite popular in biostatistics and in the healthcare industry.

SAS is primarily a declarative rather than an imperative language. In other words, you tell SAS what you wish to accomplish and let the program figure out how to accomplish it.

Another feature of SAS is that it is designed to work efficiently with data on disk rather than in RAM, unlike R or Stata.

SAS programs

A SAS program typically consists of two types of code blocks:

  • data steps create and manipulate data tables
  • proc steps carry out some analytic or data management procedure.

SAS also has the capabilities to define macros and variables. As in Stata, macros in SAS work through string substitution.

Accessing SAS

You have several options for accessing SAS for learning and assignments.
All examples shown in class will use SAS in batch mode as that is the way I primarily use it.

Batch Mode

You can use SAS in batch mode on GreatLakes (from within the campus network):

ssh greatlakes.arc-ts.umich.edu 
module load sas
sas example0.sas -log example0.log

When we run SAS in batch mode, the SAS program has extension .sas. After running this file a .log file will be created with the code run and messages from the SAS program, and and displayed or printed output will end up in a file with extension .lst. These are all plain text files, so you can view them with a page viewer such as less.

Command Line Mode

You can also use SAS in an interactive “line” mode. To do this on the SCS servers, invoke SAS with the -nodms option:

sas -nodms

Some procedures, such as proc import, attempt to create an additional window. This will cause an error if graphical forwarding is not set up. To prevent this you can add the -noterminal option when invoking SAS at the command line.

sas -nodms -noterminal

To exit this command line interface, use the statement endsas;.

Graphical User Interfaces

SAS offers a free “University Edition” for academic use.

You can also access SAS using midesktop through the UM computing service.

Resources

Examples

Examples

Several of our examples are based on Professor Shedden’s 2016 course notes.

All of the examples discussed below can be found at the git repo Stats506_F20 under examples/sas. To run the examples, you will need to download data to the examples/sas/data folder yourself.

Writing A Basic SAS program

This video explains the basics of a SAS program and how to write one using SAS studio.

Here are some key points to keep in mind:

  • Most SAS programs are composed of data and proc steps.
  • SAS statements are delimited by a semicolon ;.
  • A run; statement tells SAS to execute a block of code.
  • After code is run, a log file contains information about its execution, including any errors. You may wish to think of this as containing “messages” and “warnings” as we think of them in R.
  • The role of the “data output” window in the video is played by a “listing” (.lst) file in batch mode.
  • SAS statements are not case sensitive.
  • SAS is primarily a declarative language.

Importing Data

Delimited data

In example 0 we import delimited data using a data step with an infile statement to parse a file and an input statement to specify the formats. We then run the contents and print procedures to examine the data set created.

Next, we use proc import to import a comma delimited copy of the 2009 RECS data and again explore it using proc print and proc contents.

You can read more about formats for SAS variables here.
Note that character style formats are preceded by $ and that all format types end with . with the exception of numeric types where the . can be followed by an integer d for decimal precision.

Fixed-width files

Another file format frequently used with SAS is a “fixed-width” file.
Here, rather than using a delimiter to separate columns each column has a standard or fixed width. In example 1 we read a fixed width file using a data step with an input and an infile statement.

The example, as posted, has several messages about invalid data in the log file.
Can you figure out how to resolve these?

File name pipes

In Professor Shedden’s notes, you can find a filename statement which uses a “pipe” to read data in a compressed format.

SAS export format

You have previously encountered the open XPT format for NHANES data.
Please see Professor Shedden’s notes for how to reference this file type within SAS.

Libraries

SAS uses a binary format sas7bdat for native data storage on disk. SAS also uses the concept of ‘libraries’ similar to how schema are used in SQL. The default library is named WORK and is set up in a temporary directory.
You create handles for libraries using a libname statement.

In example 2, we create a library handle mylib and save the RECS data to it after importing.

In example 3, we create a data table recs referencing the RECS data in .sas7bdat format downloaded from the EIA site. Note the additional metadata it contains relative to the version imported from CSV (by comparing the outputs from proc contents).

Subsetting data

In example 4 we create rural and urban subsets of the RECS data an save them to our library using data steps.

We then use a data step to find the last 5 rows of the recs data as imported from csv or read natively from sas7bdat to compare.

Descriptive Statistics

There are several procedures useful for obtaining descriptive statistics.

In example 5 we explore proc tabulate.

In example 6 we explore proc means, proc summary, and proc freq.

Split, apply, combine

An important difference between proc means and proc summary is that the former computes output to be printed to the listing file while the latter constructs a table of summary statistics. The latter is thus useful for implementing the “split, apply, combine” pattern of grouped aggregation. (There is an output statement in proc means that can be used to produce both.)

In example 7 we look for the state(s) with the highest proportion of wood-shingle roofs among single-family homes using proc summary.

Some notes about proc summary:

  • when we use a class statement observe that we see both group totals and an overall total differentiated by _TYPE_;
  • to use a by statement, the data must be sorted first;
  • when we use a by statement, we do not get an overall total.

In example 7, we also make use of proc format to create a format state that we later use to print nice values for the REPORTABLE_DOMAIN variable. By specifying a format library, we add those values to a sas format data set with extension .sas7bcat. In a later example we will reuse those formats after setting the fmtsearch option to include this format library.

Using SQL in SAS

(Note: SAS is being introduce before SQL this year. You may wish to review this section after the SQL notes rather than now.)

SAS has a procedure proc sql which allows you to form SQL like queries within SAS. This can be more efficient than similar programs constructed using multiple proc and data steps.

In example 8, adapted from Professor Shedden’s notes, we use proc sql to find all single family homes in the RECS data with mean ‘heating degree days’ above 2000.

Then in example 9, we use proc sql to repeat the analysis of finding the “States” with the highest proportion of wood-shingled roofs. Following the analysis, we use proc export to write the resulting table to csv.

Data step programming

While there are many useful procedures in SAS, custom analyses often involve data manipulations done using multiple data steps. This is often called “data step programming”.

In example 10, we use data step programming along with proc sort and proc summary to find the percent of single family homes within each census region more than one standard deviation above the mean electrical usage for that region using the RECS data.

This example makes use of a technique called “re-merging” to add group-level summary statistics as variables in our data. The basic idea of “re-merging” is the following:

  1. first, compute group-level summary statistics and store in a new table;

  2. then, merge this table (e.g. left join) back into the table it came from using the grouping variable from step 1 to identify common rows.

To merge, note that we use a data step with a merge statement identifying the data sets to be merged and a by statement identifying the variable(s) to join on.

Example 12 repeats example 10 using the RECS sample weights and a better programming style.

Exercise

Repeat example 10 using proc sql. You can find a solution to this exercise on the course repo (as example11.sas).

Case Study

In the case studies folder, you will find a short case study fitting a linear mixed model to the sleepstudy data from R’s lme4 package. This case study illustrates:

  1. use of proc mixed,
  2. using the ods system to create sas tables with components from models fit using proc mixed,
  3. use of macro variables using the %let construction,
  4. use of sas macros, which are similar to user-defined functions in R with the exception that they work by text substitution.