About SAS

SAS is a both a programming language and a collection of data analysis routines. It is closed source commercial software widely used by industry. For instance, SAS promotional materials claim 83,000 installations including most of the top 100 companies from the Fortune 500. It is also quite popular in biostatistics and the healthcare industry.

As with SQL, SAS is primarily a declarative rather than imperative language. In other words, you tell SAS what you wish to accomplish and let the program figure out how to accomplish it.

Another feature of SAS is that it is designed to work efficiently with data on disk rather than in RAM, unlike R or Stata.

SAS programs

A SAS program typically consists of two types of code blocks:

  • data steps create and manipulate data tables
  • proc steps carry out some analytic or data management procedure.

SAS also has capabilities to define macros and variables though we will not cover these here.

Examples

Our examples are largely based on Professor Shedden’s 2016 course notes and case studies.

Accessing SAS

You have several options for accessing SAS for learning and assignments.
All examples shown in class will use SAS in batch mode as that is the way I primarily use it.

Batch Mode

You can use SAS in batch mode on the scs servers:

Several versions of SAS are also available on GreatLakes (module load SAS).

When we run SAS in batch mode, the SAS program has extension .sas. After running this file a .log file will be created with the code run and messages from the SAS program, and and displayed or printed output will end up in a file with extension .lst. These are all plain text files, so you can view them with a page viewer such as less.

Command Line Mode

You can also use SAS in an interactive “line” mode. To do this on the scs servers, invoke SAS with the -nodms option:

Some procedures, such as proc import, attempt to create an additional window. This will cause an error if graphical forwarding is not set up. To prevent this you can add the -noterminal option when invoking SAS at the command line.

To exit this command line interface, use the statemetn endsas;.

Graphical User Interfaces

SAS offers a free “University Edition” for academic use.

You can also access SAS using midesktop through the UM computing service. You will need to figure out the details yourself if you choose this route.

If you visit GreatLakes from a web browser, you can use a graphical SAS interface there.

Resources

Examples

All of the examples shown below can be found at the git repo Stats506_F18 under examples/sas. To run the examples, you will need to download data to the examples/sas/data folder yourself.

Writing A Basic SAS program

This video explains the basics of a SAS program and how to write one using SAS studio.

Here are some key points to keep in mind:

  • Most SAS programs are composed of “data” and “proc” steps.
  • SAS statements are delimited by a semicolon “;”.
  • A “run;” statement tells SAS to execute a block of code.
  • After code is run, a log file contains information about its execution including any errors. You may wish to think of this as containing “messages” and “warnings” as we think of them in R.
  • The role of the “data output” window in the video is played by a “listing” (.lst) file in batch mode.
  • SAS statements are not case sensitive.
  • SAS is primarily a declarative language.

Importing Data

Delimited data

In example 0 we import delimited data using a data step with an infile statement to parse a file and an input statement to specify the formats. We then run the contents and print procedures to examine the dataset created.

We then use proc import to import a comma delimited copy of the 2009 RECS data and again explore it using proc print and proc contents.

You can read more about formats for SAS variables here. Note that character style formats are preceded by $ and that all format types end with . with the exception of numeric types where the . can be followed by an integer d for decimal precision.

Fixed-width files

Another file format frequently used with SAS is a “fixed-width” file.
Here, rather than using a delimiter to separate columns each column has a standard or fixed width. In example 10 we read a fixed width file using a data step with an input and infile statement.

The example, as posted, has several error messages about invalid data in the log file. Can you figure out how to resolve these?

File name pipes

In Professor Shedden’s notes, you can find a filename statement which uses a “pipe” to read data in a compressed format.

SAS export format

You have previously worked with the open XPT format in Stata. Please see Professor Shedden’s notes for how to reference this file type within SAS.

Libraries

SAS uses a binary format sas7bdat for native data storage on disk. SAS also uses the concept of ‘libraries’ similar to how schema are used in SQL. The default library is WORK set up in a temporary directory. You create handles for libraries using a libname statement.

In example 1, we create a library handle mylib and save the RECS data to it after importing.

In example 2, we create a data table recs referencing the RECS data in sas7bdat format downloaded from the EIA site. Note the additional metadata it contains relative to the version imported from CSV.

Subsetting data

In example 3 we create rural and urban subsets of the RECS data an save them to our library using “data” steps.

We then use a data step to find the last 5 rows of the recs data as imported from csv or read natively from sas7bdat to compare.

Descriptive Statistics

There are several procedures useful for obtaining descriptive statistics.

In example 4 we explore proc tabulate.

In example 5 we explore proc means, proc summary and proc freq.

Split, apply, combine

An important difference between proc means and proc summary is that the former computes output to be printed to the listing file while the latter constructs a table of summary statistics. The latter is thus useful for implementing the “split, apply, combine” pattern of grouped aggregation. In example 6 we look for the state with the highest proportion of wood-shingle roofs using proc summary.

Some notes about proc summary:

  • when we use a class statement observe that we see both group totals and an overall total;
  • to use a by statement, the data must be sorted first;
  • when we use a by statement, we do not get an overall total.

In example 6, we also make use of proc format to create a format state that we later use to print nice values for the REPORTABLE_DOMAIN variable. By specifying a format library, we add those values to a sas format dataset with extension .sas7bcat. Later, in example 8 we will reuse those formats after setting the fmtsearch option to include this format library.

Using SQL in SAS

SAS has a procedure proc sql which allows you to form SQL like queries within SAS. This can be more efficient than similar programs constructed using multiple proc and data steps.

In example 7, taken from Professor Shedden’s notes, we use proc sql to find all single family homes in the RECS data with mean ‘heating degree days’ above 2000.

Then in example 8, we use proc sql to repeat the analysis of finding the “States” with the highest proportion of wood-shingled roofs. Following the analysis, we use proc export to write the resulting table to csv.

Data step programming

While there are many useful “procs” in SAS, custom analyses often involve data manipulations done using multiple data steps. This is often called “data step programming”. In example 9, we use data step programming along with proc sort and proc summary to find the percent of single family homes within each census region more than one standard deviation above the mean electrical usage for that region using the RECS data.

In example 9, we make use of a technique called “re-merging” to add group-level summary statistics as variables in our data. The basic idea of “re-merging” is the following:

  1. first, compute group-level summary statistics and store in a new table;

  2. then, merge this table (e.g. left join) back into the table it came from using the grouping variable from step 1 to identify common rows.

To merge, note that we use a data step with a “merge” statement identifying the datasets to be merged and a “by” statement identifying the variable(s) to join on.

Exercise

Repeat example 9 using proc sql. You can find a solution to this exercise on the course repo.

Case Study

In the case studies folder, you will find a short case study fitting a linear mixed model to the sleepstudy data from R’s lme4 package. In that case study, we illusrtate:

  1. use of proc mixed,
  2. using the ods system to create sas tables with components from models fit using proc mixed,
  3. use of macro variables using the %let construction,
  4. use of sas macros, which are similar to user-defined functions in R.