Scientifically Proven Cheatsheet to Becoming a Badass SAS Programmer

Quick Tips before we get started:

When you name a process flow “Autoexec”, SAS Enterprise Guide prompts you to run the process flow when you open the project.
To make it easier to see the log messages associated with the code you submit, you can set an option to hide the generated wrapper code.

A SAS Program is a sequence of steps that perform a specific task. There are two kinds of steps that make up a SAS Program.

DATA Steps

Typically reads data from an input source, processes it, and creates a SAS data set.
Can be used to create new variables

PROC Steps

Generates reports & graphs
Manages data
Sorts data

How to Add Comments in your SAS Programs:

Block Comment: / * */

Can be any length
Can contain internal semi-colons
Cannot be nested
Avoid placing in 1st and 2nd columns

Comment Statement: * … ;

Can begin in columns 1 and 2
Cannot contain internal semi-colons

Three Major Types of Files used in SAS:

Raw data files
SAS data sets
SAS Program files

SAS Libraries

A SAS library is the highest level of organization for information.

You refer to a SAS library by a logical name called a library reference name, or libref.

The libref references a particular physical location that the operating environment recognizes. So, think of the libref as a folder on your computer. Instead of your computer, it’s just a folder on the server.

Defining a Library

LIBNAME libref 'SAS-library' <options>;

Example:

LIBNAME BGlynn '/ServerName/FolderName/SubfolderName/';

It won’t be automatically available in the SAS session. You must assign a libref to a user-created library to make it available.
But once you do, BGlynn will be available for the duration of the current SAS session.

A valid libref must follow several rules:

Must have a length of 1-8 characters
Must begin with a letter or underscore
Must specify an existing folder because the LIBNAME statement does not create a new folder.

Accessing a SAS Library

%LET BellaFolder = /ServerName/FolderName/SubfolderName/;
LIBNAME BGlynn "&BellaFolder";

Any time you reference a macro variable within quotation marks, such as in the LIBNAME statement above, you MUST use double quotation marks.

If you want to create a report that displays the contents of a SAS library, use:

PROC CONTENTS DATA= BGlynn._ALL_;
RUN;

By default, a PROC CONTENTS report includes information about each individual data set in the library called the descriptor portion which can be very long. To remove it, just add nods at the end like:

PROC CONTENTS DATA= BGlynn._ALL_ nods;
RUN;

Databases	SAS	SQL
table	dataset	table
column	variable	attribute
row	observation	tuple

Comparison of Common Terminology used across platforms and programming languages

Data Values

Character	Numeric
Left-aligned	Right-aligned
Missing values: blank	Missing values: period
32,767 bytes is the largest possible length	Default Length: 8 bytes
	*A numeric variable of the default length can hold 16 or 17 significant digits

Symbol(s)	Mnemonic	Definition
=	EQ	equal to
^= -= ~=	NE	not equal to
> <	GT LT	greater than / less than
>= <=	GE LE	greater than or equal to & less than or equal to
	IN	equal to one of a list

Either the symbols or the mnemonic can be used
For IN, you must enclose the list in parentheses () and have quotations for character variables. Either a comma or a space can be used to separate the elements in a list.

Special Where Operators

Mnemonic	Definition
BETWEEN – AND	an inclusive range
WHERE SAME AND	augment a where expression
IS NULL	a missing value
IS MISSING	a missing value
LIKE	matches a pattern

If you are pulling data from a SQL database table, then IS NULL and IS MISSING will produce different results. IS NULL should be used for SQL.

Symbol	Replaces	Ex WHERE Clause	Returns
%	any number of characters	WHERE Name LIKE ‘T%’	Both Tom and Tommy
_	one character	WHERE Name LIKE ‘T_m’	Only Tom

Only Names in the table: Tom, Tommy

Formatting Data Values

FORMAT variable(s) format;

A FORMAT is an instruction that tells SAS how to display data values.
SAS Format Form: <$> format <w>.<d>
- $: indicates a character format and precedes the name of the SAS format
- <w>: where you specify the total format width, including the decimal places and special characters.
- . : the period is required syntax
- <d>: where you specify the number of decimal places in numeric formats

Format	Definition
$w.	writes standard character data
w.d	writes standard numeric data
COMMAw.d	writes numeric values with a comma that separates every 3 digits and a period that separates the decimal fraction
DOLLARw.d	same as above but with a leading $ sign
COMMAXw.d	writes numeric values with a period that separates every 3 digits and a comma that separates the decimal fraction
EUROXw.d	same as above but with a leading € sign

SAS Date Values are a special category of numeric values.
SAS stores date values as the number of days between January 1, 1960 and a specific date.

Format	Stored Value	Displayed Value
MMDDYY6.	11870	070192
MMDDYY8.	11870	07/01/92
MMDDYY10.	11870	07/01/1992
DDMMYY6.	11870	010792
DDMMYY8.	11870	01/07/92
DDMMYY10.	11870	01/07/1992
DATE7.	11870	01JUL92
DATE9.	11870	01JUL1992
WORDDATE.	11870	July 1, 1992
WEEKDATE.	11870	Wednesday, July 1, 1992
MONYY7.	11870	JUL1992
YEAR4.	11870	1992

Creating and Applying User-Defined Formats

To create and apply your own formats, you must use 2 PROC steps:

You use PROC FORMAT to create the user-defined format.
Then you run a FORMAT statement in a PROC PRINT step to apply the format to a specific variable.

When you create a user-defined format, you don’t associate it with a particular variable or data set. Instead, you create it based on values that you want to display differently.

PROC FORMAT;
    VALUE format-name value-or-range1 = 'formatted-value1'
                      value-or-range2 = 'formatted-value2'
                      ...;
RUN;

You can create multiple user-defined formats in the same PROC FORMAT step by simply specifying multiple VALUE statements.

Rules for Constructing a Format Name

Type	Format-name
character	$CTRYFMT
character	$_ST3FMT_
numeric	SALRANGE_FMT
numeric	_SALRANGE

A format name can have a maximum of 32 characters.
The name of a format that applies to character values must start with a $, followed by a letter or underscore.
The name of a format that applies to numeric values must begin with a letter or underscore.
A format name cannot end in a number.
All remaining characters can be letters, underscores, or numbers.
A user-defined format name cannot be the name of a SAS format.
The format name does not end with a period in the VALUE statement. However, when you refer to the format in a FORMAT statement, you will specify the period.

Specifying Ranges of Values Using the Tiers Format

PROC FORMAT;
    VALUE TIERS 20000 - 49999 = 'Tier 1'
                50000 - 99999 = 'Tier 2'
               100000 - 250000 = 'Tier 3';
RUN;

The TIERS format applies only to numeric values.

The value 99,999.87 falls outside of all ranges specified above. The ranges defined assume that the values of salary are stored as whole numbers. To create a set of ranges to include non-whole numbers, use: <-, -<, or <-<

	First Value	Symbol(s)	Last Value
exclude –>	50,000	<-	100,000
	50,000	-<	100,000	<– exclude
exclude –>	50,000	<-<	100,000	<– exclude

You can only use the < symbol to define a non-inclusive range.

Defining a Continuous Range

If you don’t know the highest and/or the lowest values in a range, use the keyword(s): LOW and/or HIGH

PROC FORMAT;
    VALUE TIERS LOW -< 50000 = 'Tier 1'
                50000 -< 100000 = 'Tier 2'
               100000 - HIGH = 'Tier 3';
RUN;

The LOW keyword can be used to define ranges that apply also to character values.

For character values, LOW treats missing values as the lowest possible value. However, this is not true for numeric values.

To Format a Date Like ‘YYYY-MM-DD’

PROC FORMAT;
    PICTURE mydate low-high = '%Y-%0m-%d'
              (datatype = date);
RUN;

Reading SAS Data Sets

Types of files that a DATA Step can read as input data:

SAS data sets
Excel files
Raw data files

To create a new data set from an existing SAS data set, just use a DATA step.

DATA output-SAS-data-set;  

    SET input-SAS-data-set;

RUN;

* Creates *

* Reads *

By default, a SET statement reads all observations and all variables from the input data set sequentially.

Using DROP and KEEP Statements in a DATA step

DATA WORK.subset1;
    SET BGlynn.sales;
    WHERE Country = 'USA;
    DROP Gender DateOfBirth;
    KEEP FirstName LastName Salary;
RUN;

If you use a KEEP statement, you have to write every single variable that you want to keep in the dataset.
Neither the DROP or the KEEP statement will have any effect on the input data set, i.e. BGlynn.sales

How SAS Processes the DATA Step

The Compilation Phase
- SAS scans each DATA step statement for syntax errors.
- Then it converts the program to machine code if no syntax errors are found
- SAS flags the variables to be dropped
- Creates the program data vector (PDV) to hold the current observation
  - PDV is an area of memory where SAS builds one observation
  - Contains 2 automatic variables that can be used for processing but that are not written to the data set as part of an observation
    1. _N_ : is the iteration number of the DATA step
    2. _ERROR_: signals the occurrence of an error that is caused by the data during execution. The default value is 0, which just means there were no errors. When 1 or more error(s) occur, the value will be 1.
- When SAS compiles the SET statement, a slot is added to the PDV for each variable in the data set
The Execution Phase
- SAS creates the descriptor portion of the new data set once the compilation phase is complete
- The descriptor portion supplies: the variable names and attributes such as the type and length

Reading Excel Spreadsheet Data in SAS

The SAS/ACCESS interfaces are each licensed separately but allows your SAS programs to read data from and write data to a third party data source in the same way it reads and writes to a SAS library.

LIBNAME libref <engine> "workbook-name" <options>;
Ex: LIBNAME ExcelData excel "&BellaFolder/sales.xls";

However, if you’re using a client application, such as SAS Enterprise Guide or SAS Studio – the above method will not work. It also will not work if your SAS and Microsoft Office products have non-matching number of bits, where one is 32-bit and the other is 64-bit.
Instead, you can either use the SAS windowing environment or a PROC IMPORT / EXPORT statement.

PROC EXPORT DATA=BGlynn.sales OUTFILE="&BellaFolder.SALES.xlsx" DBMS=XLSX REPLACE; RUN;

It’s important to note that in order for you to open the Excel file, you’ll need to disassociate the libref when you’re finished using it.

LIBNAME ExcelData clear;

Reading Database Data in SAS

LIBNAME libref engine <SAS/ACCESS options>
    USER =
    PASSWORD =
    PATH =
    SCHEMA =;
Ex: LIBNAME oralib oracle user= edu101 pw=pa$$w0rd path=dbms_srv schema=educ;

Reading Raw Data Files in SAS

Specification	Types of Data	Arrangement
list input	standard and/or non-standard	separated by delimited
column input	standard	in columns
formatted input	standard and/or non-standard	in columns

Standard Data	Non-standard Data
data that SAS can read without any special instructions	Dates & Numeric values that include special characters like $

Method 1 for Reading in Raw Data in a Data Step

DATA output-SAS-data-set;
    SET input-SAS-data-set;
RUN;

Method 2 for Reading in Raw Data in a Data Step

DATA output-SAS-data-set;
    INFILE 'raw-data-file-name';
    INPUT specifications;
RUN;

The INFILE statement identifies the physical name and location of the raw data file. You should specify the full path and filename, including the extension.
The INPUT statement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables.

DATA WORK.SALES1;
    INFILE "&BELLAFolder.Sales.csv" dlm=',';
    INPUT EmployeeID FirstName $ LastName $ Gender $ Salary JobTitle $;
RUN;

SAS considers a space or blank to be the default delimiter between values in a delimited raw data file. If your file uses any other delimiter, you’ll need to indicate what it is in the INFILE statement by using the DLM= option as shown above.

Using the Length Statement

Can be used in a DATA step to explicitly define the length of a character variable
8 bytes is the default length

DATA output-SAS-data-set;
     LENGTH variable1 <$> length;
RUN;

PROC SQL	DATA Step
Join tables & produce a report in one step without creating a SAS data set Join tables without presorting the data Specify complex matching criteria	Create multiple data sets
	Direct output to data sets based on which input data set contributed to the observation
	Use FIRST. and LAST. processing as well as DO loops, arrays, and the Hash object
	Perform complex data manipulation

Elements of the PROC SQL Step

There are some differences between the syntax of a PROC SQL step and the syntax of other PROC steps.
For example, when SAS executes a PROC SQL statement, the SQL procedure continues to run until SAS encounters a STEP boundary.
The STEP boundary stops PROC SQL from running and removes it from memory.
A QUIT statement at the end of a PROC SQL step is an explicit boundary.
The beginning of a PROC Step or a DATA Step is also a step boundary.
In a single PROC SQL step, you can include one SELECT or multiple SELECT statements.

PROC SQL;
   SELECT column-1 <, column-2 ...>
   FROM table-1 ...
   <WHERE expression>
   <GROUP BY column-1 <, column-2 ...>>
   <HAVING expression>
   <ORDER BY column-1 <, column-2 ...>>;
   SELECT column-1 <, column-2 ...>
   FROM table-2 ...;
QUIT;

Each SELECT statement creates a separate report.
SAS executes the PROC SQL query as soon as it reaches the semicolon at the end of the SELECT statement.

Scientifically Proven Cheatsheet to Becoming a Badass SAS Programmer

Published by bella.belanich on 05/14/202205/14/2022

DATA Steps

PROC Steps

How to Add Comments in your SAS Programs:

Block Comment: / * */

Comment Statement: * … ;

Three Major Types of Files used in SAS:

SAS Libraries

Defining a Library

Accessing a SAS Library

Data Values

Special Where Operators

Formatting Data Values

Creating and Applying User-Defined Formats

Rules for Constructing a Format Name

Specifying Ranges of Values Using the Tiers Format

Defining a Continuous Range

Reading SAS Data Sets

Using DROP and KEEP Statements in a DATA step

How SAS Processes the DATA Step

Reading Excel Spreadsheet Data in SAS

Reading Database Data in SAS

Reading Raw Data Files in SAS

Method 1 for Reading in Raw Data in a Data Step

Method 2 for Reading in Raw Data in a Data Step

Using the Length Statement

Elements of the PROC SQL Step

0 Comments

Leave a Reply

Career Get Loaded, You're Now a Nerd -- SQL like a Treacle

Teradata SQL – Tip Sheet

Scientifically Proven Cheatsheet to Becoming a Badass SAS Programmer

Published by bella.belanich on 05/14/202205/14/2022

DATA Steps

PROC Steps

How to Add Comments in your SAS Programs:

Block Comment: / * */

Comment Statement: * … ;

Three Major Types of Files used in SAS:

SAS Libraries

Defining a Library

Accessing a SAS Library

Data Values

Special Where Operators

Formatting Data Values

Creating and Applying User-Defined Formats

Rules for Constructing a Format Name

Specifying Ranges of Values Using the Tiers Format

Defining a Continuous Range

Reading SAS Data Sets

Using DROP and KEEP Statements in a DATA step

How SAS Processes the DATA Step

Reading Excel Spreadsheet Data in SAS

Reading Database Data in SAS

Reading Raw Data Files in SAS

Method 1 for Reading in Raw Data in a Data Step

Method 2 for Reading in Raw Data in a Data Step

Using the Length Statement

Elements of the PROC SQL Step

0 Comments

Leave a Reply

Related Posts

Career Get Loaded, You're Now a Nerd -- SQL like a Treacle

Teradata SQL – Tip Sheet