Quick Tips before we get started:
When you name a process flow “Autoexec”, SAS Enterprise Guide prompts you to run the process flow when you open the project.
To make it easier to see the log messages associated with the code you submit, you can set an option to hide the generated wrapper code.
A SAS Program is a sequence of steps that perform a specific task. There are two kinds of steps that make up a SAS Program.
- Typically reads data from an input source, processes it, and creates a SAS data set.
- Can be used to create new variables
- Generates reports & graphs
- Manages data
- Sorts data
How to Add Comments in your SAS Programs:
Block Comment: / * */
- Can be any length
- Can contain internal semi-colons
- Cannot be nested
- Avoid placing in 1st and 2nd columns
Comment Statement: * … ;
- Can begin in columns 1 and 2
- Cannot contain internal semi-colons
Three Major Types of Files used in SAS:
- Raw data files
- SAS data sets
- SAS Program files
A SAS library is the highest level of organization for information.
You refer to a SAS library by a logical name called a library reference name, or libref.
The libref references a particular physical location that the operating environment recognizes. So, think of the libref as a folder on your computer. Instead of your computer, it’s just a folder on the server.
Defining a Library
LIBNAME libref 'SAS-library' <options>;
LIBNAME BGlynn '/ServerName/FolderName/SubfolderName/';
- It won’t be automatically available in the SAS session. You must assign a libref to a user-created library to make it available.
- But once you do, BGlynn will be available for the duration of the current SAS session.
- Must have a length of 1-8 characters
- Must begin with a letter or underscore
- Must specify an existing folder because the LIBNAME statement does not create a new folder.
Accessing a SAS Library
%LET BellaFolder = /ServerName/FolderName/SubfolderName/; LIBNAME BGlynn "&BellaFolder";
- Any time you reference a macro variable within quotation marks, such as in the LIBNAME statement above, you MUST use double quotation marks.
If you want to create a report that displays the contents of a SAS library, use:
PROC CONTENTS DATA= BGlynn._ALL_; RUN;
- By default, a PROC CONTENTS report includes information about each individual data set in the library called the descriptor portion which can be very long. To remove it, just add nods at the end like:
PROC CONTENTS DATA= BGlynn._ALL_ nods; RUN;
|Missing values: blank||Missing values: period|
|32,767 bytes is the largest possible length||Default Length: 8 bytes|
|*A numeric variable of the default length can hold 16 or 17 significant digits|
|^= -= ~=||NE||not equal to|
|> <||GT LT||greater than / less than|
|>= <=||GE LE||greater than or equal to & |
less than or equal to
|IN||equal to one of a list|
- Either the symbols or the mnemonic can be used
- For IN, you must enclose the list in parentheses () and have quotations for character variables. Either a comma or a space can be used to separate the elements in a list.
Special Where Operators
|BETWEEN – AND||an inclusive range|
|WHERE SAME AND||augment a where expression|
|IS NULL||a missing value|
|IS MISSING||a missing value|
|LIKE||matches a pattern|
If you are pulling data from a SQL database table, then IS NULL and IS MISSING will produce different results. IS NULL should be used for SQL.
|Symbol||Replaces||Ex WHERE Clause||Returns|
|%||any number of characters||WHERE Name LIKE ‘T%’||Both Tom and Tommy|
|_||one character||WHERE Name LIKE ‘T_m’||Only Tom|
Formatting Data Values
FORMAT variable(s) format;
- A FORMAT is an instruction that tells SAS how to display data values.
- SAS Format Form: <$> format <w>.<d>
- $: indicates a character format and precedes the name of the SAS format
- <w>: where you specify the total format width, including the decimal places and special characters.
- . : the period is required syntax
- <d>: where you specify the number of decimal places in numeric formats
|$w.||writes standard character data|
|w.d||writes standard numeric data|
|COMMAw.d||writes numeric values with a comma that separates every 3 digits and a period that separates the decimal fraction|
|DOLLARw.d||same as above but with a leading $ sign|
|COMMAXw.d||writes numeric values with a period that separates every 3 digits and a comma that separates the decimal fraction|
|EUROXw.d||same as above but with a leading € sign|
SAS Date Values are a special category of numeric values.
SAS stores date values as the number of days between January 1, 1960 and a specific date.
|Format||Stored Value||Displayed Value|
|WORDDATE.||11870||July 1, 1992|
|WEEKDATE.||11870||Wednesday, July 1, 1992|
Creating and Applying User-Defined Formats
To create and apply your own formats, you must use 2 PROC steps:
- You use PROC FORMAT to create the user-defined format.
- Then you run a FORMAT statement in a PROC PRINT step to apply the format to a specific variable.
When you create a user-defined format, you don’t associate it with a particular variable or data set. Instead, you create it based on values that you want to display differently.
PROC FORMAT; VALUE format-name value-or-range1 = 'formatted-value1' value-or-range2 = 'formatted-value2' ...; RUN;
- You can create multiple user-defined formats in the same PROC FORMAT step by simply specifying multiple VALUE statements.
Rules for Constructing a Format Name
- A format name can have a maximum of 32 characters.
- The name of a format that applies to character values must start with a $, followed by a letter or underscore.
- The name of a format that applies to numeric values must begin with a letter or underscore.
- A format name cannot end in a number.
- All remaining characters can be letters, underscores, or numbers.
- A user-defined format name cannot be the name of a SAS format.
- The format name does not end with a period in the VALUE statement. However, when you refer to the format in a FORMAT statement, you will specify the period.
Specifying Ranges of Values Using the Tiers Format
PROC FORMAT; VALUE TIERS 20000 - 49999 = 'Tier 1' 50000 - 99999 = 'Tier 2' 100000 - 250000 = 'Tier 3'; RUN;
The TIERS format applies only to numeric values.
The value 99,999.87 falls outside of all ranges specified above. The ranges defined assume that the values of salary are stored as whole numbers. To create a set of ranges to include non-whole numbers, use: <-, -<, or <-<
|First Value||Symbol(s)||Last Value|
|exclude –>||50,000||<-<||100,000||<– exclude|
You can only use the < symbol to define a non-inclusive range.
Defining a Continuous Range
If you don’t know the highest and/or the lowest values in a range, use the keyword(s): LOW and/or HIGH
PROC FORMAT; VALUE TIERS LOW -< 50000 = 'Tier 1' 50000 -< 100000 = 'Tier 2' 100000 - HIGH = 'Tier 3'; RUN;
The LOW keyword can be used to define ranges that apply also to character values.
For character values, LOW treats missing values as the lowest possible value. However, this is not true for numeric values.
To Format a Date Like ‘YYYY-MM-DD’
PROC FORMAT; PICTURE mydate low-high = '%Y-%0m-%d' (datatype = date); RUN;
Reading SAS Data Sets
Types of files that a DATA Step can read as input data:
- SAS data sets
- Excel files
- Raw data files
To create a new data set from an existing SAS data set, just use a DATA step.
DATA output-SAS-data-set; SET input-SAS-data-set; RUN;
* Creates *
* Reads *
By default, a SET statement reads all observations and all variables from the input data set sequentially.
Using DROP and KEEP Statements in a DATA step
DATA WORK.subset1; SET BGlynn.sales; WHERE Country = 'USA; DROP Gender DateOfBirth; KEEP FirstName LastName Salary; RUN;
- If you use a KEEP statement, you have to write every single variable that you want to keep in the dataset.
- Neither the DROP or the KEEP statement will have any effect on the input data set, i.e. BGlynn.sales
How SAS Processes the DATA Step
- The Compilation Phase
- SAS scans each DATA step statement for syntax errors.
- Then it converts the program to machine code if no syntax errors are found
- SAS flags the variables to be dropped
- Creates the program data vector (PDV) to hold the current observation
- PDV is an area of memory where SAS builds one observation
- Contains 2 automatic variables that can be used for processing but that are not written to the data set as part of an observation
- _N_ : is the iteration number of the DATA step
- _ERROR_: signals the occurrence of an error that is caused by the data during execution. The default value is 0, which just means there were no errors. When 1 or more error(s) occur, the value will be 1.
- When SAS compiles the SET statement, a slot is added to the PDV for each variable in the data set
- The Execution Phase
- SAS creates the descriptor portion of the new data set once the compilation phase is complete
- The descriptor portion supplies: the variable names and attributes such as the type and length
Reading Excel Spreadsheet Data in SAS
- The SAS/ACCESS interfaces are each licensed separately but allows your SAS programs to read data from and write data to a third party data source in the same way it reads and writes to a SAS library.
LIBNAME libref <engine> "workbook-name" <options>; Ex: LIBNAME ExcelData excel "&BellaFolder/sales.xls";
- However, if you’re using a client application, such as SAS Enterprise Guide or SAS Studio – the above method will not work. It also will not work if your SAS and Microsoft Office products have non-matching number of bits, where one is 32-bit and the other is 64-bit.
- Instead, you can either use the SAS windowing environment or a PROC IMPORT / EXPORT statement.
PROC EXPORT DATA=BGlynn.sales OUTFILE="&BellaFolder.SALES.xlsx" DBMS=XLSX REPLACE; RUN;
It’s important to note that in order for you to open the Excel file, you’ll need to disassociate the libref when you’re finished using it.
LIBNAME ExcelData clear;
Reading Database Data in SAS
LIBNAME libref engine <SAS/ACCESS options> USER = PASSWORD = PATH = SCHEMA =; Ex: LIBNAME oralib oracle user= edu101 pw=pa$$w0rd path=dbms_srv schema=educ;
Reading Raw Data Files in SAS
|Specification||Types of Data||Arrangement|
|list input||standard and/or non-standard||separated by delimited|
|column input||standard||in columns|
|formatted input||standard and/or non-standard||in columns|
|Standard Data||Non-standard Data|
|data that SAS can read without any special instructions||Dates & Numeric values that include special characters like $|
Method 1 for Reading in Raw Data in a Data Step
DATA output-SAS-data-set; SET input-SAS-data-set; RUN;
Method 2 for Reading in Raw Data in a Data Step
DATA output-SAS-data-set; INFILE 'raw-data-file-name'; INPUT specifications; RUN;
- The INFILE statement identifies the physical name and location of the raw data file. You should specify the full path and filename, including the extension.
- The INPUT statement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables.
DATA WORK.SALES1; INFILE "&BELLAFolder.Sales.csv" dlm=','; INPUT EmployeeID FirstName $ LastName $ Gender $ Salary JobTitle $; RUN;
- SAS considers a space or blank to be the default delimiter between values in a delimited raw data file. If your file uses any other delimiter, you’ll need to indicate what it is in the INFILE statement by using the DLM= option as shown above.
Using the Length Statement
- Can be used in a DATA step to explicitly define the length of a character variable
- 8 bytes is the default length
DATA output-SAS-data-set; LENGTH variable1 <$> length; RUN;
|PROC SQL||DATA Step|
|Join tables & produce a report in one step without creating a SAS data set|
Join tables without presorting the data
Specify complex matching criteria
|Create multiple data sets|
|Direct output to data sets based on which input data set contributed to the observation|
|Use FIRST. and LAST. processing as well as DO loops, arrays, and the Hash object|
|Perform complex data manipulation|
Elements of the PROC SQL Step
- There are some differences between the syntax of a PROC SQL step and the syntax of other PROC steps.
- For example, when SAS executes a PROC SQL statement, the SQL procedure continues to run until SAS encounters a STEP boundary.
- The STEP boundary stops PROC SQL from running and removes it from memory.
- A QUIT statement at the end of a PROC SQL step is an explicit boundary.
- The beginning of a PROC Step or a DATA Step is also a step boundary.
- In a single PROC SQL step, you can include one SELECT or multiple SELECT statements.
PROC SQL; SELECT column-1 <, column-2 ...> FROM table-1 ... <WHERE expression> <GROUP BY column-1 <, column-2 ...>> <HAVING expression> <ORDER BY column-1 <, column-2 ...>>; SELECT column-1 <, column-2 ...> FROM table-2 ...; QUIT;
- Each SELECT statement creates a separate report.
- SAS executes the PROC SQL query as soon as it reaches the semicolon at the end of the SELECT statement.