Quick Tips before we get started:

When you name a process flow “Autoexec”, SAS Enterprise Guide prompts you to run the process flow when you open the project.

To make it easier to see the log messages associated with the code you submit, you can set an option to hide the generated wrapper code.

A SAS Program is a sequence of steps that perform a specific task. There are two kinds of steps that make up a SAS Program.

DATA Steps

  • Typically reads data from an input source, processes it, and creates a SAS data set.
  • Can be used to create new variables

PROC Steps

  • Generates reports & graphs
  • Manages data
  • Sorts data

How to Add Comments in your SAS Programs:

Block Comment: / * */

  • Can be any length
  • Can contain internal semi-colons
  • Cannot be nested
  • Avoid placing in 1st and 2nd columns

Comment Statement: * … ;

  • Can begin in columns 1 and 2
  • Cannot contain internal semi-colons

Three Major Types of Files used in SAS:

  1. Raw data files
  2. SAS data sets
  3. SAS Program files

SAS Libraries

A SAS library is the highest level of organization for information.

You refer to a SAS library by a logical name called a library reference name, or libref.

The libref references a particular physical location that the operating environment recognizes. So, think of the libref as a folder on your computer. Instead of your computer, it’s just a folder on the server.

Defining a Library

LIBNAME libref 'SAS-library' <options>;

Example:

LIBNAME BGlynn '/ServerName/FolderName/SubfolderName/';
  • It won’t be automatically available in the SAS session. You must assign a libref to a user-created library to make it available.
  • But once you do, BGlynn will be available for the duration of the current SAS session.
  • Must have a length of 1-8 characters
  • Must begin with a letter or underscore
  • Must specify an existing folder because the LIBNAME statement does not create a new folder.

Accessing a SAS Library

%LET BellaFolder = /ServerName/FolderName/SubfolderName/;
LIBNAME BGlynn "&BellaFolder";
  • Any time you reference a macro variable within quotation marks, such as in the LIBNAME statement above, you MUST use double quotation marks.

If you want to create a report that displays the contents of a SAS library, use:

PROC CONTENTS DATA= BGlynn._ALL_;
RUN;
  • By default, a PROC CONTENTS report includes information about each individual data set in the library called the descriptor portion which can be very long. To remove it, just add nods at the end like:
PROC CONTENTS DATA= BGlynn._ALL_ nods;
RUN;

DatabasesSASSQL
tabledatasettable
columnvariableattribute
rowobservationtuple
Comparison of Common Terminology used across platforms and programming languages

Data Values

CharacterNumeric
Left-alignedRight-aligned
Missing values: blankMissing values: period
32,767 bytes is the largest possible lengthDefault Length: 8 bytes
*A numeric variable of the default length can hold 16 or 17 significant digits
Symbol(s)MnemonicDefinition
=EQequal to
^= -= ~=NEnot equal to
> <GT LTgreater than / less than
>= <=GE LEgreater than or equal to &
less than or equal to
INequal to one of a list
  • Either the symbols or the mnemonic can be used
  • For IN, you must enclose the list in parentheses () and have quotations for character variables. Either a comma or a space can be used to separate the elements in a list.

Special Where Operators

MnemonicDefinition
BETWEEN – ANDan inclusive range
WHERE SAME ANDaugment a where expression
IS NULLa missing value
IS MISSINGa missing value
LIKEmatches a pattern

If you are pulling data from a SQL database table, then IS NULL and IS MISSING will produce different results. IS NULL should be used for SQL.

SymbolReplacesEx WHERE ClauseReturns
%any number of charactersWHERE Name LIKE ‘T%’ Both Tom and Tommy
_one characterWHERE Name LIKE ‘T_m’ Only Tom
Only Names in the table: Tom, Tommy

Formatting Data Values

FORMAT variable(s) format;
  • A FORMAT is an instruction that tells SAS how to display data values.
  • SAS Format Form: <$> format <w>.<d>
    • $: indicates a character format and precedes the name of the SAS format
    • <w>: where you specify the total format width, including the decimal places and special characters.
    • . : the period is required syntax
    • <d>: where you specify the number of decimal places in numeric formats
FormatDefinition
$w.writes standard character data
w.dwrites standard numeric data
COMMAw.dwrites numeric values with a comma that separates every 3 digits and a period that separates the decimal fraction
DOLLARw.dsame as above but with a leading $ sign
COMMAXw.dwrites numeric values with a period that separates every 3 digits and a comma that separates the decimal fraction
EUROXw.dsame as above but with a leading € sign

SAS Date Values are a special category of numeric values.

SAS stores date values as the number of days between January 1, 1960 and a specific date.

FormatStored ValueDisplayed Value
MMDDYY6.11870070192
MMDDYY8.1187007/01/92
MMDDYY10.1187007/01/1992
DDMMYY6.11870010792
DDMMYY8.1187001/07/92
DDMMYY10.1187001/07/1992
DATE7.1187001JUL92
DATE9.1187001JUL1992
WORDDATE.11870July 1, 1992
WEEKDATE.11870Wednesday, July 1, 1992
MONYY7.11870JUL1992
YEAR4.118701992

Creating and Applying User-Defined Formats

To create and apply your own formats, you must use 2 PROC steps:

  1. You use PROC FORMAT to create the user-defined format.
  2. Then you run a FORMAT statement in a PROC PRINT step to apply the format to a specific variable.

When you create a user-defined format, you don’t associate it with a particular variable or data set. Instead, you create it based on values that you want to display differently.

PROC FORMAT;
    VALUE format-name value-or-range1 = 'formatted-value1'
                      value-or-range2 = 'formatted-value2'
                      ...;
RUN;
  • You can create multiple user-defined formats in the same PROC FORMAT step by simply specifying multiple VALUE statements.

Rules for Constructing a Format Name

TypeFormat-name
character$CTRYFMT
character$_ST3FMT_
numericSALRANGE_FMT
numeric_SALRANGE
  • A format name can have a maximum of 32 characters.
  • The name of a format that applies to character values must start with a $, followed by a letter or underscore.
  • The name of a format that applies to numeric values must begin with a letter or underscore.
  • A format name cannot end in a number.
  • All remaining characters can be letters, underscores, or numbers.
  • A user-defined format name cannot be the name of a SAS format.
  • The format name does not end with a period in the VALUE statement. However, when you refer to the format in a FORMAT statement, you will specify the period.

Specifying Ranges of Values Using the Tiers Format

PROC FORMAT;
    VALUE TIERS 20000 - 49999 = 'Tier 1'
                50000 - 99999 = 'Tier 2'
               100000 - 250000 = 'Tier 3';
RUN;

The TIERS format applies only to numeric values.

The value 99,999.87 falls outside of all ranges specified above. The ranges defined assume that the values of salary are stored as whole numbers. To create a set of ranges to include non-whole numbers, use: <-, -<, or <-<

First ValueSymbol(s)Last Value
exclude –>50,000<-100,000
50,000-<100,000<– exclude
exclude –>50,000<-<100,000<– exclude

You can only use the < symbol to define a non-inclusive range.

Defining a Continuous Range

If you don’t know the highest and/or the lowest values in a range, use the keyword(s): LOW and/or HIGH

PROC FORMAT;
    VALUE TIERS LOW -< 50000 = 'Tier 1'
                50000 -< 100000 = 'Tier 2'
               100000 - HIGH = 'Tier 3';
RUN;

The LOW keyword can be used to define ranges that apply also to character values.

For character values, LOW treats missing values as the lowest possible value. However, this is not true for numeric values.

To Format a Date Like ‘YYYY-MM-DD’

PROC FORMAT;
    PICTURE mydate low-high = '%Y-%0m-%d'
              (datatype = date);
RUN;

Reading SAS Data Sets

Types of files that a DATA Step can read as input data:

  1. SAS data sets
  2. Excel files
  3. Raw data files

To create a new data set from an existing SAS data set, just use a DATA step.

DATA output-SAS-data-set;  

    SET input-SAS-data-set;

RUN;

* Creates *

* Reads *

By default, a SET statement reads all observations and all variables from the input data set sequentially.

Using DROP and KEEP Statements in a DATA step

DATA WORK.subset1;
    SET BGlynn.sales;
    WHERE Country = 'USA;
    DROP Gender DateOfBirth;
    KEEP FirstName LastName Salary;
RUN;
  • If you use a KEEP statement, you have to write every single variable that you want to keep in the dataset.
  • Neither the DROP or the KEEP statement will have any effect on the input data set, i.e. BGlynn.sales

How SAS Processes the DATA Step

  1. The Compilation Phase
    • SAS scans each DATA step statement for syntax errors.
    • Then it converts the program to machine code if no syntax errors are found
    • SAS flags the variables to be dropped
    • Creates the program data vector (PDV) to hold the current observation
      • PDV is an area of memory where SAS builds one observation
      • Contains 2 automatic variables that can be used for processing but that are not written to the data set as part of an observation
        1. _N_ : is the iteration number of the DATA step
        2. _ERROR_: signals the occurrence of an error that is caused by the data during execution. The default value is 0, which just means there were no errors. When 1 or more error(s) occur, the value will be 1.
    • When SAS compiles the SET statement, a slot is added to the PDV for each variable in the data set
  2. The Execution Phase
    • SAS creates the descriptor portion of the new data set once the compilation phase is complete
    • The descriptor portion supplies: the variable names and attributes such as the type and length

Reading Excel Spreadsheet Data in SAS

  • The SAS/ACCESS interfaces are each licensed separately but allows your SAS programs to read data from and write data to a third party data source in the same way it reads and writes to a SAS library.
LIBNAME libref <engine> "workbook-name" <options>;
Ex: LIBNAME ExcelData excel "&BellaFolder/sales.xls";
  • However, if you’re using a client application, such as SAS Enterprise Guide or SAS Studio – the above method will not work. It also will not work if your SAS and Microsoft Office products have non-matching number of bits, where one is 32-bit and the other is 64-bit.
  • Instead, you can either use the SAS windowing environment or a PROC IMPORT / EXPORT statement.
PROC EXPORT DATA=BGlynn.sales OUTFILE="&BellaFolder.SALES.xlsx" DBMS=XLSX REPLACE; RUN;

It’s important to note that in order for you to open the Excel file, you’ll need to disassociate the libref when you’re finished using it.

LIBNAME ExcelData clear;

Reading Database Data in SAS

LIBNAME libref engine <SAS/ACCESS options>
    USER =
    PASSWORD =
    PATH =
    SCHEMA =;
Ex: LIBNAME oralib oracle user= edu101 pw=pa$$w0rd path=dbms_srv schema=educ;

Reading Raw Data Files in SAS

SpecificationTypes of DataArrangement
list inputstandard and/or non-standardseparated by delimited
column inputstandardin columns
formatted inputstandard and/or non-standardin columns
Standard DataNon-standard Data
data that SAS can read without any special instructionsDates & Numeric values that include special characters like $

Method 1 for Reading in Raw Data in a Data Step

DATA output-SAS-data-set;
    SET input-SAS-data-set;
RUN;

Method 2 for Reading in Raw Data in a Data Step

DATA output-SAS-data-set;
    INFILE 'raw-data-file-name';
    INPUT specifications;
RUN;
  • The INFILE statement identifies the physical name and location of the raw data file. You should specify the full path and filename, including the extension.
  • The INPUT statement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables.
DATA WORK.SALES1;
    INFILE "&BELLAFolder.Sales.csv" dlm=',';
    INPUT EmployeeID FirstName $ LastName $ Gender $ Salary JobTitle $;
RUN;
  • SAS considers a space or blank to be the default delimiter between values in a delimited raw data file. If your file uses any other delimiter, you’ll need to indicate what it is in the INFILE statement by using the DLM= option as shown above.

Using the Length Statement

  • Can be used in a DATA step to explicitly define the length of a character variable
  • 8 bytes is the default length
DATA output-SAS-data-set;
     LENGTH variable1 <$> length;
RUN;
PROC SQLDATA Step
Join tables & produce a report in one step without creating a SAS data set
Join tables without presorting the data
Specify complex matching criteria
Create multiple data sets
Direct output to data sets based on which input data set contributed to the observation
Use FIRST. and LAST. processing as well as DO loops, arrays, and the Hash object
Perform complex data manipulation

Elements of the PROC SQL Step

  • There are some differences between the syntax of a PROC SQL step and the syntax of other PROC steps.
  • For example, when SAS executes a PROC SQL statement, the SQL procedure continues to run until SAS encounters a STEP boundary.
  • The STEP boundary stops PROC SQL from running and removes it from memory.
  • A QUIT statement at the end of a PROC SQL step is an explicit boundary.
  • The beginning of a PROC Step or a DATA Step is also a step boundary.
  • In a single PROC SQL step, you can include one SELECT or multiple SELECT statements.
PROC SQL;
   SELECT column-1 <, column-2 ...>
   FROM table-1 ...
   <WHERE expression>
   <GROUP BY column-1 <, column-2 ...>>
   <HAVING expression>
   <ORDER BY column-1 <, column-2 ...>>;
   SELECT column-1 <, column-2 ...>
   FROM table-2 ...;
QUIT;
  • Each SELECT statement creates a separate report.
  • SAS executes the PROC SQL query as soon as it reaches the semicolon at the end of the SELECT statement.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security