Google SAS Search

Add to Google

Tuesday, March 29, 2011

Data Step Hooks

Here is something to keep in mind when using the END= option on the set statement: There is no guarantee you will hit the end of file.

Simple example to illustrate:

data test;
do i = 1 to 10;

data _null_;
set test(where=(i > 10)) end= eof;
if eof
then put "It set EOF for end of file";

In the SAS documentation this is stated cryptically:
Restriction: END= cannot be used with POINT=. When random access is used, the END= variable is never set to 1.

If only it were true that END= can not be used. It can be used. It just might not work as you assumed it would. Consider the above data step where i > 5. In that case it does set eof to 1 as expected. Try where i < 5. Strangely, it does set eof to 1 even though we have never reached the end of file. SAS just "knows" that is has reached the logical end of the file. Which makes you think that reading 0 obs from the file should set eof to 1. But as we saw, it doesn't.

What actually happens is the data step finishes executing as soon as the set statement fails to read another record. So even if it did set eof to 1, it would never reach the if statement to execute it.

Unfortuately, there is no good way (that I know of) to run a bit of code at the end of file, even if 0 obs are read. You could toss the where clause and use a subsetting if statement. But then you are doing a lot of useless data step IO.

What would be be sweet is if SAS provided hooks into the data step. Two useful ones would be post-compile/pre-execute and post-execute. Maybe use special named labels?

Something like:

data _null_;
set test(where=(i > 10)) end= eof;

* In a super awesome world, the code in this label
would ALWAYS execute no matter if the set
statement reads anything or not;
* This would eliminate a lot of the IF _N_ = 1 silliness;

* In a doubly super awesome world, the code in this label
would execute at the end of the data step's life;
* No matter how many observations were read.;