Data Steps: 2008

Thursday, August 21, 2008

Thanks San Diego

Thank you to everybody at the San Diego SAS User's Group. I had a great time meeting everyone and presenting.

The paper and code can be found on my site at:
http://www.pelicanprogramming.com/sgf08/sgf08.html

I will also put the power point up when I get a chance.

Thanks again for hosting me as a speaker! And thanks to Kirk Paul Lafler of Software Intelligence Corp for letting me use your laptop so I didn't have to do any projector fiddling.

Wednesday, August 06, 2008

A Great Blog From A SAS User

I just checked out Ann Maria's Blog from the Julia Group and was so impressed with her writing that I had to share it.

I particulary liked The Dangers and Wonders of Statistics Using SAS post.

Go check it out.

Now.

Wednesday, July 30, 2008

JavaScript Object Notation

I use Javascript quite a bit for my i-Doc site and some of the projects I work on. I really like the language because it's syntax is comfortable and you can do a bunch with it without having to know a lot about it. And getting stuff done quickly is pretty much the whole reason for programming.

But the "looseness" of Javascript can also tempt you to fall into programming habits that don't scale well.

When I first started learning Javascript I approached it as a purely function based language. Probably because I was already familiar with SCL (Screen Control Language) in SAS/AF and that's what I likened Javascript to. But anyways, all my JS code looked like this:


function doSomething(someVal) {
  var someLocal;
  // do some stuff
  return rValue;
}

function doSomethingElse(someVal) {
  var someLocal;
  // do some other stuff
  return rValue;
}

And I would store it in a file and include the file as a link in my header tags.
This works perfectly well and so I had no incentive to change it. Until I started getting lots of functions in lots of files. It doesn't scale well. But by changing the coding style just a little bit, I am able to write my JS code so it is much easier to maintain. Using JavaScript Object Notation (or JSON) I can fake namespaces. This lets me take more control over the design of my JS code. Using JSON the above would be rewritten:


myNameSpace = {
  doSomething function(someVal) {
     var someLocal;
     // do some stuff
     return rValue;
  } ,
  doSomethingElse function(someVal) {
     var someLocal;
     // do some other stuff
     return rValue;
  }
};

Then when I want to use one of the functions, I just preface it with the object name ( myNameSpace.doSomething(withThis); )
I usually choose the object name to be the same as the name of the javascript file. That way I avoid name collisions, and I can quickly find where a function is defined if I need to look at the source code.

Certainly, this is not a great leap forward in web programming. But I still see so much function-style javascript online that I thought it would be useful to pass it along.

Thursday, July 24, 2008

SAS Macro Nesting

I'd like to share a nifty SAS option that will help tremendously with debugging SAS macros. The mprintNest system option will show nesting information in your log. This is a big improvement over mprint that showed which macro you were in, but made it nearly impossible to tell which macros may have contained the macro call.

With mprintNest you can see exactly where you are in the executing macro stack.

You must use mprintNest with mprint. It cannot be set on it's own.

Wednesday, June 04, 2008

Data Set Sorted By Information

#UPDATE# Please see the comments for a better way to get the sorted by information from a data set! #UPDATE#

The other day a colleague mentioned that it was not possible to get the sort information for a data set from the dictionary tables. Intrigued, I
took up the challenge. While it's true that you cannot get the sorted by information directly, it is possible to get the necessary information
and put it together.

First of all, a little test data set:


data myData;
  input key1 nonKey keyB key3;
cards; 
1 2 6 3
1 5 5 7
2 7 4 4
3 3 5 9
4 5 3 2
1 5 4 6
3 3 9 8
5 5 7 9
6 6 4 3
3 3 3 6
;

I chose the names key1, nonKey, keyB, and key3 to make sure I wasn't getting the variables in alphabetical order
and mistaking it for the sort order.

Now sort the data set:


proc sort data = myData;
  by key3 descending keyB key1;
run;

It is important to make sure this works with descending sorts. Also notice the
variables are in a different order than in the data step so we don't confuse any artifact of creation order with sort order.

And now finally the code that will report the sort order. I used SQL and a data _null_ step to get the information then I just write it to the log. Originally I was just trying to see if it could be done, but just writing it to the log is not the most useful. Now that I've seen it works, I can rewrite it as all macro code using the vtables and %sysfunc() calls. That would allow me to make it a "function" style macro which returns a value to be used within code. Maybe tomorrow or next week...


%macro getSortedByVars(lib=,mem=);
%* This macro will write the sort order of a data set into the log;
%* If there is no sort on the data set it returns a blank;
%* it takes two parameters: the library and the name of the data set;
%* June 2008, Stephen Philp datasteps.blogspot.com/pelicanprogramming.com;
%let lib = %upcase(&lib);
%let mem = %upcase(&mem);

proc sql;
  create table keys
  as select name, sortedBy, case 
                    when (sortedBy<0) then 'DESCENDING'
     when (sortedBy>1) then ' '
      end as prefix,
      case 
                    when (sortedBy<0) then abs(sortedBy)
     else sortedBy
      end as sortOrder
from dictionary.columns
  where libname ="&LIB"    and
        memname = "&MEM" and
  sortedBy ne 0
  order by sortOrder;
quit;

%* now pack up those values from the keys table;
data _null_;
  length value $32767; 
  if 0 then set keys nobs=n;
  do i = 1 to n;
    set keys point=i;
 value = catX(' ', value, prefix, name);
  end;
  call symput('sortedBy',trim(value));
  stop;
run;

%put &sortedBy;
%mend getSortedByVars;

%getSortedByVars(lib=work,mem=mydata);

Friday, May 09, 2008

i-Doc Interactive SAS Documentation

After much hard work, I am happy to announce the arrival of i-Doc interactive SAS documentation. The idea behind i-Doc is to generate SAS documentation from users all over the world. I have started with SAS functions and hope to continue with formats, informats, macro, system options, etc. Eventually I'd like to provide copies in book format for people to keep on their desks.

Please check it out, tell your friends if you find it useful, etc. Currently i-Doc is beta and only works with internet explorer.

i-Doc Interactive SAS Documentation

Thursday, April 17, 2008

SAS SQL Join

One of the most frequent uses of SQL is to join tables. And since SAS data sets are tables, there is good reason to learn SQL. But a lot of SAS programmers will shy away from learning SQL because they are already familiar with merging in the data step. Here are three additional reasons to use a SQL join instead of a data step merge. But first two little sample data sets to use for the examples:


data sales;
  length var $5;
  do id_num = 1 to 10;
    var = 'left';
    output;
  end;
run;

data contracts;
  length var $5;
  input id_no var $;
cards;
3 right
2 right
5 right
6 right
1 right
;

Okay, now here's the three reasons!

1) No need to sort the data beforehand.
This one is pretty self explanatory. If the tables are not sorted by the variables you are joining on, SQL will take care of it.

2) You can join on different variables names.
In a SAS data step merge, you have to merge by a variable or variables that are identical in each data set. With SQL, you can join on variables with different names as long as the values match up. So instead of something like this:


proc sort data = contracts;
  by id_no;
run;
data together;
  merge sales( in=a )
        contracts( in=b rename=(id_no = id_num) );
  by id_num;
  if a;
run;

You can use sql:


proc sql;
  create table together
  as select * from sales as a 
              left join contracts as b
              on a.id_num = b.id_no;
quit;

3) You are warned if there are overlapping variables.
In a traditional data step merge, you have to be very mindful of overlapping variables-- or variables that are shared by the data sets but are not part of the by statetement. If there are overlapping variables, the last data set named on the merge statement to contribute to the observation gets to deliver the resulting value. But there is no warning in the log letting you know values have been overwritten. SQL will let you know that the variable already exists, but it uses the first value, not the last.

If you are already pretty familiar with merging in the data step then you may find some of the SQL syntax a little strange. Most merges are of the "if a;" and "if a and b;" variety. Those are the best starting points for getting used to the equivalent SQL syntax.
The left join we used in the above example is equivalent to "if a;" data step syntax. This is an "outer" join since we are asking for values that don't belong to both sets. For the more restrictive "if a and b;" merge, use an inner join in SQL:


proc sql;
  create table together
  as select * from sales as a 
              inner join contracts as b
              on a.id_num = b.id_no;
quit;

It is called an inner join because you only want values that belong to both sets. For the data step syntax "if not (a and b);" you use a full outer join in SQL. It is called a full outer join because you want everything that is not contained in both sets.


proc sql;
  create table together
  as select * from sales as a 
              full join contracts as b
              on a.id_num = b.id_no;
quit;

Hopefully that is enough to get you started if you are interested in SQL. If I get some time I will put together and post a little cheat sheet of Venn diagrams to illustrate the inner/outer join concepts.

Update: I just got some Venn diagrams up in a new post. Hope you find it useful!
www.sascoders.com/2010/06/how-to-get-what-you-want-out-of-data.html

Wednesday, March 26, 2008

The Cats() Function

One of my favorite new functions is the cats() function available in SAS v9. It is a compress() like function in that it removes leading and
trailing blanks, but it also concatenates the results. CATS() stands for concatenate and strip. Basically the cats() function takes this
type of assignment statement:


key = trim(left(firstName)) || trim(left(lastName)) || 
      trim(left(phone)) || trim(left(zip)) ;

and changes it to this:


key = cats(firstName, lastName, phone, zip);

Generally, the cats() function will return a value with a length of 32767 in the data step. So as always, it's a good idea to use
a length statement on the variable you are assigning to. In this example I might use something like:


length key $200;

The cats() function belongs to a family of cat() functions each doing it's own version of concatenate: cat(), cats(), catt(), catx().

Coming up in version 10, the dog() family of functions! :)

Tuesday, March 18, 2008

Blogging At the SAS Global Forum

So far so good at The SAS Global Forum.

I haven't had too much time to go to a lot of talks, but I attended a really informative one yesterday morning. Judy Loren from Health Dialog Analytic Solutions gave a good talk on using data step hash objects. One little code tidbit that caught my eye was this loop construct:

do i = 1 by 1 until( some criteria or i > 1000 );

It is a nice shorthand way to create a loop sentinel variable and update it instead of:

i = 0;
do until( some criteria or i > 1000 );
i + 1;

Nifty! You never know what little gems you can pick up from the Global Forum. Thanks Judy Loren.

Mostly I have been busy preparing for my own talk tomorrow morning (and enjoying St Paddy's Day on the riverwalk :).

I uploaded the paper and some accompanying code files in case you are interested.

More later!

Thursday, March 13, 2008

SAS Global Forum 2008

Howdy Ya'll!

I will be travelling to San Antonio, TX for SAS Global Forum 2008. I am not really there to "blog" the conference, but I will do my best to take some pictures and share my thoughts on different things I see there. Mostly I plan on wandering around, getting some free swag, meeting as many people as I can and having a good time. Oh, and I am presenting a paper "SAS Macros: Beyond the Basics" Wednesday morning at 10am :)

If you happen to see me there, please step forward and introduce yourself!

Thursday, February 07, 2008

Excel Text Wrap

I often work with data sets that have large text columns. Invariably, the long text fields almost always are related to customer complaint data. It seems we are at our most eloquent when we have something to complain about!

Anyways, I often end up pushing these data sets with large text columns into Excel. And Excel takes the text column and sizes it to only show some of the text. To remedy this, I will first set the width of the column to something manageable, then highlight the column in Excel and choose Format->Cells. Next choose the alignment tab and put a check mark next to Wrap Text. This will tell Excel to auto-wrap the text in the column so it can all be shown.

Like most SAS programmers, I end up doing at least some of my data/presentation work in Excel. I know enough to get most things I need done, however I certainly don't know Excel as well as I know SAS. In an effort to increase my productivity in Excel I ask you to send me any tips you have. After a few weeks, I will put them all together and share them on this blog.

Please send your Excel Productivity Tips For SAS Programmers to stephen at pelicanprogramming dot com. Please include your name and the city where
you live.

Thanks!

Tuesday, January 29, 2008

New Macro IN Operator

Has anyone gotten the new IN Macro operator to work? I just want to test it out and SAS Macro keeps coughing up an error that a character operand is found where a numeric operand is required.

According to the documentation, you can use the # character or the mnenomic IN. Their example is A#B C D E.

I am trying to test it with:


%macro test;
  %if A#B C D E A %then %put it works;
%mend test;
%test;

Pretty straightforward as far as I can tell. Am I overlooking something obvious?

Data Steps

Google SAS Search