I just looked at some code I had written a few months ago. It categorized things based on a bunch of criteria. At the end of the data step I output the categories that did not fit into a definition bucket nicely:
if bucket in(12,11,10,7,6,5,4) then output junk;
But then I remembered that I also wanted to see if any of them went past all the logic and came up with a missing bucket:
if bucket in(12,11,10,7,6,5,4.) then output junk;
Of course, YOU can see that I forgot the last comma between the 4 and . So missings weren't actually included. I didn't actually see it till today. Doh!
Tuesday, December 20, 2011
Fat Fingers
Tuesday, December 06, 2011
Thursday, December 01, 2011
Little Utility Macro
A lot of times when I am working with interactive SAS, I find myself staring at a SAS date that has not been formatted. The quickest way to see the actual date was to go to my "scratch" enhanced editor and write a quick data _null_ to put it to the log. That was before I realized that macros can be invoked from the command line.
%macro date(d);
%let r = %sysfunc( putN(&d,mmddyy10.) );
%put &r;
%mend date;
Now I just put this little guy into my autoexec, and voila! a full 40 seconds of my life saved!
Speaking of saving time, did you know that you can follow my job site on twitter? If you had, you would have seen the newest posting for a SAS BI Developer needed in Maryland $120K/yr.
Just go to www.sasCoders.com and click the twitter button underneath the banner.
Wednesday, November 16, 2011
Remove Formatting From Variables
What do you do if you have a SAS data set with formatted variables, but you don't have access to the format? You have to remove the format from the variables if you want to work with the data set.
The easiest way to remove formats from variables in a data set is to use proc datasets.
Assuming I have a data set named Responses with some variables that have formats applied to them that no longer exist.
proc datasets library = myLibrary memtype=data;
modify responses;
attrib _all_ format=;
run;
quit;
Wednesday, November 09, 2011
I have been following some of the recent talk going on the blogosphere about R and SAS.
R vs SAS/SPSS in Corporations: A view from the other side
She is correct that it is nearly impossible to get large organizations to give up their propietary software. And she does a great job explaining why. So should the SAS Institute be worried about R?
First lets take a moment to point out something that is often overlooked when people compare open-source to propietary software. Open-source projects on their own are usually relatively lame imitations of their propietary cousins (let's be honest here!). However(!!) open-source becomes incredibly powerful when it is combined with other open source projects.
Consider Linux: nice little OS. MySQL: fun hobby data base. Apache: a very patchy server. PHP: Easy to code, a nightmare to maintain. On their own, none of them are world-changing. But combine them into a LAMP stack and Boom! You are a web developer powerhouse:
PHP 77% of web sites!
Apache serving 80% of the internet!
What this tells me is that R on its own is probably not a big deal for the SAS Institute to compete with. But R combined with the right open-source
projects could potentially be explosive.
I've been thinking that a lot of the SAS I do could be replaced with:
Ruby/Ruby On Rails
R
Git
PostgreSQL
And then on my (shameless plug!)
What you’ll do:
Help to build out data warehouse.
Identify and analyze trends that are important to the business.
Build tools to surface key metrics and reports for the company.
Write ETL scripts to funnel data from various sources.
Layout, optimize and maintain distributed schemas.
Description:
What you’ll need to be familiar with:
SQL
Ruby
Vertica or Other Distributed Databases
Other things we use:
Scala
Ruby on Rails
RESTful Services
PostgreSQL
R
Git
Apparently others are thinking the same. So should the SAS Institute be worried? I honestly don't know. But as a consultant I do know what I will be doing this weekend: installing Ubuntu on an old laptop and setting up a Git, Ruby, Rails, R (GRRR!) development environment to play around with.
Wednesday, November 02, 2011
SAS Research Programmer/Analyst
SAS Research Programmer/Analyst job in Worcester Massachusetts Research Analysts/Programmers provide meaningful contributions by creating, managing, and analyzing large and complex data files on health care utilization. The Institute’s faculty conducts local and multi-site research studies on important public health problems with funding from the National Institutes of Health, the Agency for Healthcare Research and Quality, the Centers for Disease Control, and private foundations such as the Robert Wood Johnson Foundation.
sasCoders.com
Tuesday, November 01, 2011
New SAS Job Site
Well, I finally got my new site up and running. It is a job site specifically for SAS programmers looking for SAS jobs in the US.
I switched the www.sasCoders.com URL from this blog to the new site yesterday. Hopefully it won't be too confusing to people while Google updates their index. This blog can always be reached at http://datasteps.blogspot.com
If you are a recruiter, please hop on over to www.sasCoders.com and create some job listings. The first three are free!
One of the hardest parts of getting a web site going is bootstrapping the content (job postings in this case). A lot of times what sites do is host content from affiliate sites to make it look like they have online traction. But this is frustrating for the users because when they click on the job it will send them off to jobs-network, or career-network, or jobs-ring, or some other affiliate site. You end up feeling like they're less interested in showing you the job than they are in harvesting your email address.
I thought about using affiliate content, but one of the things that I hope will set www.sasCoders.com apart from the others, is no affiliate listings. Cheap affiliate listings make it too easy for someone, anyone to create a job (that may or may not actually exist) and then pay .02 for every person that gets fooled into providing their email address when they apply. No thanks.
So I decided to launch the site "empty" and rely on time/perseverance/luck to get content. Did I mention the first 3 are free? :)
Please do check out the site, and any feedback is always appreciated. Either in the comments here or stephen at sas coders dot com.
Wednesday, October 12, 2011
Efficiently Drop/Keep SAS Data Set Columns
What is the most efficient way to drop/keep columns (variables) in sas tables (data sets)?
For the most part, we would correctly say "using a keep= option on the data set as it is being read into the current step." A quick example to illustrate:
data someData;In fact, I even wrote a whole paper on this for SUGI a few years back.
set myData(keep= x y z);
run;
proc sort data = myData(keep= x y z) out= someData;
by x y;
run;
Programming with the KEEP, RENAME and DROP Data Set Options
However, what if you didn't have access to modify the code? You need to create a seperate step just to keep/drop variables. What's the most efficient way to do it? You could create another data step and use the keep= option on the set statement:
data myData;That approach will work, but it's not very efficient. In fact, it's pretty horribly inefficient. The data set is read one observation at a time, which is usually IO intense. And IO is a big efficiency suck. Plus the single data step actually creates a copy of myData as it's being processed. After the step is finished, the temporary copy replaces the original data set.
* just dropping some variables....;
set myData(keep=x y z);
run;
As a general rule, the most efficient way to move SAS data sets around is to copy them. Copying is usually more efficient than reading them one observation at a time because the copy can use a better copy buffer resulting in less IO. This hints that we should avoid the data step and use a procedure that can copy the data.
Here is what I came up with:
* a little test data set;As you can see, we are using proc datasets instead of a data step. First we are renaming the data set to something temporary (the CHANGE statement). This is a super cheap operation since only the data set header (metadata) is being modified. Then we append (copy) the temporary data set onto the non-existent original data set. Along the way we tell SAS exactly which variables to keep/drop. We have to use the FORCE option because the base data set no longer exists after the rename. And then finally we delete the temporary data set. Whew!
data myData;
do i = 1 to 5;
a = 'blah blah blah';
b = 'foo';
c = 'bar';
x = 'some data ';
y = 'lovely data';
z = 4;
output;
end;
run;
proc datasets;
change myData= tempData;
run;
append base=work.myData
data = tempData(keep = x y z)
force;
run;
delete tempData;
quit;
I haven't run any benchmarks to see how much time this method would save because I, uh, don't have time. But I'm pretty sure it would be much faster than an extra data step.
If you know of another way to accomplish this that is more efficient than a data step, please share!
Thursday, October 06, 2011
Thank You Steve Jobs
Today I am going to dedicate my little slice of this giant internet to Steve Jobs. Like a lot of people around the world, I found out Steve Jobs passed away through an Apple product. His vision and creativity touched all of us.
Rather than me clumsily listing all the inspiring things he accomplished with his short time, I thought it would be better to share some of his own words. If you haven't seen or read his Commencement speech to Stanford from 2005, please take a few moments to do so now. Below is the text of his speech from:
http://news.stanford.edu/news/2005/june15/jobs-061505.html
I am honored to be with you today at your commencement from one of the finest universities in the world. I never graduated from college. Truth be told, this is the closest I've ever gotten to a college graduation. Today I want to tell you three stories from my life. That's it. No big deal. Just three stories.
The first story is about connecting the dots.
I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?
It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him?" They said: "Of course." My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school. She refused to sign the final adoption papers. She only relented a few months later when my parents promised that I would someday go to college.
And 17 years later I did go to college. But I naively chose a college that was almost as expensive as Stanford, and all of my working-class parents' savings were being spent on my college tuition. After six months, I couldn't see the value in it. I had no idea what I wanted to do with my life and no idea how college was going to help me figure it out. And here I was spending all of the money my parents had saved their entire life. So I decided to drop out and trust that it would all work out OK. It was pretty scary at the time, but looking back it was one of the best decisions I ever made. The minute I dropped out I could stop taking the required classes that didn't interest me, and begin dropping in on the ones that looked interesting.
It wasn't all romantic. I didn't have a dorm room, so I slept on the floor in friends' rooms, I returned coke bottles for the 5¢ deposits to buy food with, and I would walk the 7 miles across town every Sunday night to get one good meal a week at the Hare Krishna temple. I loved it. And much of what I stumbled into by following my curiosity and intuition turned out to be priceless later on. Let me give you one example:
Reed College at that time offered perhaps the best calligraphy instruction in the country. Throughout the campus every poster, every label on every drawer, was beautifully hand calligraphed. Because I had dropped out and didn't have to take the normal classes, I decided to take a calligraphy class to learn how to do this. I learned about serif and san serif typefaces, about varying the amount of space between different letter combinations, about what makes great typography great. It was beautiful, historical, artistically subtle in a way that science can't capture, and I found it fascinating.
None of this had even a hope of any practical application in my life. But ten years later, when we were designing the first Macintosh computer, it all came back to me. And we designed it all into the Mac. It was the first computer with beautiful typography. If I had never dropped in on that single course in college, the Mac would have never had multiple typefaces or proportionally spaced fonts. And since Windows just copied the Mac, it's likely that no personal computer would have them. If I had never dropped out, I would have never dropped in on this calligraphy class, and personal computers might not have the wonderful typography that they do. Of course it was impossible to connect the dots looking forward when I was in college. But it was very, very clear looking backwards ten years later.
Again, you can't connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future. You have to trust in something — your gut, destiny, life, karma, whatever. This approach has never let me down, and it has made all the difference in my life.
My second story is about love and loss.
I was lucky — I found what I loved to do early in life. Woz and I started Apple in my parents garage when I was 20. We worked hard, and in 10 years Apple had grown from just the two of us in a garage into a $2 billion company with over 4000 employees. We had just released our finest creation — the Macintosh — a year earlier, and I had just turned 30. And then I got fired. How can you get fired from a company you started? Well, as Apple grew we hired someone who I thought was very talented to run the company with me, and for the first year or so things went well. But then our visions of the future began to diverge and eventually we had a falling out. When we did, our Board of Directors sided with him. So at 30 I was out. And very publicly out. What had been the focus of my entire adult life was gone, and it was devastating.
I really didn't know what to do for a few months. I felt that I had let the previous generation of entrepreneurs down - that I had dropped the baton as it was being passed to me. I met with David Packard and Bob Noyce and tried to apologize for screwing up so badly. I was a very public failure, and I even thought about running away from the valley. But something slowly began to dawn on me — I still loved what I did. The turn of events at Apple had not changed that one bit. I had been rejected, but I was still in love. And so I decided to start over.
I didn't see it then, but it turned out that getting fired from Apple was the best thing that could have ever happened to me. The heaviness of being successful was replaced by the lightness of being a beginner again, less sure about everything. It freed me to enter one of the most creative periods of my life.
During the next five years, I started a company named NeXT, another company named Pixar, and fell in love with an amazing woman who would become my wife. Pixar went on to create the worlds first computer animated feature film, Toy Story, and is now the most successful animation studio in the world. In a remarkable turn of events, Apple bought NeXT, I returned to Apple, and the technology we developed at NeXT is at the heart of Apple's current renaissance. And Laurene and I have a wonderful family together.
I'm pretty sure none of this would have happened if I hadn't been fired from Apple. It was awful tasting medicine, but I guess the patient needed it. Sometimes life hits you in the head with a brick. Don't lose faith. I'm convinced that the only thing that kept me going was that I loved what I did. You've got to find what you love. And that is as true for your work as it is for your lovers. Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking. Don't settle. As with all matters of the heart, you'll know when you find it. And, like any great relationship, it just gets better and better as the years roll on. So keep looking until you find it. Don't settle.
My third story is about death.
When I was 17, I read a quote that went something like: "If you live each day as if it was your last, someday you'll most certainly be right." It made an impression on me, and since then, for the past 33 years, I have looked in the mirror every morning and asked myself: "If today were the last day of my life, would I want to do what I am about to do today?" And whenever the answer has been "No" for too many days in a row, I know I need to change something.
Remembering that I'll be dead soon is the most important tool I've ever encountered to help me make the big choices in life. Because almost everything — all external expectations, all pride, all fear of embarrassment or failure - these things just fall away in the face of death, leaving only what is truly important. Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose. You are already naked. There is no reason not to follow your heart.
About a year ago I was diagnosed with cancer. I had a scan at 7:30 in the morning, and it clearly showed a tumor on my pancreas. I didn't even know what a pancreas was. The doctors told me this was almost certainly a type of cancer that is incurable, and that I should expect to live no longer than three to six months. My doctor advised me to go home and get my affairs in order, which is doctor's code for prepare to die. It means to try to tell your kids everything you thought you'd have the next 10 years to tell them in just a few months. It means to make sure everything is buttoned up so that it will be as easy as possible for your family. It means to say your goodbyes.
I lived with that diagnosis all day. Later that evening I had a biopsy, where they stuck an endoscope down my throat, through my stomach and into my intestines, put a needle into my pancreas and got a few cells from the tumor. I was sedated, but my wife, who was there, told me that when they viewed the cells under a microscope the doctors started crying because it turned out to be a very rare form of pancreatic cancer that is curable with surgery. I had the surgery and I'm fine now.
This was the closest I've been to facing death, and I hope it's the closest I get for a few more decades. Having lived through it, I can now say this to you with a bit more certainty than when death was a useful but purely intellectual concept:
No one wants to die. Even people who want to go to heaven don't want to die to get there. And yet death is the destination we all share. No one has ever escaped it. And that is as it should be, because Death is very likely the single best invention of Life. It is Life's change agent. It clears out the old to make way for the new. Right now the new is you, but someday not too long from now, you will gradually become the old and be cleared away. Sorry to be so dramatic, but it is quite true.
Your time is limited, so don't waste it living someone else's life. Don't be trapped by dogma — which is living with the results of other people's thinking. Don't let the noise of others' opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary.
When I was young, there was an amazing publication called The Whole Earth Catalog, which was one of the bibles of my generation. It was created by a fellow named Stewart Brand not far from here in Menlo Park, and he brought it to life with his poetic touch. This was in the late 1960's, before personal computers and desktop publishing, so it was all made with typewriters, scissors, and polaroid cameras. It was sort of like Google in paperback form, 35 years before Google came along: it was idealistic, and overflowing with neat tools and great notions.
Stewart and his team put out several issues of The Whole Earth Catalog, and then when it had run its course, they put out a final issue. It was the mid-1970s, and I was your age. On the back cover of their final issue was a photograph of an early morning country road, the kind you might find yourself hitchhiking on if you were so adventurous. Beneath it were the words: "Stay Hungry. Stay Foolish." It was their farewell message as they signed off. Stay Hungry. Stay Foolish. And I have always wished that for myself. And now, as you graduate to begin anew, I wish that for you.
Stay Hungry. Stay Foolish.
Thank you all very much.
Tuesday, August 09, 2011
First One In Gets the Win
Yikes, it's been a while since the last update! So I will try to keep this one short and useful. Most everybody knows there are essentially two ways for tables to be merged in SAS: using the merge statement in the data step and using a join in SQL. Programmers tend to prefer one way over the other, and generally they are interchangeable. However, there are some minor differences that you should keep in mind. One such difference is in how overlapping variables are handled.
Here is a very basic one-to-one merge and its SQL equivalent:
data left;
do i = 1 to 10;
output;
end;
run;
data right;
do i = 1 to 10;
output;
end;
run;
data merged;
merge left(in=a) right(in=b);
by i;
run;
proc sql noprint;
create table joined as
select a.*,b.*
from left a inner join right b
on a.i = b.i;
quit;
Now lets add another variable that is the same on both data sets:
data left;
length overlap $8;
do i = 1 to 10;
overlap = 'left';
output;
end;
run;
data right;
length overlap $8;
do i = 1 to 10;
overlap = 'right';
output;
end;
run;
Now when the two data sets are merged, what value will be in the rows for the overlap variable?
It depends on the order you specify the data sets on the merge statement.
The value comes from the last data set to contribute a record to the merge.
data merged;
merge left(in=a) right(in=b);
by i;
run;
The resulting value for overlap will be 'right' because it is the last one named on the merge statement and each row in left has a match in right.
Would you expect it to work the same way in proc SQL? Of course not! You are a SAS programmer. These types of inconsistencies keep you employed.
proc sql noprint;
create table joined as
select a.*,b.*
from left a inner join right b
on a.i = b.i;
quit;
The value for overlap is 'left' in the joined data set. Opposite of how the data step merge works. I like to remember the SQL rule as: First one in gets the win.
Tuesday, March 29, 2011
Data Step Hooks
Here is something to keep in mind when using the END= option on the set statement: There is no guarantee you will hit the end of file.
Simple example to illustrate:
data test;
do i = 1 to 10;
output;
end;
run;
data _null_;
set test(where=(i > 10)) end= eof;
if eof
then put "It set EOF for end of file";
run;
In the SAS documentation this is stated cryptically:
Restriction: END= cannot be used with POINT=. When random access is used, the END= variable is never set to 1.
If only it were true that END= can not be used. It can be used. It just might not work as you assumed it would. Consider the above data step where i > 5. In that case it does set eof to 1 as expected. Try where i < 5. Strangely, it does set eof to 1 even though we have never reached the end of file. SAS just "knows" that is has reached the logical end of the file. Which makes you think that reading 0 obs from the file should set eof to 1. But as we saw, it doesn't.
What actually happens is the data step finishes executing as soon as the set statement fails to read another record. So even if it did set eof to 1, it would never reach the if statement to execute it.
Unfortuately, there is no good way (that I know of) to run a bit of code at the end of file, even if 0 obs are read. You could toss the where clause and use a subsetting if statement. But then you are doing a lot of useless data step IO.
What would be be sweet is if SAS provided hooks into the data step. Two useful ones would be post-compile/pre-execute and post-execute. Maybe use special named labels?
Something like:
data _null_;
set test(where=(i > 10)) end= eof;
PRE_EXEC:
* In a super awesome world, the code in this label
would ALWAYS execute no matter if the set
statement reads anything or not;
* This would eliminate a lot of the IF _N_ = 1 silliness;
return;
POST_EXEC:
* In a doubly super awesome world, the code in this label
would execute at the end of the data step's life;
* No matter how many observations were read.;
return;
run;