Data Steps: 2009

Thursday, December 17, 2009

Design

You are at the gym running on a treadmill listening to your iPhone through the earbuds. You're having one of those good, easy runs. Maybe you're listening to an interesting podcast like WNYC's Radio Lab, or some good running music. You're focused. You've got a nice rhythm going and your body feels relaxed. You push the pace a little more than normal. Sweat is dripping off you. Endorphines begin flooding your system. You have declared war on holiday snacks and Christmas dinners. And now on this treadmill you are winning the war. Suddenly your iPhone stops playing. You are pulled out of your running reverie and glance down at your now-beeping phone. Voice control has been activated. You fumble for the controls. You stumble on the treadmill. The pace is too fast for multi-tasking. You slow the pace down on the treadmill and flip the phone back to iPod. The phone takes on a life of it's own, pausing and clicking and going back to voice control again. You try to run without, but it's just not the same. Your running rhythm won't return. Everything hurts. You're exhausted. You hit stop on the treadmill and step off defeated.

Apparently there is a design flaw on the iPhone earbuds where moisture causes the little clicker to send random clicker signals to the iPhone.

A design flaw on an Apple product! And Apple is good at design.

So now that I've gotten my preliminary decisions out of the way in the previous post, I can start working on designing my app/site. Luckily Rails encourages agile development which is all about procrastinating on difficult design decisions. Rather than try to design your entire project at the beginning, you take a very minimalist approach and assume the details will reveal themselves as you go.

My minimalist design will consist of two pieces: a short paragraph describing the site, and a piece of paper with drawings of the layout of the pages.

The description of the site:
The site will display SAS jobs. Job seekers will be able to search the jobs by zip code. Recruiters will be able to post jobs. The jobs will expire after a certain amount of time. There will also be some admin pages to control Recruiters and the Jobs they post.

Now I'm taking out a piece of paper and drawing some rough sketches of the main pages.... Done.

From the description I see there are three types of users with three main actions: job seekers find, recruiters post and admins admin. I also see that there are only two models that I need in my data base: recruiters and job postings.

That's pretty much it for the design. Now I can begin coding.

Friday, December 11, 2009

Some Preliminaries

As I mentioned in an earlier post, I am going to build a new site dedicated to SAS jobs. Before I actually start coding the site, there are some preliminary decisions that need to be made. I have already decided to use Ruby on Rails for the framework. I have just started learning it and so far I like what I have seen. I have also decided on my host already. Although nearly every host says they support hosting RoR, most don'tdo a very good job. I have test driven heroku.com with another small site I built and they do a great job. They will essentially host everything for free while you mock it up and you can buy more resources if you ever need to. They are also an exclusive RoR host so they have tailored the develop, test, deploy environment to Rails' unique agile nature. Choosing heroku.com as my provider also forces me to use git as my version control. Heroku seamlessly uses git as part of the workflow and it works very very nicely.

Now that I have gotten the preliminary decisions out of the way I will be able to develop some simple use cases and models and start coding my app.

Oh, and I will be doing all of the development on my macbook running OSX with Xcode3 and using mySql as the development database.

Below are the books that I am using to help guide me along the learning path. I have all three books on my desk and I can say with confidence that all three are very good.

Books:

Pragmatic Version Control Using Git (Pragmatic Starter Kit)

Agile Web Development with Rails, Third Edition

Programming Ruby 1.9: The Pragmatic Programmers' Guide (Facets of Ruby)

Wednesday, December 09, 2009

Some Fun with SAS and Perl Regular Expressions

This post assumes you have a little understanding of how regular expressions work and specifically how SAS implements regular expressions. I recently did something like this and thought it would be good to share. Suppose you have a program that searches through a big text field for a specific word. That's pretty easy to code and you can even get away with just using a simple indexW() function. The problem is when you look at the text field on your report, your eyes glaze over as you scan for the word to make sure you are capturing the correct output. If only there was some easy way to make the word stand out from its neighbors.

I used the prxchange() function to search for a pattern and then replace it with another pattern. In this case, I am outputting HTML so I can wrap my search word in tags. First I will give a little example code, then I will break down what the code is doing and finally show some easy improvements. For the sake of clarity and brevity, I am only showing the code that highlights the search word. I am not showing the code that subsets the data based on the search term.

Example 1:


data _null_;
  input text $80.;
  put "The text before matching " text=  ;
  text = prxchange('s/(battery)/<b>$1<\/b>/', -1, text);
  put "The text after matching " text= //;
datalines;
This battery is dead.
Batteries are in the box.
;

Output in the log:
The text before matching text=This battery is dead.
The text after matching text=This <b>battery</b> is dead.


The text before matching text=Batteries are in the box.
The text after matching text=Batteries are in the box.

Looking at the code above, you can see that the only interesting thing happening is the prxchange() function. The prxchange function takes a regular expression as its first argument. The regular expression uses a substitution syntax with a generic look of

s/(something to look for)/numbered capture buffers/.

So in my example above, the word (or pattern really) I am looking for is battery. I put () around it to specify that it's the first capture buffer: $1. Then I wrap $1 with bold tags. You can see I had to escape the / in the closing tag because it is a special regular expression character. So my regular expression is:

s/(battery)/$1<\/b>/

and reads as: look for the pattern 'battery', store it in $1 and substitute it with $1.

The second parameter to the prxchange() function is -1 and just tells the function to keep searching the source, finding and replacing every occurrence till you get to the end of source. The third parameter 'text' just tells the function what text source to search.

Make sense?

Now there are a couple things that can easily be added to the regular expression to make the code a little more robust and efficient. First of all, the regular expression is recompiled on every loop of the data step. In our case, we don't need that so we can add the /o option to the end of the regular expression to tell it to just compile it once:

s/(battery)/$1<\/b>/o

Also, our regular expression is caSe SensiTive. We can tell it to ignore case by adding the ignore case option (/i) to the end of the regular expression:

s/(battery)/$1<\/b>/oi

Now it will match battery, Battery, BATTERY, etc.

But wait! We also want to match Batteries. What to do? We could shorten our regular expression to:

s/(batter)/$1<\/b>/oi

But that would match batter and batter is a liquid mixture, usually based on one or more flours combined with liquids such as water, milk or beer. That's definetly not what we are looking for. We want to search for batter followed by at least one or more [a-z] characters:

s/(batter[a-z]+)/$1<\/b>/oi

Now our example code looks like:


data _null_;
  input text $80.;
  put "The text before matching " text=  ;
  text = prxchange('s/(batter[a-z]+)/<b>$1<\/b>/oi', -1, text);
  put "The text after matching " text= //;
datalines;
This battery is dead.
Batteries are in the box.
Do not eat the cookie batter before it is cooked.
;

Output in the log:
The text before matching text=This battery is dead.
The text after matching text=This <b>battery</b> is dead.


The text before matching text=Batteries are in the box.
The text after matching text=<b>Batteries</b> are in the box.


The text before matching text=Do not eat the cookie batter before it is cooked.
The text after matching text=Do not eat the cookie batter before it is cooked.

And finally, you sharp SAS coders probably don't want to hardcode the search term. More likely it would be stored in a variable and then you could construct the regular expression like you would any other text variable:


mySearch = 'batter';
rx = "s/(" ||
      mySearch ||
      "[a-z]+)/<b>$1<\/b>/oi";

Or something like that. Also, you can search for more than one thing. Just enclose each pattern in () and refer to them as $1, $2, etc. Play around with it. Have fun. Thanks for reading!

Wednesday, December 02, 2009

Side Projects

A few years ago I created a site that lets users donate their SAS expertise by creating online documentation. Users could put in their own example code, explain potential pitfalls, share tips, etc. I coded it all by hand in Perl with MySQL, JavaScript and HTML. I even included some nifty AJAX for logging in, etc. And it had a trendy name too: iDoc. As in "I document" verb, or "Internet Documentation" noun. After a lot of programming and evenings with O'Reilly books, I felt that it was ready to be released to the world. I tentatively exposed the URL and wrote an introductory email to SAS-L. The response was....

virtual silence.

Ho Hum. Crickets chirping. Nothing. Well, there was one person who railed against my decision to not be cross-browser compatible. Specifically, the site worked well with IE, not so well with others. As all of you who have developed anything more complex than the most generic HTML page knows, cross-browser compatibility is a nightmare. To describe it as a pain in the ass is a disservice to donkeys. I digress...

Anyways, the _site_ was a failure. But the _project_ was a success in that it taught me a bunch of stuff that I was able to incorporate into my day-to-day programming. And the things I learned have dovetailed into other side projects.

Fast forward to today. Today I am starting another side project. It will be written in Ruby on Rails. I bought a Rails book two months ago and have written only one little site so far. But I like it. It's clean, it's fast to develop with and it's easy to learn. I don't even know Ruby. I'll be learning that along the way too. I'll try to share what I learn as I go. I'll tag the posts with something like "Side Project" to differentiate them from the normal SAS postings.

The side project I am starting today is a SAS specific job site. I know there are others out there and some good ones too, but I think there might be enough room for one more. And even if the site fails, the time will be well spent learning a new language and platform.

If you could, pop over to the comments and let me know if hearing about a side project written in Ruby on Rails is the least bit interesting to you. As much
as I like the idea of sharing what I learn, I don't want to fill a SAS programming blog with a bunch of posts about a topic nobody cares about. Thanks! -s

Thursday, November 19, 2009

Keep and Rename Data Set Options

I always forget which gets applied first when using the keep= and rename= data set options on the same data set. So I thought I'd just put it here so I will remember:

Keep happens before the rename.

Keep happens before the rename.

Keep happens before the rename.

There. Now I won't forget.

A little test code to prove it:


data test;
  length x y z $5;
run;

data test;
  set test(keep=x y z rename=(x=x2));
run;

Monday, November 16, 2009

{ old=>'datasteps.blogspot.com' , new=>'www.sasCoders.com' }

A few months ago I got an email from my service provider vaguely stating they would no longer exist as a company and I better find an alternative. Yikes! I am still trying to sort everything out, but so far being forced to move has been a good thing. It has given me the impetus to focus, evaluate and expand my little web presence.

One of the tasks I put off for too long was to link this blog to its own unique URL. Apparently having its own URL gives your blog a feeling of gravitas and professionalism. Now I can finally cross that off my checklist. While the new URL is www.sasCoders.com, the old datasteps.blogspot.com will still continue to work. And the name of the blog will still stay the same. Eventually as I sort things out I will probably move it to its own subdomain (datasteps.sasCoders.com). Would you believe datasteps.com is already taken?

Dear Mark Fitzgerald, er, I mean CodePanther: I am assuming you probably registered a bunch of URLs based on existing blogspot.com blogs? In the hopes that I will eventually want that name and contact you to see how much you will sell it for? What a great strategy! Hope it works well for you. Oh wait! Maybe you specifically chose data steps because you wanted to be associated with this super successful blog? Woohoo! Rock on CodePanther!

Anyways, hopefully the switch is seamless and I won't lose any readers or google juice. :)

Tuesday, October 27, 2009

SAS Job Searching

Where do you search for SAS jobs? I have been lucky during the downturn and stayed employed, but I know there are a lot of others out there who haven't been as fortunate. When I did look for a job I generally used dice.com, but that
was mainly out of habit more than anything else. Do you have any specific sites that you use? Any tips to share with other SAS programmers looking for work?

Tuesday, September 08, 2009

Telecommuting

If you work in a large city there is a good chance you have to commute to work. It is also likely that your commute time is non-trivial. Here in Los Angeles it is not uncommon for people to spend over an hour in their car everyday. No fun!

If you search dice.com for sas jobs 1115 jobs are returned. If you tell dice to restrict to telecommuting jobs a whopping 0 are returned(!). So why isn't telecommuting more of an option? We have laptops, cell phones, secure VPN, high speed internet, Skype, etc. Why hasn't the distributed work force become the norm? As a SAS programmer, do you telecommute? If not, why not?

If you do telecommute, could you share your setup with us? What has worked for you and what hasn't? The more specific you can be the better. If I can get enough feedback I will put something together along the lines of A SAS Programmer Telecommuting/Home Office Best Practices.

Some questions I am thinking of:
Do you have a plan for backing up data? Is it local? There's a lot of really good online backup that is really cheap.
How do you protect your data? Encryption tools?
Do you use a revision control system to track your source code (Git, Subversion)?
How do you connect to servers? VPN, SSH?

Any other issues I am not thinking of?

Wednesday, August 26, 2009

SAS Orphaned Workspace

Sometimes SAS doesn't shut down correctly and you get stuck with orphaned workspace. These orphaned workspace directories should be cleaned out now and again. You can run this code in SAS to find out where your workspace is and then go to the directory and delete any old ones.

data _null_;
w = getoption('work');
put w=;
run;

On my windows machine this gets me something like:

w=C:\DOCUME~1\STEPHE~1\LOCALS~1\Temp\SAS Temporary Files\_TD3516

Now I can go to C:\DOCUME~1\STEPHE~1\LOCALS~1\Temp\SAS Temporary Files\ and delete all the old ones.

Note, this is straightforward on a single user machine like my windows laptop, but you have to be more careful in multi-user environments where you don't want to delete active workspaces that others are using.

For multi-user environments like unix there is a utility that SAS provides called cleanwork.

Thursday, July 23, 2009

Gliffy

Occasionally I like to sketch out data extraction flows, database schemas, use cases, etc. I prefer pencil and paper to the industrial strength tools like visio. But sometimes I need to share my little pictures and mailing my scratch paper just won't cut it. I recently found a great free online site that provides a diagram editor that is easy to use and lets me share with people over the internet.

www.gliffy.com

So far I have only used the free account, but it has worked great.

Wednesday, April 01, 2009

Proc FCMP

A long time ago I figured out how to write functions for SAS using SAS/Toolkit. It was not a straightforward process and not very useful. In fact, someone recently asked me about it and I was unable to share much useful info. However, I did find proc FCMP which does look to be very useful-- if you're into writing your own functions, that is!

Here is some online info:
http://support.sas.com/rnd/base/datastep/user-written-functions.pdf
http://support.sas.com/documentation/onlinedoc/base/91/fcmp.pdf

If you have SAS 9.1 you can use it to write functions that can be used in some
procedures, but not within the data step. However, if you have SAS 9.2 you can use them in a data step. And it uses data step syntax so you don't have to know C!

Data Steps

Google SAS Search