Google SAS Search

Add to Google

Thursday, June 03, 2010

How To Get What You Want Out of a Data Step Merge

This Fall my daughter will be going to kindergarden. So like all other hyper-attentive parents we have started introducing her to the concept of homework. The other night I got out her crayons and introduced her to set theory. After about an hour I finally got her to draw her Venn diagrams with the the corresponding SAS data step code for a merge. I was so proud I decided to post it here.

In her drawing data set A is red and B is blue. The shaded area is what's kept.
Disregard the part where she drew herself building a sand castle on the beach. It has nothing to do with Venn diagrams, or SAS data step merging code.

What? You don't believe my five year old daughter drew it? :)

Just a quick refresher: The A and B business refers to the automatic variables that are created when you use the IN= data set option. Essentially the variable A will be "true" whenever that data set contributes an observation to the merge (or join). Also, I don't know why everyone uses A and B-- you can set these variables to anything (left|right, paid|due, etc); but I've always seen A and B so I will stick with convention.

data merged;
merge someData(in=A) otherData(in=B);
by someKey;
if a and (not b); * just keep the observations in A that do not match anything to B;