Google SAS Search

Add to Google

Tuesday, October 16, 2007

Saving Steps With SQL

Often we need to create some simple statistics for a set of data and then associate those stats with each observation of the original set. As a simple example
consider a table with only three rows:
N
3
6
4

We want to get the mean of the variable N and stripe it down all the observations:
N Mean
3 4.333
6 4.333
4 4.333

The first way I learned to do this was with a proc summary and a merge. A better way to do it is with proc sql.

Here is a little test data:
data myData;
input x level $;
cards;
11 a
31 a
51 a
2 b
61 a
8 b
21 a
71 a
91 a
4 b
61 a
21 a
5 b
7 b
5 b
31 a
1 b
61 a
8 a
9 b
3 a
2 b
5 a
7 b
7 a
3 b
;
* in that data set we have two variables X and LEVEL. We can get the stats on X for each level by summarizing and merging...;
proc sort data = myData;
by level;
run;

proc summary data = myData;
by level;
var x;
output out= tempStats(drop=_type_ _freq_) mean=mean max=max min=min;
run;

data sumStats;
merge myData tempStats;
by level;
run;

* or better yet, we can collapse the whole thing into one nifty proc sql step!;
proc sql;
create table stats as select *,
min(x) as min,
max(x) as max,
mean(x) as mean
from myData group by level;
quit;

2 comments:

  1. SQL is by far my favourite way of calculating summary stats like that. The major downside (at the moment, at least) is that SQL doesn't have a "median()" function. Or maybe it does in version 9?

    ReplyDelete
  2. Hi Chris,

    Indeed, there is no "real" median() function in SQL-- even in version 9. There is a median function but it works across columns on the same row (ie, median(col1,col2,col3,col4) ) NOT median(row1...rowN). Confusing to be sure!

    ReplyDelete