Google SAS Search

Add to Google

Tuesday, November 29, 2005

fEqual() Compares Floats For Equality

Here's a SAS function I wrote in C using SAS/TOOLKT that addresses the floating point equality problem previously discussed here.

It is pretty straightforward and uses the algorithm based on this SAS TS Note.

%MACRO FUZZCOMP(X,Y,EPS=1E-12);
(ABS(&X-&Y) LE &EPS*MAX(ABS(&X),ABS(&Y)))
%MEND;


You can find it and a couple other functions I have written here.

As an aside, is there any interest out there in knowing how to write user-written functions in C for the SAS System using SAS/TOOLKT? I know SAS/TOOLKT isn't the *sexiest* SAS product (that would be JMP), but it can be quite useful writing specific functions in a lower-level language like C. If there is any interest I could put something formal together for a paper or even just put a tutorial up on the web.

Thursday, November 10, 2005

De-dupe In Excel

A lot of SAS programmers have to deliver data in Excel every once in a while. Sometimes, after you've gotten the data into Excel you find that you need to get rid of duplicates. Here's how to do it in Excel:

1) Be sure your columns are named.

2) Highlight the columns you want to use as your sort key (the ones you would use in your BY statement for PROC SORT).

3) From the drop-down menu go to Data->Filter->Advanced Filter.

4) Excel will automatically select the range you have highlighted. You should see a little checkbox that says "Unique Records Only." Check that.

5) Hit OK.

Wednesday, November 02, 2005

What Every Computer Scientist Should Know About Floating-Point Arithmetic

So I've been thinking about floating point numbers recently. Mostly I've been thinking about comparing floating numbers for _relative_ equality. I know this certainly isn't a new issue for most, especially if you have worked with a "lower" level language like C/C++, but for the average SAS programmer it may come as a surprise that 7.4 may not = 7.4! In fact the rules of real numbers dictate that 7.4 can never = 7.4 since they are both approximations ( or shorthand ) for an infinetly precise number with decimal places stretching from here to Mars and back again ad infinitum.

In the general world we don't really care that much about floating point inequality because our precision, or more specifically lack-of-precision, makes it a moot point.

But in the world of computers and real numbers precision is always an issue. As anyone who has been unfortunate enough to write code such as this has (painfully) learned:


// add .10 cents rebate to the customers account till they have reached
// the rebate maximum
// called by perVisit() function

const float REBATE_MAXIMUM = 2.5; // $2.50 rebate max

void addRebate( Customer &c)
{
if ( c.accumulatedRebate == REBATE_MAXIMUM ) return;
else
{
// add the ten cents to their account and update their accumulated rebate
// so they do not go over
c.account += .1;
c.accumulatedRebate += .1;
}
}

Now who's going to explain to the CEO why all the 3rd quarter revenue got eliminated in massive rebates? GULP.

Hopefully you recognize the error in the above code? Since the values being compared are floats they are not really 2.50 but really something closer to 2.50000000007 or 2.5000000000001 or well _anything_ once you get past the signifigant digits of 2.50.

But the above code is C++ and as a SAS programmer you don't have to worry about those kinds of hairy details right? Try this code from Data Savant Consulting(which has a nice page discussing this very issue):


data _null_;
x = 7.3;
x = x+ 0.1;
y = 7.4;
if x = y then put "Duh! of course they are equal.";
else put "Doooh! " x " and " y " are not equal!!";
run;

Then go read this!
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Or if you don't have the time to grind through that, just remember comparing floats for equality is not usually a good idea.