Google SAS Search

Add to Google

Wednesday, December 21, 2005

Quantum missing

The other day I was pulling into one of my local surfing spots to watch one of the larger swells of the season roll through and I got a call on the cell phone from my wife. After exchanging the usual pleasentries she asked me if I knew how to get the maximum missing value out of a set of missing values. WTF?
For reasons that only really really smart people can comprehend, her organization sometimes uses different "values" of missing to enumerate different reasons for the missing value. Are you with me? Something along the lines of:


select( q1a )
when('ONE') q1s = 1 ;
when('TWO') q1s = 2 ;
when('DK') q1s = .d ;
when('RF') q1s = .r ;
otherwise q1s = . ;
end;


Ostensibly this is used for creating the variables that will eventually be used in statistics.
Cause as missings, they are automatically excluded by proc calculations; but you can still see "why" it's missing. I have to say, I don't know if it's a neat-o trick or a classic abuse of language potential (just cause you CAN do it doesn't mean it's a GOOD idea), but my wife is much more pragmatic than I and didn't care to hear my theoretical musings on storing multiple values in what most of us would consider a single value-- missing. She just wanted to know: did I have an answer?

Well, the max() function returns a missing value if any of it's arguments are missing so that was no good. But as luck would have it the max operator will treat a missing value as a real value. (Which seems quite opposite of the sum() function and + operator. . .) And not only that, but the SAS documentation actually includes an order of missing values. Going smallest to largest:
._
.
.A - .Z

So the simple answer to her complicated question was:

maxMissingValue = m1 <> m2 <> m3 ; 


Of course, I did not know the answer when she called and at the time I was much more interested in the overhead bombs that were breaking outside. I mean, these were some seriously mind numbing big waves. . . So I made some lame joke about how you can't know the missing value until you collapse the missing value potential wave into a singularity event through observation, blah blah blah, uh I have no idea honey.

1 comment:

  1. The example you provide works when the missing values are all on the same obversation. When they are on multiple records you can get the max missing value for each level of your sorting variable by doing the following:
    PROC SORT Data=_inDsn_ Out=_outDsn_ NoDupKey ;
    BY _sortVar_ mainVar ;
    WHERE Missing( mainVar ) ;
    RUN ;

    DATA _outDsn2_ ;
    SET _outDsn_ ;
    BY sortVar ;
    IF ( Last.sortVar ) ;
    RUN ;

    I am a big fan of the Missing() function because of the fact that good old . is not the largest missing numeric value. A dangerous old standby for eliminating missing values is:
    WHERE ( numVar > . ) ;
    Run this on your wife's special missings and they will sneak through. Good if she wants that to happen, bad if she doesn't.
    As far as the value of using the special missing values, I think it has to be "appropriate" in that they exist and I can't think of any other reason for it. A neat thing is that you can assign format strings to each special missing value-I would say it's a big mistake not to. I'd hate to try to figure out what they mean on my own.
    I think that creating a system for these values, on the other hand, is probably tricky. If the ordinal relationships have meaning, as implied by your wife's request, I'd hate to think what she'd have to do if she needed to insert a new level between .A and .B.
    I'd love to see a PROC that would summarize the non-missing values like MEANS and the missing values like FREQ. This would be very useful for missing lab observations which could be assigned formatted values that display as 'Sample not received' 'Sample contaminated' 'Not enough material' 'BLQ'.
    Yeah, I could write a macro but c'mon, SAS, help me out here...

    ReplyDelete