hi ...
data _null_ wrote ... "While I think CMISS will be very handy, I don't think it adds anything to
the solution to this particular problem. There really is no reason to transpose the data when the
MISSING function will suffice to examine each variable at each observation."
well ... I think that it might add one thing, parsimony ... the code comprises only TRANSPOSE+a
short data step ... I suppose one could argue that parsimony+TRANSPOSE is an oxymoron, but if you
limit the definition of parsimony in this instance to a small amount of SAS code, I think this
does add something to the solution (though the posted solution using a couple of formats, ODS
OUTPUT, and PROC FREQ is pretty neat)
also ... I forgot about that as of V9, the limit on the number of variables in no longer 32,767
(making the TRANSPOSE limit one of time not data set size)...
http://support.sas.com/kb/8/213.html
so, with ~600,000 observations and 5 variables (Windows XP, SAS 9.2) ...
TRANSPOSE+data step+CMISS approach: REAL TIME about 16 seconds
DATA_NULL_ posted hash table approach: REAL TIME about 4 seconds
DATA_NULL_ posted formats+ODS OUTPUT+PROC FREQ approach: REAL TIME less than 1 second
so, with a large data set, if the definition of parsimony also includes a consideration
efficiency, I think that we have a clear winner here (no experiments with the definition of
"large" expanded to many variables)
I think that the various solutions fit what I call the Kenny Roger's Rule: "...You got to know
when to hold em, know when to fold em,..."
ps I searched the SAS-L archives and this is the first time the CMISS function has been
mentioned, co maybe making folks with V9.2 aware of the function also "adds something"
--
Mike Zdeb
U@Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
>> On 8/5/08, Mike Zdeb <msz03@albany.edu>> wrote:
>> hi ... ps ... since TRANSPOSE is involved, the V9.2 CMISS solution has an observation limit --
>> Mike Zdeb
>> U@Albany School of Public Health
>> One University Place
>> Rensselaer, New York 12144-3456
>> P/518-402-6479 F/630-604-1475
>>>hi ... if you already have V9.2, the CMISS function will count missing character values (as
NMISS
>>> does for numeric), so that reduces the effort ...
>>> * create some missing values;
>>> data class;
>>> set sashelp.class end=last nobs=obs;
>>> if _n_ in (3,5,7) then call missing(of _character_);
>>> if _n_ in (4,8) then call missing(of _numeric_);
>>> run;
>>> * how many observations;
>>> data _null_;
>>> call symputx('nv',obs);
>>> set class (obs=0) nobs=obs;
>>> run;
>>> *
>>> everything ends up as character in TCLASS
>>> so you need a BLANK for missing
>>> to make CMISS work properly
>>> ;
>>> options missing=' ';
>>> proc transpose data=class out=tclass;
>>> var _all_;
>>> run;
>>> options missing='.';
>>> data counts;
>>> set tclass;
>>> nmiss = cmiss(of col:);
>>> n = &nv - nmiss;
>>> drop col:;
>>> run;
>>> proc print data=counts noobs;
>>> run;
>>> _NAME_ nmiss n
>>> Name 3 16
>>> Sex 3 16
>>> Age 2 17
>>> Height 2 17
>>> Weight 2 17
>>> Mike Zdeb
>>> U@Albany School of Public Health
>>> One University Place
>>> Rensselaer, New York 12144-3456
>>> P/518-402-6479 F/630-604-1475
|