Re: missing numerical values = - infinity? #4 367832

  • Permalink
  • submit to reddit
  • Email
  • Follow

The main problem revealed in this thread is the non-intutive
results of applying a comparision operator to missing values
for numeric varia bles.  There seems to be no major complaint
about how missing values are treated in an arithmetic

So, what if SAS introduced a new class of comparison
operators, like the below?

  X := Y        X:^= Y
  X :< Y        X:^< Y
  X :> Y        X:^> Y
  X :>=
  X :<=

Let's say all of these comparison operators would generate a
TRUE or FALSE only when both X and Y are not missing.
Otherwise they would result in a missing value.  That is, in
the assignment statements below, Z could be ., 0, or 1.

   Z = X:=Y ;
   Z = X:^=Y ;

This would have no deleterious effect on extant programs, but
would provide a welcome flexibility in dealing with a long-
standing problem.  There would be no need for

   if missing(x,y)=0 then result = (X>Y);

Instead you could have:

    Result = (x:>y);

Introducng these operators to the OP's example of

   if birth_weight :< 2500 then low_birth_weight=1;
   else if birth_weight :>= 2500 then low_birth_weight=0;

would result in low_birth_weght = . if birth_weight is missing.
Just what the OP wanted.

Is this notion of enough interest to be thought through
more thorougly, with the possibility of adding it to the
sasware ballot?


> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Peter Flom
> Sent: Tuesday, December 29, 2009 4:08 PM
> Subject: Re: missing numerical values = - infinity?
> Dale McLerran <stringplayer_2@YAHOO.COM> wrote
> >Peter,
> >
> >I don't like creating additional variables to contain information
> >about the missing values.  It increases file size and complexity.
> >I much prefer having the information about different kinds of
> >missing values coded directly with the variable when the response
> >is missing.
> >
> >Dale
> >
> I can see the point of that.
> I guess it's partly a matter of taste, partly what you are used to
> doing, and partly a question of how often you need these different
> missing values
> Peter
> Peter L. Flom, PhD
> Statistical Consultant
> Website: http://www DOT statisticalanalysisconsulting DOT com/
> Writing;
> Twitter:   @peterflom
Reply mkeintz 12/30/2009 12:10:42 AM

See related articles to this posting

Hi, all,

Thought this code might be of interest, especially with regard to Dan
Nordlund's post with a reference about floating point reps:

------------------------------------------- Code

options noovp noerrorabend nocenter mprint mprintnest ls = 80 ps = 54
        compress = yes spool nosymbolgen nomlogic source source2;
title "SAS v&sysvlong on &sysscpl";

data missings ( keep = x xhex xaddrl );
  do ade = rank( '_' ), rank( ' ' ), rank( 'A' ) to rank( 'Z' );
    x = input( '.' || byte( ade ), ??best. );
    link output;
  do a = -constant( 'BIG' ), -1, -constant( 'SMALL' ),
         constant( 'SMALL' ), .1, .2, .3, .4, .5, .6, .7, .8, .9, 1,
2, 5, 9, 10, constant( 'BIG' );
    x = a;
    link output;

  xaddrl = addrlong( x );
  format xaddrl hex16.;
  xhex = put( peekclong( xaddrl ), hex16. );

proc print width = min data = missings;
  var xhex x;
  format x best32.;

---------------------------------- Output

SAS v9.01.01M3P020206 on XP_PRO           01:04 Wednesday, December
30, 2009   1

Obs          xhex                              x

  1    0000000000D2FFFF                        _
  2    0000000000D1FFFF                        .
  3    0000000000BEFFFF                        A
  4    0000000000BDFFFF                        B
  5    0000000000BCFFFF                        C
  6    0000000000BBFFFF                        D
  7    0000000000BAFFFF                        E
  8    0000000000B9FFFF                        F
  9    0000000000B8FFFF                        G
 10    0000000000B7FFFF                        H
 11    0000000000B6FFFF                        I
 12    0000000000B5FFFF                        J
 13    0000000000B4FFFF                        K
 14    0000000000B3FFFF                        L
 15    0000000000B2FFFF                        M
 16    0000000000B1FFFF                        N
 17    0000000000B0FFFF                        O
 18    0000000000AFFFFF                        P
 19    0000000000AEFFFF                        Q
 20    0000000000ADFFFF                        R
 21    0000000000ACFFFF                        S
 22    0000000000ABFFFF                        T
 23    0000000000AAFFFF                        U
 24    0000000000A9FFFF                        V
 25    0000000000A8FFFF                        W
 26    0000000000A7FFFF                        X
 27    0000000000A6FFFF                        Y
 28    0000000000A5FFFF                        Z
 29    FFFFFFFFFFFFEFFF     -1.7976931348623E308
 30    000000000000F0BF                       -1
 31    0000000000001080    -2.2250738585072E-308
 32    0000000000000000                        0
 33    0000000000001000     2.2250738585072E-308
 34    9A9999999999B93F                      0.1
 35    9A9999999999C93F                      0.2
 36    333333333333D33F                      0.3
 37    9A9999999999D93F                      0.4
 38    000000000000E03F                      0.5
 39    333333333333E33F                      0.6
 40    666666666666E63F                      0.7
 41    9A9999999999E93F                      0.8
 42    CDCCCCCCCCCCEC3F                      0.9
 43    000000000000F03F                        1
 44    0000000000000040                        2
 45    0000000000001440                        5
 46    0000000000002240                        9
 47    0000000000002440                       10
 48    FFFFFFFFFFFFEF7F      1.7976931348623E308
Reply droide 12/30/2009 6:06:53 AM

""Keintz, H. Mark"" <mkeintz@WHARTON.UPENN.EDU> wrote in message
> The main problem revealed in this thread is the non-intutive
> results of applying a comparision operator to missing values
> for numeric varia bles.  There seems to be no major complaint
> about how missing values are treated in an arithmetic
> expression.
> So, what if SAS introduced a new class of comparison
> operators, like the below?
>  X := Y        X:^= Y
>  X :< Y        X:^< Y
>  X :> Y        X:^> Y
>  X :>=
>  X :<=
> Let's say all of these comparison operators would generate a
> TRUE or FALSE only when both X and Y are not missing.
> Otherwise they would result in a missing value.  That is, in
> the assignment statements below, Z could be ., 0, or 1.

Not sure I see the point - at the moment, missing values and zeroes are 
FALSE, positive and negative numbers are TRUE.  If your new comparison 
operators resulted in a missing value, the comparison would evaluate to 
FALSE, and SAS would presumably slide in a zero.  I couldn't guess how many 
programs would be affected if this behavior were changed.

I don't think it's possible to make a language "intuitive" to everyone.  I 
worked with a guy once who got angrily agitated when it turned out that the 
NODUP operand of PROC SORT did not give him what he wanted.  He should have 
used NODUPKEY.  He couldn't be bothered to look it up, just assumed that SAS 
would do what he wanted regardless of what he typed.

The lesson here is to RTFM.  Missing values are pretty thoroughly 
documented, and have been for upwards of 30 years.

Reply Lou 12/30/2009 2:15:05 PM 130703 articles. 26 followers. Post

2 Replies

Similar Articles

[PageSpeed] 59

  • Permalink
  • submit to reddit
  • Email
  • Follow


Similar Artilces:

Re: Numerical missing values #4
Hi, In your PROC FORMAT you have a few too many tokens to begin the format. Use just one statement and then use multiple values within that statement. You essentially are overwriting the single line format each time if you repeat the invalue statement, that is why you only see the last one. e.g. proc format; invalue cnv_mss 991 =.a 992 =.b 993 =.c 994 =.d 995 =.e 999 =.f other=. ; run; data test; do i = 990 to 999; newvar = input(i,cnv_mss.); output; end; run; data _null_; set test; if newvar eq .d then p...

Re: New Comparison Operators? - WAS: missing numerical values =
what is the skewness estimates in your extensive experience? On Wed, 6 Jan 2010 13:24:33 -0500, Jonathan Goldberg <jgoldberg@BIOMEDSYS.COM> wrote: >In my (fairly extensive) experience the variance in the quality of code >produced by statisticians is quite high. :-) > >Jonathan > >On Mon, 4 Jan 2010 17:27:47 -0500, oloolo <dynamicpanel@YAHOO.COM> wrote: > >>OT: why so many ppl claim that statisticians are bad SAS programmers? What >>are their sample sizes? After all, SAS was written by Statisticians, LOL >> >>On Mon, 4 Ja...

Re: New Comparison Operators? - WAS: missing numerical values =
Jonathan, I'm not familiar with the various options of storing or testing for missing data; however, I have found the (up to) 28 ordered ways to store missing data (._ . .a .b ... .z) with SAS most helpful, namely, missing data can occur for a variety of reasons (don't know, refused to answer, not available, etc.) to which SAS can apply a different missing data code for each (even allowing comparisons if missing data from one record is in a sense "worse" than another). It does make sense and is often extremely helpful to be able to test if .r from one respondent is equal to...

Re: How to replace the Missing Value with the correct Value by #4
There are a few ways of looking forward, IIRC something to the effect of merge the dataset with the dataset at point (_N_+1). I could work it out but I'm in a meeting shortly, and several of the regulars seem to have it memorized better than I do. It's also on the page somewhere. -Joe On Mon, Apr 6, 2009 at 1:50 PM, Gerhard Hellriegel <> wrote: > hmm, you want to go forward one record if there is no lagged? The logic > for that I find a bit complicated (go back if possible, otherwise > forward...). > Perhaps a idea:...

Re: "Missing" macro values, was: Re: Macro Variable Problem #4
Hi Toby, Yes, that is correct. I know about the characters BYTE(1) to BYTE(31), which are being used for macro quoting. That apparently was the easiest way to implement it in SAS. Otherwise extra special characters to signal the begin and the end had to be inserted (instead of replaced), and that would make it much more difficult for the programmers of SAS to realize macro quoting. Anyway, my question actually is; why are those masking characters ignored with %EVAL and %SUBSTR? Are they ignored as well with implcit evaluation of conditions (after %IF)? Regards - Jim (from home, without SAs a...

Re: Numerical missing values
I think the SPSS method is non-standard. It would be best to convert your missing data to truly missing values. Usually '.' is sufficient, but as you note, there are the special missing values .a-.z and ._ also. An effective method to recode is with Proc Format: proc format; invalue cnv_mss 999 =.; data miss; input num cnv_mss.; datalines; 1 999 5 4 999 ; run; hth Paul Choate DDS Data Extraction (916) 654-2160 > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Karovaldas > Sent: Wednesday, May 10, 2006...

Re: missing value #4
Good old Double DoW does it: data need(drop = firstmtb); do until (last.id_firm); set have; by id_firm notsorted; if missing(firstmtb) and not missing(mtb) then firstmtb = mtb; end; do until (last.id_firm); set have; by id_firm notsorted; if missing(mtb) and not missing(ri) then mtb = firstmtb; output; end; run; On Mon, 14 Feb 2005 18:40:08 +0100, pierre magin <pierre_magin@YAHOO.FR> wrote: >Hi all; > >I have the following data; and I want to replace >the missing value of the variable mtb by th...

Re: Populate missing values #4
Or SQL :-) proc sql; create table temp1 (drop=adate rename=(adate1=adate)) as select *, max(adate) as adate1 format mmddyy. from temp group by id order by id ; ID State adate 001 PA 01/01/06 001 CA 01/01/06 001 WA 01/01/06 002 SD 01/08/05 002 NY 01/08/05 On Thu, 26 Jan 2006 12:49:39 -0500, Howard Schreier <hs AT dc-sug DOT org> <nospam@HOWLES.COM> wrote: >It can also be done in one step by means of a double pass, as in: > > data Result; > set temp(in=preview) temp; > by id; > if then maxd...

Re: Comparing numeric values #4
On Thu, 1 Nov 2007 10:18:19 -0500, data _null_, <datanull@GMAIL.COM> wrote: .... >Then look on for "Numeric precision 101" or similar, >for the answer to why. .... Here is the current link to the document Instead of book marking, search for "TS-DOC: TS-654" -- I don't think they will change the TS-# even when they re-org... :-) Cheers, Chang ...

Re: Missing Value Imputation #4
RMathur@INDUCTIS.COM wrote: >I am trying to impute missing values for some continuous variables. I >have a categorical variable called as SIC which is nothing but industry >SIC. > >My question is that instead of imputing those continuous variables buy >just mean or median why can't I impute those values by help of SIC. > >For each SIC I find the mean value (of non missing obs) of that >variable. >Now I try to impute the variable by checking its SIC and then impute the >corresponding mean in place of missing. > >Is my approach correct? Is it more accur...

Re: replace missing value #4
On Wed, 7 Jun 2006 11:34:26 -0400, Sigurd Hermansen <HERMANS1@WESTAT.COM> wrote: >ljmpll: > >In a Data step with dataset sorted so as to make the lagged value the one that you want to carry over, > >y=lag(x); >if missing(x) then x=y; But if there are multiple consecutive missing values to be filled, a RETAIN-based approach works better than a LAG-based approach. data before; input x @@; cards; . 1 . 2 . 3 4 . . 5 ; data after; set before; y=lag(x); if missing(x) then x=y; drop y; run; g:ives Obs x 1 . 2 1 ...

Re: Updating missing values ?? #4
On Tue, 13 Jun 2006 18:43:22 -0700, shiling99@YAHOO.COM wrote: >sa polo wrote: >>Hi, > > > >I have a dataset : > >data x; > >id=1 ; val=.; output; > >id=1 ; val=.; output; > >id=1 ; val=4; output; > >id=1 ; val=5; output; > >id=1 ; val=.; output; > >id=1 ; val=6; output; > > > >id=2 ; val=1; output; > >id=2 ; val=.; output; > >id=2 ; val=4; output; > >id=2 ; val=.; output; > >id=2 ; val=.; output; > >id=2 ; val=6; output; > > > >run; > > > >The missing values need t...

Re: how to fill the missing values #4
Reeza, If *all* the missing data for a specific variable is caused by a systematic and known process - e.g. a presentation package that produces blanks until a value changes - then a methodology that reverses that systematic process can be applied. However, the OP referred to "test part" and "most close cases" which leads me to the conclusion that this wasn't a well-understood systematic process. Another possibility is that some or all of the missing really should be "not applicable". Otherwise Multiple Imputation is the best choice. Dave. -----Original ...

Re: Code with missing values #4
Firstly Clint, coding bags of IFs is poor practice. The conditions are mutually exclusive, so the second and subsequent should be "Else If". In this case though, I would replace the lot with a format statement to recode the year, if that is what you really want to do. Secondly, you say there are numeric columns, but you are reading them with character informats. Why don't you use numeric informats? Then to test whether they are missing you can use the NMiss() function to find the number of missing values, and ditch the record if all are missing. Note too that the function s...

Re: Explanation of missing values #4
Hi Bernie, I've been carefully examining your code, but I could not find any explanation for missing silhydro values. However, you say you presented the entire data step. But as it is, is very likely does not run, because of the erroneous line: > lensmat ne '' and label1 ne '' and spflag=0)); */ It is an incomplete statement and ends with an end-of-comment mark. Anyway, without the input datasets it is not possible to solve this problem from a distance. I would advice you to add the statement: IF Silhydro LE .z THEN PUT new_ID= Silhydro=; just before the RUN; st...

Re: Fill missed values #4
Sort the data in decreasing intv order within day. Loop through data If saved_intv = intv Else saved_intv + -1 If intv = saved_intv output Else do last_intv = intv Do intv = saved_intv to last_intv by -1 output End-do saved_intv = last_intv End-else End-loop -----Original Message----- From: Haigang Zhou Sent: Sunday, February 22, 2009 7:41 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Fill missed values I have a data set containing recorded values over fixed intervals. However, some intervals are skipped when...

Re: newbie: Missing values #4
On Sun, 30 Mar 2008 20:46:22 -0400, lp <lp@MENON.US> wrote: >Hi, >Sorry, if this is a basic question but, I have a SAS data set and have >weight and height as my variables and am attempting to calculate BMI, >however, there are missing values on some of the cells (indicated as '.' and >'0'). How could I program SAS to ignore the participants with the missing >values and calculate the rest of the participants' BMI? Below is my SAS >editor and log. Thank you for your help. I think the fundamental problem is that you have some *incorrect* data: zer...

Re: Replace missing values #4
Guibo, The following will do what I think you are asking: data jk (drop=hold:); input id $ 1-2 diag $ 4-7 name $ 9-12; retain hold_id hold_name; if not(missing(id)) then hold_id=id; else id=hold_id; if not(missing(name)) then hold_name=name; else name=hold_name; datalines; 11 2340 john .. 1250 .. 5460 21 1324 mark .. 3456 .. 5467 44 3456 smth .. 5678 ; HTH, Art ------- On Mon, 30 Jun 2008 13:11:48 -0400, Guibo Xing <gx7656@HOTMAIL.COM> wrote: >Hi all, > >Can someone please provide an easy way to replace missing values to ID and >NAME in the following t...

Re: Check for missing Numerical values ??
Learner In addition to the numerical missing value represented by a period, SAS also has the values .A through .Z as well as ._ . These give you the ability to code a missing value according to some criterion so that the missing value can have meaning in addition to the fact that there is no number. For example, if you were doing some study involving lab rats and some values were missing, you might code ..d meaning the rat died ..l the doofus lab assistant left the cage door open and the rat escaped etc. Missing values are documented in the base sas concepts documentation. Nat Wooding...

Re: Missing values to zeros #4
Reeza , It is options missing = It does not change the underling value it just formats missing values for printing purposes. Toby Dunn Normal People Worry Me!!!!! I reject your reality and substitute my own!!! From: Reeza <fkhurshed@HOTMAIL.COM> Reply-To: Reeza <fkhurshed@HOTMAIL.COM> To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Missing values to zeros Date: Mon, 21 Aug 2006 10:40:13 -0700 I think there's something called setmiss or setmissing option....but I'm not sure. Reeza sasguy wrote: > How can we covert missing values to zeros of around hundred variable...

Re: Numerical missing values #7
Karovaldas@GMAIL.COM wrote: >I use SPSS and can define any numeric values of a variable to be >treated as missing. I've been searching for way to do that in SAS and >can't seem to be able to find it. Is it possible to declare a value of >999 in variable VAR1 to be treated as missing in all analyses and >reports? > >The only way I found to do this is to recode the value of 999 to >something like '.a'. I am a bit confused about SAS approach to missing >values that are coded with different numeric codes. Appreciate all >help I can get. I see that...

Re: deleting observations with missing values #4
Howard; Even if there are more numerics, you can omit the array. data wanted; set a; If not nMiss( of a--d ) ; run; Since nMiss() only accepts numeric arguments, you might also want to code... Data wanted ; Set a ; If not nMiss( of a-numeric-d ) ; Run ; Ed Edward Heaton, Senior Systems Analyst, Westat (An Employee-Owned Research Corporation), 1650 Research Boulevard, TB-286, Rockville, MD 20850-3195 Voice: (301) 610-4818 Fax: (301) 294-2085 -----Original Message----- From: owner-sas-l@listserv.u...

Re: Help to fill in missing values #4
I suspect that using PROC SQL and the MAX function would be the easiest approach for this. Could look as simple as (tested): data rr; input Coname $ Year Sic; cards; ACF 1950 3743 ACF 1951 . ACF 1952 . ACME 1950 . ACME 1951 . ACME 1952 3559 AEGIS 1950 3730 AEGIS 1951 3730 AEGIS 1952 . ZEMEX 1950 . ZEMEX 1951 3390 ZEMEX 1952 . ; run; proc sql; select coname, year, max(msic) as sic from rr(rename = (sic = msic)) group by coname ; quit; This requires remerging summary statistics, but, with annual data it seems likely that that would be a non-issue. Of course, if you had >1 non-missing...

Re: Numerical missing values #5 1553539
In SAS there is no way to treat something else as missing. In my opinion all the other things, like 9999 are not a good solution. That produces only confusion and errors! If I get something like that (e.g. the old database DBASE used 99999 as missing (how to decide, if that is a integer or a missing?), I convert it by if xx=99999 then xx=.; or if verify(compress(put(xx,10.)),"9") then ...; as soon as possible to get it cleaned. On Wed, 10 May 2006 09:41:47 -0700, Karovaldas <Karovaldas@GMAIL.COM> wrote: >I use SPSS and can define any numeric values of a variable to ...

Re: track non-missing values #4
Try the N() function with a variable list--that will return the number of non-missing vars on a given record. Something like: if n(of x1-x300) > 0 then flagged = 1 ; else flagged = 0 ; HTH, -Roy -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ran S Sent: Friday, September 29, 2006 6:30 AM To: SAS-L@LISTSERV.UGA.EDU Subject: track non-missing values Hi, I have more than 300 variables in the dataset. In the dataset, there is one row per participant. I would like to know atleast one non-missing value amongst all these variables, for ea...