creating cross-tab for multiple (38) variables

  • Follow


Hello -

I have a dataset that includes a series of ethnicity variables with 
binary responses  - that is a respondent can say "yes" to as many ethnic 
codes as apply. Looks like:

var name	response	label
ethnic_a 	1 or 0		thai
ethnic_b	1 or 0		cambodian

etc, thru to the 38 total choices.

What I'd like to do is create an output that displays a fequency count 
of all the possible combinations for those who selected multiple 
ethnicities. Short of writing a confusing and time-consuming series of 
individual cross tabs, what are the options?

thanks,

-Greg

0
Reply Greg 1/22/2010 5:20:39 PM

proc transpose the data so selections are in columns

proc freq data=?/
   tables a*b*c*d / out=combinations ;
run;

add e,f,g etc for the number of columns
0
Reply montura 1/22/2010 5:31:55 PM


montura wrote:
> proc transpose the data so selections are in columns
> 
> proc freq data=?/
>    tables a*b*c*d / out=combinations ;
> run;
> 
> add e,f,g etc for the number of columns

Thanks, this is a good suggestion, and in theory this would work  -- but 
with 38 variables and all those possible combinations, the table is too 
large for SAS to process.

*****
proc freq data=*** noprint;
tables 
ethnic_3*ethnic_4*ethnic_5*ethnic_6*ethnic_7*ethnic_8*ethnic_9*ethnic_51*ethnic_52*
ethnic_53*ethnic_a*ethnic_a1*ethnic_a2*ethnic_a3*ethnic_b*ethnic_c*ethnic_d*ethnic_e*
ethnic_f*ethnic_f1*ethnic_f2*ethnic_f3*ethnic_h*ethnic_i*ethnic_j*ethnic_k*ethnic_l*
ethnic_n*ethnic_p*ethnic_q*ethnic_s*ethnic_t*ethnic_u*ethnic_v*ethnic_w*ethnic_x*
ethnic_y*ethnic_z /out=combination;
run;

ERROR: The requested table is too large to process.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 50314 observations read from the data set 
ADM10.ADM10JANFIX.
       WHERE freshapp=1;
WARNING: The data set WORK.COMBINATION may be incomplete.  When this 
step was stopped there
          were 0 observations and 40 variables.
NOTE: PROCEDURE FREQ used (Total process time):
       real time           17.12 seconds
       cpu time            0.76 seconds
**********


Unless I'm misunderstanding something about your suggestion. If I break 
it into chunks, I risk losing combinations from groups not chunked together.

thanks,

-Greg
0
Reply Greg 1/22/2010 5:48:35 PM

On Jan 22, 11:48=A0am, Greg Dubrow
<dubrowg___@___b*e*rk***el*e****y.e&d*u> wrote:
> montura wrote:
> > proc transpose the data so selections are in columns
>
> > proc freq data=3D?/
> > =A0 =A0tables a*b*c*d / out=3Dcombinations ;
> > run;
>
> > add e,f,g etc for the number of columns
>
> Thanks, this is a good suggestion, and in theory this would work =A0-- bu=
t
> with 38 variables and all those possible combinations, the table is too
> large for SAS to process.
>
> *****
> proc freq data=3D*** noprint;
> tables
> ethnic_3*ethnic_4*ethnic_5*ethnic_6*ethnic_7*ethnic_8*ethnic_9*ethnic_51*=
et=ADhnic_52*
> ethnic_53*ethnic_a*ethnic_a1*ethnic_a2*ethnic_a3*ethnic_b*ethnic_c*ethnic=
_d=AD*ethnic_e*
> ethnic_f*ethnic_f1*ethnic_f2*ethnic_f3*ethnic_h*ethnic_i*ethnic_j*ethnic_=
k*=ADethnic_l*
> ethnic_n*ethnic_p*ethnic_q*ethnic_s*ethnic_t*ethnic_u*ethnic_v*ethnic_w*e=
th=ADnic_x*
> ethnic_y*ethnic_z /out=3Dcombination;
> run;
>
> ERROR: The requested table is too large to process.
> NOTE: The SAS System stopped processing this step because of errors.
> NOTE: There were 50314 observations read from the data set
> ADM10.ADM10JANFIX.
> =A0 =A0 =A0 =A0WHERE freshapp=3D1;
> WARNING: The data set WORK.COMBINATION may be incomplete. =A0When this
> step was stopped there
> =A0 =A0 =A0 =A0 =A0 were 0 observations and 40 variables.
> NOTE: PROCEDURE FREQ used (Total process time):
> =A0 =A0 =A0 =A0real time =A0 =A0 =A0 =A0 =A0 17.12 seconds
> =A0 =A0 =A0 =A0cpu time =A0 =A0 =A0 =A0 =A0 =A00.76 seconds
> **********
>
> Unless I'm misunderstanding something about your suggestion. If I break
> it into chunks, I risk losing combinations from groups not chunked togeth=
er.
>
> thanks,
>
> -Greg

It is not clear to me what combinations you want.  This example
program, does not model the data very will.  It does count all
combinations that exist in the responses.  A more detail example of
the desired output would be helpful.

data test;
   array ethnic[38];
   length resp $38 respL $4096;
   do id =3D 1 to 1000;
      do _n_ =3D 1 to dim(ethnic);
         ethnic[_n_] =3D rantbl(1234,1-2/dim(ethnic)) - 1;
         end;
      resp =3D cats(of ethnic[*]);
      do _n_ =3D 1 to dim(ethnic);
         if ethnic[_n_] eq 1 then respL =3D catx(',',respL,vlabel(ethnic
[_n_]));
         end;
      output;
      call missing(respL);
      end;
   run;
proc summary data=3Dtest chartype nway;
   class resp respL;
   output out=3Dcounts;
   run;
proc print;
   run;


0
Reply data 1/22/2010 6:01:37 PM

data _null_; wrote:

> It is not clear to me what combinations you want.  This example
> program, does not model the data very will.  It does count all
> combinations that exist in the responses.  A more detail example of
> the desired output would be helpful.

What I'd love to get is a data set that produces all of the reported 
combinations and keeps it attached to the student id. e.g. -

id		combo
12345		thai cambodian
12346		thai cambodian vietnamese
12347		chicano korean
12348		white black
12349		black
12350		latino

etc.

If this isn't possible then I'd be happy producing output of each of the 
possible combos (and if possible put the label names in place of the 
variable names) that looks like:

Combo 				count
ethnic_a ethnic_b 		25
ethnic_ ethnic_b ethnic_c	3
ethnic_c ethnic_d		8
(or have the labels put in place of names)

I'll try what you suggested, see what happens.

thanks,

-Greg

0
Reply Greg 1/22/2010 6:19:40 PM

4 Replies
574 Views

(page loaded in 0.049 seconds)

Similiar Articles:













7/22/2012 8:46:38 AM


Reply: