|
|
creating cross-tab for multiple (38) variables
Hello -
I have a dataset that includes a series of ethnicity variables with
binary responses - that is a respondent can say "yes" to as many ethnic
codes as apply. Looks like:
var name response label
ethnic_a 1 or 0 thai
ethnic_b 1 or 0 cambodian
etc, thru to the 38 total choices.
What I'd like to do is create an output that displays a fequency count
of all the possible combinations for those who selected multiple
ethnicities. Short of writing a confusing and time-consuming series of
individual cross tabs, what are the options?
thanks,
-Greg
|
|
0
|
|
|
|
Reply
|
Greg
|
1/22/2010 5:20:39 PM |
|
proc transpose the data so selections are in columns
proc freq data=?/
tables a*b*c*d / out=combinations ;
run;
add e,f,g etc for the number of columns
|
|
0
|
|
|
|
Reply
|
montura
|
1/22/2010 5:31:55 PM
|
|
montura wrote:
> proc transpose the data so selections are in columns
>
> proc freq data=?/
> tables a*b*c*d / out=combinations ;
> run;
>
> add e,f,g etc for the number of columns
Thanks, this is a good suggestion, and in theory this would work -- but
with 38 variables and all those possible combinations, the table is too
large for SAS to process.
*****
proc freq data=*** noprint;
tables
ethnic_3*ethnic_4*ethnic_5*ethnic_6*ethnic_7*ethnic_8*ethnic_9*ethnic_51*ethnic_52*
ethnic_53*ethnic_a*ethnic_a1*ethnic_a2*ethnic_a3*ethnic_b*ethnic_c*ethnic_d*ethnic_e*
ethnic_f*ethnic_f1*ethnic_f2*ethnic_f3*ethnic_h*ethnic_i*ethnic_j*ethnic_k*ethnic_l*
ethnic_n*ethnic_p*ethnic_q*ethnic_s*ethnic_t*ethnic_u*ethnic_v*ethnic_w*ethnic_x*
ethnic_y*ethnic_z /out=combination;
run;
ERROR: The requested table is too large to process.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 50314 observations read from the data set
ADM10.ADM10JANFIX.
WHERE freshapp=1;
WARNING: The data set WORK.COMBINATION may be incomplete. When this
step was stopped there
were 0 observations and 40 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time 17.12 seconds
cpu time 0.76 seconds
**********
Unless I'm misunderstanding something about your suggestion. If I break
it into chunks, I risk losing combinations from groups not chunked together.
thanks,
-Greg
|
|
0
|
|
|
|
Reply
|
Greg
|
1/22/2010 5:48:35 PM
|
|
On Jan 22, 11:48=A0am, Greg Dubrow
<dubrowg___@___b*e*rk***el*e****y.e&d*u> wrote:
> montura wrote:
> > proc transpose the data so selections are in columns
>
> > proc freq data=3D?/
> > =A0 =A0tables a*b*c*d / out=3Dcombinations ;
> > run;
>
> > add e,f,g etc for the number of columns
>
> Thanks, this is a good suggestion, and in theory this would work =A0-- bu=
t
> with 38 variables and all those possible combinations, the table is too
> large for SAS to process.
>
> *****
> proc freq data=3D*** noprint;
> tables
> ethnic_3*ethnic_4*ethnic_5*ethnic_6*ethnic_7*ethnic_8*ethnic_9*ethnic_51*=
et=ADhnic_52*
> ethnic_53*ethnic_a*ethnic_a1*ethnic_a2*ethnic_a3*ethnic_b*ethnic_c*ethnic=
_d=AD*ethnic_e*
> ethnic_f*ethnic_f1*ethnic_f2*ethnic_f3*ethnic_h*ethnic_i*ethnic_j*ethnic_=
k*=ADethnic_l*
> ethnic_n*ethnic_p*ethnic_q*ethnic_s*ethnic_t*ethnic_u*ethnic_v*ethnic_w*e=
th=ADnic_x*
> ethnic_y*ethnic_z /out=3Dcombination;
> run;
>
> ERROR: The requested table is too large to process.
> NOTE: The SAS System stopped processing this step because of errors.
> NOTE: There were 50314 observations read from the data set
> ADM10.ADM10JANFIX.
> =A0 =A0 =A0 =A0WHERE freshapp=3D1;
> WARNING: The data set WORK.COMBINATION may be incomplete. =A0When this
> step was stopped there
> =A0 =A0 =A0 =A0 =A0 were 0 observations and 40 variables.
> NOTE: PROCEDURE FREQ used (Total process time):
> =A0 =A0 =A0 =A0real time =A0 =A0 =A0 =A0 =A0 17.12 seconds
> =A0 =A0 =A0 =A0cpu time =A0 =A0 =A0 =A0 =A0 =A00.76 seconds
> **********
>
> Unless I'm misunderstanding something about your suggestion. If I break
> it into chunks, I risk losing combinations from groups not chunked togeth=
er.
>
> thanks,
>
> -Greg
It is not clear to me what combinations you want. This example
program, does not model the data very will. It does count all
combinations that exist in the responses. A more detail example of
the desired output would be helpful.
data test;
array ethnic[38];
length resp $38 respL $4096;
do id =3D 1 to 1000;
do _n_ =3D 1 to dim(ethnic);
ethnic[_n_] =3D rantbl(1234,1-2/dim(ethnic)) - 1;
end;
resp =3D cats(of ethnic[*]);
do _n_ =3D 1 to dim(ethnic);
if ethnic[_n_] eq 1 then respL =3D catx(',',respL,vlabel(ethnic
[_n_]));
end;
output;
call missing(respL);
end;
run;
proc summary data=3Dtest chartype nway;
class resp respL;
output out=3Dcounts;
run;
proc print;
run;
|
|
0
|
|
|
|
Reply
|
data
|
1/22/2010 6:01:37 PM
|
|
data _null_; wrote:
> It is not clear to me what combinations you want. This example
> program, does not model the data very will. It does count all
> combinations that exist in the responses. A more detail example of
> the desired output would be helpful.
What I'd love to get is a data set that produces all of the reported
combinations and keeps it attached to the student id. e.g. -
id combo
12345 thai cambodian
12346 thai cambodian vietnamese
12347 chicano korean
12348 white black
12349 black
12350 latino
etc.
If this isn't possible then I'd be happy producing output of each of the
possible combos (and if possible put the label names in place of the
variable names) that looks like:
Combo count
ethnic_a ethnic_b 25
ethnic_ ethnic_b ethnic_c 3
ethnic_c ethnic_d 8
(or have the labels put in place of names)
I'll try what you suggested, see what happens.
thanks,
-Greg
|
|
0
|
|
|
|
Reply
|
Greg
|
1/22/2010 6:19:40 PM
|
|
|
4 Replies
574 Views
(page loaded in 0.049 seconds)
|
|
|
|
|
|
|
|
|