f



Creating new variable based on the name of another variable & compressing a variable

I'm having a whole heap of trouble trying to create some syntax for 2 variables and hoping someone might help.
I have previously done this in excel but now have a sample of 200,000 + so it is no longer possible.

I have 30 visits- data e.g.s only show 5.

Ill do my best to explain this!
I have one variable that I want to create two variables from:
Current variable looks like this:

WB_1	WB_2	WB_3	WB_4	WB_5
40	.	.	.	.
20	.	30	.	.
30	.	20	.	20
30	.	.	.	.
10	.	10	.	20
10	.	.	.	.
20	.	10	.	.
50	.	40	.	.



So this is 8 peoples Wellbeing scores across 5 visits (of 30 in the dataset).
Person 1 only has data from visit 1.
Person 2 has data from visit 1 and 3.
Person 3 has data from visit 1 3 and 5.

The visit number is in the variable name (_1,_2 etc).

I want two variables. 
One is a compressed version of the scores to remove the blanks:

WB_1	WB_2	WB_3	WB_4	WB_5
40	.	.	.	.
20	30		.	.
30	20	20	.	.
30	.		.	.
10	10	20	.	.
10	.		.	.
20	10	30	.	.
50	40	40	.	.


e.g. if there are no scores from visit two, shift the whole lot back. This then becomes Time1 Time2, rather than Visit 1, Visit 2 etc.

The other variable I want to tell me what visit number the data was collected at:


WB_1_Visit	WB_2_Visit	WB_3_Visit	WB_4_Visit	WB_5_Visit
1				
1	        3			
1	        3	        5		
1				
1	        3	        5		
1				
1	        3			
1	        3			

e.g. 
person 1's score from time 1 was collected at visit 1
person 2's scores for time 1 and time 2 was collected at visit 1 and 3


I hope this makes some sense. It has taken me a long time to get my head around.
Any ideas or suggestions would be gratefully received!!
0
erin
12/21/2016 11:12:07 PM
comp.soft-sys.stat.spss 5679 articles. 0 followers. Post Follow

1 Replies
477 Views

Similar Articles

[PageSpeed] 10

Here is a solution.  FWIW: You might consider leaving the data in long format rather than wide.  You don't indicate your downstream requirements so I'm just dropping that as a suggestion (normalized data is usually easier to build general code against).

DATA LIST FREE /WB_1        WB_2        WB_3        WB_4        WB_5 .
BEGIN DATA
40        .        .       .        . 
20        30       .       .        . 
30        20       20      .        . 
30        .        .       .        . 
10        10       20      .        . 
10        .        .       .        . 
20        10       30      .        . 
50        40       40      .        . 
END DATA.
COMPUTE CaseID=$CASENUM.
FORMATS  WB_1 TO WB_5 (F2.0).
VARSTOCASES /MAKE WB FROM WB_1 TO WB_5 /INDEX=Visit_Num(WB).
COMPUTE Visit_Num=CHAR.SUBSTR(Visit_Num,4).
ALTER TYPE Visit_Num (F2.0).
CASESTOVARS ID=CaseID.
LIST.

 
 
 
  CaseID Visit_Num.1 Visit_Num.2 Visit_Num.3 WB.1 WB.2 WB.3 
 
    1.00       1           .           .      40    .    . 
    2.00       1           2           .      20   30    . 
    3.00       1           2           3      30   20   20 
    4.00       1           .           .      30    .    . 
    5.00       1           2           3      10   10   20 
    6.00       1           .           .      10    .    . 
    7.00       1           2           3      20   10   30 
    8.00       1           2           3      50   40   40 
 
 
Number of cases read:  8    Number of cases listed:  8

On Wednesday, December 21, 2016 at 6:12:10 PM UTC-5, erin.ps...@gmail.com wrote:
> I'm having a whole heap of trouble trying to create some syntax for 2 variables and hoping someone might help.
> I have previously done this in excel but now have a sample of 200,000 + so it is no longer possible.
> 
> I have 30 visits- data e.g.s only show 5.
> 
> Ill do my best to explain this!
> I have one variable that I want to create two variables from:
> Current variable looks like this:
> 
> WB_1	WB_2	WB_3	WB_4	WB_5
> 40	.	.	.	.
> 20	.	30	.	.
> 30	.	20	.	20
> 30	.	.	.	.
> 10	.	10	.	20
> 10	.	.	.	.
> 20	.	10	.	.
> 50	.	40	.	.
> 
> 
> 
> So this is 8 peoples Wellbeing scores across 5 visits (of 30 in the dataset).
> Person 1 only has data from visit 1.
> Person 2 has data from visit 1 and 3.
> Person 3 has data from visit 1 3 and 5.
> 
> The visit number is in the variable name (_1,_2 etc).
> 
> I want two variables. 
> One is a compressed version of the scores to remove the blanks:
> 
> WB_1	WB_2	WB_3	WB_4	WB_5
> 40	.	.	.	.
> 20	30		.	.
> 30	20	20	.	.
> 30	.		.	.
> 10	10	20	.	.
> 10	.		.	.
> 20	10	30	.	.
> 50	40	40	.	.
> 
> 
> e.g. if there are no scores from visit two, shift the whole lot back. This then becomes Time1 Time2, rather than Visit 1, Visit 2 etc.
> 
> The other variable I want to tell me what visit number the data was collected at:
> 
> 
> WB_1_Visit	WB_2_Visit	WB_3_Visit	WB_4_Visit	WB_5_Visit
> 1				
> 1	        3			
> 1	        3	        5		
> 1				
> 1	        3	        5		
> 1				
> 1	        3			
> 1	        3			
> 
> e.g. 
> person 1's score from time 1 was collected at visit 1
> person 2's scores for time 1 and time 2 was collected at visit 1 and 3
> 
> 
> I hope this makes some sense. It has taken me a long time to get my head around.
> Any ideas or suggestions would be gratefully received!!

0
David
12/22/2016 5:45:35 PM
Reply: