Combining multiple rows of data into a single row for each unique ID

I am a rank beginner using SPSS and need some step-by-step help.  I
have a very large (over one million records) file which contains
detailed data some of which is in multiple rows for the same ID.  I
have a unique ID variable and I have a Line Number variable.  What I
need to be able to do is to bring all of the lines into a single line
for each ID and then match it to a header file which also contains the
unique ID number.  I have been trying for two days using the
casestovars and varstocases commands as well as trying to split the
file by layering the id numbers, etc. and can not get anything to
work.  I have taken only one class in SPSS and we did not really
address how to use syntax - everything is point and click.  I am a
previous SAS user and find this extremely frustrating and would
appreciate any help anyone can give me.

0
Jan
7/20/2007 5:26:26 PM
comp.soft-sys.stat.spss 5604 articles. 0 followers. Post Follow

5 Replies
244 Views

Similar Articles

[PageSpeed] 20
Jan wrote:
> I am a rank beginner using SPSS and need some step-by-step help.  I
> have a very large (over one million records) file which contains
> detailed data some of which is in multiple rows for the same ID.  I
> have a unique ID variable and I have a Line Number variable.  What I
> need to be able to do is to bring all of the lines into a single line
> for each ID and then match it to a header file which also contains the
> unique ID number.  I have been trying for two days using the
> casestovars and varstocases commands as well as trying to split the
> file by layering the id numbers, etc. and can not get anything to
> work.  I have taken only one class in SPSS and we did not really
> address how to use syntax - everything is point and click.  I am a
> previous SAS user and find this extremely frustrating and would
> appreciate any help anyone can give me.
> 


It would be helpful if you posted a small sample of the current data 
(e.g., 3 or 4 IDs with 4 or 5 variables) along with how you want it to 
look after restructuring.

-- 
Bruce Weaver
bweaver@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
0
Bruce
7/20/2007 6:45:33 PM
On Jul 20, 1:45 pm, Bruce Weaver <bwea...@lakeheadu.ca> wrote:
> Jan wrote:
> > I am a rank beginner using SPSS and need some step-by-step help.  I
> > have a very large (over one million records) file which contains
> > detailed data some of which is in multiple rows for the same ID.  I
> > have a unique ID variable and I have a Line Number variable.  What I
> > need to be able to do is to bring all of the lines into a single line
> > for each ID and then match it to a header file which also contains the
> > unique ID number.  I have been trying for two days using the
> > casestovars and varstocases commands as well as trying to split the
> > file by layering the id numbers, etc. and can not get anything to
> > work.  I have taken only one class in SPSS and we did not really
> > address how to use syntax - everything is point and click.  I am a
> > previous SAS user and find this extremely frustrating and would
> > appreciate any help anyone can give me.
>
> It would be helpful if you posted a small sample of the current data
> (e.g., 3 or 4 IDs with 4 or 5 variables) along with how you want it to
> look after restructuring.
>
> --
> Bruce Weaver
> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
> "When all else fails, RTFM."



Thanks, Bruce.  Here is an example of what the data looks like except
there are multiple variables that I have not included to the right of
HPC_3.

Prov__ID            Line_No         HPC_1          HPC_2
HPC_3
5142                             1          6391
1785            1452
9865                             1          7784
5241             6999
9865                             2          6302
8574             1111

This is what I'd like it to end up looking like:

Prov_ID              N_Count        HPC_1          HPC_2
HPC_3        HPC_xxx        HPC_xxx        HPC_xxx
9865                             2            7784
5241            6999           6302           8574             1111

This would tell me how many lines make up the group and move the 3
values from the 2nd line to the same row as the values from the first
row.  Another factor is that some of the cases (by Prov_ID) have
various numbers of lines - some
have 2 lines and some have 9 lines so I'm not sure how to tell SPSS to
look for all that may be there and name a new
variable accordingly. I don't quite know where to start and I'm the
only SPSS person here.  Thanks so much for any guidance you can give
me.




0
Jan
7/20/2007 7:08:52 PM
There are 2 steps.

1. You can use AGGREGATE and N_Break command (Data...Aggregate). Use
Prov_ID as a Break Variable and click N_Break.  It will generate a new
variable counting the number of rows within Prov_ID.

***************************************************************************
This is what you got before.

> Prov__ID   Line_No   HPC_1   HPC_2   HPC_3
> 5142            1          6391      1785      1452
> 9865            1          7784      5241      6999
> 9865            2          6302      8574      1111

This is what you will get after.

> Prov__ID   Line_No   HPC_1   HPC_2   HPC_3   N_Break
> 5142            1          6391      1785      1452        1
> 9865            1          7784      5241      6999        2
> 9865            2          6302      8574      1111        2
**************************************************************************

2. Use Restructure command (from Data...Restructure).  Select
"Restructure Cases into Variables", use Prov_ID as an identifier.

The new data file will look like this, which is not exactly the same
as the one you want because of the order of the variable.  However,
you can tell SPSS arrange the variables in any order that you want.

> Prov_ID  N_Count   HPC_1.1   HPC_1.2   HPC2.1   HPC2.2   HPC_3.1   HPC3.2
> 9865      2              7784         6302         5241       8574        6999        1111
> 5142      1              6391                         1785                      1452


> Prov_ID  N_Count   HPC_1   HPC_2    HPC_3   HPC_xxx   HPC_xxx   HPC_xxx
> 9865      2              7784      5241       6999      6302          8574         1111
> 5142      1              6391      1785      1452

0
TASTAT
7/20/2007 8:11:23 PM
Jan wrote:
> On Jul 20, 1:45 pm, Bruce Weaver <bwea...@lakeheadu.ca> wrote:
>> Jan wrote:
>>> I am a rank beginner using SPSS and need some step-by-step help.  I
>>> have a very large (over one million records) file which contains
>>> detailed data some of which is in multiple rows for the same ID.  I
>>> have a unique ID variable and I have a Line Number variable.  What I
>>> need to be able to do is to bring all of the lines into a single line
>>> for each ID and then match it to a header file which also contains the
>>> unique ID number.  I have been trying for two days using the
>>> casestovars and varstocases commands as well as trying to split the
>>> file by layering the id numbers, etc. and can not get anything to
>>> work.  I have taken only one class in SPSS and we did not really
>>> address how to use syntax - everything is point and click.  I am a
>>> previous SAS user and find this extremely frustrating and would
>>> appreciate any help anyone can give me.
>> It would be helpful if you posted a small sample of the current data
>> (e.g., 3 or 4 IDs with 4 or 5 variables) along with how you want it to
>> look after restructuring.
>>
>> --
>> Bruce Weaver
>> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
>> "When all else fails, RTFM."
> 
> 
> 
> Thanks, Bruce.  Here is an example of what the data looks like except
> there are multiple variables that I have not included to the right of
> HPC_3.
> 
> Prov__ID            Line_No         HPC_1          HPC_2
> HPC_3
> 5142                             1          6391
> 1785            1452
> 9865                             1          7784
> 5241             6999
> 9865                             2          6302
> 8574             1111
> 
> This is what I'd like it to end up looking like:
> 
> Prov_ID              N_Count        HPC_1          HPC_2
> HPC_3        HPC_xxx        HPC_xxx        HPC_xxx
> 9865                             2            7784
> 5241            6999           6302           8574             1111
> 
> This would tell me how many lines make up the group and move the 3
> values from the 2nd line to the same row as the values from the first
> row.  Another factor is that some of the cases (by Prov_ID) have
> various numbers of lines - some
> have 2 lines and some have 9 lines so I'm not sure how to tell SPSS to
> look for all that may be there and name a new
> variable accordingly. I don't quite know where to start and I'm the
> only SPSS person here.  Thanks so much for any guidance you can give
> me.
> 

I think the syntax shown below gives the result you ask for.  Copy it 
into a new syntax window (Open-->Syntax--New), then highlight it and run.

* ------------------------- .
data list list / Prov__ID  Line_No  HPC_1   HPC_2  HPC_3 (5f5.0).
begin data
5142      1        6391    1785   1452
9865      1        7784    5241   6999
9865      2        6302    8574   1111
end data.
list.

* Restructure the data from Cases to Vars .

SORT CASES BY Prov__ID Line_No .
CASESTOVARS
  /ID = Prov__ID
  /INDEX = Line_No
  /GROUPBY = INDEX
  /COUNT = N_count .

list.
* ------------------------- .

Here's the output:

Prov__ID N_count HPC_1.1 HPC_2.1 HPC_3.1 HPC_1.2 HPC_2.2 HPC_3.2

    5142       1    6391    1785    1452       .       .       .
    9865       2    7784    5241    6999    6302    8574    1111

Number of cases read:  2    Number of cases listed:  2


HTH.

-- 
Bruce Weaver
bweaver@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
0
Bruce
7/20/2007 8:26:49 PM
On Jul 20, 3:26 pm, Bruce Weaver <bwea...@lakeheadu.ca> wrote:
> Jan wrote:
> > On Jul 20, 1:45 pm, Bruce Weaver <bwea...@lakeheadu.ca> wrote:
> >> Jan wrote:
> >>> I am a rank beginner using SPSS and need some step-by-step help.  I
> >>> have a very large (over one million records) file which contains
> >>> detailed data some of which is in multiple rows for the same ID.  I
> >>> have a unique ID variable and I have a Line Number variable.  What I
> >>> need to be able to do is to bring all of the lines into a single line
> >>> for each ID and then match it to a header file which also contains the
> >>> unique ID number.  I have been trying for two days using the
> >>> casestovars and varstocases commands as well as trying to split the
> >>> file by layering the id numbers, etc. and can not get anything to
> >>> work.  I have taken only one class in SPSS and we did not really
> >>> address how to use syntax - everything is point and click.  I am a
> >>> previous SAS user and find this extremely frustrating and would
> >>> appreciate any help anyone can give me.
> >> It would be helpful if you posted a small sample of the current data
> >> (e.g., 3 or 4 IDs with 4 or 5 variables) along with how you want it to
> >> look after restructuring.
>
> >> --
> >> Bruce Weaver
> >> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
> >> "When all else fails, RTFM."
>
> > Thanks, Bruce.  Here is an example of what the data looks like except
> > there are multiple variables that I have not included to the right of
> > HPC_3.
>
> > Prov__ID            Line_No         HPC_1          HPC_2
> > HPC_3
> > 5142                             1          6391
> > 1785            1452
> > 9865                             1          7784
> > 5241             6999
> > 9865                             2          6302
> > 8574             1111
>
> > This is what I'd like it to end up looking like:
>
> > Prov_ID              N_Count        HPC_1          HPC_2
> > HPC_3        HPC_xxx        HPC_xxx        HPC_xxx
> > 9865                             2            7784
> > 5241            6999           6302           8574             1111
>
> > This would tell me how many lines make up the group and move the 3
> > values from the 2nd line to the same row as the values from the first
> > row.  Another factor is that some of the cases (by Prov_ID) have
> > various numbers of lines - some
> > have 2 lines and some have 9 lines so I'm not sure how to tell SPSS to
> > look for all that may be there and name a new
> > variable accordingly. I don't quite know where to start and I'm the
> > only SPSS person here.  Thanks so much for any guidance you can give
> > me.
>
> I think the syntax shown below gives the result you ask for.  Copy it
> into a new syntax window (Open-->Syntax--New), then highlight it and run.
>
> * ------------------------- .
> data list list / Prov__ID  Line_No  HPC_1   HPC_2  HPC_3 (5f5.0).
> begin data
> 5142      1        6391    1785   1452
> 9865      1        7784    5241   6999
> 9865      2        6302    8574   1111
> end data.
> list.
>
> * Restructure the data from Cases to Vars .
>
> SORT CASES BY Prov__ID Line_No .
> CASESTOVARS
>   /ID = Prov__ID
>   /INDEX = Line_No
>   /GROUPBY = INDEX
>   /COUNT = N_count .
>
> list.
> * ------------------------- .
>
> Here's the output:
>
> Prov__ID N_count HPC_1.1 HPC_2.1 HPC_3.1 HPC_1.2 HPC_2.2 HPC_3.2
>
>     5142       1    6391    1785    1452       .       .       .
>     9865       2    7784    5241    6999    6302    8574    1111
>
> Number of cases read:  2    Number of cases listed:  2
>
> HTH.
>
> --
> Bruce Weaver
> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
> "When all else fails, RTFM."- Hide quoted text -
>
> - Show quoted text -

THANK YOU SO MUCH!!!!!

0
Jan
7/20/2007 8:35:43 PM
Reply:
Similar artilces about - Combining multiple rows of data into a single row for each unique ID: