Rename variable if...

  • Follow


Hello

I guess this could be done rather smooth in syntax editor - but I do
not know how.

My problem is that I have a lot of cases where each case has a unique
id. But some of the cases are having identical id's.

Like this:
1234
3445
5645
5645
7001

Earlier I have asked questions in this group regarding data
restructure. But this time I would like to know whether it is possible
to programme a syntax checking for identical id's and if they are
identical they should be recoded.

I think I should use something like
sort cases ascending
check if id=lag(id)
and if that is the case then rename id to something else. That could
be from 5645 to 56455645

Does it make sense - and is it possible solve the issue?


Thanks again for any help.
0
Reply MG 3/25/2011 1:04:08 PM

On Mar 25, 9:04=A0am, MG <spamawa...@gmail.com> wrote:
> Hello
>
> I guess this could be done rather smooth in syntax editor - but I do
> not know how.
>
> My problem is that I have a lot of cases where each case has a unique
> id. But some of the cases are having identical id's.
>
> Like this:
> 1234
> 3445
> 5645
> 5645
> 7001
>
> Earlier I have asked questions in this group regarding data
> restructure. But this time I would like to know whether it is possible
> to programme a syntax checking for identical id's and if they are
> identical they should be recoded.
>
> I think I should use something like
> sort cases ascending
> check if id=3Dlag(id)
> and if that is the case then rename id to something else. That could
> be from 5645 to 56455645
>
> Does it make sense - and is it possible solve the issue?
>
> Thanks again for any help.


Is your ID variable numeric or string?  Are all IDs 4 digits?  Why do
you want to recode when identical IDs are discovered?  Are those
records not for the same person?  If they are, why not number the
records within ID?  Even if the duplicate records are for different
people, you could then use the record number within ID when creating a
new ID variable.  Here's an example that assumes ID is numeric.

data list list / ID (f8.0).
begin data
1234
3445
5645
5645
7001
1234
1234
end data.

* Use MATCH FILES to flag first record per ID .
* Then use that new variable to number records within ID.
sort cases by ID.
match files
 file =3D * /
 by ID /
 first =3D Record .
if (record EQ 0) record =3D lag(record) + 1.
list.

* If you REALLY want to recode duplicate IDs,
* you could incorporate the record number in the new ID.

compute NewID =3D ID*10 + record.
format NewID (f8.0).
list.
* -------- end of example -------- .

HTH.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."
0
Reply Bruce 3/25/2011 1:25:26 PM


I suggest that you use the <data> <identify duplicate cases>. <paste> 
your syntax, then use the variables created when you run that syntax.
The syntax below will give you 2 new IDs.  The first preserves the 
original sort order and tries to avoid common problems, the second does 
what you mentioned.
Open a new instance of SPSS, paste and run the syntax below.  I put in 
2  variables. Change the indicated lines to match your situation.

Art Kendall
Social Research Consultants

new file.
data list list/id (f4)x1 x2.
begin data
1234 1 1
3445 1 1
5645 1 1
5645 1 1
7001 1 1
7001 2 2
7001 3 3
end data.
* Identify Duplicate Cases.
SORT CASES BY id(A).
MATCH FILES
   /FILE=*
   /BY id
   /FIRST=PrimaryFirst
   /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMATS  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
   /FILE=*
   /DROP=PrimaryLast InDupGrp.
VARIABLE LABELS  PrimaryFirst 'Indicator of each first matching case as 
Primary' MatchSequence
     'Sequential count of matching cases'.
VALUE LABELS  PrimaryFirst 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryFirst (ORDINAL) /MatchSequence (SCALE).
FREQUENCIES VARIABLES=PrimaryFirst MatchSequence.
EXECUTE.
numeric new_id (f5) varmatch (f3).
compute new_id = (id*10)+ matchsequence.
do if $casenum ne 1 and matchsequence ne 1.
compute varmatch = 0.
*change the line below to match your actual data.
do repeat v = x1 to x2.
if v eq lag(v) varmatch = varmatch +1.
end repeat.
end if.
var labels varmatch 'number of other variables that match'.
*change the line below to match your actual data.
value labels varmatch 2 'complete match'.
crosstabs tables = matchsequence by varmatch.
* to get literally the new ID you mentioned.
numeric new_id2 (f8).
compute new_id2 = ID.
if matchsequence gt 1 new_id2 = (10000*ID) + ID.
execute.



On 3/25/2011 9:04 AM, MG wrote:
> Hello
>
> I guess this could be done rather smooth in syntax editor - but I do
> not know how.
>
> My problem is that I have a lot of cases where each case has a unique
> id. But some of the cases are having identical id's.
>
> Like this:
> 1234
> 3445
> 5645
> 5645
> 7001
>
> Earlier I have asked questions in this group regarding data
> restructure. But this time I would like to know whether it is possible
> to programme a syntax checking for identical id's and if they are
> identical they should be recoded.
>
> I think I should use something like
> sort cases ascending
> check if id=lag(id)
> and if that is the case then rename id to something else. That could
> be from 5645 to 56455645
>
> Does it make sense - and is it possible solve the issue?
>
>
> Thanks again for any help.
0
Reply Art 3/25/2011 2:29:12 PM

On Mar 25, 9:04=A0am, MG <spamawa...@gmail.com> wrote:
> Hello
>
> I guess this could be done rather smooth in syntax editor - but I do
> not know how.
>
> My problem is that I have a lot of cases where each case has a unique
> id. But some of the cases are having identical id's.
>
> Like this:
> 1234
> 3445
> 5645
> 5645
> 7001
>
> Earlier I have asked questions in this group regarding data
> restructure. But this time I would like to know whether it is possible
> to programme a syntax checking for identical id's and if they are
> identical they should be recoded.
>
> I think I should use something like
> sort cases ascending
> check if id=3Dlag(id)
> and if that is the case then rename id to something else. That could
> be from 5645 to 56455645
>
> Does it make sense - and is it possible solve the issue?
>
> Thanks again for any help.
Not tested, but...
COMPUTE ORDER=3D$CASENUM.
SORT CASES BY ID.
IF $CASENUM=3D1 OR ID <> LAG(ID) #SUBID=3D1.
IF (ID EQ LAG(ID)) #SUBID=3D#SUBID+1.
COMPUTE NEWID=3DID + #SUBID/10.
SORT CASES BY ORDER.
---
result.
newid.
> 1234
> 3445
> 5645
> 5645.1
> 7001

0
Reply David 3/26/2011 2:14:28 PM

3 Replies
466 Views

(page loaded in 0.117 seconds)

Similiar Articles:













7/21/2012 10:02:38 PM


Reply: