Jeli,
I'm not sure how this will help with your data problem, but you can
accomplish what you asked with something like:
data want (drop=count);
set have;
by id;
if first.id then do;
count=0;
new_id=id;
end;
else do;
count+1;
new_id=catx('-',id,count);
end;
run;
HTH,
Art
-----------
On Feb 2, 5:23 am, Jeli <jeli0...@hotmail.co.uk> wrote:
> I have a dataset with thousands of records and variables.
>
> I have tried to select only destinct records, however some of then
> might have a different industry or different spelling of the name.
> This is clearly a data quality issue but what I need to do is have an
> intrim solution which will fix this until I can sort out the source
> data.
>
> i.e.
>
> ID NAME PD Industry Country
> 123456 Record 1 0.25 Medical USA
> 123456 Record 1 0.25 Cable USA
> 999403 Record 99 0.25 Medical USA
> 999403 Record 97 0.25 Medical USA
> 999403 Record 0.25 Medical USA
>
> What I would like to do is if a record is duplicated keep the first
> record as it is, but change the duplicate ID so that I know this is
> the first duplicate or second etc so the new data would look like:
>
> NEW ID ID NAME PD Industry Country
> 123456 123456 Record 1 0.25 Medical USA
> 123456-1 123456 Record 1 0.25 Cable USA
> 999403 999403 Record 99 0.25 Medical USA
> 999403-1 999403 Record 97 0.25 Medical USA
> 999403-2 999403 Record 0.25 Medical USA
>
> can this be done?
|
|
0
|
|
|
|
Reply
|
art297 (4237)
|
2/2/2010 12:55:05 PM |
|
Hi,
Try this,Hope this helps..
data want;
set test;
by id;
if first.id then count=0;
else count+1;
if count ne 0 then new_id=compress(id||'-'||count);
else new_id=id;
drop count;
run;
Thanks,
-Shiva
|
|
0
|
|
|
|
Reply
|
shiva.saidala (86)
|
2/2/2010 1:12:55 PM
|
|