Re: duplicates

  • Follow


Jeli,

I'm not sure how this will help with your data problem, but you can
accomplish what you asked with something like:

data want (drop=count);
  set have;
  by id;
  if first.id then do;
    count=0;
    new_id=id;
  end;
  else do;
    count+1;
    new_id=catx('-',id,count);
  end;
run;

HTH,
Art
-----------
On Feb 2, 5:23 am, Jeli <jeli0...@hotmail.co.uk> wrote:
> I have a dataset with thousands of records and variables.
>
> I have tried to select only destinct records, however some of then
> might have a different industry or different spelling of the name.
> This is clearly a data quality issue but what I need to do is have an
> intrim solution which will fix this until I can sort out the source
> data.
>
> i.e.
>
> ID      NAME    PD      Industry        Country
> 123456  Record 1        0.25    Medical USA
> 123456  Record 1        0.25    Cable   USA
> 999403  Record 99       0.25    Medical USA
> 999403  Record 97       0.25    Medical USA
> 999403  Record  0.25    Medical USA
>
> What I would like to do is if a record is duplicated keep the first
> record as it is, but change the duplicate ID so that I know this is
> the first duplicate or second etc so the new data would look like:
>
> NEW ID  ID      NAME    PD      Industry        Country
> 123456  123456  Record 1        0.25    Medical USA
> 123456-1        123456  Record 1        0.25    Cable   USA
> 999403  999403  Record 99       0.25    Medical USA
> 999403-1        999403  Record 97       0.25    Medical USA
> 999403-2        999403  Record  0.25    Medical USA
>
> can this be done?
0
Reply art297 (4237) 2/2/2010 12:55:05 PM

Hi,

Try this,Hope this helps..

data want;
set test;
by id;
if first.id then count=0;
else count+1;
if count ne 0 then new_id=compress(id||'-'||count);
else new_id=id;
drop count;
run;

Thanks,
-Shiva
0
Reply shiva.saidala (86) 2/2/2010 1:12:55 PM


1 Replies
31 Views

(page loaded in 0.097 seconds)


Reply: