Group by will only return summarized rows UNLESS you also include a
non-group-by field in the select clause. If that's the case you'll receive
"NOTE: The query requires remerging summary statistics back with the
original data" and all your "detail" rows that satisfy the having condition
will be returned. e.g. in the event mytable contained a variable "d" and
the select read "select *, d ..."
I'm not sure why the grouped query result presents a problem to you. If the
table contains only 3 variables, and you're returning only combinations of
variables that are duplicated ( by virtue of the having clause ), you DO
have enough information to go back to the original table. i.e. having ALL
rows returned of duplicate a,b,c values will not result in any a,b,c
combinations not returned by the summarized query.
If you're really looking for the number of duplicates for each a,b,c
combination, add the count(*) as cnt to your select clause.
proc sql;
create table mydupes1 as
select *, count(*) as cnt
from mytable
group by a,b,c
having calculated cnt > 1
;
/* Want the total number of duplicated records? */
select sum(cnt) as total_duplicated_records
from mydupes1
;
quit;
In any case, the following detail to summary join does return what you're
looking for:
proc sql;
create table mydupes1 as
select all.*
from mytable all,
( select *
from mytable
group by a, b, c
having count(*) > 1
) summ
where all.a = summ.a
and all.b = summ.b
and all.c = summ.c
order by a, b, c;
;
quit;
-----Original Message-----
From: Talbot Michael Katz [mailto:topkatz@MSN.COM]
Sent: Friday, October 29, 2004 9:43 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: PROC SQL Duplicate Record Selection
Hi.
I have a data set with three fields that I pulled from a larger table.
There were some duplicate records, and I wanted to see a few of them so
that I could go back to the original table and investigate. I issued the
following query:
proc sql;
create table mydupes1 as
select *
from mytable
group by a, b, c
having count(*) > 1
order by a, b, c;
quit;
I thought this would return all instances of the duplicate records, but it
returned only a single instance of each record that has duplicates. I
thought that's what I would get if I put "select distinct" and, indeed, I
tried that and got identical results.
I even tried the following:
proc sql;
create table mydupes2 as
select *
from mytable
except
select *
from mytable
group by a, b, c
having count(*) = 1
order by a, b, c;
quit;
and, to my surprise, got the same thing (only one representative for each
duplicate). I could get all the duplicates if I matched mytable and
mydupes1, and I could also use a DATA step and SET BY with FIRST and LAST
syntax if the data set MYTABLE were already sorted, but can it be done in
a "single step" PROC SQL without matching?
Thanks!
-- TMK --
------------------------------------------------------------
This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.
Ce courrier �lectronique est confidentiel et prot�g�. L'exp�diteur ne renonce pas aux droits et obligations qui s'y rapportent. Toute diffusion, utilisation ou copie de ce message ou des renseignements qu'il contient par une personne autre que le (les) destinataire(s) d�sign�(s) est interdite. Si vous recevez ce courrier �lectronique par erreur, veuillez m'en aviser imm�diatement, par retour de courrier �lectronique ou par un autre moyen.
============================================================
|
|
0
|
|
|
|
Reply
|
harry.droogendyk1 (872)
|
10/30/2004 2:09:08 AM |
|
Droogendyk, Harry wrote:
> Group by will only return summarized rows UNLESS you also include a
> non-group-by field in the select clause.
Excellent explanation.
> In any case, the following detail to summary join does return what
> you're
> looking for:
>
> proc sql;
> create table mydupes1 as
> select all.*
> from mytable all,
> ( select *
> from mytable
> group by a, b, c
> having count(*) > 1
> ) summ
> where all.a = summ.a
> and all.b = summ.b
> and all.c = summ.c
> order by a, b, c;
> ;
> quit;
Prior thread about SPDE indicates this is probably the best form for large
data.
For robustness I show two forms that select via IN . The second form is
invalid but demonstrates a (concatenated) IN construct that would be
desirable.
%let seed=1234;
data foo;
do a = 1 to 5;
do b = 1 to 5;
do c = 1 to 5;
do _n_ = 1 to 3 * ranuni(&seed);
output;
end;
end; end; end;
run;
* today this is valid;
* most robust way to concatenate numeric fields is to use RB8.;
proc sql;
create table dups as
select *
from foo
where put(a,rb8.)||put(b,rb8.)||put(c,rb8.)
in
(select put(a,rb8.)||put(b,rb8.)||put(c,rb8.)
from foo
group by a,b,c
having count(*) > 1
)
order by monotonic()
;
quit;
* this will be valid someday?;
* (v1,...vN) in (select a1,...,aN from ...)
* - return true if N values match N values of a row in select set of another
query;
proc sql;
create table dups as
select *
from foo
where (a,b,c) %* comma separated list indicating implicit concatenation;
in
(select a,b,c
from foo
group by a,b,c
having count(*) > 1
)
;
quit;
--
Richard A. DeVenezia
http://www.devenezia.com/
|
|
0
|
|
|
|
Reply
|
radevenz (1536)
|
10/30/2004 5:12:18 AM
|
|
|
1 Replies
62 Views
(page loaded in 0.102 seconds)
Similiar Articles: Saving the last observation of duplicate records - comp.soft-sys ...Re: Printing the last column in a record - comp.lang ... be safe other records ... How to find duplicate rows with SQL ... now Peter’s duplicate record has been deleted. PROC ... Common Programming Mistake with Proc Sort NODUPRECS - comp.soft ...For the procedure to actually remove the duplicates, the duplicate records should be brought ... Re: Equivalent of NODUPKEY in PROC SQL? - comp.soft-sys.sas ... Common ... How best to detect duplicate values in a column? - comp.databases ...I don't want to simply re-declare the table as ... have it correct, I then clean up whatever duplicate records the ... match first (this will require a stored procedure ... File is full or damaged - comp.soft-sys.sasI need to remove duplicate records from it and storing the ... option in PROC SORT, this will slow down the procedure ... Once they're back (Eudora may ... UTF-8 / The UTF ... Many-to-many merge - comp.soft-sys.sas... in1) dataset2; By ID; if in1; Run; OR PROC SQL ... net> Sent: Sun, Mar 7, 2010 11:52 am Subject: Re ... How to merge with duplicate observation on both sides ... How to delete rows in a sas dataset - comp.soft-sys.sasduplicate records data into single row - comp.soft-sys.sas ... Proc IML: how to delete a row in a matrix depending on values in ... Re: How to output the file names under a ... Query with NULL - comp.databases.oracle.server... further and look at the plans: SQL ... you need to be aware that this re ... can be used to rewrite the bad SQL even if there is no access to the source. The procedure ... [comp.publish.cdrom] CD-Recordable FAQ, Part 1/4 - comp.publish ...Archive-name: cdrom/cd-recordable/part1 Posting-Frequency: monthly Last-modified: 2008/10/09 Version: 2.71 Send corrections and updates to And... remove any commas/data from mysql? - comp.databases.mysql ...... out of a mysql table, $result = mysql_query($sql ... Also, since it looks like you're using php, look at ... Efficient way to delete a large amount of records from a ... improve strlen - comp.lang.asm.x86For the record, I'm not a big fan of writing software ... of the mill desktop application, think Google, SQL, ..) ... am creating something new, when I am merely reverse engineering ... Re: PROC SQL Duplicate Record Selection - RhinocerusGroup by will only return summarized rows UNLESS you also include a non-group-by field in the select clause. If that's the case you'll receive "NO PROC SQL Duplicate Record Selection - sas - Mofeel Groupssas, PROC SQL Duplicate Record Selection . comp.soft-sys.sas - The SAS statistics ... 29, 2004 10:12 PM To: XXXX@XXXXX.COM Subject: Re: PROC SQL Duplicate Record Selection ... 7/25/2012 7:53:22 AM
|