### To remove the duplicate record in the list using java

Sir/MadamWe have a list of Record with the unqiue key like account no, and sequence no, and the rest of fields are exactly the same.Any way for java to remove those duplicated records?Thanks
timcons1 (42) 9/18/2007 2:02:34 AM

>We have a list of Record with the unqiue key like account no, and sequence
>no, and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?

For a canned solution, see http://mindprod.com/products2.html#SORTED
The Java Glossary
http://mindprod.com

timothy ma and constance lee wrote:

>We have a list of Record with the unqiue key like account no, and sequence >no, ..

If every record has a unique key formed from account & seqence number, how can any two records be identical, or duplicate?

>..and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?

I do not fully understand the question.

The way you describe the records, I guess it might be something like (fixed width font required for proper viewing)..

Acc. #  | Seq. # | Field1  | Field2 | Field3
121045     2       cat       dog      fish
415386     3       giraffe   dog      fish
848345     7       cat       dog      fish
900277     4       frog      cow      whale

..and you are saying you want to remove duplicates in Fields1 through 3.  So the first and third record are 'duplicates' but the second (with Giraffe) and 4th are not?

Am I on track so far?

* If that is the case, and records 1 and 3 are considered 'duplicates' which one should be dumped?

-- Andrew Thompson
http://www.athompson.info/andrew/
Andrew

Something like that

Account No  Seq No    Name
123456        001          abc
123456        001          abc
123234        001           xyz
123234        002           abd1
123421        002           ijk

The message may be from some Mainframe that we dont want fix it Mainframe level. Simply using java to remove the duplicated one:

123456    001    abc

Thanks
timothy ma and constance lee wrote:

Please refrain from top-posting.  I find it most confusing. Notice how both Roedy and myself put our comments directly after anything worth replying to?  In best situations, we would then trim other parts of earlier messages that we are not commenting on.  That technique is known as 'in-line with trim' posting - and is much easier to follow.

>Something like that
>
>Account No  Seq No    Name
>123456        001          abc
>123456        001          abc
>123234        001           xyz
>123234        002           abd1
>123421        002           ijk

OK.  So I was wrong in guessing that the Acc./Seq. # was unique in all cases - they can also be duplicate.

>The message may be from some Mainframe that we dont want fix it Mainframe level. Simply using java to remove the duplicated one:
>
>123456    001    abc

I suspect (without looking at the link Roedy posted) that sorting the records is one technique that might identify duplicates, but there are also other ways.

For example, you might iterate the entire original list and on each iteration of the loop:
- Make an object that uses all the fields as a 'key'
- Use that key to check if a record with that key already exists in a HashMap.
- If not..
  - add the object to the HashMap,
..else..
  - discard it as a duplicate.

At the end of the loop, the HashMap should contain only the unique records.

-- Andrew Thompson
http://www.athompson.info/andrew/
timothy ma and constance lee wrote:
> The message may be from some Mainframe that we dont want fix it Mainframe level.

How do you receive the message?

If your records comes from the SQL database, you may simply achieve your goal using "SELECT DISTINCT ... " instead of a regular "SELECT ..." statement.

If there are some additional data being read with an SQL query, there is usually also possibility to read the rows in order which is consistent (partially at least) with your uniqueness key.  Because database keys are usually already indexed, it should cost nothing if you'll choose your database keys in ORDER BY clause to achieve the right order. Having even partially sorted records at Java side you may significantly speed up your process (of course, if it all is really worth of it).

Otherwise, just follow some of the already suggested solutions.

piotr
timothy ma and constance lee wrote:
> Account No  Seq No    Name
> 123456        001          abc
> 123456        001          abc
> 123234        001           xyz
> 123234        002           abd1
> 123421        002           ijk

Andrew Thompson wrote:
> OK.  So I was wrong in guessing that the Acc./Seq. # was unique in all cases - they can also be duplicate.

That wasn't a guess:

timothy ma and constance lee wrote:
> We have a list of Record with the unqiue [sic] key like account no, and sequence no,

They actually said so, then contradicted it with the data example.

-- Lew
"timothy ma and constance lee" <timcons1@shaw.ca> wrote:

> Andrew
> 
> Something like that
> 
> Account No Seq No Name
> 123456 001 abc
> 123456 001 abc
> 123234 001 xyz
> 123234 002 abd1
> 123421 002 ijk
> 
> The message may be from some Mainframe that we dont want fix it Mainframe level. Simply using java to remove the duplicated one:
> 
> 123456 001 abc

Questions:

1. do you already have a Java object that encapsulates a record of data? If not, can you implement one?

2. does this Java object implement java.lang.Comparable? If not, can it be made to do so?

3. if you have two duplicate records in the sequence of records, would you rather end up (after you do your processing to remove duplicates) with the first record, or would you rather end up with the last record of the duplicates?

Suggestion:

use a java.util.Set of objects (that must implement Comparable) to eliminate duplicates. When you've added all of your collection of record objects to the Set, you will end up with a collection with no duplicates.

Regards
GRB 