To remove the duplicate record in the list using java

  • Follow


Sir/MadamWe have a list of Record with the unqiue key like account no, and sequence no, and the rest of fields are exactly the same.Any way for java to remove those duplicated records?Thanks
0
Reply timcons1 (40) 9/18/2007 2:02:34 AM

On Tue, 18 Sep 2007 02:02:34 GMT, "timothy ma and constance lee"
<timcons1@shaw.ca> wrote, quoted or indirectly quoted someone who said
:

>We have a list of Record with the unqiue key like account no, and sequence 
>no, and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?

For a canned solution, see http://mindprod.com/products2.html#SORTED
-- 
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
0
Reply Roedy 9/18/2007 2:07:50 AM


timothy ma and constance lee wrote:...>We have a list of Record with the unqiue key like account no, and sequence >no, ..If every record has a unique key formed from account& seqence number, how cany any two records beidentical, or duplicate?*>..and the rest of fields are exactly the same.>Any way for java to remove those duplicated records?I do not fully understand the question.The way you describe the records, I guess it it might be something like (fixed width font required for proper viewing)..Acc. #  | Seq. # | Field1  | Field2 | Field3121045     2       cat       dog      fish415386     3       giraffe   dog      fish848345     7       cat       dog      fish900277     4       frog      cow      whale..and you are saying you want to remove duplicates in Fields1 through 3.  So the first and third record are'duplicates' but the second (with Giraffe) and 4th are not?Am I on track so far?* If that is the case, and records 1 and 3 are considered 'duplicates' which one should be dumped?-- Andrew Thompsonhttp://www.athompson.info/andrew/Message posted via JavaKB.comhttp://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1
0
Reply Andrew 9/18/2007 3:27:48 AM

AndrewSomething like thatAccount No  Seq No    Name123456        001          abc123456        001          abc123234        001           xyz123234        002           abd1123421        002           ijkThe message may be from some Mainframe that we dont want fix it Mainframe level. SImply using java to remove the duplicated one:123456    001    abcThanks"Andrew Thompson" <u32984@uwe> wrote in message news:7861e17ba18ec@uwe...> timothy ma and constance lee wrote:> ..>>We have a list of Record with the unqiue key like account no, and sequence>>no, ..>> If every record has a unique key formed from account> & seqence number, how cany any two records be> identical, or duplicate?*>>>..and the rest of fields are exactly the same.>>Any way for java to remove those duplicated records?>> I do not fully understand the question.>> The way you describe the records, I guess it> it might be something like (fixed width font> required for proper viewing)..>> Acc. #  | Seq. # | Field1  | Field2 | Field3> 121045     2       cat       dog      fish> 415386     3       giraffe   dog      fish> 848345     7       cat       dog      fish> 900277     4       frog      cow      whale>> .and you are saying you want to remove duplicates> in Fields1 through 3.  So the first and third record are> 'duplicates' but the second (with Giraffe) and 4th are> not?>> Am I on track so far?>> * If that is the case, and records 1 and 3 are> considered 'duplicates' which one should be> dumped?>> -- > Andrew Thompson> http://www.athompson.info/andrew/>> Message posted via JavaKB.com> http://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1> 
0
Reply timothy 9/18/2007 3:35:22 AM

On Sep 17, 8:35 pm, "timothy ma and constance lee" <timco...@shaw.ca>
wrote:
> "Andrew Thompson" <u32984@uwe> wrote in messagenews:7861e17ba18ec@uwe...
> > timothy ma and constance lee wrote:
> > ..
> >>We have a list of Record with the unqiue key like account no, and sequence
> >>no, ..
>
> > If every record has a unique key formed from account
> > & seqence number, how cany any two records be
> > identical, or duplicate?*
>
> >>..and the rest of fields are exactly the same.
> >>Any way for java to remove those duplicated records?
>
> > I do not fully understand the question.
>
> > The way you describe the records, I guess it
> > it might be something like (fixed width font
> > required for proper viewing)..
>
> > Acc. #  | Seq. # | Field1  | Field2 | Field3
> > 121045     2       cat       dog      fish
> > 415386     3       giraffe   dog      fish
> > 848345     7       cat       dog      fish
> > 900277     4       frog      cow      whale
>
> > .and you are saying you want to remove duplicates
> > in Fields1 through 3.  So the first and third record are
> > 'duplicates' but the second (with Giraffe) and 4th are
> > not?
>
> > Am I on track so far?
>
> > * If that is the case, and records 1 and 3 are
> > considered 'duplicates' which one should be
> > dumped?
>
> > --
> > Andrew Thompson
> >http://www.athompson.info/andrew/
>
> > Message posted via JavaKB.com
> >http://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1
> Andrew
>
> Something like that
>
> Account No  Seq No    Name
> 123456        001          abc
> 123456        001          abc
> 123234        001           xyz
> 123234        002           abd1
> 123421        002           ijk
>
> The message may be from some Mainframe that we dont want fix it Mainframe
> level. SImply using java to remove the duplicated one:
>
> 123456    001    abc
>
> Thanks
>

Try using a Set (probably HashSet or LinkedHashSet depending).  You'll
have to make sure that your object properly implements hashCode() and
equals(), but that shouldn't be too hard...

Also, please don't top post.


4. It makes it hard to follow the conversation.
3. Why is top-posting bad?
2. Please don't top post.
1. I like to top post.



Good luck,
Daniel.

0
Reply Daniel 9/18/2007 4:40:54 AM

timothy ma and constance lee wrote:...Please refrain from top-posting.  I find it most confusing.Notice how both Roedy and myself put our comments directly after anything worth replying to?  In best situations,we would then trim other parts of earlier messages thatwe are not commenting on.  That technique is known as'in-line with trim' posting - and is much easier to follow.>Something like that>>Account No  Seq No    Name>123456        001          abc>123456        001          abc>123234        001           xyz>123234        002           abd1>123421        002           ijkOK.  So I was wrong in guessing that the Acc./Seq. #was unique in all cases - they can also be duplicate.>The message may be from some Mainframe that we dont want fix it Mainframe >level. SImply using java to remove the duplicated one:>>123456    001    abcI suspect (without looking at the link Roedy posted)that sorting the records is one technique that might identify duplicates, but there are also other ways.For example, you might iterate the entire original listand on each iteration of the loop.- Make an object that uses all the fields as a 'key'- Use that  key to check if a record with that key already exists in a HashMap.- If not..   - add the object to the HashMap,..else..  - discard it as a duplicate.At the end of the loop, the HashMap should containonly the unique records.-- Andrew Thompsonhttp://www.athompson.info/andrew/Message posted via http://www.javakb.com
0
Reply Andrew 9/18/2007 4:55:59 AM

timothy ma and constance lee wrote:> The message may be from some Mainframe that we dont want fix it Mainframe > level.How do you receive the message?If your records comes from the SQL database, you may simply achieve your goal using "SELECT DISTINCT ... " instead of a regular "SELECT ..." statement.If there are some additional data being read with an SQL query, there is usually also possibility to read the rows in order which is consistent (partially at least) with your uniqueness key.  Because database keys are usually already indexed, it should cost nothing if you'll choose your database keys in ORDER BY clause to achieve the right order. Having even partially sorted records at Java side you may significantly seed up your process (of course, if it all is really worth of it).Otherwise, just follow some of the already suggested solutions.piotr
0
Reply Piotr 9/18/2007 3:05:10 PM

timothy ma and constance lee wrote:>> Account No  Seq No    Name>> 123456        001          abc>> 123456        001          abc>> 123234        001           xyz>> 123234        002           abd1>> 123421        002           ijkAndrew Thompson wrote:> OK.  So I was wrong in guessing that the Acc./Seq. #> was unique in all cases - they can also be duplicate.That wasn't a guess:timothy ma and constance lee wrote:> We have a list of Record with the unqiue [sic] key > like account no, and sequence no,They actually said so, then contradicted it with the data example.-- Lew
0
Reply Lew 9/18/2007 9:45:54 PM

"timothy ma and constance lee" <timcons1@shaw.ca> wrote in news:_HHHi.194784$fJ5.28279@pd7urf1no:> Andrew> > Something like that> > Account No  Seq No    Name> 123456        001          abc> 123456        001          abc> 123234        001           xyz> 123234        002           abd1> 123421        002           ijk> > The message may be from some Mainframe that we dont want fix it Mainframe > level. SImply using java to remove the duplicated one:> > 123456    001    abcQuestions:1.  do you already have a Java object that encapsulates a record of data?  If not, can you implement one?2.  does this Java object implement java.lang.Comparable?  If not, can it be made to do so?3.  if you have two duplicate records in the sequence of records, would you rather end up (after you do your processing to remove duplicates) wiht the first record, or would you rather end up with the last record of the duplicates?Suggestion:use a java.util.Set of objects (that must implement Comparable) to eliminate duplicates.  When you've added all of your collection of record objects to the Set, you will end up with a collection with no duplicates.RegardsGRB-- ---------------------------------------------------------------------Greg R. Broderick                  usenet200709@blackholio.dyndns.orgA. Top posters.Q. What is the most annoying thing on Usenet?---------------------------------------------------------------------
0
Reply Greg 9/18/2007 10:50:41 PM

Greg R. Broderick wrote:> 3.  if you have two duplicate records in the sequence of records, would you > rather end up (after you do your processing to remove duplicates) wiht the > first record, or would you rather end up with the last record of the > duplicates?A meaningless distinction in many data systems, such as SQL-based ones.For example, SQL queries make no promises about order of records absent an ORDER BY clause, and even then, none about ordering of equal values.-- Lew
0
Reply Lew 9/18/2007 11:10:11 PM

9 Replies
132 Views

(page loaded in 0.309 seconds)

Similiar Articles:













7/27/2012 2:38:30 PM


Reply: