Sir/MadamWe have a list of Record with the unqiue key like account no, and sequence no, and the rest of fields are exactly the same.Any way for java to remove those duplicated records?Thanks
|
|
0
|
|
|
|
Reply
|
timcons1 (40)
|
9/18/2007 2:02:34 AM |
|
On Tue, 18 Sep 2007 02:02:34 GMT, "timothy ma and constance lee"
<timcons1@shaw.ca> wrote, quoted or indirectly quoted someone who said
:
>We have a list of Record with the unqiue key like account no, and sequence
>no, and the rest of fields are exactly the same.
>Any way for java to remove those duplicated records?
For a canned solution, see http://mindprod.com/products2.html#SORTED
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
|
|
0
|
|
|
|
Reply
|
Roedy
|
9/18/2007 2:07:50 AM
|
|
timothy ma and constance lee wrote:...>We have a list of Record with the unqiue key like account no, and sequence >no, ..If every record has a unique key formed from account& seqence number, how cany any two records beidentical, or duplicate?*>..and the rest of fields are exactly the same.>Any way for java to remove those duplicated records?I do not fully understand the question.The way you describe the records, I guess it it might be something like (fixed width font required for proper viewing)..Acc. # | Seq. # | Field1 | Field2 | Field3121045 2 cat dog fish415386 3 giraffe dog fish848345 7 cat dog fish900277 4 frog cow whale..and you are saying you want to remove duplicates in Fields1 through 3. So the first and third record are'duplicates' but the second (with Giraffe) and 4th are not?Am I on track so far?* If that is the case, and records 1 and 3 are considered 'duplicates' which one should be dumped?-- Andrew Thompsonhttp://www.athompson.info/andrew/Message posted via JavaKB.comhttp://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1
|
|
0
|
|
|
|
Reply
|
Andrew
|
9/18/2007 3:27:48 AM
|
|
AndrewSomething like thatAccount No Seq No Name123456 001 abc123456 001 abc123234 001 xyz123234 002 abd1123421 002 ijkThe message may be from some Mainframe that we dont want fix it Mainframe level. SImply using java to remove the duplicated one:123456 001 abcThanks"Andrew Thompson" <u32984@uwe> wrote in message news:7861e17ba18ec@uwe...> timothy ma and constance lee wrote:> ..>>We have a list of Record with the unqiue key like account no, and sequence>>no, ..>> If every record has a unique key formed from account> & seqence number, how cany any two records be> identical, or duplicate?*>>>..and the rest of fields are exactly the same.>>Any way for java to remove those duplicated records?>> I do not fully understand the question.>> The way you describe the records, I guess it> it might be something like (fixed width font> required for proper viewing)..>> Acc. # | Seq. # | Field1 | Field2 | Field3> 121045 2 cat dog fish> 415386 3 giraffe dog fish> 848345 7 cat dog fish> 900277 4 frog cow whale>> .and you are saying you want to remove duplicates> in Fields1 through 3. So the first and third record are> 'duplicates' but the second (with Giraffe) and 4th are> not?>> Am I on track so far?>> * If that is the case, and records 1 and 3 are> considered 'duplicates' which one should be> dumped?>> -- > Andrew Thompson> http://www.athompson.info/andrew/>> Message posted via JavaKB.com> http://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1>
|
|
0
|
|
|
|
Reply
|
timothy
|
9/18/2007 3:35:22 AM
|
|
On Sep 17, 8:35 pm, "timothy ma and constance lee" <timco...@shaw.ca>
wrote:
> "Andrew Thompson" <u32984@uwe> wrote in messagenews:7861e17ba18ec@uwe...
> > timothy ma and constance lee wrote:
> > ..
> >>We have a list of Record with the unqiue key like account no, and sequence
> >>no, ..
>
> > If every record has a unique key formed from account
> > & seqence number, how cany any two records be
> > identical, or duplicate?*
>
> >>..and the rest of fields are exactly the same.
> >>Any way for java to remove those duplicated records?
>
> > I do not fully understand the question.
>
> > The way you describe the records, I guess it
> > it might be something like (fixed width font
> > required for proper viewing)..
>
> > Acc. # | Seq. # | Field1 | Field2 | Field3
> > 121045 2 cat dog fish
> > 415386 3 giraffe dog fish
> > 848345 7 cat dog fish
> > 900277 4 frog cow whale
>
> > .and you are saying you want to remove duplicates
> > in Fields1 through 3. So the first and third record are
> > 'duplicates' but the second (with Giraffe) and 4th are
> > not?
>
> > Am I on track so far?
>
> > * If that is the case, and records 1 and 3 are
> > considered 'duplicates' which one should be
> > dumped?
>
> > --
> > Andrew Thompson
> >http://www.athompson.info/andrew/
>
> > Message posted via JavaKB.com
> >http://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1
> Andrew
>
> Something like that
>
> Account No Seq No Name
> 123456 001 abc
> 123456 001 abc
> 123234 001 xyz
> 123234 002 abd1
> 123421 002 ijk
>
> The message may be from some Mainframe that we dont want fix it Mainframe
> level. SImply using java to remove the duplicated one:
>
> 123456 001 abc
>
> Thanks
>
Try using a Set (probably HashSet or LinkedHashSet depending). You'll
have to make sure that your object properly implements hashCode() and
equals(), but that shouldn't be too hard...
Also, please don't top post.
4. It makes it hard to follow the conversation.
3. Why is top-posting bad?
2. Please don't top post.
1. I like to top post.
Good luck,
Daniel.
|
|
0
|
|
|
|
Reply
|
Daniel
|
9/18/2007 4:40:54 AM
|
|
timothy ma and constance lee wrote:...Please refrain from top-posting. I find it most confusing.Notice how both Roedy and myself put our comments directly after anything worth replying to? In best situations,we would then trim other parts of earlier messages thatwe are not commenting on. That technique is known as'in-line with trim' posting - and is much easier to follow.>Something like that>>Account No Seq No Name>123456 001 abc>123456 001 abc>123234 001 xyz>123234 002 abd1>123421 002 ijkOK. So I was wrong in guessing that the Acc./Seq. #was unique in all cases - they can also be duplicate.>The message may be from some Mainframe that we dont want fix it Mainframe >level. SImply using java to remove the duplicated one:>>123456 001 abcI suspect (without looking at the link Roedy posted)that sorting the records is one technique that might identify duplicates, but there are also other ways.For example, you might iterate the entire original listand on each iteration of the loop.- Make an object that uses all the fields as a 'key'- Use that key to check if a record with that key already exists in a HashMap.- If not.. - add the object to the HashMap,..else.. - discard it as a duplicate.At the end of the loop, the HashMap should containonly the unique records.-- Andrew Thompsonhttp://www.athompson.info/andrew/Message posted via http://www.javakb.com
|
|
0
|
|
|
|
Reply
|
Andrew
|
9/18/2007 4:55:59 AM
|
|
timothy ma and constance lee wrote:> The message may be from some Mainframe that we dont want fix it Mainframe > level.How do you receive the message?If your records comes from the SQL database, you may simply achieve your goal using "SELECT DISTINCT ... " instead of a regular "SELECT ..." statement.If there are some additional data being read with an SQL query, there is usually also possibility to read the rows in order which is consistent (partially at least) with your uniqueness key. Because database keys are usually already indexed, it should cost nothing if you'll choose your database keys in ORDER BY clause to achieve the right order. Having even partially sorted records at Java side you may significantly seed up your process (of course, if it all is really worth of it).Otherwise, just follow some of the already suggested solutions.piotr
|
|
0
|
|
|
|
Reply
|
Piotr
|
9/18/2007 3:05:10 PM
|
|
timothy ma and constance lee wrote:>> Account No Seq No Name>> 123456 001 abc>> 123456 001 abc>> 123234 001 xyz>> 123234 002 abd1>> 123421 002 ijkAndrew Thompson wrote:> OK. So I was wrong in guessing that the Acc./Seq. #> was unique in all cases - they can also be duplicate.That wasn't a guess:timothy ma and constance lee wrote:> We have a list of Record with the unqiue [sic] key > like account no, and sequence no,They actually said so, then contradicted it with the data example.-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
9/18/2007 9:45:54 PM
|
|
"timothy ma and constance lee" <timcons1@shaw.ca> wrote in news:_HHHi.194784$fJ5.28279@pd7urf1no:> Andrew> > Something like that> > Account No Seq No Name> 123456 001 abc> 123456 001 abc> 123234 001 xyz> 123234 002 abd1> 123421 002 ijk> > The message may be from some Mainframe that we dont want fix it Mainframe > level. SImply using java to remove the duplicated one:> > 123456 001 abcQuestions:1. do you already have a Java object that encapsulates a record of data? If not, can you implement one?2. does this Java object implement java.lang.Comparable? If not, can it be made to do so?3. if you have two duplicate records in the sequence of records, would you rather end up (after you do your processing to remove duplicates) wiht the first record, or would you rather end up with the last record of the duplicates?Suggestion:use a java.util.Set of objects (that must implement Comparable) to eliminate duplicates. When you've added all of your collection of record objects to the Set, you will end up with a collection with no duplicates.RegardsGRB-- ---------------------------------------------------------------------Greg R. Broderick usenet200709@blackholio.dyndns.orgA. Top posters.Q. What is the most annoying thing on Usenet?---------------------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Greg
|
9/18/2007 10:50:41 PM
|
|
Greg R. Broderick wrote:> 3. if you have two duplicate records in the sequence of records, would you > rather end up (after you do your processing to remove duplicates) wiht the > first record, or would you rather end up with the last record of the > duplicates?A meaningless distinction in many data systems, such as SQL-based ones.For example, SQL queries make no promises about order of records absent an ORDER BY clause, and even then, none about ordering of equal values.-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
9/18/2007 11:10:11 PM
|
|
|
9 Replies
132 Views
(page loaded in 0.309 seconds)
|