The javadoc says the formula to calcuate the hashCode for String is: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]this can result in negative numbers which is not desirable for me.>From the formula this is not clear why as always positive numbershould be generated.Any idea folks ?Second question:I am counting on hashCode to help me with generating a unique Integerkey using two concatenated strings the combination of which isgauranteed to be unique. .e.g.("LASTNAME"+"."+"EMPLOYEE-CATEGORY") will always be unique.so("LASTNAME"+"."+"EMPLOYEE-CATEGORY") .hashCode() is what I wanted touse.will above hashCode() always be unique given that the stringcombination used to generate it is unique ?help???k.
|
|
0
|
|
|
|
Reply
|
jlukar (5)
|
3/8/2007 12:31:37 AM |
|
jlukar@gmail.com wrote:> The javadoc says the formula to calcuate the hashCode for String is:> > s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]> > this can result in negative numbers which is not desirable for me.>>From the formula this is not clear why as always positive number> should be generated.> Any idea folks ?Java int arithmetic is 2's complement signed, and can wrap around frompositive to negative. For example, Integer.MAX_VALUE+1 is negative.> Second question:> > I am counting on hashCode to help me with generating a unique Integer> key using two concatenated strings the combination of which is> gauranteed to be unique. .....String hash codes are not unique. If a type has more than 2**32different values, its hash code cannot possibly be unique.A hash that is unique for all values in its domain is called a "perfecthash". There are algorithms that will generate a perfect hash for asmall fixed set of strings. Would that help? If not, you need to resortto the conventional solution - use hashing to reduce the number ofpotential collisions, but then search for the exact one you want.Presumably, you are doing this to solve a higher level problem. If youwere to describe that problem, someone may suggest a better, moreJava-ish, solution.Patricia
|
|
0
|
|
|
|
Reply
|
Patricia
|
3/8/2007 1:06:18 AM
|
|
jlukar@gmail.com wrote:
>> The javadoc says the formula to calcuate the hashCode for String is:
>>
>> s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
(N.b., the symbol ^ in this context refers to exponentiation, not XOR.)
>> this can result in negative numbers which is not desirable for me.
Why are negative numbers undesirable?
Incidentally, this hash code is not arbitrary. It is difficult to make a
well-designed hash function and this one happens to suit well for 32-bit
twos-complement arithmetic.
See Knuth for more on this.
<http://www-cs-faculty.stanford.edu/~knuth/>
<http://en.wikipedia.org/wiki/Donald_Knuth>
<http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>
I believe it's /Volume 2 - Seminumerical Algorithms/ that covers this topic.
Beware: changing hashCode() for a class mandates that you also override equals().
-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
3/8/2007 2:38:54 AM
|
|
Lew <lew@nospam.lewscanon.com> wrote:> Beware: changing hashCode() for a class mandates that you also override equals().> The necessity is the other way around. You may choose any hashCode implementation you like, so long as it is consistent with equals. Any hash code implementation that is repeatable is consistent with Object's implementation of equals, so hashCode can literally do practically anything.It's when you override equals that maintaining that consistency (namely, any two objects that are equal also have the same hash code) requires some effort.-- Chris Smith
|
|
0
|
|
|
|
Reply
|
Chris
|
3/8/2007 5:09:52 AM
|
|
jlukar@gmail.com <jlukar@gmail.com> wrote:>The javadoc says the formula to calcuate the hashCode for String is:> s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]>this can result in negative numbers which is not desirable for me.Don't use hashCode for something it's not suitable for. If negative numbersare undesirable, you're using it wrong, and probably want a different method.>>From the formula this is not clear why as always positive number>>should be generated.Think overflow.>Second question:>I am counting on hashCode to help me with generating a unique Integer>key using two concatenated strings the combination of which is>gauranteed to be unique. .Then you're using it wrong again. hashCode() has no guarantee of uniqueness,and for many data types CANNOT guarantee uniqueness.>("LASTNAME"+"."+"EMPLOYEE-CATEGORY") will always be unique.So equals() should be based on that. great.>("LASTNAME"+"."+"EMPLOYEE-CATEGORY") .hashCode() is what I wanted to>use.Not unique. There are only 2^32 possible hashCode() return values, and thereare FAR more possible strings than that, so duplicate hash for differentvalues are alwas an issue. This is called "hash collision".>will above hashCode() always be unique given that the string>combination used to generate it is unique ?No. Not even close. There's no possible hashing function that willguarantee uniqueness. ALWAYS account for collisions. For some purposes, you can get away with using a bigger hash and deciding thatyou'll live with the astronomically low chance of collision. You could usea 128-bit hash (say, MD5), which makes accidental collisions very veryunlikely, but you can't call it hashCode(), because it's not an int. --Mark Rafn dagon@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
dagon
|
3/8/2007 6:31:48 PM
|
|
Mark Rafn wrote:....> No. Not even close. There's no possible hashing function that will> guarantee uniqueness. ALWAYS account for collisions. I think you are overstating the situation. Given a limited set ofpossible key strings, it is possible to construct a perfect hash thatwill never map two strings in the set to the same code.Of course, the Java String hash is not, and cannot be, a perfect hashbecause there are more String values than there are hash codes. On theother hand, the hashCode function for Integer is perfect.Patricia
|
|
0
|
|
|
|
Reply
|
Patricia
|
3/8/2007 9:06:01 PM
|
|
>Mark Rafn wrote:>...>> No. Not even close. There's no possible hashing function that will>> guarantee uniqueness. ALWAYS account for collisions. Patricia Shanahan <pats@acm.org> wrote:>I think you are overstating the situation. Given a limited set of>possible key strings, it is possible to construct a perfect hash that>will never map two strings in the set to the same code.As long as there is an enumerated list of less than 2^32 of them, and you'rewilling to do the work, sure. But the OP had something called "lastname" inhis example, of which there are an unlimited number of possibilities.--Mark Rafn dagon@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
dagon
|
3/8/2007 9:50:04 PM
|
|
On Mar 7, 8:06 pm, Patricia Shanahan <p...@acm.org> wrote:> jlu...@gmail.com wrote:> > The javadoc says the formula to calcuate the hashCode for String is:>> > s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]>> > this can result in negative numbers which is not desirable for me.> >>From the formula this is not clear why as always positive number> > should be generated.> > Any idea folks ?>> Java int arithmetic is 2's complement signed, and can wrap around from> positive to negative. For example, Integer.MAX_VALUE+1 is negative.thank you for that. it makes sense now.>> > Second question:>> > I am counting on hashCode to help me with generating a unique Integer> > key using two concatenated strings the combination of which is> > gauranteed to be unique. .>> ...>> String hash codes are not unique. If a type has more than 2**32> different values, its hash code cannot possibly be unique.>> A hash that is unique for all values in its domain is called a "perfect> hash". There are algorithms that will generate a perfect hash for a> small fixed set of strings. Would that help? If not, you need to resortIf by "small fixed set" you mean something like only "A","F","d" willappear then that wont work.I mean that the lengght of string is fixed an no more than 11 charsoverall. You see I know that LASTNAME can not be more than 5characters. I also know that EMPLOYEE-CATEGORY can not be bigger than5 characters. So the combination is LASTNAME+.+EMPLOYEE-CATEGORY= 11 characters total.I used LASTNAME for clarity sake. The field is actually something elseand can't be more than 5 chars. Same with EMPLOYEE-CATEGORY.> to the conventional solution - use hashing to reduce the number of> potential collisions, but then search for the exact one you want.>> Presumably, you are doing this to solve a higher level problem. If you> were to describe that problem, someone may suggest a better, more> Java-ish, solution.The problem is that I want to generate uniqueue integer keys for thecombonation of "String1"+.+"String2" so that I can use it as a key ina HashMap type of cache.I wanted to do it this way so that I can alway derive the key from"String1" and "String2" that exist in my domain object and thereforedo a cache lookup based on a key derived from "String1" and "String2".thanks much for your comments already.>> Patricia
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/8/2007 10:40:54 PM
|
|
On Mar 7, 9:38 pm, Lew <l...@nospam.lewscanon.com> wrote:> jlu...@gmail.com wrote:> >> The javadoc says the formula to calcuate the hashCode for String is:>> >> s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]>> (N.b., the symbol ^ in this context refers to exponentiation, not XOR.)>> >> this can result in negative numbers which is not desirable for me.>> Why are negative numbers undesirable?Hi.I guess to think about it within the context of how I am using it anegative number is ok. I am using the generated number as a key intothe HashMap for the object. But the comments from all implies thatthis is futile because the ("String1"+"String2").hashCode() will notguarntee a unique key anyways.>> Incidentally, this hash code is not arbitrary. It is difficult to make a> well-designed hash function and this one happens to suit well for 32-bit> twos-complement arithmetic.>> See Knuth for more on this.>> <http://www-cs-faculty.stanford.edu/~knuth/>> <http://en.wikipedia.org/wiki/Donald_Knuth>> <http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>thank you for the links. I was hoping for a JAVA API for this withoutthe use of a algorithm.>> I believe it's /Volume 2 - Seminumerical Algorithms/ that covers this topic.>> Beware: changing hashCode() for a class mandates that you also override equals().>> -- Lew
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/8/2007 10:47:36 PM
|
|
jlukar@gmail.com wrote:....> The problem is that I want to generate uniqueue integer keys for the> combonation of "String1"+.+"String2" so that I can use it as a key in> a HashMap type of cache.Why not use a HashMap with the concatenated string as key? HashMap knowsall about potential collisions, and deals with them correctly.Patricia
|
|
0
|
|
|
|
Reply
|
Patricia
|
3/8/2007 10:47:44 PM
|
|
On Mar 8, 5:47 pm, Patricia Shanahan <p...@acm.org> wrote:> jlu...@gmail.com wrote:>> ...>> > The problem is that I want to generate uniqueue integer keys for the> > combonation of "String1"+.+"String2" so that I can use it as a key in> > a HashMap type of cache.>> Why not use a HashMap with the concatenated string as key? HashMap knows> all about potential collisions, and deals with them correctly.>> PatriciaYou mean instead of using the int I simply use the"String1"."String2" for the key ?The only reason I was trying to get a unique integer out of it wasbecause the domain Object I was using (lets say Employee) has aemployee ID as a key. Some employees from one building (building A)all have employee ID's. Employees from another building (building B)do not have employee ID's yet the legacy system I am working on usesemployee ID as the key into a HashMap.I was trying to use the same interface to generate integer unique ID'sfor the employess from building B.thanks
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/8/2007 10:56:25 PM
|
|
On Mar 8, 4:50 pm, d...@dagon.net (Mark Rafn) wrote:> >Mark Rafn wrote:> >...> >> No. Not even close. There's no possible hashing function that will> >> guarantee uniqueness. ALWAYS account for collisions.>> Patricia Shanahan <p...@acm.org> wrote:>> >I think you are overstating the situation. Given a limited set of> >possible key strings, it is possible to construct a perfect hash that> >will never map two strings in the set to the same code.>> As long as there is an enumerated list of less than 2^32 of them, and you're> willing to do the work, sure. But the OP had something called "lastname" in> his example, of which there are an unlimited number of possibilities.Hi. Let me clarify. The Lastname is used to oversimplify my example.In actuality the "lastname" is a business domain field that representsa string of 5 characters. So it could be "ABCDE" or anycombination of. That combined with "EMPLOYEE-CATEGORY" which is also5 characters gives a combined number of 10 characters. So somethinglike this "ABCDE"."XIEPE" is a posiblity. But any permutation ofcharacters could be in those two fields.thanks> --> Mark Rafn d...@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/8/2007 11:00:38 PM
|
|
jlukar@gmail.com wrote:> On Mar 7, 8:06 pm, Patricia Shanahan <p...@acm.org> wrote:>> jlu...@gmail.com wrote:>>> The javadoc says the formula to calcuate the hashCode for String is:>>> s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]>>> this can result in negative numbers which is not desirable for me.>>> >From the formula this is not clear why as always positive number>>> should be generated.>>> Any idea folks ?>> Java int arithmetic is 2's complement signed, and can wrap around from>> positive to negative. For example, Integer.MAX_VALUE+1 is negative.> > thank you for that. it makes sense now.> >>> Second question:>>> I am counting on hashCode to help me with generating a unique Integer>>> key using two concatenated strings the combination of which is>>> gauranteed to be unique. .>> ...>>>> String hash codes are not unique. If a type has more than 2**32>> different values, its hash code cannot possibly be unique.>>>> A hash that is unique for all values in its domain is called a "perfect>> hash". There are algorithms that will generate a perfect hash for a>> small fixed set of strings. Would that help? If not, you need to resort> > If by "small fixed set" you mean something like only "A","F","d" will> appear then that wont work.> > I mean that the lengght of string is fixed an no more than 11 chars> overall. You see I know that LASTNAME can not be more than 5> characters. I also know that EMPLOYEE-CATEGORY can not be bigger than> 5 characters. So the combination is LASTNAME+.+EMPLOYEE-CATEGORY> = 11 characters total.Can't be unique. Each character has 2^16 possible values and, not including the period, there are 10 values, so there are 2^16^10 = 2^160 ~ 10^48 different possible combinations. Of course, I assume that you are using only ~128 = 2^7 values of the characters (7-bit ASCII), but even so, a hash would produce 2^7^10 = 2^70 ~ 10^21 combinations.> > I used LASTNAME for clarity sake. The field is actually something else> and can't be more than 5 chars. Same with EMPLOYEE-CATEGORY.> > >> to the conventional solution - use hashing to reduce the number of>> potential collisions, but then search for the exact one you want.>>>> Presumably, you are doing this to solve a higher level problem. If you>> were to describe that problem, someone may suggest a better, more>> Java-ish, solution.> > The problem is that I want to generate uniqueue integer keys for the> combonation of "String1"+.+"String2" so that I can use it as a key in> a HashMap type of cache.> > I wanted to do it this way so that I can alway derive the key from> "String1" and "String2" that exist in my domain object and therefore> do a cache lookup based on a key derived from "String1" and "String2".Most HashMap types of cache implement bucketed algorithms or such.> > thanks much for your comments already.> > >> Patricia> >
|
|
0
|
|
|
|
Reply
|
Joshua
|
3/8/2007 11:06:47 PM
|
|
jlukar@gmail.com wrote:> On Mar 8, 4:50 pm, d...@dagon.net (Mark Rafn) wrote:>>> Mark Rafn wrote:>>> ...>>>> No. Not even close. There's no possible hashing function that will>>>> guarantee uniqueness. ALWAYS account for collisions.>> Patricia Shanahan <p...@acm.org> wrote:>>>>> I think you are overstating the situation. Given a limited set of>>> possible key strings, it is possible to construct a perfect hash that>>> will never map two strings in the set to the same code.>> As long as there is an enumerated list of less than 2^32 of them, and you're>> willing to do the work, sure. But the OP had something called "lastname" in>> his example, of which there are an unlimited number of possibilities.> > Hi. Let me clarify. The Lastname is used to oversimplify my example.> In actuality the "lastname" is a business domain field that represents> a string of 5 characters. So it could be "ABCDE" or any> combination of. That combined with "EMPLOYEE-CATEGORY" which is also> 5 characters gives a combined number of 10 characters. So something> like this "ABCDE"."XIEPE" is a posiblity. But any permutation of> characters could be in those two fields.> > thanks> > > >> -->> Mark Rafn d...@dagon.net <http://www.dagon.net/>> > Just saw this. There are 5^5*26^5 (I assume) = (130)^5, 5 * lg (130) ~ 35.1, so there are too many to make into a unique int. That said, if you only had 10 characters with 5 possible entries each, a potential would be to treat it as a 10-digit base 5 number: ((a[0]*5+a[1])*5+a[2])*5+...
|
|
0
|
|
|
|
Reply
|
Joshua
|
3/8/2007 11:20:11 PM
|
|
On Mar 8, 6:20 pm, Joshua Cranmer <Pidgeo...@epenguin.zzn.com> wrote:> jlu...@gmail.com wrote:> > On Mar 8, 4:50 pm, d...@dagon.net (Mark Rafn) wrote:> >>> Mark Rafn wrote:> >>> ...> >>>> No. Not even close. There's no possible hashing function that will> >>>> guarantee uniqueness. ALWAYS account for collisions.> >> Patricia Shanahan <p...@acm.org> wrote:>> >>> I think you are overstating the situation. Given a limited set of> >>> possible key strings, it is possible to construct a perfect hash that> >>> will never map two strings in the set to the same code.> >> As long as there is an enumerated list of less than 2^32 of them, and you're> >> willing to do the work, sure. But the OP had something called "lastname" in> >> his example, of which there are an unlimited number of possibilities.>> > Hi. Let me clarify. The Lastname is used to oversimplify my example.> > In actuality the "lastname" is a business domain field that represents> > a string of 5 characters. So it could be "ABCDE" or any> > combination of. That combined with "EMPLOYEE-CATEGORY" which is also> > 5 characters gives a combined number of 10 characters. So something> > like this "ABCDE"."XIEPE" is a posiblity. But any permutation of> > characters could be in those two fields.>> > thanks>> >> --> >> Mark Rafn d...@dagon.net <http://www.dagon.net/>>> Just saw this. There are 5^5*26^5 (I assume) = (130)^5, 5 * lg (130)> ~ 35.1, so there are too many to make into a unique int. That said, if> you only had 10 characters with 5 possible entries each, a potential> would be to treat it as a 10-digit base 5 number:> ((a[0]*5+a[1])*5+a[2])*5+...Thats more than I'd like to chew on. I really needed a simple uniqueinteger key derived from two strings concat of which is guaruanteed tobe unique within my business application.looks like it might not be possible in my particular situation using asimple Java API. I might have to write my own key generator derivedfrom the string concat.anything in apache commons library that might help me there if anyoneknows about.thanks much for all input.
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/9/2007 12:31:26 AM
|
|
jlukar@gmail.com <jlukar@gmail.com> wrote:>Thats more than I'd like to chew on. I really needed a simple unique>integer key derived from two strings concat of which is guaruanteed to>be unique within my business application.One possible way is to use a database. Store the string with asequence-defined identifier, and use that identifier as your ID, guaranteedunique. Or just use the string as the key.You haven't really explained why you need it to be unique, though. Java'sHashMap implementation (and every other implementation of hash I've seen)handles collisions just fine, and you can too. --Mark Rafn dagon@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
dagon
|
3/9/2007 1:12:01 AM
|
|
On Mar 8, 4:31 pm, "jlu...@gmail.com" <jlu...@gmail.com> wrote:
> On Mar 8, 6:20 pm, Joshua Cranmer <Pidgeo...@epenguin.zzn.com> wrote:
>
>
>
> > jlu...@gmail.com wrote:
> > > On Mar 8, 4:50 pm, d...@dagon.net (Mark Rafn) wrote:
> > >>> Mark Rafn wrote:
> > >>> ...
> > >>>> No. Not even close. There's no possible hashing function that will
> > >>>> guarantee uniqueness. ALWAYS account for collisions.
> > >> Patricia Shanahan <p...@acm.org> wrote:
>
> > >>> I think you are overstating the situation. Given a limited set of
> > >>> possible key strings, it is possible to construct a perfect hash that
> > >>> will never map two strings in the set to the same code.
> > >> As long as there is an enumerated list of less than 2^32 of them, and you're
> > >> willing to do the work, sure. But the OP had something called "lastname" in
> > >> his example, of which there are an unlimited number of possibilities.
>
> > > Hi. Let me clarify. The Lastname is used to oversimplify my example.
> > > In actuality the "lastname" is a business domain field that represents
> > > a string of 5 characters. So it could be "ABCDE" or any
> > > combination of. That combined with "EMPLOYEE-CATEGORY" which is also
> > > 5 characters gives a combined number of 10 characters. So something
> > > like this "ABCDE"."XIEPE" is a posiblity. But any permutation of
> > > characters could be in those two fields.
>
> > > thanks
>
> > >> --
> > >> Mark Rafn d...@dagon.net <http://www.dagon.net/>
>
> > Just saw this. There are 5^5*26^5 (I assume) = (130)^5, 5 * lg (130)
> > ~ 35.1, so there are too many to make into a unique int. That said, if
> > you only had 10 characters with 5 possible entries each, a potential
> > would be to treat it as a 10-digit base 5 number:
> > ((a[0]*5+a[1])*5+a[2])*5+...
>
> Thats more than I'd like to chew on. I really needed a simple unique
> integer key derived from two strings concat of which is guaruanteed to
> be unique within my business application.
>
> looks like it might not be possible in my particular situation using a
> simple Java API. I might have to write my own key generator derived
> from the string concat.
>
> anything in apache commons library that might help me there if anyone
> knows about.
>
> thanks much for all input.
It is very unlikely that you will find a hash code that is unique. As
a matter of fact, you may have misunderstood the use of hashes.
A hash code helps you decide which "bucket" to look in for the key
that you want. If you have 5 buckets and 30 unique keys, you aren't
going to have 30 unique hash codes, but 5. This lets you reduce your
average look up by a factor of 5. You only have to search (on
average) a bucket with 6 items in it.
So, to reiterate. A hash isn't the same as a key. The hash is a
property of the key, and is defined so that if keyA is equal to keyB,
then keyA's hash is equal to keyB's hash. The converse isn't
necessarily true.
This lets you know that if hashA != hashB, then A != B. if hashA ==
hashB, A might = B.
Having said all that...
If you restrict the range of values an object can take, then you
*might* be able to construct a hash function that results in unique
hashs for unique objects.
If your business field is exactly 5 uppercase letters, then that gives
you a possibility of 26^5 combinations, which fits within a 24 bit
number. You can get your hash value by treating the string of 5
uppercase letters as a base 26 number, A = 0, B = 1, ..., Z = 25.
The string AZWAA would be the number 0 * 26^0 + 25 * 26^1 + 22 * 26^2
+ 0 * 26^3 + 0 * 26^4
Notice that each term is a factor of a power of 26.
Hope this helps.
|
|
0
|
|
|
|
Reply
|
Daniel
|
3/9/2007 1:12:28 AM
|
|
Patricia Shanahan wrote:>> Why not use a HashMap with the concatenated string as key? HashMap knows>> all about potential collisions, and deals with them correctly.jlukar@gmail.com wrote:> You mean instead of using the int I simply use the> "String1"."String2" for the key ?Yes.-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
3/9/2007 2:29:45 AM
|
|
jlukar@gmail.com <jlukar@gmail.com> wrote:>> Thats more than I'd like to chew on. I really needed a simple unique>> integer key derived from two strings concat of which is guaruanteed to>> be unique within my business application.> Mark Rafn wrote:> One possible way is to use a database. Store the string with a> sequence-defined identifier, and use that identifier as your ID, guaranteed> unique. Or just use the string as the key.> > You haven't really explained why you need it to be unique, though. Java's> HashMap implementation (and every other implementation of hash I've seen)> handles collisions just fine, and you can too. Really he should just use the concatenated String as the key and forget all about using an int.-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
3/9/2007 2:33:06 AM
|
|
jlukar@gmail.com wrote:> [concerning non-uniqueness of hash codes]> I mean that the lengght of string is fixed an no more than 11 chars> overall. You see I know that LASTNAME can not be more than 5> characters. I also know that EMPLOYEE-CATEGORY can not be bigger than> 5 characters. So the combination is LASTNAME+.+EMPLOYEE-CATEGORY> = 11 characters total. Let's do some arithmetic. There is exactly one String of length zero. There are 65536 different Strings of length one. There are 65536^2 = 4,294,967,296 Strings of length two (using ^ to indicate exponentiation). ... There are 65536^11 Strings of length eleven. All told, there are (65536^12 - 1) / 65535 Strings oflength eleven or less. That's just a little under 1e53 (the11-character Strings make up the vast majority). Meanwhile, there are 2^32 = 4,294,967,296 distinct hashCode()values, because that's how many different `int' values exist. 4,294,967,296 may seem like a lot of hashCode() values, butnext to 95,782,432,828,056,470,036,404,813,049,000,000,000,000,-000,000,000,000+ it is as nothing. If you try to marry theStrings and the hashCode()s, you will commit bigamy on a trulygrand scale. And that is why your hope for a universal and unique hashcode for Strings -- even for short Strings -- is in vain.-- Eric Sosmanesosman@acm-dot-org.invalid
|
|
0
|
|
|
|
Reply
|
Eric
|
3/9/2007 3:37:23 AM
|
|
On Mar 8, 8:12 pm, "Daniel Pitts" <googlegrou...@coloraura.com> wrote:> On Mar 8, 4:31 pm, "jlu...@gmail.com" <jlu...@gmail.com> wrote:>>>>> It is very unlikely that you will find a hash code that is unique. As> a matter of fact, you may have misunderstood the use of hashes.Yes I believe so. I thought it can be used in place of a key as itimplied in my mind a unique key.> This lets you know that if hashA != hashB, then A != B. if hashA ==> hashB, A might = B.aha!>>> If your business field is exactly 5 uppercase letters, then that gives> you a possibility of 26^5 combinations, which fits within a 24 bit> number. You can get your hash value by treating the string of 5> uppercase letters as a base 26 number, A = 0, B = 1, ..., Z = 25.> The string AZWAA would be the number 0 * 26^0 + 25 * 26^1 + 22 * 26^2> + 0 * 26^3 + 0 * 26^4> Notice that each term is a factor of a power of 26.business field will be a 11 chars combined. The calculation anotherposter did, makes this impossible for me.>> Hope this helps.very much so. thank you.
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/9/2007 7:19:36 PM
|
|
On Mar 8, 8:12 pm, d...@dagon.net (Mark Rafn) wrote:> jlu...@gmail.com <jlu...@gmail.com> wrote:> >Thats more than I'd like to chew on. I really needed a simple unique> >integer key derived from two strings concat of which is guaruanteed to> >be unique within my business application.>> One possible way is to use a database. Store the string with a> sequence-defined identifier, and use that identifier as your ID, guaranteed> unique. Or just use the string as the key.can't do DB. It is a real-time high throughput system which means noDB in between a transaction.>> You haven't really explained why you need it to be unique, though. Java's> HashMap implementation (and every other implementation of hash I've seen)> handles collisions just fine, and you can too.I want to be able to retrieve a HashMap value using a key. So I don'tunderstand your comment about HashMap handling collisions fine. (whichcouple of other posters referred to as well).e.g.if I have "xyz123" and "abc345" both happen to get the same keyvalue of 124444 and I search the HashMap for key 124444, I will gettwo objects returned. That would not be desirable.Do you mean I should then do another check and see that I should pick"xyz123" or "abc345" ??thanks much.> --> Mark Rafn d...@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/9/2007 7:25:51 PM
|
|
that was funny. I also now know what bigamy is which I had to lookup.Thanks very much for this detail explanation. It is very clear why mystring hashCode can not possibly result in a unique key fitting intoan integer.I have since settled for using the concatentated string object as asecondary key, still utilizing my original domain business objectinterface but just filling bogus int. for the records primary key.thanks.k.On Mar 8, 10:37 pm, Eric Sosman <esos...@acm-dot-org.invalid> wrote:> jlu...@gmail.com wrote:> > [concerning non-uniqueness of hash codes]> > I mean that the lengght of string is fixed an no more than 11 chars> > overall. You see I know that LASTNAME can not be more than 5> > characters. I also know that EMPLOYEE-CATEGORY can not be bigger than> > 5 characters. So the combination is LASTNAME+.+EMPLOYEE-CATEGORY> > = 11 characters total.>> Let's do some arithmetic.>> There is exactly one String of length zero.>> There are 65536 different Strings of length one.>> There are 65536^2 = 4,294,967,296 Strings of length two> (using ^ to indicate exponentiation).>> ...>> There are 65536^11 Strings of length eleven.>> All told, there are (65536^12 - 1) / 65535 Strings of> length eleven or less. That's just a little under 1e53 (the> 11-character Strings make up the vast majority).>> Meanwhile, there are 2^32 = 4,294,967,296 distinct hashCode()> values, because that's how many different `int' values exist.>> 4,294,967,296 may seem like a lot of hashCode() values, but> next to 95,782,432,828,056,470,036,404,813,049,000,000,000,000,-> 000,000,000,000+ it is as nothing. If you try to marry the> Strings and the hashCode()s, you will commit bigamy on a truly> grand scale.>> And that is why your hope for a universal and unique hash> code for Strings -- even for short Strings -- is in vain.>> --> Eric Sosman> esos...@acm-dot-org.invalid
|
|
0
|
|
|
|
Reply
|
jlukar
|
3/9/2007 7:30:43 PM
|
|
jlukar@gmail.com wrote:> On Mar 8, 8:12 pm, d...@dagon.net (Mark Rafn) wrote:....>>You haven't really explained why you need it to be unique, though. Java's>>HashMap implementation (and every other implementation of hash I've seen)>>handles collisions just fine, and you can too.> > > I want to be able to retrieve a HashMap value using a key. So I don't> understand your comment about HashMap handling collisions fine. (which> couple of other posters referred to as well).> > e.g.> > if I have "xyz123" and "abc345" both happen to get the same key> value of 124444 and I search the HashMap for key 124444, I will get> two objects returned. That would not be desirable.> > Do you mean I should then do another check and see that I should pick> "xyz123" or "abc345" ??No, we mean that the HashMap keys should be the strings "xyz123" and"abc345". It is possible, but unlikely, that they have the same hashcode. HashMap knows that, and deals with it.If you do:myMap.put("xyz123",emp1);myMap.put("abc345",emp2);where emp1 and emp2 are different Employee references, thenmyMap.get("xyz123") will definitely return emp1, not emp2, regardless ofhashCode collisions.Patricia
|
|
0
|
|
|
|
Reply
|
Patricia
|
3/9/2007 8:10:16 PM
|
|
On Mar 9, 1:25 pm, "jlu...@gmail.com" <jlu...@gmail.com> wrote:> I want to be able to retrieve a HashMap value using a key. So I don't> understand your comment about HashMap handling collisions fine. (which> couple of other posters referred to as well).>> e.g.>> if I have "xyz123" and "abc345" both happen to get the same key> value of 124444 and I search the HashMap for key 124444, I will get> two objects returned. That would not be desirable.>> Do you mean I should then do another check and see that I should pick> "xyz123" or "abc345" ??>> thanks much.A HashMap is a reliable mapping between a set of keys and a collectionof values, and any object in java can be used as a key or stored as avalue.If you do myMap.put("xyz123", objectA) and myMap.put("abc345",objectB), then myMap.get("xyz123") will always return objectA andmyMap.get("abc345") will always return objectB (provided that youdon't overwrite either entry by using the same key with a differentobject). It doesn't matter what the hashCode result is for eitherstring, HashMap knows how to keep your objects distinct even if theirkeys have the same hashCode as long as the keys themselves are notequal.If all of the objects you are storing are represented uniquely by the11 character string you have described, then you should be able to usethat 11 character string as the direct key into the HashMap withoutworrying about the details of the implementation.
|
|
0
|
|
|
|
Reply
|
djthomp
|
3/9/2007 8:25:43 PM
|
|
djthomp wrote:....> A HashMap is a reliable mapping between a set of keys and a collection> of values, and any object in java can be used as a key or stored as a> value.The only limitation on this is that the key object's classes MUST haveequals and hashCode implemented in accordance with the contractdescribed in the Object API documentation.Of course, String does have correct equals and hashCode implementations.Patricia
|
|
0
|
|
|
|
Reply
|
Patricia
|
3/9/2007 8:56:33 PM
|
|
Patricia Shanahan wrote:> djthomp wrote:> ...>> A HashMap is a reliable mapping between a set of keys and a collection>> of values, and any object in java can be used as a key or stored as a>> value.> > The only limitation on this is that the key object's classes MUST have> equals and hashCode implemented in accordance with the contract> described in the Object API documentation.Point of pedantry: And it mustn't do anything silly with recursion. For instance, using a HashMap as a key for a HashMap it uses as a key (if you follow). It's also possible to break deserialisation dependent upon the order (and nesting) of the object.> Of course, String does have correct equals and hashCode implementations.More pedantry: Way back in JLS1 String was defined to have an unimplementable hashCode.Tom Hawtin
|
|
0
|
|
|
|
Reply
|
Tom
|
3/9/2007 10:30:46 PM
|
|
>> One possible way is to use a database. Store the string with a
>> sequence-defined identifier, and use that identifier as your ID, guaranteed
>> unique. Or just use the string as the key.
jlukar@gmail.com <jlukar@gmail.com> wrote:
>can't do DB. It is a real-time high throughput system which means no
>DB in between a transaction.
Using a sequence in a DB doesn't mean you need to query it every time. You
can cache dozens or hundreds of them, use them as needed, and throw away the
ones you don't use.
If you need a truly unique number (say, for a primary key), this is really
your best bet.
>> You haven't really explained why you need it to be unique, though. Java's
>> HashMap implementation (and every other implementation of hash I've seen)
>> handles collisions just fine, and you can too.
>
>I want to be able to retrieve a HashMap value using a key. So I don't
>understand your comment about HashMap handling collisions fine. (which
>couple of other posters referred to as well).
If you're using a java.util.HashMap, why were you worried about generating
integers, and why did you care if they were negative? It's keyed on
Object, so your string ID is fine.
>if I have "xyz123" and "abc345" both happen to get the same key
>value of 124444 and I search the HashMap for key 124444, I will get
>two objects returned. That would not be desirable.
Right, so search on your ACTUAL key - either a Key object you define, or a
string concatenation with some separator that's not part of the strings. Let
the Map implementation deal with how the hashing works.
>Do you mean I should then do another check and see that I should pick
>"xyz123" or "abc345" ??
No. If "xyz123abc345" is your business key, use it as your Map key. There's
no reason to mess with picking an integer for a key.
--
Mark Rafn dagon@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
dagon
|
3/9/2007 10:59:18 PM
|
|
jlukar@gmail.com wrote:> that was funny. I also now know what bigamy is which I had to lookup. Obligatory warning: "Kids: Don't try this at home!"-- Eric Sosmanesosman@acm-dot-org.invalid
|
|
0
|
|
|
|
Reply
|
Eric
|
3/10/2007 2:31:06 AM
|
|
Tom Hawtin wrote:> More pedantry: Way back in JLS1 String was defined to have an> unimplementable hashCode.While that is primarily of academic interest, I don't think thatmentioning it is /pedantry/, as such, is it ?;-) -- chris
|
|
0
|
|
|
|
Reply
|
Chris
|
3/10/2007 4:46:51 PM
|
|
On 7 Mar 2007 16:31:37 -0800, "jlukar@gmail.com" <jlukar@gmail.com> wrote:>>The javadoc says the formula to calcuate the hashCode for String is:>> s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]>>this can result in negative numbers which is not desirable for me.>>From the formula this is not clear why as always positive number>should be generated.>Any idea folks ?>>>Second question:>>I am counting on hashCode to help me with generating a unique Integer>key using two concatenated strings the combination of which is>gauranteed to be unique. .>>e.g.>("LASTNAME"+"."+"EMPLOYEE-CATEGORY") will always be unique.>>so>>("LASTNAME"+"."+"EMPLOYEE-CATEGORY") .hashCode() is what I wanted to>use.>>will above hashCode() always be unique given that the string>combination used to generate it is unique ?>>help???>k.Why do you believe that ("LASTNAME"+"."+"EMPLOYEE-CATEGORY") will alwaysunique? Ben
|
|
0
|
|
|
|
Reply
|
Ben
|
3/10/2007 11:02:38 PM
|
|
|
30 Replies
249 Views
(page loaded in 0.283 seconds)
|