Do I have to handle hash collisions in a hash_set myself?
I did a test in which I use find() to look for objects in a hash_set.
These objects are definitely not contained, but find() sometimes finds
them anyway. See this code:
<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
hash_set
class MyContainer {
std::vector<int> v;
public:
MyContainer(){}
std::size_t hashcode() const {
std::size_t hash = 0;
for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
hash = v[i] + (hash << 6) + (hash << 16) - hash;
}
return hash;
}
void add(int i){v.push_back(i);}
};
struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode(); // problematic b/c of
collisions?
}
};
namespace gnu_namespace {
template<> struct hash<const MyContainer*> {
inline size_t operator()(const MyContainer* d) const {
return d->hashcode();
}
};
}
int getRand(int min, int max){
return ((rand() % (max-min+1)) + min);
}
int main(int argc, char** argv){
srand(time(0));
typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrs> MyMap;
MyMap myMap;
int repeat = 100000;
int size = 10;
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(0, 1000));
}
myMap.insert(h);
}
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(2000, 3000));
}
MyMap::const_iterator found = myMap.find(h);
assert(found == myMap.end()); // aborts!
}
// TODO: finally delete elements in myMap
return EXIT_SUCCESS;
}
</code>
The solution seems to be to adapt the equality condition in eqPtrs to
also test for actual equality of the members, not just equality of the
hash codes:
struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
}
};
Is that the solution, or am I doing something wrong in general?
General comments on my code are welcome!
Markus
|
|
0
|
|
|
|
Reply
|
markus.dehmann (142)
|
7/11/2008 1:20:27 AM |
|
On Jul 10, 9:20=A0pm, Markus Dehmann <markus.dehm...@gmail.com> wrote:
> Do I have to handle hash collisions in a hash_set myself?
>
> I did a test in which I use find() to look for objects in a hash_set.
> These objects are definitely not contained, but find() sometimes finds
> them anyway. See this code:
>
> <code>
> #include <iostream>
> #include <stdexcept>
> #include <ctime>
> #include <set>
> // also include header that defines gnu_namespace and includes
> hash_set
>
> class MyContainer {
> =A0 std::vector<int> v;
> public:
> =A0 MyContainer(){}
> =A0 std::size_t hashcode() const {
> =A0 =A0 std::size_t hash =3D 0;
> =A0 =A0 for(unsigned i=3D0; i<v.size(); ++i){ // sdbm function gives
> collisions sometimes
> =A0 =A0 =A0 hash =3D v[i] + (hash << 6) + (hash << 16) - hash;
> =A0 =A0 }
> =A0 =A0 return hash;
> =A0 }
> =A0 void add(int i){v.push_back(i);}
>
> };
>
> struct eqPtrs {
> =A0 bool operator()(const MyContainer* h1, const MyContainer* h2) const
> {
> =A0 =A0 return h1->hashcode() =3D=3D h2->hashcode(); // problematic b/c o=
f
> collisions?
> =A0 }
>
> };
>
> namespace gnu_namespace {
> =A0 template<> struct hash<const MyContainer*> {
> =A0 =A0 inline size_t operator()(const MyContainer* d) const {
> =A0 =A0 =A0 return d->hashcode();
> =A0 =A0 }
> =A0 };
>
> }
>
> int getRand(int min, int max){
> =A0 return ((rand() % (max-min+1)) + min);
>
> }
>
> int main(int argc, char** argv){
> =A0 srand(time(0));
> =A0 typedef hash_set<const MyContainer*, hash<const MyContainer*>,
> eqPtrs> MyMap;
> =A0 MyMap myMap;
> =A0 int repeat =3D 100000;
> =A0 int size =3D 10;
> =A0 for(int i=3D0; i<repeat; ++i){
> =A0 =A0 MyContainer* h =3D new MyContainer();
> =A0 =A0 for(int j=3D0; j<size; ++j){
> =A0 =A0 =A0 h->add(getRand(0, 1000));
> =A0 =A0 }
> =A0 =A0 myMap.insert(h);
> =A0 }
> =A0 for(int i=3D0; i<repeat; ++i){
> =A0 =A0 MyContainer* h =3D new MyContainer();
> =A0 =A0 for(int j=3D0; j<size; ++j){
> =A0 =A0 =A0 h->add(getRand(2000, 3000));
> =A0 =A0 }
> =A0 =A0 MyMap::const_iterator found =3D myMap.find(h);
> =A0 =A0 assert(found =3D=3D myMap.end()); // aborts!
> =A0 }
> =A0 // TODO: finally delete elements in myMap
> =A0 return EXIT_SUCCESS;}
>
> </code>
>
> The solution seems to be to adapt the equality condition in eqPtrs to
> also test for actual equality of the members, not just equality of the
> hash codes:
>
> struct eqPtrs {
> =A0 bool operator()(const MyContainer* h1, const MyContainer* h2) const
> {
> =A0 =A0 return h1->hashcode() =3D=3D h2->hashcode() && haveSameElements(*=
h1,
> *h2); // added condition
> =A0 }
>
> };
>
> Is that the solution, or am I doing something wrong in general?
> General comments on my code are welcome!
>
> Markus
You actually have already figured out. eqPrts should return true only
when the two myContainer's have the same elements.
Buy the way, doesn't having the same elements imply having the same
hashcode? Comparing hashcode in eqPrts is redundant.
On a different note, try use Boost.Pointer_container library, it's a
lot safer that what you have posted.
|
|
0
|
|
|
|
Reply
|
huili80 (29)
|
7/11/2008 1:44:21 AM
|
|
On Jul 10, 9:44=A0pm, huil...@gmail.com wrote:
> On Jul 10, 9:20=A0pm, Markus Dehmann <markus.dehm...@gmail.com> wrote:
>
>
>
> > Do I have to handle hash collisions in a hash_set myself?
>
> > I did a test in which I use find() to look for objects in a hash_set.
> > These objects are definitely not contained, but find() sometimes finds
> > them anyway. See this code:
>
> > <code>
> > #include <iostream>
> > #include <stdexcept>
> > #include <ctime>
> > #include <set>
> > // also include header that defines gnu_namespace and includes
> > hash_set
>
> > class MyContainer {
> > =A0 std::vector<int> v;
> > public:
> > =A0 MyContainer(){}
> > =A0 std::size_t hashcode() const {
> > =A0 =A0 std::size_t hash =3D 0;
> > =A0 =A0 for(unsigned i=3D0; i<v.size(); ++i){ // sdbm function gives
> > collisions sometimes
> > =A0 =A0 =A0 hash =3D v[i] + (hash << 6) + (hash << 16) - hash;
> > =A0 =A0 }
> > =A0 =A0 return hash;
> > =A0 }
> > =A0 void add(int i){v.push_back(i);}
>
> > };
>
> > struct eqPtrs {
> > =A0 bool operator()(const MyContainer* h1, const MyContainer* h2) const
> > {
> > =A0 =A0 return h1->hashcode() =3D=3D h2->hashcode(); // problematic b/c=
of
> > collisions?
> > =A0 }
>
> > };
>
> > namespace gnu_namespace {
> > =A0 template<> struct hash<const MyContainer*> {
> > =A0 =A0 inline size_t operator()(const MyContainer* d) const {
> > =A0 =A0 =A0 return d->hashcode();
> > =A0 =A0 }
> > =A0 };
>
> > }
>
> > int getRand(int min, int max){
> > =A0 return ((rand() % (max-min+1)) + min);
>
> > }
>
> > int main(int argc, char** argv){
> > =A0 srand(time(0));
> > =A0 typedef hash_set<const MyContainer*, hash<const MyContainer*>,
> > eqPtrs> MyMap;
> > =A0 MyMap myMap;
> > =A0 int repeat =3D 100000;
> > =A0 int size =3D 10;
> > =A0 for(int i=3D0; i<repeat; ++i){
> > =A0 =A0 MyContainer* h =3D new MyContainer();
> > =A0 =A0 for(int j=3D0; j<size; ++j){
> > =A0 =A0 =A0 h->add(getRand(0, 1000));
> > =A0 =A0 }
> > =A0 =A0 myMap.insert(h);
> > =A0 }
> > =A0 for(int i=3D0; i<repeat; ++i){
> > =A0 =A0 MyContainer* h =3D new MyContainer();
> > =A0 =A0 for(int j=3D0; j<size; ++j){
> > =A0 =A0 =A0 h->add(getRand(2000, 3000));
> > =A0 =A0 }
> > =A0 =A0 MyMap::const_iterator found =3D myMap.find(h);
> > =A0 =A0 assert(found =3D=3D myMap.end()); // aborts!
> > =A0 }
> > =A0 // TODO: finally delete elements in myMap
> > =A0 return EXIT_SUCCESS;}
>
> > </code>
>
> > The solution seems to be to adapt the equality condition in eqPtrs to
> > also test for actual equality of the members, not just equality of the
> > hash codes:
>
> > struct eqPtrs {
> > =A0 bool operator()(const MyContainer* h1, const MyContainer* h2) const
> > {
> > =A0 =A0 return h1->hashcode() =3D=3D h2->hashcode() && haveSameElements=
(*h1,
> > *h2); // added condition
> > =A0 }
>
> > };
>
> > Is that the solution, or am I doing something wrong in general?
> > General comments on my code are welcome!
>
> > Markus
>
> You actually have already figured out. eqPrts should return true only
> when the two myContainer's have the same elements.
> Buy the way, doesn't having the same elements imply having the same
> hashcode? Comparing hashcode in eqPrts is redundant.
> On a different note, try use Boost.Pointer_container library, it's a
> lot safer that what you have posted.
Oops... Boost.Pointer_container doesn't seem to have hash_set or
hash_map ...
|
|
0
|
|
|
|
Reply
|
huili80 (29)
|
7/11/2008 2:08:22 AM
|
|
On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.com> wrote:
> Do I have to handle hash collisions in a hash_set myself?
Which hash_set? There is not, and almost certainly never will
be, a hash_set in C++.
Which is, of course, a technicality. There will be an
unordered_set, and most existing hash_set are only slightly
different (but in different and incompatible ways).
> I did a test in which I use find() to look for objects in a
> hash_set. These objects are definitely not contained, but
> find() sometimes finds them anyway. See this code:
> <code>
> #include <iostream>
> #include <stdexcept>
> #include <ctime>
> #include <set>
> // also include header that defines gnu_namespace and includes
> // hash_set
If you're asking specifics about the GNU implementation, you
should probably ask an implementation specific group. But
supposing that the issue is more general, and concerns what
unordered_set will guarantee in the end.
> class MyContainer {
> std::vector<int> v;
> public:
> MyContainer(){}
> std::size_t hashcode() const {
> std::size_t hash =3D 0;
> for(unsigned i=3D0; i<v.size(); ++i){ // sdbm function gives
> collisions sometimes
> hash =3D v[i] + (hash << 6) + (hash << 16) - hash;
> }
> return hash;
> }
> void add(int i){v.push_back(i);}
> };
> struct eqPtrs {
> bool operator()(const MyContainer* h1, const MyContainer* h2) const
> {
> return h1->hashcode() =3D=3D h2->hashcode(); // problematic b/c of
> collisions?
> }
> };
You've defined two objects to be equal if their hash codes are
equal. Is that really what you want?
> namespace gnu_namespace {
> template<> struct hash<const MyContainer*> {
> inline size_t operator()(const MyContainer* d) const {
> return d->hashcode();
> }
> };
> }
> int getRand(int min, int max){
> return ((rand() % (max-min+1)) + min);
> }
> int main(int argc, char** argv){
> srand(time(0));
> typedef hash_set<const MyContainer*, hash<const MyContainer*>,
> eqPtrs> MyMap;
> MyMap myMap;
> int repeat =3D 100000;
> int size =3D 10;
> for(int i=3D0; i<repeat; ++i){
> MyContainer* h =3D new MyContainer();
> for(int j=3D0; j<size; ++j){
> h->add(getRand(0, 1000));
> }
> myMap.insert(h);
> }
> for(int i=3D0; i<repeat; ++i){
> MyContainer* h =3D new MyContainer();
> for(int j=3D0; j<size; ++j){
> h->add(getRand(2000, 3000));
> }
> MyMap::const_iterator found =3D myMap.find(h);
> assert(found =3D=3D myMap.end()); // aborts!
I'm not sure I understand. You create a random object, and
assert that it is in the container. Why should you expect it to
be in the container?
> }
> // TODO: finally delete elements in myMap
> return EXIT_SUCCESS;}
> </code>
> The solution seems to be to adapt the equality condition in
> eqPtrs to also test for actual equality of the members, not
> just equality of the hash codes:
> struct eqPtrs {
> bool operator()(const MyContainer* h1, const MyContainer* h2) const
> {
> return h1->hashcode() =3D=3D h2->hashcode() && haveSameElements(*h1,
> *h2); // added condition
> }
> };
> Is that the solution, or am I doing something wrong in general?
I don't know. It's up to you to define what equality should
mean. The only requirement is that if two elements are equal,
they have the same hash code.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
|
|
0
|
|
|
|
Reply
|
james.kanze (9590)
|
7/11/2008 9:54:57 AM
|
|
On Jul 11, 5:54=A0am, James Kanze <james.ka...@gmail.com> wrote:
> On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.com> wrote:
> > =A0 for(int i=3D0; i<repeat; ++i){
> > =A0 =A0 MyContainer* h =3D new MyContainer();
> > =A0 =A0 for(int j=3D0; j<size; ++j){
> > =A0 =A0 =A0 h->add(getRand(0, 1000));
> > =A0 =A0 }
> > =A0 =A0 myMap.insert(h);
> > =A0 }
> > =A0 for(int i=3D0; i<repeat; ++i){
> > =A0 =A0 MyContainer* h =3D new MyContainer();
> > =A0 =A0 for(int j=3D0; j<size; ++j){
> > =A0 =A0 =A0 h->add(getRand(2000, 3000));
> > =A0 =A0 }
> > =A0 =A0 MyMap::const_iterator found =3D myMap.find(h);
> > =A0 =A0 assert(found =3D=3D myMap.end()); // aborts!
>
> I'm not sure I understand. =A0You create a random object, and
> assert that it is in the container. =A0Why should you expect it to
> be in the container?
No, I define an object that is guaranteed *not* to be in the container
and ask the container to find it. The assertion says that the object
should not be found.
Markus
|
|
0
|
|
|
|
Reply
|
markus.dehmann (142)
|
7/12/2008 5:47:53 PM
|
|
On Jul 12, 7:47 pm, Markus Dehmann <markus.dehm...@gmail.com> wrote:
> On Jul 11, 5:54 am, James Kanze <james.ka...@gmail.com> wrote:
> > On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.com> wrote:
> > > for(int i=3D0; i<repeat; ++i){
> > > MyContainer* h =3D new MyContainer();
> > > for(int j=3D0; j<size; ++j){
> > > h->add(getRand(0, 1000));
> > > }
> > > myMap.insert(h);
> > > }
> > > for(int i=3D0; i<repeat; ++i){
> > > MyContainer* h =3D new MyContainer();
> > > for(int j=3D0; j<size; ++j){
> > > h->add(getRand(2000, 3000));
> > > }
> > > MyMap::const_iterator found =3D myMap.find(h);
> > > assert(found =3D=3D myMap.end()); // aborts!
> > I'm not sure I understand. You create a random object, and
> > assert that it is in the container. Why should you expect it to
> > be in the container?
> No, I define an object that is guaranteed *not* to be in the
> container and ask the container to find it. The assertion says
> that the object should not be found.
Yes, I misread this part. Except that given the way you defined
equality, you didn't guarantee that an equal object wouldn't be
in the container.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
|
|
0
|
|
|
|
Reply
|
james.kanze (9590)
|
7/12/2008 7:28:53 PM
|
|
|
5 Replies
29 Views
(page loaded in 0.099 seconds)
Similiar Articles: In coarse level parallelisation, how to handle output? - comp ...For instance, if all 100 processes are trying to find members of a set, just output ... the levels of residual ... on the long-term life of the materials used to handle ... How to set a password in gui form??? - comp.soft-sys.matlab ...How to handle the Cancel button on JOptionPane.showInputDialog ... I have a input ... Download 35 Complete GUI Examples - This is a set of MATLAB GUI ... 35 Complete GUI ... saving a figure at a set resolution - comp.soft-sys.matlab ...... figures: rez=1200; %resolution (dpi) of final graphic f=gcf; %f is the handle of ... why does execCommand('SaveAs') not work - comp.lang ... saving a figure at a set ... solve non linear equations with constraint - comp.soft-sys.matlab ...Hi, I want to solve a set of non linear equations f[X]= Y where Y is a given ... constraint on X such that x1+x2+...+Xn=1 > > I am not sure if fsolve can handle ... calculate local minimum in a large matrix - comp.soft-sys.matlab ...how to handle large linear inequality constraints ( about 100 ... how to handle large ... sys.matlab... in overland flow modeling and when fluid depth, h, goes below a set ... FWVW (gtwvw) - license freeware - download - comp.lang.xharbour ...Hi friends ! The FWVW.LIB contains a set of instructions (Classes, functions, commands) to handle with more ease gtwvw graphics features. requi... How to declare variable - comp.soft-sys.sasEither can handle single or double quotes if you quote things properly, though the ... procedure with the DECLARE statement and are assigned values by using either a SET ... Fokker-Plank matlab code - comp.soft-sys.matlabAbout Function Handle 1 141 Shiyuan ... lookup table which output a set of data 4 117 alvin veritas filesystem and directories with large number of files ...The more hash collisions you have, the more lookups (CPU) is required to look up a name. ... and directories with large number of files ..... layouts are said to handle many ... constraints in system of non-linear equations - comp.soft-sys ...how to handle large linear inequality constraints ( about 100 ... Solving linear ... Hi, I want to solve a set of non linear equations ... > I am not sure if fsolve can ... hash_set: how to handle collisions? - C / C++hash_set: how to handle collisions?. C / C++ Forums on Bytes. Hash table - Wikipedia, the free encyclopediaTherefore, most hash table implementations have some collision resolution strategy to handle such ... a structure that implements an enclosing approximation of a set ... 7/2/2012 3:30:09 AM
|