iterator invalidation guarantees in tr1::unordered_map

  • Follow


Hello-

Can anyone please clarify for me, does tr1::unordered_map guarantee
that iterators remain valid always (except iterators to items that
have been removed?). This was definitely the case with the old SGI
implementation. I am trying to transition from gcc's
__gnu_cxx::hash_map to std::tr1::unordered_map in gcc 4.5, and I am
getting crashes that seem like it's not safe to store an iterator to
objects and use it later (after many other insertions and erasures of
other objects) to delete that object. Before I look deeper into the
gcc implementation I just wanted to make sure it's the case, that
according to the standard the iterators should remain valid. Thanks!

-Lewis

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply ldh 7/10/2010 7:07:53 AM

On 10 jul, 11:07, ldh <lhy...@gmail.com> wrote:

> Can anyone please clarify for me, does tr1::unordered_map guarantee
> that iterators remain valid always (except iterators to items that
> have been removed?). This was definitely the case with the old SGI
> implementation. I am trying to transition from gcc's
> __gnu_cxx::hash_map to std::tr1::unordered_map in gcc 4.5, and I am
> getting crashes that seem like it's not safe to store an iterator to
> objects and use it later (after many other insertions and erasures of
> other objects) to delete that object. Before I look deeper into the
> gcc implementation I just wanted to make sure it's the case, that
> according to the standard the iterators should remain valid. Thanks!

I'm not sure about TR1, but the draft for ISO C++0x dated 2010-03-26
says in section [unord.req], paragraphs 13 and 14:

13. The insert members shall not affect the validity of references to
container elements, but may invalidate all iterators to the container.
The erase members shall invalidate only iterators and references to
the erased elements.

14. The insert members shall not affect the validity of iterators if (N
+n) < z * B, where N is the number of elements in the container prior
to the insert operation, n is the number of elements inserted, B is
the container�s bucket count, and z is the container�s maximum load
factor.

Note this section is actually about models of Unordered Associative
Containers, which includes unordered_map, unordered_set etc.

GCC's implementation for TR1 and C++0x is probably the same.

--
P.


-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply ISO 7/11/2010 12:47:25 PM


> > Can anyone please clarify for me, does tr1::unordered_map guarantee
> > that iterators remain valid always (except iterators to items that
> > have been removed?). This was definitely the case with the old SGI
> > implementation.

I posted exactly the same question two years ago and unfortunatelly
did not get any satisfactory answer. Yes, TR1 iterators get
invalidated also on inserts, but I do not see why it has to be so. SGI
style hash (or the ones from Dinkumware) containers do not have these
restrictions and, consequently, provide the same guarantees on
iterators like std::map which is really convenient.

-Marek

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply Marek 7/12/2010 2:52:45 AM

On Jul 12, 5:52 am, Marek Vondrak <marek.vond...@gmail.com> wrote:
> > > Can anyone please clarify for me, does tr1::unordered_map guarantee
> > > that iterators remain valid always (except iterators to items that
> > > have been removed?). This was definitely the case with the old SGI
> > > implementation.
>
> I posted exactly the same question two years ago and unfortunatelly
> did not get any satisfactory answer. Yes, TR1 iterators get
> invalidated also on inserts, but I do not see why it has to be so. SGI
> style hash (or the ones from Dinkumware) containers do not have these
> restrictions and, consequently, provide the same guarantees on
> iterators like std::map which is really convenient.

OK thanks for the responses. I think I understand the reason, it is
much more flexible and perhaps more efficient not to require the
iterator to remain always valid; in order to guarantee validity of
existing iterators in the presence of rehashing, one must avoid
storing a pointer to the element's bucket inside the iterator. The SGI
implementation of an iterator just contains a pointer to the node, and
a pointer to the hash table; when you increment it and it hits the end
of the bucket, it has to re-calculate the hash value and look up the
correct bucket so it can find the next one. This makes calling the
form of erase() that takes an iterator less efficient than you might
think -- you still have to hash the key to find out what bucket you
are erasing from, so that you can re-link the other nodes in that
bucket correctly, and you still have to iterate through the whole list
unless it is doubly linked. But the main reason to force always-valid
iterators is presumably to enable calling erase() with that iterator
later, so it may end up giving you very little benefit for the price
you pay.

In any case, it seems clear what the newly standardized behavior will
be, so that's what I needed. Thanks.

-Lewis


-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply ldh 7/13/2010 1:33:13 AM

> The SGI
> implementation of an iterator just contains a pointer to the node, and
> a pointer to the hash table; when you increment it and it hits the end
> of the bucket, it has to re-calculate the hash value and look up the
> correct bucket so it can find the next one.

As I argued last year, it does not have to be done this way. Each
iterator only needs to point to a node object. Node objects can be
spliced into buckets using an intrusive list and at the same time they
can be spliced to a global list of all nodes in the container. At any
time instant, the global list equals some concatenation of bucket
lists (in some random order). The global list is used to enumerate all
objects in the container, bucket lists are used to find ranges of
objects with the same key (subsets of bucket lists). Using the global
list, one can hop from a bucket to bucket without doing any additional
searches or evaluations of node's hash values and using the node's
bucket lists, one can efficiently find places to insert new nodes to.
I used to have an implementation of hash maps and multimaps that
worked this way. The time complexities were as follows:
insert: time to find a bucket O( 1 ) + time to find a place within
bucket \approx number_of_items_in_container / number_of_buckets (when
using a hash function that distributes elements uniformly)
erase:  O( 1 ), invalidates only iterators and pointers pointing to
the deleted object
find: time to find a bucket O( 1 ) + time to find an element within
bucket \approx number_of_items_in_container / number_of_buckets
find in multimap: as above
iterator::operator++() and operator--(): O( 1 )

-Marek

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply Marek 7/13/2010 7:51:26 PM

On Jul 13, 10:51 pm, Marek Vondrak <marek.vond...@gmail.com> wrote:

> As I argued last year, it does not have to be done this way. Each
> iterator only needs to point to a node object. Node objects can be
> spliced into buckets using an intrusive list and at the same time they
> can be spliced to a global list of all nodes in the container. At any
> time instant, the global list equals some concatenation of bucket
> lists (in some random order). The global list is used to enumerate all
> objects in the container, bucket lists are used to find ranges of
> objects with the same key (subsets of bucket lists). Using the global
> list, one can hop from a bucket to bucket without doing any additional
> searches or evaluations of node's hash values and using the node's
> bucket lists, one can efficiently find places to insert new nodes to.

Yep... I think it's because of so many issues like this that hash maps
did not make it into the original standard, there are a lot of
tradeoffs to consider... I think it's probably correct that the more
common use cases need to optimize lookup times rather than iteration,
so the SGI implementation is reasonable as a general-purpose tool. I
would argue that your suggestion, which has another doubly linked list
on top of the hash map structure, is sufficiently different in terms
of performance that it's really a different container altogether; and
it too certainly has its uses.

-Lewis


-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply ldh 7/15/2010 8:52:48 AM

5 Replies
555 Views

(page loaded in 0.003 seconds)

Similiar Articles:











7/20/2012 12:48:53 PM


Reply: