Hi people,
I have some piece of code that takes way too much time to do what it
does (which is irrelevant here) and while searching for the cause I
stumbled upon the following surprising results:
The code uses a priority queue and so have lines of the form:
PQ.push(item);
and
Item item = PQ.top() ; PQ.pop();
So I added some manual profiling code around those two operations:
Timer t ; t.start();
PQ.push(item)
t.stop(); total_insert_time += t.time();
Timer t ; t.start();
Item item = PQ.top() ; PQ.pop();
t.stop(); total_remove_time += t.time();
During the particularly long run, about 150000 items are inserted in
total and each and every one of them removed. (the main loop finishes
when the PQ is empty)
But here is the puzzing (to me at least) result:
total_insert_time : 34 seconds
total_remove_time: 240 seconds
And this ratio is more or less consistent, not only among several runs
of the same algorithm with the same input in my mahcine but among
several runs on different platforms with different STL implementations
and machines.
FWIW, in all the different paltforms the code is compiled with the
default options, which in many cases means a range checked STL.
What I found very odd here is that pop_heap takes 8-10 times longer
than push_heap in an escenario where total the number of items are
balanced (the PQ is consumed until it's empty)
And there's more... it turns out that 95% of the items are insterted in
a initialization phase before the first is ever removed..... after the
first remove, a run of items may be inserted for each remove, but only
5% of the total is inserted in this phase.
I tried putting all the items from the initialization phase (95% of the
total) in a vector, with storage reserved in advance for 200000 items
to avoid additional allocations, then constructing the PQ with that
sequence (which calls make_heap up front), but the results are roughly
the same... removing all of the items takes 8-10 times longer than
inserting them.
Is this expected?
What can I do?
Given that 95% of the items are inserted up front, I tried things like
inserting the items lineraly in a list<>, sorting after the initialization
phase, then inserting in order for the remaining 5%, but then the sort along
takes around 250 secs, so this is roughly the same.
TIA
Fernando Cacciola
SciSoft
http://fcacciola.50webs.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
fernando_cacciola (3)
|
6/27/2006 11:35:14 AM |
|
Fernando Cacciola wrote:
> Hi people,
>
>
> I have some piece of code that takes way too much time to do what it
> does (which is irrelevant here) and while searching for the cause I
> stumbled upon the following surprising results:
>
> The code uses a priority queue and so have lines of the form:
>
> PQ.push(item);
>
> and
>
> Item item = PQ.top() ; PQ.pop();
>
> So I added some manual profiling code around those two operations:
>
> Timer t ; t.start();
> PQ.push(item)
> t.stop(); total_insert_time += t.time();
>
> Timer t ; t.start();
> Item item = PQ.top() ; PQ.pop();
> t.stop(); total_remove_time += t.time();
>
> During the particularly long run, about 150000 items are inserted in
> total and each and every one of them removed. (the main loop finishes
> when the PQ is empty)
>
> But here is the puzzing (to me at least) result:
>
> total_insert_time : 34 seconds
> total_remove_time: 240 seconds
The results should not be puzzling at all. These timings merely confirm
that that it is faster to add items at the back of an array (actually,
a std::vector in this case) than it is to remove items from the front
(or conversely, that it is faster to remove items from the back of a
std::vector than to add items at its front).
The reason for the difference should be clear: adding (or "pushing")
items at the back of the vector (serving here as the priority queue) is
a constant time operation; whereas removing even just one item from the
front necessitates shifting all of the remaining items forward to fill
the space just vacated. Worse, removing items from the front requires
more and more work as the number of items in the queue increases.
One obvious way to avoid this kind of slowdown would be to use a
different kind of container to implement a priority queue. Ideally this
replacement container would not favor either the front or back when it
came to inserting or removing items. In other words, either operation
would take the same amount of time at either end of the container.
Fortunately, the Standard anticipated that a container with exactly
that property would be useful - which is why a std::deque is a standard
C++ class of container - and the one that will solve this performance
issue.
Greg
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Greg
|
6/28/2006 10:40:55 AM
|
|
Greg Herlihy wrote:
> Fernando Cacciola wrote:
> > Hi people,
> >
> >
> > I have some piece of code that takes way too much time to do what it
> > does (which is irrelevant here) and while searching for the cause I
> > stumbled upon the following surprising results:
> >
> > The code uses a priority queue and so have lines of the form:
> >
> > PQ.push(item);
> >
> > and
> >
> > Item item = PQ.top() ; PQ.pop();
> >
> > So I added some manual profiling code around those two operations:
> >
> > Timer t ; t.start();
> > PQ.push(item)
> > t.stop(); total_insert_time += t.time();
> >
> > Timer t ; t.start();
> > Item item = PQ.top() ; PQ.pop();
> > t.stop(); total_remove_time += t.time();
> >
> > During the particularly long run, about 150000 items are inserted in
> > total and each and every one of them removed. (the main loop finishes
> > when the PQ is empty)
> >
> > But here is the puzzing (to me at least) result:
> >
> > total_insert_time : 34 seconds
> > total_remove_time: 240 seconds
>
> The results should not be puzzling at all. These timings merely confirm
> that that it is faster to add items at the back of an array (actually,
> a std::vector in this case) than it is to remove items from the front
> (or conversely, that it is faster to remove items from the back of a
> std::vector than to add items at its front).
>
> The reason for the difference should be clear: adding (or "pushing")
> items at the back of the vector (serving here as the priority queue) is
> a constant time operation; whereas removing even just one item from the
> front necessitates shifting all of the remaining items forward to fill
> the space just vacated. Worse, removing items from the front requires
> more and more work as the number of items in the queue increases.
This would be true if std::priority_queue was implemented in such a
naive way.
In GNU C++ library for example, push() is container's push_back()
followed by push_heap(). The latter shifts the new element up the tree
until it takes a right position. pop() is pop_heap() followed by
container's pop_back(). pop_heap() exchanges the first (topmost)
element with the last one and then shifts the first element down the
tree. The point to note here is that no container's pop_front() is
involved.
I would expect any other C++ library implementation do the same since
this optimal version of algorithm is described in most algorithm
textbooks. http://en.wikipedia.org/wiki/Binary_heap
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maxim
|
6/29/2006 8:51:20 AM
|
|
Greg Herlihy wrote:
> Fernando Cacciola wrote:
> > Hi people,
> >
> >
> > I have some piece of code that takes way too much time to do what it
> > does (which is irrelevant here) and while searching for the cause I
> > stumbled upon the following surprising results:
> >
> > The code uses a priority queue and so have lines of the form:
> >
> > PQ.push(item);
> >
> > and
> >
> > Item item = PQ.top() ; PQ.pop();
> >
> > So I added some manual profiling code around those two operations:
> >
> > Timer t ; t.start();
> > PQ.push(item)
> > t.stop(); total_insert_time += t.time();
> >
> > Timer t ; t.start();
> > Item item = PQ.top() ; PQ.pop();
> > t.stop(); total_remove_time += t.time();
> >
> > During the particularly long run, about 150000 items are inserted in
> > total and each and every one of them removed. (the main loop finishes
> > when the PQ is empty)
> >
> > But here is the puzzing (to me at least) result:
> >
> > total_insert_time : 34 seconds
> > total_remove_time: 240 seconds
>
> The results should not be puzzling at all. These timings merely confirm
> that that it is faster to add items at the back of an array (actually,
> a std::vector in this case) than it is to remove items from the front
> (or conversely, that it is faster to remove items from the back of a
> std::vector than to add items at its front).
>
> The reason for the difference should be clear: adding (or "pushing")
> items at the back of the vector (serving here as the priority queue) is
> a constant time operation; whereas removing even just one item from the
> front necessitates shifting all of the remaining items forward to fill
> the space just vacated. Worse, removing items from the front requires
> more and more work as the number of items in the queue increases.
>
> One obvious way to avoid this kind of slowdown would be to use a
> different kind of container to implement a priority queue. Ideally this
> replacement container would not favor either the front or back when it
> came to inserting or removing items. In other words, either operation
> would take the same amount of time at either end of the container.
> Fortunately, the Standard anticipated that a container with exactly
> that property would be useful - which is why a std::deque is a standard
> C++ class of container - and the one that will solve this performance
> issue.
>
No, that's not it.
The timings only confirm what is already known: that the pop_heap
operation is slower than the push_heap operation by a constant factor.
Take a look at your favorite book on algorithms and you will notice
that the pop operation does more per loop iteration than the push (heap
insertion uses an "up-heap" operation, while heap removal uses a
"down-heap" operation").
I implemented a simple version of heap insert/remove based on the code
from Sedgewick's "Algorithms in C" using a pre-allocated vector of the
desired size, and the results show about the same relation between
pop/push timings (using VC++ 2003 Free download and the STL that comes
with it). Actually, my implementation of pop_heap was usually a little
slower that the STL one.
BTW, looking at the source code for pop_heap() on my library, I noticed
that it works a little differently than I am used to see. I don't know
if it is reasonable to ask for an explanation on this, considering that
it is copywrited code, but knowning that the guy who wrote the lib that
comes with VC reads this group :), why it that? (or perhaps I could not
understand the code correctly...)
Regards,
TM
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
TiagoMartinez
|
6/29/2006 8:56:10 AM
|
|
Fernando Cacciola wrote:
> Hi people,
>
>
> I have some piece of code that takes way too much time to do what it
> does (which is irrelevant here) and while searching for the cause I
> stumbled upon the following surprising results:
>
> During the particularly long run, about 150000 items are inserted in
> total and each and every one of them removed. (the main loop finishes
> when the PQ is empty)
>
> But here is the puzzing (to me at least) result:
>
> total_insert_time : 34 seconds
> total_remove_time: 240 seconds
>
> And this ratio is more or less consistent, not only among several runs
> of the same algorithm with the same input in my mahcine but among
> several runs on different platforms with different STL implementations
> and machines.
>
> FWIW, in all the different paltforms the code is compiled with the
> default options, which in many cases means a range checked STL.
>
> What I found very odd here is that pop_heap takes 8-10 times longer
> than push_heap in an escenario where total the number of items are
> balanced (the PQ is consumed until it's empty)
>
> And there's more... it turns out that 95% of the items are insterted in
> a initialization phase before the first is ever removed..... after the
> first remove, a run of items may be inserted for each remove, but only
> 5% of the total is inserted in this phase.
> I tried putting all the items from the initialization phase (95% of the
> total) in a vector, with storage reserved in advance for 200000 items
> to avoid additional allocations, then constructing the PQ with that
> sequence (which calls make_heap up front), but the results are roughly
> the same... removing all of the items takes 8-10 times longer than
> inserting them.
>
> Is this expected?
>
> What can I do?
>
> Given that 95% of the items are inserted up front, I tried things like
> inserting the items lineraly in a list<>, sorting after the initialization
> phase, then inserting in order for the remaining 5%, but then the sort along
> takes around 250 secs, so this is roughly the same.
You can use a priority queue for sorting: insert all elements into it
and pop them one by one. So, you cannot expect to take less time than
sorting takes (in this case, where almost all elements are in the
priority queue at the same time).
I find it logical that the removal phase takes longer than the insertion
phase. You have to know something about the implementation, of course.
If the priority queue uses a heap, my argument goes as follows.
Suppose we have a heap which is implemented by means of a vector v. That
means that every element v[i] is not less than the its two child
elements v[2*i+1] and v[2*i+2] (if they exist). When we insert element,
we initially do a push_back and then perform swaps with its parent as
long as its parent is less than the newly inserted element. If we insert
elements in random order, we normally do not have much swapping to do:
most elements belong to the lower levels of the heap.
We remove elements by calling pop(). Now, this does not remove an
arbitrary element, but the largest element. This hole in the heap has to
be filled. This can be done as follows:
- decide which of the two children is largest
- move that child up
- repeat this with the new hole created (one level deeper) until you
reach the bottom level
This leaves a hole at the bottom level. Move v.back() in this hole and,
if necessary, swap it up in the heap, just as in a regular insertion.
There are alternative ways of doing the removal, but the important thing
is that you always do some work on every level of the heap (so log(N) work).
Geert-Jan
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Geert
|
6/29/2006 8:59:27 AM
|
|
Fernando Cacciola wrote:
> Hi people,
>
>
> I have some piece of code that takes way too much time to do what it
> does (which is irrelevant here) and while searching for the cause I
> stumbled upon the following surprising results:
>
> The code uses a priority queue and so have lines of the form:
>
> PQ.push(item);
>
> and
>
> Item item = PQ.top() ; PQ.pop();
>
> So I added some manual profiling code around those two operations:
> [...]
> But here is the puzzing (to me at least) result:
>
> total_insert_time : 34 seconds
> total_remove_time: 240 seconds
> [...]
It shouldn't be that puzzling if you understand how the heap algoritms
work. The C++ standard doesn't specify how to implement the heap,
but the common implementation of a heap in the range [first,last) is to
consider it a binary tree where first[k] is the parent of first[2*k+1]
and first[2*k+2] and the parent is no smaller (given the supplied
strict weak ordering) than any of its children. Then the push_heap just
compares the new element to the parent and moves it up in the tree
until the parent is no smaller than the new element. On the other hand
pop_heap puts the element last[-1] in the place of the top one and
moves it down the tree as appropriate. Not only it has to compare both
children in each step but there are more steps on average than in the
insertion case because the element last[-1] was originally on the
bottom of the heap.
Btw., the computational complexity of make_heap for n elements is O(n)
while the removal of all n elements from the heap takes O(n log n).
This should be a hint that pop_heap is the heap operation which does
the actual work, so you can't expect it to be much faster than it is.
Cheers
Vladimir Marko
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Vladimir
|
6/29/2006 9:00:23 AM
|
|
Fernando Cacciola wrote:
> The code uses a priority queue ...
> What I found very odd here is that pop_heap takes 8-10 times longer
> than push_heap in an escenario where total the number of items are
> balanced (the PQ is consumed until it's empty)
Expected average time for binary heap push is O(1). Expected time for
pop is O(log(n)). Since your log2(n) is about 18, the results aren't
surprising.
See http://en.wikipedia.org/wiki/Binary_heap
There isn't a comparison-based structure that can do both insert and
delete-min in O(1). If there was, you could do comparison-based sorting
in O(n).
It is possible that your data has a particular structure that would
allow speedups (for instance bucket-based arrangements).
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
wade
|
6/29/2006 11:21:21 AM
|
|
In article <1151436038.261656.33310@x69g2000cwx.googlegroups.com>, Greg
Herlihy <greghe@pacbell.net> wrote:
> Fernando Cacciola wrote:
> > Hi people,
> >
> >
> > I have some piece of code that takes way too much time to do what it
> > does (which is irrelevant here) and while searching for the cause I
> > stumbled upon the following surprising results:
> >
> > The code uses a priority queue and so have lines of the form:
> >
> > PQ.push(item);
> >
> > and
> >
> > Item item = PQ.top() ; PQ.pop();
> > But here is the puzzing (to me at least) result:
> >
> > total_insert_time : 34 seconds
> > total_remove_time: 240 seconds
>
> The results should not be puzzling at all.
Well my compiler's implementation of std::priority_queue shows
the opposite.:) It all depends on the implementation of this
priority_queue. It has the same API as std::priority_queue, so I used
std::priority queue and it uses push_heap(),pop_heap() to do
pushing/popping, so vector vs deque should probably favor vector,
> These timings merely confirm
> that that it is faster to add items at the back of an array (actually,
> a std::vector in this case) than it is to remove items from the front
> (or conversely, that it is faster to remove items from the back of a
> std::vector than to add items at its front).
>
std::push_heap() wants the data to add to be the last element of the
sequence and pop_heap() swaps the first element and reheaps the rest
of the sequnce. The result is the last element of the sequence. All
inserts/erasures occur at the end of the sequence so the container
should not be the problem, it might be a poor implementation of
std::vector but perhaps if the queue is fixed in size some special
container requires empty(), front(), push_back(),pop_back() ,begin(),
end() methods with the usual meanings. The iterators returned by
begin(), end() must be random access iterators. Also requires some
of the standard container typedefs.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Carl
|
6/29/2006 11:30:52 AM
|
|
Greg Herlihy wrote:
>
> The results should not be puzzling at all. These timings merely confirm
> that that it is faster to add items at the back of an array (actually,
> a std::vector in this case) than it is to remove items from the front
> (or conversely, that it is faster to remove items from the back of a
> std::vector than to add items at its front).
>
> The reason for the difference should be clear: adding (or "pushing")
> items at the back of the vector (serving here as the priority queue) is
> a constant time operation; whereas removing even just one item from the
> front necessitates shifting all of the remaining items forward to fill
> the space just vacated. Worse, removing items from the front requires
> more and more work as the number of items in the queue increases.
The OP didn't mention explicitly whether the priority queue he used was
std::priority_queue<> or anything else, but std::priority_queue<>::pop()
doesn't remove items from the front. (Actually, std::vector<> doesn't
even support pop_front() operation! So, if std::priority_queue<>::pop()
used pop_front(), it would not even compile.)
Instead, it uses std::pop_heap() and C::pop_back(), where C is the
container type. 23.2.3.2.2 says, essentially:
void push(const value_type& x)
{
c.push_back(x);
push_heap(c.begin(), c.end(), comp);
}
void pop()
{
pop_heap(c.begin(), c.end(), comp);
c.pop_back();
}
25.3.6.1-2 specifies that std::push_heap() should take at most log(N)
comparisons, and std::pop_heap() at most 2 * log(N) comparisons. This
may explain why std::priority_queue<>::pop() takes more time than std::
priority_queue<>::push(), but the empirical ratio of 1:7 is more than
what can be naturally expected from the specification. Maybe std::push_
heap() is implemented much more efficiently than std::pop_heap()? Or
the difference could be largely due to the nature of the data sequence.
> One obvious way to avoid this kind of slowdown would be to use a
> different kind of container to implement a priority queue. Ideally this
> replacement container would not favor either the front or back when it
> came to inserting or removing items. In other words, either operation
> would take the same amount of time at either end of the container.
> Fortunately, the Standard anticipated that a container with exactly
> that property would be useful - which is why a std::deque is a standard
> C++ class of container - and the one that will solve this performance
> issue.
If a different kind of container could have solved the problem, the
default second template argument of std::priority_queue<> would have
been std::deque<> instead of std::vector<> (which is indeed the case
for std::queue<>), but it is not the case.
--
Seungbeom Kim
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Seungbeom
|
6/29/2006 11:34:33 AM
|
|
|
8 Replies
350 Views
(page loaded in 0.198 seconds)
|