Binary tree search vs Binary search

  • Follow


Hi

   I would like to check this result:
I have tested binary tree search API from Linux (search.h) by
searching into a binary tree generated from an unsorted dictionary a
given word. The second program uses the same dictionary, from which an
array is initialized which is sorted with qsort() then the same word
is searched with binary search fct (bsearch()).
  Running both programs shows that the binary search (the second
program) is almost twice as fast as the binary tree search.
  I have used directly the sorted distionary (so qsort() should have
the highest complexity), but still the second program is the fastest.
 Is this the correct result ? What advantage has binary tree search
wrt qsort() followed by binary search ?

thanks
Bogdan
0
Reply cristeab (60) 10/18/2010 5:14:44 PM

On Oct 18, 1:14=A0pm, Bogdan <crist...@gmail.com> wrote:
> Hi
>
> =A0 =A0I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> =A0 Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> =A0 I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> =A0Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?

Binary search (e.g. ordered array) is O(log(n)) search but O(n)
insert.  Binary tree is O(log(n)) search *and* O(log(n)) insert.

The overhead you're finding by a constant is acceptable (and falls
below the noise in big-Oh notation).  They should in theory take
roughly the same time as n =3D> oo, but for small values they'll skew.

Tom
0
Reply tom236 (284) 10/18/2010 5:43:04 PM


On Oct 19, 12:43=A0am, Tom St Denis <t...@iahu.ca> wrote:
> On Oct 18, 1:14=A0pm, Bogdan <crist...@gmail.com> wrote:
> > =A0 Running both programs shows that the binary search (the second
> > program) is almost twice as fast as the binary tree search.

> Binary search (e.g. ordered array) is O(log(n)) search but O(n)
> insert. =A0Binary tree is O(log(n)) search *and* O(log(n)) insert.

Correct.

> The overhead you're finding by a constant is acceptable

How in heaven's name do you know that??  I was once called
in to fix a poorly designed program that took 10 hours
to process data from an 8-hour shift.  Customer was running
two shifts per day OK, but couldn't start a graveyard
shift till the software was sped up!  Perhaps I should have
just told them "No problem!  Some guy on Usenet says
this is 'acceptable'."    :-)   :-)

> (and falls below the noise in big-Oh notation).

Meaningless gibberish.

> =A0They should in theory take
> roughly the same time as n =3D> oo, but for small values they'll skew.

Wrong again.  They take respective approx. times of
     a_1 + b_1 log N   and   a_2 + b_2 log N
What *is* correct is that you want to choose large N to
measure (b_1 / b_2) accurately.  To assume a priori this ratio
will be unity shows much confusion.

Hope this helps.
James Dow Allen
0
Reply jdallen2000 (489) 10/18/2010 7:04:29 PM

Bogdan <cristeab@gmail.com> writes:

>    I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
>   Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
>   I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
>  Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?

I won't repeat what's been said already, but I will add that to get a
better understanding of what's happening you'll probably have to post
the code.  It may be that matters unrelated to the abstract algorithms
such as IO or memory allocation are having a significant impact.

-- 
Ben.
0
Reply Ben 10/18/2010 9:49:56 PM

On Oct 18, 12:14=A0pm, Bogdan <crist...@gmail.com> wrote:
> Hi
>
> =A0 =A0I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> =A0 Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> =A0 I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> =A0Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?


In additional to what other have said, the pointer computation for
binary search may be a bit quicker than the pointer chasing you get
for a binary tree.  The binary search may be a bit more cache friendly
as well.  Although that's all implementation dependent, and there are
few absolutes.

A significant difference is that a binary tree can be unbalanced,
which may make certain leaf items considerably deeper in the tree than
the best case lg(n) depth.  A worst case is often generated by
inserting items in the binary tree in ascending or descending order,
which may reduce the binary tree to a sequential linked list.  There
are tree structures that remain balanced (FSVO "balance") during
insertion and deletion, but they tend to be more complex than a simple
binary tree.

0
Reply robertwessel2 (1339) 10/18/2010 10:26:22 PM

In article <0d015b21-7bd6-4600-bf8c-4aae400b3b35@p26g2000yqb.googlegroups.com>,
 Bogdan <cristeab@gmail.com> wrote:

>   I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
>  Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?

First you would have to verify your binary trees are balanced. Then you would 
have to measure where your program is spending time. Is it paging tree nodes? 
Etc.

Binary trees can be modified more quickly than an array.

-- 
Damn the living - It's a lovely life.           I'm whoever you want me to be.
Silver silverware - Where is the love?       At least I can stay in character.
Oval swimming pool - Where is the love?    Annoying Usenet one post at a time.
Damn the living - It's a lovely life.     May your creator bless and keep you.
0
Reply China 10/18/2010 10:51:07 PM

On Oct 18, 1:14=A0pm, Bogdan <crist...@gmail.com> wrote:
> Hi
>
> =A0 =A0I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> =A0 Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> =A0 I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> =A0Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?

As has been said, pre-qsorting is an average case O(n log n) operation
and then looking up O(n) entries with binary search will again require
O(n log n) time.  Inserting O(n) items in e.g. a red-black or similar
balanced tree requires O(n log n) time, and looking up O(n) items is
again O(n log n).  So asymptotically the solutions have comparable
asymptotic performance as long as the dictionary isn't modified once
sorted.  If modifications can occur, the tree works better because
each update requires only O(log n) time.  The array requires O(n) time
to update as has been explained.

Or of course if the tree implementation is not self-balancing, then
search length is a random phenomenon.

You said you're looking up "a given word."  If you are looking up only
a single word, then chance difference between path length in the
balanced tree and number of probes in the binary search table alone
could account for a factor of 2.

Otherwise, it's that constant factors related to the implementations
are different.  Inserting a string into a tree requires allocating a
node in addition to the string.  There is space overhead for pointers
in the tree that isn't there in the sorted list.  Extra space and the
way it falls in memory can affect cache and paging performance as has
been mentioned.  Rebalancing the tree is overhead that isn't present
in the array, etc.  A factor of 2 or 3 difference is not hard to
believe.

FWIW, I just happened to read this
http://ieeexplore.ieee.org/Xplore/login.jsp?url=3Dhttp%3A%2F%2Fieeexplore.i=
eee.org%2Fiel5%2F2%2F5569041%2F05530308.pdf%3Farnumber%3D5530308&authDecisi=
on=3D-203
, a nicely done article about how merely realigning critical code
segments with respect to page / cache line boundaries (by e.g.
changing the size of the OS shell environment when starting the
program) can in some cases affect run time by 20%.

0
Reply gene.ressler (527) 10/19/2010 1:44:52 AM

On 10/18/2010 9:44 PM, Gene wrote:
> On Oct 18, 1:14 pm, Bogdan<crist...@gmail.com>  wrote:
>> Hi
>>
>>     I would like to check this result:
>> I have tested binary tree search API from Linux (search.h) by
>> searching into a binary tree generated from an unsorted dictionary a
>> given word. The second program uses the same dictionary, from which an
>> array is initialized which is sorted with qsort() then the same word
>> is searched with binary search fct (bsearch()).
>>    Running both programs shows that the binary search (the second
>> program) is almost twice as fast as the binary tree search.
>>    I have used directly the sorted distionary (so qsort() should have
>> the highest complexity), but still the second program is the fastest.
>>   Is this the correct result ? What advantage has binary tree search
>> wrt qsort() followed by binary search ?
>
> As has been said, pre-qsorting is an average case O(n log n) operation
> and then looking up O(n) entries with binary search will again require
> O(n log n) time.  Inserting O(n) items in e.g. a red-black or similar
> balanced tree requires O(n log n) time, and looking up O(n) items is
> again O(n log n).  So asymptotically the solutions have comparable
> asymptotic performance as long as the dictionary isn't modified once
> sorted.

     If by "comparable" you mean "essentially the same except for
negligible fiddly bits," you are repeating Tom St Denis' error: The
fact that two processes have the same order-of-magnitude asymptotic
performance does not mean they have "comparable" performance.  For
example the time (or memory or comparisons or whatever) used two
processes A and B, both of O(1), can differ by an arbitrarily large
amount.

     Brutally simplistic example: Take an O(N log N) sorting method S
that uses 100 nsec for each comparison and negligible time for other
operations.  Add a 2.7-month delay loop to each comparison to arrive
at a new sorting method S2.  Assertion: Both S and S2 have O(N log N)
performance, even though S can sort 1000 items in a millisecond while
S2 takes 2243 years.  A millisecond is "comparable" to two-plus
millennia only for people who have way too much time on their hands.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply esosman2 (2945) 10/19/2010 2:13:21 AM

On Oct 18, 3:04=A0pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
> > The overhead you're finding by a constant is acceptable
>
> How in heaven's name do you know that?? =A0I was once called
> in to fix a poorly designed program that took 10 hours
> to process data from an 8-hour shift. =A0Customer was running
> two shifts per day OK, but couldn't start a graveyard
> shift till the software was sped up! =A0Perhaps I should have
> just told them "No problem! =A0Some guy on Usenet says
> this is 'acceptable'." =A0 =A0:-) =A0 :-)

Well I didn't say it was optimal, I just said it's not a big deal.  If
he's running into a timing hazard then it is a problem.  See how that
works [read on...]

> > (and falls below the noise in big-Oh notation).
>
> Meaningless gibberish.

No, it's not.  See the operation is log(n) time which is greater than
a constant multiple c.  O(c*log(n)) is equiv to O(log(n)) therefore a
small constant factor difference is "below the noise" or irrelevant.

> > =A0They should in theory take
> > roughly the same time as n =3D> oo, but for small values they'll skew.
>
> Wrong again. =A0They take respective approx. times of
> =A0 =A0 =A0a_1 + b_1 log N =A0 and =A0 a_2 + b_2 log N
> What *is* correct is that you want to choose large N to
> measure (b_1 / b_2) accurately. =A0To assume a priori this ratio
> will be unity shows much confusion.

You really don't understand big-Oh notation do you? ...

Tom
0
Reply tom236 (284) 10/19/2010 12:23:56 PM

Tom St Denis <tom@iahu.ca> writes:

> On Oct 18, 3:04 pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
>> Tom St Denis <tom@iahu.ca> writes: [attribution restored]

<snip talking about two O(log n) algorithms>

>> >  They should in theory take
>> > roughly the same time as n => oo, but for small values they'll skew.
>>
>> Wrong again.  They take respective approx. times of
>>      a_1 + b_1 log N   and   a_2 + b_2 log N
>> What *is* correct is that you want to choose large N to
>> measure (b_1 / b_2) accurately.  To assume a priori this ratio
>> will be unity shows much confusion.
>
> You really don't understand big-Oh notation do you? ...

Which bit makes you think that?

Maybe you should define what you mean by "they should in theory take
roughly the same time as n => oo"?

James Dow Allen is making the reasonable assumption that this means

  lim [n->oo] T1(n)/T2(n) = 1

where T1 and T2 are functions that describe the two algorithms running
times.  How do you define "taking roughly the same time" so that it
covers the example given by James?

-- 
Ben.
0
Reply Ben 10/19/2010 2:07:25 PM

On Oct 19, 12:43 am, Tom St Denis <t...@iahu.ca> wrote:
> [OP reported on a speed comparison]:
>>    [bsearch presents twice the speed of bin tree search]
> The overhead you're finding by a constant is acceptable (and falls
> below the noise in big-Oh notation).  They should in theory take
> roughly the same time as n => oo, but for small values they'll skew.

And now, he tries to defend this foolishness!  :-)

First note that OP presents a speed-doubling and Tom implies
that it will go away for large n.  :-)  :-)
Tom:  Do you still believe that, or did Googling "big-Oh" help?

Tom St Denis <tom@iahu.ca> wrote, in
news:11995ff4-8575-4327-bfc3-eb348322a142@l8g2000yql.googlegroups.com: 

> On Oct 18, 3:04�pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
>> > The overhead you're finding by a constant is acceptable

Tom, did your newsreader remove the attribution on the above
line?  It's by Tom St Denis.  If you now understand it was a foolish
remark, good for you.  But don't remove the attribution, please.

>>
>> How in heaven's name do you know that?? ... Perhaps I should have
>> just told them "No problem! �Some guy on Usenet says
>> this is 'acceptable'." � �:-) � :-)
> 
> Well I didn't say it was optimal, I just said it's not a big deal.  If
> he's running into a timing hazard then it is a problem.

No.  It's true that you don't say it's "optimal".  But it's
false that you said "it's no big deal."  What you wrote is
that it's "acceptable" and my question remains:
How do you know that???

If you meant it "might be acceptable" or "is probably fast enough"
you should have written that.  I'm not a stickler for "dotting every i",
but in this case the way you phrased your answer was, to be blunt,
inane.

It's not unusual for such a search to dominate speed performance.
Speedup by a factor of two is, as a matter of fact, not unlikely to be
important!  True, it's "often (or even usually) unimportant" but that's
not what you wrote, is it? BTW, since you're wrong on *every single
point* in your message here, I doubt you'll respond again, but if you
do, kindly have the courtesy to quote and correctly attribute your own
incorrect remarks which are under review.

>> > (and falls below the noise in big-Oh notation).
>>
>> Meaningless gibberish.
> 
> No, it's not.

Yes it is.  There is no "noise" in big-Oh notation for
it to "fall below."  If this isn't clear, please present
an example which falls *above* your alleged "noise" in
big-Oh notation.  :-)

>  O(c*log(n)) is equiv to O(log(n))

So you *did* know this much after all!
(Or did you Google for it just now?)

 
>> > �They should in theory take
>> > roughly the same time as n => oo, but for small values they'll
>> > skew. 
>>
>> Wrong again. �They take respective approx. times of
>> � � �a_1 + b_1 log N � and � a_2 + b_2 log N
>> What *is* correct is that you want to choose large N to
>> measure (b_1 / b_2) accurately. �To assume a priori this ratio
>> will be unity shows much confusion.
> 
> You really don't understand big-Oh notation do you? ...

:-)  :-).   We don't mind when ignorant newbies post here.
Even when they act like insolent jerks, they're amusing.

James Dow Allen
0
Reply gmail1217 (37) 10/19/2010 2:23:24 PM

On Mon, 18 Oct 2010 10:14:44 -0700 (PDT), Bogdan
<cristeab@gmail.com> wrote:

>Hi
>
>   I would like to check this result:
>I have tested binary tree search API from Linux (search.h) by
>searching into a binary tree generated from an unsorted dictionary a
>given word. The second program uses the same dictionary, from which an
>array is initialized which is sorted with qsort() then the same word
>is searched with binary search fct (bsearch()).
>  Running both programs shows that the binary search (the second
>program) is almost twice as fast as the binary tree search.
>  I have used directly the sorted distionary (so qsort() should have
>the highest complexity), but still the second program is the fastest.
> Is this the correct result ? What advantage has binary tree search
>wrt qsort() followed by binary search ?

The short answer is fewer pointers.  A slightly longer answer is
that at the compiled code level the tree search is more complex.
The key factor is that in the binary search one address offset
per value access must be computed, whereas in the tree search two
are needed (one for the pointer, one for the value).  A second
factor is arrays have better locality than trees.

   

0
Reply cri 10/19/2010 2:35:15 PM

James Dow Allen <gmail@jamesdowallen.nospam> might have writ, in
news:Xns9E16D98DFA353jamesdowallen@78.46.73.112: 
>> You really don't understand big-Oh notation do you? ...
> 
>:-)  :-).   We don't mind when ignorant newbies post here.
> Even when they act like insolent jerks, they're amusing.

Since I tend to be *more* patient than others
with newbies in Usenet, readers may wonder why I went out of
my way to criticise this pompous imbecile.  It's in part because
of another pompous and imbecilic post he made recently:

On Sep 7, 3:31 am, Tom St Denis <t...@iahu.ca> wrote:
> > In article
> > <201d0119-1080-41b4-bdf7-bd6355533...@b4g2000pra.googlegroups.com>, 
> > James Dow Allen  <jdallen2...@yahoo.com> wrote:
> > >What I AM asking is: Why aren't the rest of you doing this?
> > >  :-)
> 
> usually when a newb comes into a group [any group] and says things
> like "why not we do everything while standing on our heads?" it's
> because they don't know what they're talking about.
> 
> This doesn't mean we don't listen to the newbs, it just means we don't
> follow them.  It's not our fault you can't tell the difference.
> 
> As to the specifics here, the solution to the OPs problem is called a
> "processor cache" and "stack opcode optimizations" [something modern
> high performance processors have] making the need to inline your
> compare function rather moot.

What was especially amusing here is that his final paragraph, perhaps
intended to be helpful. was a complete non sequitur regardless of
how OP was misconstrued.  With a little more Googling, Denis may
soon be up-to-speed on the simpler facts about big-Oh notation,
but I think he needs to take up Remedial Reading for Comprehension.

Hope this helps.
James Dow Allen
0
Reply James 10/19/2010 2:43:52 PM

On Oct 19, 10:23=A0am, James Dow Allen <gm...@jamesdowallen.nospam>
wrote:

<snip>

This is my last reply to you since you're intent on being openly
hostile.

> First note that OP presents a speed-doubling and Tom implies
> that it will go away for large n. =A0:-) =A0:-)
> Tom: =A0Do you still believe that, or did Googling "big-Oh" help?

Do you know what asymptotic convergence is?  Maybe a 2x speedup when
n=3D10 is present but disappears when n=3D2^40?

> Tom, did your newsreader remove the attribution on the above
> line? =A0It's by Tom St Denis. =A0If you now understand it was a foolish
> remark, good for you. =A0But don't remove the attribution, please.

I removed the excess quotation to trim the reply down.  People can see
the attribution in the headers.

> No. =A0It's true that you don't say it's "optimal". =A0But it's
> false that you said "it's no big deal." =A0What you wrote is
> that it's "acceptable" and my question remains:
> How do you know that???

It does depend on the requirements whether it's "acceptable" or not.
But premature optimization is the root of all stupid.  If the existing
code uses a tree, porting it all over to a linear array might not be
worth the 2x improvement if the current performance isn't a problem.

You don't know that.  I don't know that.

> It's not unusual for such a search to dominate speed performance.
> Speedup by a factor of two is, as a matter of fact, not unlikely to be
> important! =A0True, it's "often (or even usually) unimportant" but that's
> not what you wrote, is it? BTW, since you're wrong on *every single
> point* in your message here, I doubt you'll respond again, but if you
> do, kindly have the courtesy to quote and correctly attribute your own
> incorrect remarks which are under review.

You're being pedantic to basically have something to argue over.  I
maintain I don't care one way or the other.

> Yes it is. =A0There is no "noise" in big-Oh notation for
> it to "fall below." =A0If this isn't clear, please present
> an example which falls *above* your alleged "noise" in
> big-Oh notation. =A0:-)

Um ... you clearly don't understand big-Oh notation.  I'm sorry but if
you write something like this you just don't get what big-Oh
accomplishes.

> > =A0O(c*log(n)) is equiv to O(log(n))
>
> So you *did* know this much after all!
> (Or did you Google for it just now?)

I didn't google it.  I went to college.

> > You really don't understand big-Oh notation do you? ...
>
> :-) =A0:-). =A0 We don't mind when ignorant newbies post here.
> Even when they act like insolent jerks, they're amusing.

How am I acting like an insolent jerk?  Just because you disagree with
what I wrote doesn't mean I'm a jerk.

Anyways, please don't reply to this post, I won't reply to you in
kind.  You're clearly trying to pick a fight, and truth be told I
don't really care about the outcome of this thread.

Tom
0
Reply tom236 (284) 10/19/2010 2:58:00 PM

On Oct 19, 9:58=A0pm, Tom St Denis <t...@iahu.ca> wrote:
> Do you know what asymptotic convergence is? =A0Maybe a 2x speedup when
> n=3D10 is present but disappears when n=3D2^40?

It's comments like this that make us wonder whether you
indeed understand big-Oh or not.  You claim to have Googled
to learn O(c log N) =3D O(log N), but the above comment
implies you think c =3D 1 ???   I'll admit to having been
somewhat ironic in my replies -- I suspect you've understood
the rudiments of big-Oh all along -- but why this strange insistence
that c should be 1 ?

> > Tom, did your newsreader remove the attribution on the above
> > line? =A0It's by Tom St Denis. =A0If you now understand it was a foolis=
h
> > remark, good for you. =A0But don't remove the attribution, please.
>
> I removed the excess quotation to trim the reply down. =A0People can see
> the attribution in the headers.

False.  The Reference list does not show clearly who wrote
what.  You unnecessarily snipped a single line; the most logical
reason for that was to obfuscate the attribution your own
incorrect comment.

> > There is no "noise" in big-Oh notation for
> > it to "fall below." =A0If this isn't clear, please present
> > an example which falls *above* your alleged "noise" in
> > big-Oh notation. =A0:-)
>
> Um ... you clearly don't understand big-Oh notation. =A0I'm sorry but if
> you write something like this you just don't get what big-Oh
> accomplishes.

Despite my sarcasm, I actually suspect you are more-or-less
familiar with big-Oh.  I'm mildly curious whether you're just being
nasty, or honestly think I'm not familiar with it.

And, if you insist that your nonsense about "noise" was
meaningful after all, why didn't you do what you were told and
"present an example which falls *above* your alleged noise."
  :-)

> Anyways, please don't reply to this post,

*You* pick a fight, post more insulting and useless drivel
and then ask *me* not to respond!! :-)  :-)

I don't give a shit if you respond to this one or not.

James
0
Reply jdallen2000 (489) 10/19/2010 3:13:37 PM

On 18 Okt., 19:43, Tom St Denis wrote:
>
> Binary search (e.g. ordered array) is O(log(n)) search but O(n)
> insert. =A0Binary tree is O(log(n)) search *and* O(log(n)) insert.
>
> The overhead you're finding by a constant is acceptable (and falls
> below the noise in big-Oh notation). =A0They should in theory take
> roughly the same time as n =3D> oo, but for small values they'll skew.

Tom, I know you are a smart guy.  But this time you are wrong.

Here's a simple counter example:

  Algorithm A requires log2(n)+8 "costly operations".
  Algorithm B requires 2*log2(n) "costly operations".
  Both are O(log n).
  For small n B is faster than A.
  But if you let n approach infinity, A will be faster than B.
  The ratio between the two will aproach 2, not 1.
  So, they won't require "roughly the same time".

Cheers!
SG
0
Reply Sebastian 10/20/2010 10:02:30 AM

On Oct 20, 6:02=A0am, Sebastian <s.gesem...@gmail.com> wrote:
> Tom, I know you are a smart guy. =A0But this time you are wrong.
>
> Here's a simple counter example:
>
> =A0 Algorithm A requires log2(n)+8 "costly operations".
> =A0 Algorithm B requires 2*log2(n) "costly operations".
> =A0 Both are O(log n).
> =A0 For small n B is faster than A.
> =A0 But if you let n approach infinity, A will be faster than B.
> =A0 The ratio between the two will aproach 2, not 1.
> =A0 So, they won't require "roughly the same time".

Except for all we [likely] know the ratio 2:1 appears at small sample
set sizes.  But that a comment of cache performance and not big-Oh.
For example, when the sample set size gets so big it doesn't fit in
memory, disk access will essentially destroy any advantage and they'll
both be for all intents and purposes the same speed.

In the big-Oh there is no such thing as O(2log n) as a "final
answer."  So in both cases I stand by my guess [which is all we're
doing here] that the 2x difference is probably nothing to write home
about.

I suspect a binary search to be faster overall [with the advantage
going down as size goes up], but the more important thing I said in my
post was that a) it's in the noise and b) Insert is slower than with a
tree.  Granted I should have said "it's *probably* in the noise" but
the general essence of the thought remains mostly valid.

All we know from the OP is he searched a dictionary and one was
"almost" twice as fast as the other.  Does the table have 50
elements?  50,000 elements?  50M elements?  What sort of architecture
is it?  etc...

Now why this turned into a flamewar between another poster and I I
don't know.   I never said anything derogatory to them.  Alas, this is
USENET ...

Tom
0
Reply tom236 (284) 10/20/2010 10:48:40 AM

On 10/20/2010 6:48 AM, Tom St Denis wrote:
> On Oct 20, 6:02 am, Sebastian<s.gesem...@gmail.com>  wrote:
>> Tom, I know you are a smart guy.  But this time you are wrong.
>>
>> Here's a simple counter example:
>>
>>    Algorithm A requires log2(n)+8 "costly operations".
>>    Algorithm B requires 2*log2(n) "costly operations".
>>    Both are O(log n).
>>    For small n B is faster than A.
>>    But if you let n approach infinity, A will be faster than B.
>>    The ratio between the two will aproach 2, not 1.
>>    So, they won't require "roughly the same time".
>
> Except for all we [likely] know the ratio 2:1 appears at small sample
> set sizes.  But that a comment of cache performance and not big-Oh.
> For example, when the sample set size gets so big it doesn't fit in
> memory, disk access will essentially destroy any advantage and they'll
> both be for all intents and purposes the same speed.

     All you're saying is that real computers are finite, meaning
that we'll always have N <= Nmax.  That, in turn, implies that all
real implementations of all algorithms, even bogosort, have O(1)
performance because f(N) <= f(Nmax) == constant == O(1).

     So, yes: You can, sort of, defend your misunderstanding of O()
notation, at the cost of making O() notation useless.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 10/20/2010 11:32:13 AM

On Oct 20, 7:32=A0am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 10/20/2010 6:48 AM, Tom St Denis wrote:
>
>
>
>
>
>
>
>
>
> > On Oct 20, 6:02 am, Sebastian<s.gesem...@gmail.com> =A0wrote:
> >> Tom, I know you are a smart guy. =A0But this time you are wrong.
>
> >> Here's a simple counter example:
>
> >> =A0 =A0Algorithm A requires log2(n)+8 "costly operations".
> >> =A0 =A0Algorithm B requires 2*log2(n) "costly operations".
> >> =A0 =A0Both are O(log n).
> >> =A0 =A0For small n B is faster than A.
> >> =A0 =A0But if you let n approach infinity, A will be faster than B.
> >> =A0 =A0The ratio between the two will aproach 2, not 1.
> >> =A0 =A0So, they won't require "roughly the same time".
>
> > Except for all we [likely] know the ratio 2:1 appears at small sample
> > set sizes. =A0But that a comment of cache performance and not big-Oh.
> > For example, when the sample set size gets so big it doesn't fit in
> > memory, disk access will essentially destroy any advantage and they'll
> > both be for all intents and purposes the same speed.
>
> =A0 =A0 =A0All you're saying is that real computers are finite, meaning
> that we'll always have N <=3D Nmax. =A0That, in turn, implies that all
> real implementations of all algorithms, even bogosort, have O(1)
> performance because f(N) <=3D f(Nmax) =3D=3D constant =3D=3D O(1).
>
> =A0 =A0 =A0So, yes: You can, sort of, defend your misunderstanding of O()
> notation, at the cost of making O() notation useless.

It's not useless though...

Compare Comba vs. Karatsuba multiplication...

Comba is basically n^2 * (work of MULADD) + n * (work of carry
prop) ... or O(n^2)

Where Karatsuba is

3x * comba of n/2 size + various 2n-length additions [with carry
prop], etc.. or O(n^log_2(3))

So even though Comba is FASTER at small values of n we still say
"Karatsuba is more efficient." [even though at small sizes we'd use
Comba instead].  Now if there was another technique that was
2*Karatsuba work you'd still write it as O(n^log_2(3)) even though
you'd know that at most reasonable sizes it'd be slower.  I didn't
invent big-Oh notation, so stop harping on me.  Nor am I saying "all
algorithms of equal big-Oh work are equivalent."

But we don't know for a fact that the 2:1 ratio caries through larger
[possibly practical] sizes.  All I was saying is a small constant
factor isn't an eye opener.  If the OP had wrote that it was 10x
faster then yes, clearly something is afoot, but 2x faster on say a
500 word dictionary doesn't exactly scream FUNDAMENTAL FLAW to me.  It
could just be that there are more cache hits with a dataset that
small.  Which would mean if your application solely deals with
searches [and no inserts] of 500 word datasets that a binary search is
ideal.  Nobody is arguing that [certainly I'm not].  But as to the
concept of broad generalizations I'd have a hard time extrapolating
whether a 2x speed difference FROM =3D=3D=3DONE=3D=3D=3D SAMPLE SET is sign=
ificant
or not since it IS below the asymptotic work level threshold.

IOW, stop putting words into my mouth and stop assuming that you know
more about the OPs problem than anyone else does (since the OP hasn't
explained their problem in enough detail).

Tom
0
Reply tom236 (284) 10/20/2010 11:46:57 AM

On 20 Okt., 13:46, Tom St Denis wrote:
> On Oct 20, 7:32=A0am, Eric Sosman wrote:
> > On 10/20/2010 6:48 AM, Tom St Denis wrote:
> > [...]
> > =A0 =A0 =A0All you're saying is that real computers are finite, meaning
> > that we'll always have N <=3D Nmax. =A0That, in turn, implies that all
> > real implementations of all algorithms, even bogosort, have O(1)
> > performance because f(N) <=3D f(Nmax) =3D=3D constant =3D=3D O(1).
>
> > =A0 =A0 =A0So, yes: You can, sort of, defend your misunderstanding of O=
()
> > notation, at the cost of making O() notation useless.
>
> It's not useless though...
>
> Compare Comba vs. Karatsuba multiplication...
>
> Comba is basically n^2 * (work of MULADD) + n * (work of carry
> prop) ... or O(n^2)
>
> Where Karatsuba is
>
> 3x * comba of n/2 size + various 2n-length additions [with carry
> prop], etc.. or O(n^log_2(3))
>
> So even though Comba is FASTER at small values of n we still say
> "Karatsuba is more efficient."

....in terms of the asymptotic runtime behaviour, yes.

> Now if there was another technique that was
> 2*Karatsuba work you'd still write it as O(n^log_2(3)) even though
> you'd know that at most reasonable sizes it'd be slower. =A0I didn't
> invent big-Oh notation, so stop harping on me. =A0Nor am I saying "all
> algorithms of equal big-Oh work are equivalent."

No, you just suggested that algorithms of the same time complexity
class "should in theory take roughly the same time" as n approaches
infinity.  For some definition of "take roughly the same time" this
might be the case.  But IMHO the most sensible interpretation of this
claim was -- as Ben pointed out -- that the runtime ratio between both
algorithms will approach 1 as n approaches infinity.  As pointed out,
this is not necessarily the case.  If you meant to say that the
algorithms have the same asymptotic runtime behaviour, you should have
said just that.

> But we don't know for a fact that the 2:1 ratio caries through larger
> [possibly practical] sizes.

Right. But we also don't know that it doesn't. ;-)

Cheers!
SG
0
Reply Sebastian 10/20/2010 2:27:34 PM

On Oct 20, 10:27=A0am, Sebastian <s.gesem...@gmail.com> wrote:
> On 20 Okt., 13:46, Tom St Denis wrote:
>
>
>
>
>
>
>
>
>
> > On Oct 20, 7:32=A0am, Eric Sosman wrote:
> > > On 10/20/2010 6:48 AM, Tom St Denis wrote:
> > > [...]
> > > =A0 =A0 =A0All you're saying is that real computers are finite, meani=
ng
> > > that we'll always have N <=3D Nmax. =A0That, in turn, implies that al=
l
> > > real implementations of all algorithms, even bogosort, have O(1)
> > > performance because f(N) <=3D f(Nmax) =3D=3D constant =3D=3D O(1).
>
> > > =A0 =A0 =A0So, yes: You can, sort of, defend your misunderstanding of=
 O()
> > > notation, at the cost of making O() notation useless.
>
> > It's not useless though...
>
> > Compare Comba vs. Karatsuba multiplication...
>
> > Comba is basically n^2 * (work of MULADD) + n * (work of carry
> > prop) ... or O(n^2)
>
> > Where Karatsuba is
>
> > 3x * comba of n/2 size + various 2n-length additions [with carry
> > prop], etc.. or O(n^log_2(3))
>
> > So even though Comba is FASTER at small values of n we still say
> > "Karatsuba is more efficient."
>
> ...in terms of the asymptotic runtime behaviour, yes.
>
> > Now if there was another technique that was
> > 2*Karatsuba work you'd still write it as O(n^log_2(3)) even though
> > you'd know that at most reasonable sizes it'd be slower. =A0I didn't
> > invent big-Oh notation, so stop harping on me. =A0Nor am I saying "all
> > algorithms of equal big-Oh work are equivalent."
>
> No, you just suggested that algorithms of the same time complexity
> class "should in theory take roughly the same time" as n approaches
> infinity. =A0For some definition of "take roughly the same time" this
> might be the case. =A0But IMHO the most sensible interpretation of this
> claim was -- as Ben pointed out -- that the runtime ratio between both
> algorithms will approach 1 as n approaches infinity. =A0As pointed out,
> this is not necessarily the case. =A0If you meant to say that the
> algorithms have the same asymptotic runtime behaviour, you should have
> said just that.

I'm saying it's lost in the noise because for a single sample set with
a difference BELOW the asymptotic rate it's lost.

It's like saying

strlen("hello") and strlen_mmx_super_optimized("hello") each take 15
and 20 cycles.  Therefore, the latter is always slower than the
former.  In reality, the setup time for short strings might eat into
the performance gains.  So we can say "it's lost in the noise."

You don't know that array search is ALWAYS 2X faster than a tree
search.  You know that for a SINGLE test case it was 2x faster.  In
single test case I can show that qsort is no better than bubble sort.
What does that prove?

> > But we don't know for a fact that the 2:1 ratio caries through larger
> > [possibly practical] sizes.
>
> Right. But we also don't know that it doesn't. ;-)

Hence my comment thrice now that I should have said "probably lost in
the noise."  But since you diehard pedantics won't let it be just stop
replying.  I acknowledged the mistake that you're pointing out.  There
is no more level of correction needed.

Tom
0
Reply tom236 (284) 10/20/2010 3:09:02 PM

On 10/20/2010 7:46 AM, Tom St Denis wrote:
> [...]
> IOW, stop putting words into my mouth and stop assuming that you know
> more about the OPs problem than anyone else does (since the OP hasn't
> explained their problem in enough detail).

     Nobody knows enough about the O.P.'s problem to say anything
useful about it; I don't think anyone disputes that.  We won't know
enough until (and unless) he reveals a whole lot more than he has
thus far.

     As for the words I have put into your mouth, they are

 > The overhead you're finding by a constant is acceptable (and falls
 > below the noise in big-Oh notation).  They should in theory take
 > roughly the same time as n => oo, but for small values they'll skew.

.... and if they are not your words, you are being impersonated by
someone who does not understand big-Oh.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 10/21/2010 2:22:46 AM

On Oct 18, 7:14=A0pm, Bogdan <crist...@gmail.com> wrote:
> Hi
>
> =A0 =A0I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> =A0 Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> =A0 I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> =A0Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?
>
> thanks
> Bogdan

Chances are that the binary tree search generated a degenerate tree --
that is, the tree is not balanced.  Each lookup _could_ take as long
as a linear search in the worst case.

Quicksort will take O(n log n)to sort, but the lookup is a guaranteed
O(log n) operation.

This is the reason for self-balancing trees, such as red-black trees.

Matt
0
Reply robothor (2) 10/21/2010 6:53:14 PM

Matt Long <robothor@gmail.com> might have writ, in news:e685bde5-5a77-
499c-a70e-7988598b38f2@s4g2000yql.googlegroups.com:

> Chances are that the binary tree search generated a degenerate tree --
> that is, the tree is not balanced.  Each lookup _could_ take as long
> as a linear search in the worst case....
> 
> This is the reason for self-balancing trees, such as red-black trees.

OP specified "Linux search.h."  Just as qsort() doesn't promise to use
Quick Sort, so 'man tsearch' may not inform about implementation 
details, but I *think* the gnu tsearch() *does* use red-black trees.

The inner loops of tsearch() and bsearch() are small enough that
the actual Intel opcodes could be posted and scrutinized but it seems
more fun to speculate.  Or to assume (on religious grounds?) that
the index calculation in bsearch() must take *exactly* the same
time as the pointer manipulations in tsearch().   :-)

James Dow Allen
0
Reply James 10/21/2010 7:50:00 PM

On Oct 18, 10:14=A0am, Bogdan <crist...@gmail.com> wrote:
> Hi
>
> =A0 =A0I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> =A0 Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> =A0 I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> =A0Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?

It's not surprising that searching a sorted array goes faster in a
binary fashion goes faster than a randomly inserted and not
necessarily perfectly balanced binary tree.

A binary search, on average, requires log2(N)-1 comparisons whereas
the binary tree can degnerate into a linear search if all of the keys
are inserted forwards or backwards or in runs.

I'll summarize that, if your data are either fairly static or fairly
small, and you can afford to resort it everytime something is added,
you can expect to do pretty nearly optimal by a binary search. If your
data are large and dynamic (and pretty randomly generated), a binary
tree search will pay off because you don't have to sort every time.

Basically, you didn't time the part that would have given the tree the
advantage. Under the circumstances, I'm actually a little surprised
that the binary search of a sorted array didn't beat the tree search
by *more* than the factor of two.

0
Reply maravera (53) 10/21/2010 9:46:39 PM

24 Replies
262 Views

(page loaded in 0.688 seconds)

Similiar Articles:


















7/14/2012 10:06:26 PM


Reply: