Efficient data structures

  • Follow


Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained 
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle 
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

Regards,
Chris
0
Reply plfriko (499) 8/15/2006 1:56:54 PM

Christian Christmann wrote:
> Hi,
>
> I'm looking for a data structure where I can store
> arbitrary elements and than efficiently check if an
> element (of same type as the stored elements) is contained
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle
> it with low komplexity.
>
> My idea was to use a STL set and then do something like
>
> myset.find( element2Search ) != myset.end()
>
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?

A heterogenous container calls for boost::any. You would have to write
your own compare function, however, for ordering in the set (do you
really want uniqueness and if so, in what sense -- per type or per
value? if not, you might consider a sorted std::vector<boost::any> and
then the standard search functions).

Cheers! --M

0
Reply mlimber (2146) 8/15/2006 2:22:36 PM


[changed followup to c.l.c++]

Christian Christmann wrote:
> Hi,
>
> I'm looking for a data structure where I can store
> arbitrary elements and than efficiently check if an
> element (of same type as the stored elements) is contained
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle
> it with low komplexity.
>
> My idea was to use a STL set and then do something like
>
> myset.find( element2Search ) != myset.end()
>
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?

A heterogenous container calls for boost::any. You would have to write
your own compare function, however, for ordering in the set (do you
really want uniqueness and if so, in what sense -- per type or per
value? if not, you might consider a sorted std::vector<boost::any> and
then the standard search functions).

Cheers! --M

0
Reply mlimber (2146) 8/15/2006 2:24:13 PM

In article <pan.2006.08.15.13.56.54.191468@yahoo.de>, plfriko@yahoo.de 
says...

[ ... ]

> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?

If you really wanted linear complexity, you could use std::find instead.

You almost certainly do NOT want to do that though. Just for example, 
let's assume your container holds about a million objects. Linear 
complexity means that on average it has to look at about half a million 
objects to find the one you want. Logarithmic complexity means it needs 
to look at about twenty objects to find the one you want.

In theory this can't be translated directly to speed -- but 
realistically, unless you're looking at quite small collections of 
objects, you can expect a logarithmic search to be quite a bit faster.

My advice would be to try out the set. Chances are it'll be fast enough, 
and you can leave it at that. If it's not, you can look into a couple of 
alternatives, such as doing a binary search in a sorted vector, or 
perhaps switching to a hash table of some sort.

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.
0
Reply jcoffin (2240) 8/15/2006 2:25:43 PM

On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote:

>>
>> My idea was to use a STL set and then do something like
>>
>> myset.find( element2Search ) != myset.end()
>>
>> For sorted associative containers the "find" function
>> has a logarithmic complexity. Are there any approaches
>> with linear complexity?
> 
> A heterogenous container calls for boost::any. You would have to write
> your own compare function, however, for ordering in the set (do you
> really want uniqueness and if so, in what sense -- per type or per
> value? if not, you might consider a sorted std::vector<boost::any> and
> then the standard search functions).

Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

0
Reply plfriko (499) 8/15/2006 3:18:40 PM

Christian Christmann wrote:
> I'm looking for a data structure where I can store
> arbitrary elements and than efficiently check if an
> element (of same type as the stored elements) is contained
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle
> it with low komplexity.
>
> My idea was to use a STL set and then do something like
>
> myset.find( element2Search ) != myset.end()
>
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?

Unsorted containers, obviously. But I fail to see why you'd
prefer linear over logarithmic complexity. 

HTH,
Michiel Salters

0
Reply Michiel.Salters (271) 8/15/2006 3:22:10 PM

Christian Christmann wrote:
> On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote:
>
>>>
>>> My idea was to use a STL set and then do something like
>>>
>>> myset.find( element2Search ) != myset.end()
>>>
>>> For sorted associative containers the "find" function
>>> has a logarithmic complexity. Are there any approaches
>>> with linear complexity?
>>
>> A heterogenous container calls for boost::any. You would have to
>> write your own compare function, however, for ordering in the set
>> (do you really want uniqueness and if so, in what sense -- per type
>> or per value? if not, you might consider a sorted
>> std::vector<boost::any> and then the standard search functions).
>
> Maybe I didn't specify my requirements too precise. What I need is
> a "template-based" data structure where all stored elements are of the
> same type. The order of the elements in the data structure is
> irrelevant.

If you don't use the member notation (.find), you can use 'std::find'
with any container:

     std::blah<mytype> mycontainer; // replace 'blah' with 'vector'
                               // or 'list' or 'set' or 'deque' or ...
     std::find(mycontainer.begin(), mycontainer.end(), myvalue);

V
-- 
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask 


0
Reply v.Abazarov (13255) 8/15/2006 3:29:04 PM

Christian Christmann wrote:
> Maybe I didn't specify my requirements too precise. What I need is
> a "template-based" data structure where all stored elements are of the
> same type. The order of the elements in the data structure is
> irrelevant.

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Cheers! --M

0
Reply mlimber (2146) 8/15/2006 3:37:27 PM

Christian Christmann wrote:
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?

Are you seriously after a slower container? If you want linear time
O(n) then use a std::list and std::find. If you want a faster
container, O(log(n)), then use std::set. You wont find anything faster
than a std::set for general use.

Maybe you meant constant complexity, O(1)? You won't find a general
purpose datastructure that achieves this, but there are possibilities
depending on how you can represent the key. For that though you would
need to carefully describe the problem domain and exactly what the key
is and represents.


K

0
Reply kirit.saelensminde (164) 8/15/2006 3:39:15 PM

"Christian Christmann" <plfriko@yahoo.de> wrote in message 
news:pan.2006.08.15.13.56.54.191468@yahoo.de...
> Hi,
>
> I'm looking for a data structure where I can store
> arbitrary elements

What do you mean by "arbitrary"?  Sounds like you're talking about data of 
different types, but below you suggest they're the same type.

>  and than efficiently check if an
> element (of same type as the stored elements) is contained
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle
> it with low komplexity.
>
> My idea was to use a STL set and then do something like
>
> myset.find( element2Search ) != myset.end()
>
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?
>

Linear complexity is O(N), while logarithmic complexity os O(log N), which 
is faster!  Why would you want a _slower_ solution?

But maybe you're not using a sorted container?  If not, why not?  If seek 
time is that important, then you can gain speed by taking advantage of the 
ordering of the data.

Perhaps if you state more explicitly what your needs are (as far as 
inserting, deleting, and seeking are concerned), then someone can suggest 
the best STL container.

-Howard


0
Reply alicebt (1862) 8/15/2006 3:46:02 PM

Christian-

You're correct that the sorted associative containers have logarithmic
complexity operationally.  Why then, would you want to use a search
mechanism with linear complexity (which is slower on average)?

Christian Christmann wrote:
> Hi,
>
> I'm looking for a data structure where I can store
> arbitrary elements and than efficiently check if an
> element (of same type as the stored elements) is contained
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle
> it with low komplexity.
>
> My idea was to use a STL set and then do something like
>
> myset.find( element2Search ) != myset.end()
>
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?
> 
> Regards,
> Chris

0
Reply matt.sawyer (1) 8/15/2006 4:19:13 PM

In article <1155656246.940215.326140@b28g2000cwb.googlegroups.com>, 
mlimber@gmail.com says...

[ ... ]

> You could use any of the containers in the STL. Which one you choose
> will depend on how you will use it (e.g., will there be a lot of
> insertions or deletions, or just one populating on startup). If
> uniqueness is necessary, you probably want std::set. If not, you might
> consider std::multiset or std::vector (which you'd have to keep sorted
> yourself). You can't beat logarithmic complexity in searching (unless
> you already know the index into the data array, but then it's not
> really a "search").

Actually, you often can beat logarithmic. Hash tables have constant 
expected complexity. You can also use an interpolating search, which 
typically has substantially lower complexity than logarithmic as well.

A binary search ignores much of the information that's really available 
-- it blindly assumes that the best guess it can make at the location is 
the middle of the available range.

An interpolating search is much closer to the way most people would (for 
example) look something up in a dictionary. If you're looking up 'cat', 
you know you can start fairly close to the beginning. If you're looking 
up 'victory', you know you can start fairly close to the end. Your first 
attempt usually won't be the right page, but you can (again) usually 
make quite a bit better of a guess than simply the middle of the range.

Obviously, this _can_ backfire -- its worst-case complexity is quite 
poor. You can, however, do something like an Introsort, and switch to a 
plain binary search if you find that it's not working well for a 
particular search.

Note that an interpolating search requires a random acces iterator -- it 
generally doesn't work in something like a tree structure.

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.
0
Reply jcoffin (2240) 8/15/2006 4:36:18 PM

Besides hashes, you can beat logaritmic complexity in
searches depending on specifics of your data.
radix sort, bucket sort, counting sort are all linear. 

Tolga Ceylan

0
Reply tolgaceylanus (23) 8/15/2006 5:01:06 PM

Christian Christmann wrote:
> Hi,
>
> I'm looking for a data structure where I can store
> arbitrary elements and than efficiently check if an
> element (of same type as the stored elements) is contained
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle
> it with low komplexity.
>
> My idea was to use a STL set and then do something like
>
> myset.find( element2Search ) != myset.end()
>
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?
>
> Regards,
> Chris

I assume you meant constant complexity, rather than linear (logarithmic
is superior to linear).

It sounds like, from your description, the number of lookups
significantly dominates the number of insertions/deletions.  In that
sort of situation, using std::vector to hold your items and then either
upper_bound or binary_search (depending on whether you want to test for
existence or actually get an iterator to the item) will usually
outperform std::set/std::map by a good margin.

If logarithmic complexity really isn't good enough for you, then try
for tr1::unordered_set/tr1::unordered_map.  Or if your implementation
doesn't include tr1 yet, it may have some containers called hash_set or
hash_map.  All of these should give you expected constant time lookups.

-- 
Alan Johnson

0
Reply awjcs (194) 8/15/2006 5:15:25 PM

<tolgaceylanus@yahoo.com> wrote in message 
news:1155661266.310375.134820@m73g2000cwd.googlegroups.com...
>
> Besides hashes, you can beat logaritmic complexity in
> searches depending on specifics of your data.
> radix sort, bucket sort, counting sort are all linear.
>

Linear (O(N)) is worse that logarithmic (O(log N)) on average.  Perhaps 
you're thinking of O(N log N) when you say "logarithmic"?  Or perhaps you 
mean constant time (O(1)) when you say "linear"?

-Howard


0
Reply alicebt (1862) 8/15/2006 5:54:07 PM

Jerry Coffin wrote:
> In article <1155656246.940215.326140@b28g2000cwb.googlegroups.com>,
> mlimber@gmail.com says...
>
> [ ... ]
>
> > You could use any of the containers in the STL. Which one you choose
> > will depend on how you will use it (e.g., will there be a lot of
> > insertions or deletions, or just one populating on startup). If
> > uniqueness is necessary, you probably want std::set. If not, you might
> > consider std::multiset or std::vector (which you'd have to keep sorted
> > yourself). You can't beat logarithmic complexity in searching (unless
> > you already know the index into the data array, but then it's not
> > really a "search").
>
> Actually, you often can beat logarithmic. Hash tables have constant
> expected complexity.

Ok, I should have said: "You can't beat logarithmic complexity with the
current standard library." (Hashes are part of TR1, however.) In any
case, while hashes can certainly be helpful, their worst-case guarantee
O(n) is obviously worse than logarithmic performance. In addition,
creating a good hash function isn't a trivial task (a bad one can
severly degrade performance), and the computations required to evaluate
the hash function can be slow. Less theoretically, hashes can decrease
locality of reference, which may degrade performance on particular
systems. Suffice it to say, there are trade-offs involved in choosing
data structures and algorithms.

> You can also use an interpolating search, which
> typically has substantially lower complexity than logarithmic as well.
>
> A binary search ignores much of the information that's really available
> -- it blindly assumes that the best guess it can make at the location is
> the middle of the available range.
>
> An interpolating search is much closer to the way most people would (for
> example) look something up in a dictionary. If you're looking up 'cat',
> you know you can start fairly close to the beginning. If you're looking
> up 'victory', you know you can start fairly close to the end. Your first
> attempt usually won't be the right page, but you can (again) usually
> make quite a bit better of a guess than simply the middle of the range.
>
> Obviously, this _can_ backfire -- its worst-case complexity is quite
> poor. You can, however, do something like an Introsort, and switch to a
> plain binary search if you find that it's not working well for a
> particular search.
>
> Note that an interpolating search requires a random acces iterator -- it
> generally doesn't work in something like a tree structure.

You can beat logarithmic complexity in average complexity but, as far
as I know, not in worst-case complexity or with standard library
functions.

Cheers! --M

0
Reply mlimber (2146) 8/15/2006 6:20:49 PM


Oppps... "logarithmic" is the wrong word here. It should have been
N log2 N

By linear, I mean O(N).

Howard wrote:
>
> Linear (O(N)) is worse that logarithmic (O(log N)) on average.  Perhaps
> you're thinking of O(N log N) when you say "logarithmic"?  Or perhaps you
> mean constant time (O(1)) when you say "linear"?
> 
> -Howard

Thanks for the fix.

Tolga Ceylan

0
Reply tolgaceylanus (23) 8/15/2006 6:35:14 PM

Also, I meant "in sorting" not "in searching". My posting is not very
related
with the original question. I guess in this case hashes O(1) can beat
O(log N)
complexity of the sets for the 'lookup' operation.

(assuming sets are typically implemented with red-black trees.)

(Shouldn't have posted at all.. :-}} )

Tolga Ceylan

Howard wrote:
>
> Linear (O(N)) is worse that logarithmic (O(log N)) on average.  Perhaps
> you're thinking of O(N log N) when you say "logarithmic"?  Or perhaps you
> mean constant time (O(1)) when you say "linear"?
> 
> -Howard

0
Reply tolgaceylanus (23) 8/15/2006 6:41:12 PM

Christian Christmann wrote:
> Hi,
> 
> I'm looking for a data structure where I can store
> arbitrary elements and than efficiently check if an
> element (of same type as the stored elements) is contained 
> in the list. The latter operation is performed pretty
> frequently, so the data structure I require must handle 
> it with low komplexity.
> 
> My idea was to use a STL set and then do something like
> 
> myset.find( element2Search ) != myset.end()
> 
> For sorted associative containers the "find" function
> has a logarithmic complexity. Are there any approaches
> with linear complexity?
> 

Why would you want linear complexity when log is clearly faster?  Or did 
you mean constant time complexity?  The canonical data structure for 
this sort of problem is a hash table.  Unfortunately this is not (yet) 
part of standard C++ but it is a commonly provided extension.
0
Reply usenet134 (507) 8/15/2006 8:09:59 PM

tolgaceylanus@yahoo.com wrote:
> Howard wrote:
>> Linear (O(N)) is worse that logarithmic (O(log N)) on average.  Perhaps
>> you're thinking of O(N log N) when you say "logarithmic"?  Or perhaps you
>> mean constant time (O(1)) when you say "linear"?
> 
> Oppps... "logarithmic" is the wrong word here. It should have been
> N log2 N

Just a small thing:  O(N log N) == O(N log2 N).  This is because Big-O
does not care about constant factors.

    N log2 N == N (log N / log 2) == (1/log 2) * N log N

    1/log 2 is a constant so it can be dropped.

-- 
Marcus Kwok
Replace 'invalid' with 'net' to reply
0
Reply ricecake6398 (573) 8/15/2006 9:17:32 PM

In article <1155666048.971928.6030@m73g2000cwd.googlegroups.com>, 
mlimber@gmail.com says...

[ ... ]

> Ok, I should have said: "You can't beat logarithmic complexity with the
> current standard library." (Hashes are part of TR1, however.) In any
> case, while hashes can certainly be helpful, their worst-case guarantee
> O(n) is obviously worse than logarithmic performance.

While the hash tables included in TR1 may have O(N) complexity in the 
worst case, a hash table can be designed to provide logarithmic worst-
case complexity.

[ ... ]

> You can beat logarithmic complexity in average complexity but, as far
> as I know, not in worst-case complexity or with standard library
> functions.

That sounds reasonable to me. 

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.
0
Reply jcoffin (2240) 8/16/2006 12:44:00 AM

Marcus Kwok wrote:
>
> Just a small thing:  O(N log N) == O(N log2 N).  This is because Big-O
> does not care about constant factors.
> 
>     N log2 N == N (log N / log 2) == (1/log 2) * N log N
> 
>     1/log 2 is a constant so it can be dropped.

I suspect that O(N log2 N) actually means O(N log^2 N)
0
Reply no.spam9 (2339) 8/16/2006 5:40:13 AM

Jerry Coffin wrote:
> While the hash tables included in TR1 may have O(N) complexity in the
> worst case, a hash table can be designed to provide logarithmic worst-
> case complexity.

Right you are, though of course it comes as another trade-off.

Cheers! --M

0
Reply mlimber (2146) 8/16/2006 12:31:25 PM

red floyd <no.spam@here.dude> wrote:
> Marcus Kwok wrote:
>> Just a small thing:  O(N log N) == O(N log2 N).  This is because Big-O
>> does not care about constant factors.
>> 
>>     N log2 N == N (log N / log 2) == (1/log 2) * N log N
>> 
>>     1/log 2 is a constant so it can be dropped.
> 
> I suspect that O(N log2 N) actually means O(N log^2 N)

You could be right, I was just using the common convention that people
use to indicate the base.  E.g., log10 is base-10 logarithm, so log2
would be base-2 logarithm.

-- 
Marcus Kwok
Replace 'invalid' with 'net' to reply
0
Reply ricecake6398 (573) 8/16/2006 3:34:02 PM

23 Replies
26 Views

(page loaded in 0.657 seconds)


Reply: