Looking things up by string prefix

  • Follow


Hi all,

I want to store some things, each filed under a string key. I then want to 
look those things up with other strings, finding the thing whose key is 
the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

Am i right in thinking that there's nothing in the standard library that 
does this? I'm writing something that uses the NavigableMap methods on 
TreeMap to do it, but it's a bit grim, and not guaranteed to be as 
efficient as it could be.

Am i further right in thinking that there's nothing in Apache's Commons 
Collections or Google's Guava Collections that does this?

Is there any reasonable open-source library that does?

If i write this myself, is there a data structure better than a Patricia 
tree for it?

Cheers,
tom

-- 
Rapid oxidation is the new black. -- some Mike
0
Reply twic (2083) 7/2/2010 3:42:31 AM

In article <alpine.DEB.1.10.1007020423470.29330@urchin.earth.li>,
 Tom Anderson <twic@urchin.earth.li> wrote:

> I want to store some things, each filed under a string key. I then want to 
> look those things up with other strings, finding the thing whose key is 
> the longest available prefix of the lookup string.
> 
> So, if i store:
> 
> /products -> foo
> /products/fruit -> bar
> 
> Then lookups go like:
> 
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar
> 
> Am i right in thinking that there's nothing in the standard library that 
> does this? I'm writing something that uses the NavigableMap methods on 
> TreeMap to do it, but it's a bit grim, and not guaranteed to be as 
> efficient as it could be.
> 
> Am i further right in thinking that there's nothing in Apache's Commons 
> Collections or Google's Guava Collections that does this?
> 
> Is there any reasonable open-source library that does?

Have you seen this?

<http://code.google.com/p/patricia-trie/>

> If i write this myself, is there a data structure better than a 
> Patricia tree for it?

-- 
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
0
Reply John 7/2/2010 4:25:12 AM


Tom Anderson wrote:

> Is there any reasonable open-source library that does?


I think this Radix Tree does what you want (it's the same thing as a 
Patricia tree):

http://code.google.com/p/radixtree/
0
Reply markspace 7/2/2010 4:27:46 AM

On 7/1/2010 11:42 PM, Tom Anderson wrote:
> Hi all,
>
> I want to store some things, each filed under a string key. I then want
> to look those things up with other strings, finding the thing whose key
> is the longest available prefix of the lookup string.
>
> So, if i store:
>
> /products -> foo
> /products/fruit -> bar
>
> Then lookups go like:
>
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar

     If you add "/products/fuzz -> baz" to the collection, what
should a lookup on "/products/f" return?

> If i write this myself, is there a data structure better than a Patricia
> tree for it?

     The scale of the problem depends on whether a key's "words" must
match in their entirety or only partially.  If it's a "word-by-word"
discipline (so /products/f yields foo), I think an ordinary multi-way
tree would be fine; no need for anything as fancy as Patricia.  If
/products/f matches both /products/fruit and /products/fuzz, something
trickier may be needed (at the least, you've got to decide what to do
about the ambiguity).

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 7/2/2010 12:49:41 PM

On Fri, 2 Jul 2010 04:42:31 +0100, Tom Anderson <twic@urchin.earth.li>
wrote, quoted or indirectly quoted someone who said :

>I want to store some things, each filed under a string key. I then want to 
>look those things up with other strings, finding the thing whose key is 
>the longest available prefix of the lookup string.

here are three approaches:

1. use SQL

2. use a TreeMap. Find entry before and after the lookup and analyse.

3. use a HashMap.  Keep chopping the key till you get a match on
lookup.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com

There is no harm in being sometimes wrong especially if one is promptly found out. 
~ John Maynard Keynes (born: 1883-06-05 died: 1946-04-21 at age: 62)
0
Reply Roedy 7/4/2010 10:16:13 AM

Tom Anderson <twic@urchin.earth.li> wrote:
> I want to store some things, each filed under a string key. I then want to 
> look those things up with other strings, finding the thing whose key is 
> the longest available prefix of the lookup string.
> So, if i store:
> /products -> foo
> /products/fruit -> bar
> Then lookups go like:
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar

My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.

The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>> 

To add a pair: put it into the hashmap indexed by key.length()
 in the ArrayList (extending the ArrayList and creating a new
 HashMap as necessary)

To lookup a string:
 reverse-iterate the ArrayList from .size()-1 downto 1: for each
 index i that has a non-empty HashMap stored in the ArrayList,
 obtain  lookupString.substring(0,i) and look it up in that
 non-empty HashMap stored at i.

If the key lengths of your keys are very sparsely distributed,
replace the ArrayList<?> with TreeMap<Integer,HashMap<String,Thing>>

PS: I think, there is no such Collection ready made in Java standard
library. I have no idea whether it exists in apache's or other class
libraries.  I don't know Patricia Trees, either, so cannot answer
those direct questions.

0
Reply Andreas 7/4/2010 1:06:16 PM

On Fri, 2 Jul 2010, John B. Matthews wrote:

> In article <alpine.DEB.1.10.1007020423470.29330@urchin.earth.li>,
> Tom Anderson <twic@urchin.earth.li> wrote:
>
>> I want to store some things, each filed under a string key. I then want to
>> look those things up with other strings, finding the thing whose key is
>> the longest available prefix of the lookup string.
>>
>> So, if i store:
>>
>> /products -> foo
>> /products/fruit -> bar
>>
>> Then lookups go like:
>>
>> /products/furniture/chairs -> foo
>> /products/fruit/bananas -> bar
>>
>> Am i right in thinking that there's nothing in the standard library that
>> does this? I'm writing something that uses the NavigableMap methods on
>> TreeMap to do it, but it's a bit grim, and not guaranteed to be as
>> efficient as it could be.
>>
>> Am i further right in thinking that there's nothing in Apache's Commons
>> Collections or Google's Guava Collections that does this?
>>
>> Is there any reasonable open-source library that does?
>
> Have you seen this?
>
> <http://code.google.com/p/patricia-trie/>

At the point at which i posted, i had not. I stumbled across it a bit 
later and kicked myself. I think it should be pretty easy to add the 
method i want to that.

tom

-- 
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
0
Reply Tom 7/4/2010 4:47:27 PM

On Thu, 1 Jul 2010, markspace wrote:

> Tom Anderson wrote:
>
>> Is there any reasonable open-source library that does?
>
> I think this Radix Tree does what you want (it's the same thing as a 
> Patricia tree):
>
> http://code.google.com/p/radixtree/

It could certainly be modified to do what i want, yes. Thanks for finding 
that.

It's a shame his RadixTree doesn't implement the standard Map interface; i 
don't particularly need it to, but it would be a more well-rounded class 
if it did.

tom

-- 
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
0
Reply Tom 7/4/2010 4:55:29 PM

On Fri, 2 Jul 2010, Eric Sosman wrote:

> On 7/1/2010 11:42 PM, Tom Anderson wrote:
>
>> I want to store some things, each filed under a string key. I then want
>> to look those things up with other strings, finding the thing whose key
>> is the longest available prefix of the lookup string.
>> 
>> So, if i store:
>> 
>> /products -> foo
>> /products/fruit -> bar
>> 
>> Then lookups go like:
>> 
>> /products/furniture/chairs -> foo
>> /products/fruit/bananas -> bar
>
>    If you add "/products/fuzz -> baz" to the collection, what
> should a lookup on "/products/f" return?

foo. Neither of the other keys is a prefix of the search term.

>> If i write this myself, is there a data structure better than a Patricia
>> tree for it?
>
>    The scale of the problem depends on whether a key's "words" must 
> match in their entirety or only partially.  If it's a "word-by-word" 
> discipline (so /products/f yields foo), I think an ordinary multi-way 
> tree would be fine; no need for anything as fancy as Patricia.  If 
> /products/f matches both /products/fruit and /products/fuzz, something 
> trickier may be needed (at the least, you've got to decide what to do 
> about the ambiguity).

I evidently haven't explained this clearly. A lookup with /products/f 
would never match either /products/fruit or /products/fuzz as entries, 
because neither is a prefix of the lookup term.

What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i 
set up a mapping for /user/editProfile/*, then i want to catch requests 
for /user/editProfile/jim, /user/editProfile/bob, etc.

tom

-- 
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
0
Reply Tom 7/4/2010 4:59:03 PM

On Sun, 4 Jul 2010, Andreas Leitgeb wrote:

> Tom Anderson <twic@urchin.earth.li> wrote:
>
>> I want to store some things, each filed under a string key. I then want to
>> look those things up with other strings, finding the thing whose key is
>> the longest available prefix of the lookup string.
>> So, if i store:
>> /products -> foo
>> /products/fruit -> bar
>> Then lookups go like:
>> /products/furniture/chairs -> foo
>> /products/fruit/bananas -> bar
>
> My approach would be, to partition the keys by length, first,
> and for each length maintain a separate HashMap.
>
> The structure to hold the key-thing pairs:
> ArrayList<HashMap<String,Thing>>
>
> To add a pair: put it into the hashmap indexed by key.length()
> in the ArrayList (extending the ArrayList and creating a new
> HashMap as necessary)
>
> To lookup a string:
> reverse-iterate the ArrayList from .size()-1 downto 1: for each
> index i that has a non-empty HashMap stored in the ArrayList,
> obtain  lookupString.substring(0,i) and look it up in that
> non-empty HashMap stored at i.

Interesting, thanks. That would certainly work. At first glance, it seems 
a bit clunky, but you only need to to do lookups for the lengths that have 
keys, and each one is a hashtable lookup, so it should be quite quick.

For large sets of keys, or rather for sets of keys with many different 
lengths, i suspect it wouldn't be very fast, but my key set is likely to 
be quite small.

tom

-- 
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
0
Reply Tom 7/4/2010 5:01:39 PM

On 7/4/2010 12:59 PM, Tom Anderson wrote:
> On Fri, 2 Jul 2010, Eric Sosman wrote:
>
>> On 7/1/2010 11:42 PM, Tom Anderson wrote:
>>
>>> I want to store some things, each filed under a string key. I then want
>>> to look those things up with other strings, finding the thing whose key
>>> is the longest available prefix of the lookup string.
>>>
>>> So, if i store:
>>>
>>> /products -> foo
>>> /products/fruit -> bar
>>>
>>> Then lookups go like:
>>>
>>> /products/furniture/chairs -> foo
>>> /products/fruit/bananas -> bar
>>
>> If you add "/products/fuzz -> baz" to the collection, what
>> should a lookup on "/products/f" return?
>
> foo. Neither of the other keys is a prefix of the search term.
>
>>> If i write this myself, is there a data structure better than a Patricia
>>> tree for it?
>>
>> The scale of the problem depends on whether a key's "words" must match
>> in their entirety or only partially. If it's a "word-by-word"
>> discipline (so /products/f yields foo), I think an ordinary multi-way
>> tree would be fine; no need for anything as fancy as Patricia. If
>> /products/f matches both /products/fruit and /products/fuzz, something
>> trickier may be needed (at the least, you've got to decide what to do
>> about the ambiguity).
>
> I evidently haven't explained this clearly. A lookup with /products/f
> would never match either /products/fruit or /products/fuzz as entries,
> because neither is a prefix of the lookup term.
>
> What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
> set up a mapping for /user/editProfile/*, then i want to catch requests
> for /user/editProfile/jim, /user/editProfile/bob, etc.

     Okay.  I don't know what's already out there and ready-written,
but if I were going to write such a thing from scratch I'd probably
come up with something like (just typed in; not compiled or tested)

	class Node {

	    /** Mapping if we match this far and no further */
	    private final Target target;

	    /** Links to deeper Nodes. */
	    private final Map<String,Node> links;

	    Node(Target target) {
	        this.target = target;
	        this.links = new HashMap<String,Node>();
	    }

	    void setLink(String key, Node dest) {
	        links.put(key, dest);
	    }

	    Node getLink(String key) {
	        return links.get(key);
	    }

	    Target match(Iterator<String> keys) {
	        if (! keys.hasNext())
	            return null;
	        Node next = getLink(keys.next());
	        Target t = (next == null) ? target : next.match(keys);
	        return (t == null) ? target : t;
	    }
	}

The internal Map might be something lighter-weight if the "degrees"
of the Nodes are expected to be small, but that's the idea.  It's a
lot like traversing a directory tree, with the added fillip of a
default value for incomplete matches.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 7/4/2010 5:45:20 PM

Tom Anderson <twic@urchin.earth.li> wrote:
>> To lookup a string:
>> reverse-iterate the ArrayList from .size()-1 downto 1: for each
>> index i that has a non-empty HashMap stored in the ArrayList,
>> obtain  lookupString.substring(0,i) and look it up in that
>> non-empty HashMap stored at i.
>
> Interesting, thanks. That would certainly work. At first glance, it seems 
> a bit clunky, but you only need to to do lookups for the lengths that have 
> keys, and each one is a hashtable lookup, so it should be quite quick.

If your set of keys is going to be small enough, you can get by with a
more lightweight variant:
  maintain an int[longestKey.length()] to count each length's key-count
and keep all the key-value pairs together in one HashMap.

Only obtain lookupString.substring(0,i) for each i whose array-element
is non-zero. For those, however, you'll search the complete HashMap, 
which may be slightly less performant than the original.

0
Reply Andreas 7/4/2010 8:04:48 PM

On 04.07.2010 22:04, Andreas Leitgeb wrote:
> Tom Anderson<twic@urchin.earth.li>  wrote:
>>> To lookup a string:
>>> reverse-iterate the ArrayList from .size()-1 downto 1: for each
>>> index i that has a non-empty HashMap stored in the ArrayList,
>>> obtain  lookupString.substring(0,i) and look it up in that
>>> non-empty HashMap stored at i.
>>
>> Interesting, thanks. That would certainly work. At first glance, it seems
>> a bit clunky, but you only need to to do lookups for the lengths that have
>> keys, and each one is a hashtable lookup, so it should be quite quick.
>
> If your set of keys is going to be small enough, you can get by with a
> more lightweight variant:

Hmm, if the data set is small enough you can get by with a wide range of 
solutions - even linear search in a list or array may be fast enough. 
This problem however is naturally implemented with some Trie variant. 
That's the data structure that fits the problem best IMHO.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
0
Reply Robert 7/4/2010 8:34:33 PM

12 Replies
108 Views

(page loaded in 0.206 seconds)

Similiar Articles:


















7/23/2012 6:24:31 PM


Reply: