Hi all,
I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.
So, if i store:
/products -> foo
/products/fruit -> bar
Then lookups go like:
/products/furniture/chairs -> foo
/products/fruit/bananas -> bar
Am i right in thinking that there's nothing in the standard library that
does this? I'm writing something that uses the NavigableMap methods on
TreeMap to do it, but it's a bit grim, and not guaranteed to be as
efficient as it could be.
Am i further right in thinking that there's nothing in Apache's Commons
Collections or Google's Guava Collections that does this?
Is there any reasonable open-source library that does?
If i write this myself, is there a data structure better than a Patricia
tree for it?
Cheers,
tom
--
Rapid oxidation is the new black. -- some Mike
|
|
0
|
|
|
|
Reply
|
twic (2083)
|
7/2/2010 3:42:31 AM |
|
In article <alpine.DEB.1.10.1007020423470.29330@urchin.earth.li>,
Tom Anderson <twic@urchin.earth.li> wrote:
> I want to store some things, each filed under a string key. I then want to
> look those things up with other strings, finding the thing whose key is
> the longest available prefix of the lookup string.
>
> So, if i store:
>
> /products -> foo
> /products/fruit -> bar
>
> Then lookups go like:
>
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar
>
> Am i right in thinking that there's nothing in the standard library that
> does this? I'm writing something that uses the NavigableMap methods on
> TreeMap to do it, but it's a bit grim, and not guaranteed to be as
> efficient as it could be.
>
> Am i further right in thinking that there's nothing in Apache's Commons
> Collections or Google's Guava Collections that does this?
>
> Is there any reasonable open-source library that does?
Have you seen this?
<http://code.google.com/p/patricia-trie/>
> If i write this myself, is there a data structure better than a
> Patricia tree for it?
--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
|
|
0
|
|
|
|
Reply
|
John
|
7/2/2010 4:25:12 AM
|
|
Tom Anderson wrote:
> Is there any reasonable open-source library that does?
I think this Radix Tree does what you want (it's the same thing as a
Patricia tree):
http://code.google.com/p/radixtree/
|
|
0
|
|
|
|
Reply
|
markspace
|
7/2/2010 4:27:46 AM
|
|
On 7/1/2010 11:42 PM, Tom Anderson wrote:
> Hi all,
>
> I want to store some things, each filed under a string key. I then want
> to look those things up with other strings, finding the thing whose key
> is the longest available prefix of the lookup string.
>
> So, if i store:
>
> /products -> foo
> /products/fruit -> bar
>
> Then lookups go like:
>
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar
If you add "/products/fuzz -> baz" to the collection, what
should a lookup on "/products/f" return?
> If i write this myself, is there a data structure better than a Patricia
> tree for it?
The scale of the problem depends on whether a key's "words" must
match in their entirety or only partially. If it's a "word-by-word"
discipline (so /products/f yields foo), I think an ordinary multi-way
tree would be fine; no need for anything as fancy as Patricia. If
/products/f matches both /products/fruit and /products/fuzz, something
trickier may be needed (at the least, you've got to decide what to do
about the ambiguity).
--
Eric Sosman
esosman@ieee-dot-org.invalid
|
|
0
|
|
|
|
Reply
|
Eric
|
7/2/2010 12:49:41 PM
|
|
On Fri, 2 Jul 2010 04:42:31 +0100, Tom Anderson <twic@urchin.earth.li>
wrote, quoted or indirectly quoted someone who said :
>I want to store some things, each filed under a string key. I then want to
>look those things up with other strings, finding the thing whose key is
>the longest available prefix of the lookup string.
here are three approaches:
1. use SQL
2. use a TreeMap. Find entry before and after the lookup and analyse.
3. use a HashMap. Keep chopping the key till you get a match on
lookup.
--
Roedy Green Canadian Mind Products
http://mindprod.com
There is no harm in being sometimes wrong especially if one is promptly found out.
~ John Maynard Keynes (born: 1883-06-05 died: 1946-04-21 at age: 62)
|
|
0
|
|
|
|
Reply
|
Roedy
|
7/4/2010 10:16:13 AM
|
|
Tom Anderson <twic@urchin.earth.li> wrote:
> I want to store some things, each filed under a string key. I then want to
> look those things up with other strings, finding the thing whose key is
> the longest available prefix of the lookup string.
> So, if i store:
> /products -> foo
> /products/fruit -> bar
> Then lookups go like:
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar
My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.
The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>>
To add a pair: put it into the hashmap indexed by key.length()
in the ArrayList (extending the ArrayList and creating a new
HashMap as necessary)
To lookup a string:
reverse-iterate the ArrayList from .size()-1 downto 1: for each
index i that has a non-empty HashMap stored in the ArrayList,
obtain lookupString.substring(0,i) and look it up in that
non-empty HashMap stored at i.
If the key lengths of your keys are very sparsely distributed,
replace the ArrayList<?> with TreeMap<Integer,HashMap<String,Thing>>
PS: I think, there is no such Collection ready made in Java standard
library. I have no idea whether it exists in apache's or other class
libraries. I don't know Patricia Trees, either, so cannot answer
those direct questions.
|
|
0
|
|
|
|
Reply
|
Andreas
|
7/4/2010 1:06:16 PM
|
|
On Fri, 2 Jul 2010, John B. Matthews wrote:
> In article <alpine.DEB.1.10.1007020423470.29330@urchin.earth.li>,
> Tom Anderson <twic@urchin.earth.li> wrote:
>
>> I want to store some things, each filed under a string key. I then want to
>> look those things up with other strings, finding the thing whose key is
>> the longest available prefix of the lookup string.
>>
>> So, if i store:
>>
>> /products -> foo
>> /products/fruit -> bar
>>
>> Then lookups go like:
>>
>> /products/furniture/chairs -> foo
>> /products/fruit/bananas -> bar
>>
>> Am i right in thinking that there's nothing in the standard library that
>> does this? I'm writing something that uses the NavigableMap methods on
>> TreeMap to do it, but it's a bit grim, and not guaranteed to be as
>> efficient as it could be.
>>
>> Am i further right in thinking that there's nothing in Apache's Commons
>> Collections or Google's Guava Collections that does this?
>>
>> Is there any reasonable open-source library that does?
>
> Have you seen this?
>
> <http://code.google.com/p/patricia-trie/>
At the point at which i posted, i had not. I stumbled across it a bit
later and kicked myself. I think it should be pretty easy to add the
method i want to that.
tom
--
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
|
|
0
|
|
|
|
Reply
|
Tom
|
7/4/2010 4:47:27 PM
|
|
On Thu, 1 Jul 2010, markspace wrote:
> Tom Anderson wrote:
>
>> Is there any reasonable open-source library that does?
>
> I think this Radix Tree does what you want (it's the same thing as a
> Patricia tree):
>
> http://code.google.com/p/radixtree/
It could certainly be modified to do what i want, yes. Thanks for finding
that.
It's a shame his RadixTree doesn't implement the standard Map interface; i
don't particularly need it to, but it would be a more well-rounded class
if it did.
tom
--
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
|
|
0
|
|
|
|
Reply
|
Tom
|
7/4/2010 4:55:29 PM
|
|
On Fri, 2 Jul 2010, Eric Sosman wrote:
> On 7/1/2010 11:42 PM, Tom Anderson wrote:
>
>> I want to store some things, each filed under a string key. I then want
>> to look those things up with other strings, finding the thing whose key
>> is the longest available prefix of the lookup string.
>>
>> So, if i store:
>>
>> /products -> foo
>> /products/fruit -> bar
>>
>> Then lookups go like:
>>
>> /products/furniture/chairs -> foo
>> /products/fruit/bananas -> bar
>
> If you add "/products/fuzz -> baz" to the collection, what
> should a lookup on "/products/f" return?
foo. Neither of the other keys is a prefix of the search term.
>> If i write this myself, is there a data structure better than a Patricia
>> tree for it?
>
> The scale of the problem depends on whether a key's "words" must
> match in their entirety or only partially. If it's a "word-by-word"
> discipline (so /products/f yields foo), I think an ordinary multi-way
> tree would be fine; no need for anything as fancy as Patricia. If
> /products/f matches both /products/fruit and /products/fuzz, something
> trickier may be needed (at the least, you've got to decide what to do
> about the ambiguity).
I evidently haven't explained this clearly. A lookup with /products/f
would never match either /products/fruit or /products/fuzz as entries,
because neither is a prefix of the lookup term.
What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
set up a mapping for /user/editProfile/*, then i want to catch requests
for /user/editProfile/jim, /user/editProfile/bob, etc.
tom
--
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
|
|
0
|
|
|
|
Reply
|
Tom
|
7/4/2010 4:59:03 PM
|
|
On Sun, 4 Jul 2010, Andreas Leitgeb wrote:
> Tom Anderson <twic@urchin.earth.li> wrote:
>
>> I want to store some things, each filed under a string key. I then want to
>> look those things up with other strings, finding the thing whose key is
>> the longest available prefix of the lookup string.
>> So, if i store:
>> /products -> foo
>> /products/fruit -> bar
>> Then lookups go like:
>> /products/furniture/chairs -> foo
>> /products/fruit/bananas -> bar
>
> My approach would be, to partition the keys by length, first,
> and for each length maintain a separate HashMap.
>
> The structure to hold the key-thing pairs:
> ArrayList<HashMap<String,Thing>>
>
> To add a pair: put it into the hashmap indexed by key.length()
> in the ArrayList (extending the ArrayList and creating a new
> HashMap as necessary)
>
> To lookup a string:
> reverse-iterate the ArrayList from .size()-1 downto 1: for each
> index i that has a non-empty HashMap stored in the ArrayList,
> obtain lookupString.substring(0,i) and look it up in that
> non-empty HashMap stored at i.
Interesting, thanks. That would certainly work. At first glance, it seems
a bit clunky, but you only need to to do lookups for the lengths that have
keys, and each one is a hashtable lookup, so it should be quite quick.
For large sets of keys, or rather for sets of keys with many different
lengths, i suspect it wouldn't be very fast, but my key set is likely to
be quite small.
tom
--
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
|
|
0
|
|
|
|
Reply
|
Tom
|
7/4/2010 5:01:39 PM
|
|
On 7/4/2010 12:59 PM, Tom Anderson wrote:
> On Fri, 2 Jul 2010, Eric Sosman wrote:
>
>> On 7/1/2010 11:42 PM, Tom Anderson wrote:
>>
>>> I want to store some things, each filed under a string key. I then want
>>> to look those things up with other strings, finding the thing whose key
>>> is the longest available prefix of the lookup string.
>>>
>>> So, if i store:
>>>
>>> /products -> foo
>>> /products/fruit -> bar
>>>
>>> Then lookups go like:
>>>
>>> /products/furniture/chairs -> foo
>>> /products/fruit/bananas -> bar
>>
>> If you add "/products/fuzz -> baz" to the collection, what
>> should a lookup on "/products/f" return?
>
> foo. Neither of the other keys is a prefix of the search term.
>
>>> If i write this myself, is there a data structure better than a Patricia
>>> tree for it?
>>
>> The scale of the problem depends on whether a key's "words" must match
>> in their entirety or only partially. If it's a "word-by-word"
>> discipline (so /products/f yields foo), I think an ordinary multi-way
>> tree would be fine; no need for anything as fancy as Patricia. If
>> /products/f matches both /products/fruit and /products/fuzz, something
>> trickier may be needed (at the least, you've got to decide what to do
>> about the ambiguity).
>
> I evidently haven't explained this clearly. A lookup with /products/f
> would never match either /products/fruit or /products/fuzz as entries,
> because neither is a prefix of the lookup term.
>
> What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
> set up a mapping for /user/editProfile/*, then i want to catch requests
> for /user/editProfile/jim, /user/editProfile/bob, etc.
Okay. I don't know what's already out there and ready-written,
but if I were going to write such a thing from scratch I'd probably
come up with something like (just typed in; not compiled or tested)
class Node {
/** Mapping if we match this far and no further */
private final Target target;
/** Links to deeper Nodes. */
private final Map<String,Node> links;
Node(Target target) {
this.target = target;
this.links = new HashMap<String,Node>();
}
void setLink(String key, Node dest) {
links.put(key, dest);
}
Node getLink(String key) {
return links.get(key);
}
Target match(Iterator<String> keys) {
if (! keys.hasNext())
return null;
Node next = getLink(keys.next());
Target t = (next == null) ? target : next.match(keys);
return (t == null) ? target : t;
}
}
The internal Map might be something lighter-weight if the "degrees"
of the Nodes are expected to be small, but that's the idea. It's a
lot like traversing a directory tree, with the added fillip of a
default value for incomplete matches.
--
Eric Sosman
esosman@ieee-dot-org.invalid
|
|
0
|
|
|
|
Reply
|
Eric
|
7/4/2010 5:45:20 PM
|
|
Tom Anderson <twic@urchin.earth.li> wrote:
>> To lookup a string:
>> reverse-iterate the ArrayList from .size()-1 downto 1: for each
>> index i that has a non-empty HashMap stored in the ArrayList,
>> obtain lookupString.substring(0,i) and look it up in that
>> non-empty HashMap stored at i.
>
> Interesting, thanks. That would certainly work. At first glance, it seems
> a bit clunky, but you only need to to do lookups for the lengths that have
> keys, and each one is a hashtable lookup, so it should be quite quick.
If your set of keys is going to be small enough, you can get by with a
more lightweight variant:
maintain an int[longestKey.length()] to count each length's key-count
and keep all the key-value pairs together in one HashMap.
Only obtain lookupString.substring(0,i) for each i whose array-element
is non-zero. For those, however, you'll search the complete HashMap,
which may be slightly less performant than the original.
|
|
0
|
|
|
|
Reply
|
Andreas
|
7/4/2010 8:04:48 PM
|
|
On 04.07.2010 22:04, Andreas Leitgeb wrote:
> Tom Anderson<twic@urchin.earth.li> wrote:
>>> To lookup a string:
>>> reverse-iterate the ArrayList from .size()-1 downto 1: for each
>>> index i that has a non-empty HashMap stored in the ArrayList,
>>> obtain lookupString.substring(0,i) and look it up in that
>>> non-empty HashMap stored at i.
>>
>> Interesting, thanks. That would certainly work. At first glance, it seems
>> a bit clunky, but you only need to to do lookups for the lengths that have
>> keys, and each one is a hashtable lookup, so it should be quite quick.
>
> If your set of keys is going to be small enough, you can get by with a
> more lightweight variant:
Hmm, if the data set is small enough you can get by with a wide range of
solutions - even linear search in a list or array may be fast enough.
This problem however is naturally implemented with some Trie variant.
That's the data structure that fits the problem best IMHO.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
|
|
0
|
|
|
|
Reply
|
Robert
|
7/4/2010 8:34:33 PM
|
|
|
12 Replies
108 Views
(page loaded in 0.206 seconds)
Similiar Articles: whats the use of "data16 nop" instruction generated by gas on x86 ...While looking at disassembled code generated by gas, found something like ... The AMD decoder can swallow up to N prefix bytes as part of each instruction, with no ... Some text processing questions - comp.lang.vhdlNote that write will allows you to build multiple things up into the string and that you need ... All of the aboe might all be off the beaten path of what you were looking ... Something bad happened to my gfortran installation - comp.lang ...... type(CPUID_TYPE) result character(16) string ... expression that is not a constant expression" Looking ... that reason it looked like I had really messed things up ... Bad use of stringstream temporary? - comp.lang.c++... ss<<x; return *this;} operator std::string() const ... Thank you; that clears things up nicely. On Mar 26, 2 ... I should say, I am very much looking forward to the new ... values for password field in /etc/shadow - comp.unix.solaris ...*NP* = No Password I am looking specifically for *SN* ... ... only one which is really special is "*LK*" (as prefix ... So the field is either: 1) An encrypted string 2) *LK* 3 ... PROC REPORT events for use with tagsets - comp.soft-sys.sas ...... tagset no longer picks it up correctly. To make things ... tagset- there are several things we need to do to format the strings ... > >Looking at the output from ... WordStar or CP/M patches for VT100 terminal - comp.os.cpm ...> So is there no possibility to set up a VTxxx terminal to use hardware ... VT100 will interpret any ESC as a command string prefix, so should never receive 1BH ... Get BIOS vendor/version/date and Machine serial/model? - comp.lang ...... else. > >Lots of thanks. >j Just scan the bios for ascii strings. Lots of interesting things show up. ... SMBIOS/DMI give me what I am looking for. Thank you a lot ... Automatically resize font when component size changes? - comp.lang ...> What I'm looking for is a way to drastically speed up this process so > I have the user resize the ... If you draw the time as one single string (e.g. "12:34:56") as ... VHDL-2002 vs VHDL-93 vs VHDL-87? - comp.lang.vhdlI've looked up google, but I can't seem to find a ... j) A predefined attribute that is a value and whose prefix ... lang.vhdl The "implicitly" declared function TO_STRING has ... how to put my pdf "file" (actually a byte array) into iText PDF ...I tried creating a String from my byte ... need iText's capability for a lot of things, not just this one item I'm coding up. ... Looking forward to getting my paws on the ... Automatically generating password for passwd - comp.sys.hp.hpux ...I'm now looking for one of the following: - a tool, which I can pass the password (or any string) and which will return the ... If you're having trouble coming up with a ... need help parsing PDF documents - comp.text.pdf... to extract text parts from within a PDF to build up ... I imagine that the text you are looking at is encoded with ... Each string is encoded according to the font in use for ... standar command line argument parser - comp.unix.programmer ...Anyways, the parser im looking for should be able to ... if the second value itself began w/ the option prefix (e ... apps.paint ... do I need to do to get it to bring up ... canvas, [font measure], and vertical text - comp.lang.tcl ...However, if I rotate the text by 90 degrees so that the string is vertical (using 8.6 code), the string now takes up about 70 pixels in height. Prefix - Wikipedia, the free encyclopediaThe word prefix is itself made up of the stem fix (meaning attach, in this case), and the prefix pre-(meaning "before"), both of which are derived from Latin roots. how to do IP prefix matching efficiently in Python?: pythonFor example if I prefix a string with @ then that string has a special meaning. ... This program is optimized by using dictionaries and lists for looking things up ... 7/23/2012 6:24:31 PM
|