Jim,
Since i spouted off on cljp about how key-value stores were better than
RDBMs, i thought i ought to put my money where my mouth is. I'm going to
put together a simple demo/benchmark for storing key-value data,
implemented on top of Tokyo Cabinet and JDBC, which will hopefully
demonstrate that TC is easier to use and faster than a database (i can try
Derby, H2 and a leading commercial database whose license prohibits the
posting of performance comparisons, and so which will have to go nameless,
lest the bearded, megalomaniacal CEO of its manufacturer get upset).
So, it would be good to know what sort of problem i'm actually trying to
show that KV stores are good for. Jim, you said:
> I need to maintain a data base of small text snippets keyed by arbitrary
> strings, without the overhead of a full SQL relational database. We
> will have several people putting data into it so it needs to support
> concurrent access over a network.
Could i trouble you to expand a little on sizing? In particular:
- How many entries are there?
- How big are the keys? What sort of things are they?
- How big are the values?
- What's the workload mix? Read vs write vs delete?
- Are writes mostly of new entries, or overwrites of existing ones?
- How skewed is the access towards a few hot entries?
- How many users are using it in parallel?
- What is the request rate like?
- How much RAM can be used? How much disk space?
Answers to any of those, no matter how rough, would be useful.
I'll post to cljp if/when i get this done.
tom
--
limited to concepts that are meta, generic, abstract and philosophical --
IEEE Standard Upper Ontology Working Group
|
|
0
|
|
|
|
Reply
|
twic (2083)
|
4/2/2010 6:27:45 PM |
|
Tom Anderson <twic@urchin.earth.li> writes:
> Jim,
>
> Since i spouted off on cljp about how key-value stores were better
> than RDBMs, i thought i ought to put my money where my mouth is. I'm
> going to put together a simple demo/benchmark for storing key-value
> data, implemented on top of Tokyo Cabinet and JDBC, which will
> hopefully demonstrate that TC is easier to use and faster than a
> database (i can try Derby, H2 and a leading commercial database whose
> license prohibits the posting of performance comparisons, and so which
> will have to go nameless, lest the bearded, megalomaniacal CEO of its
> manufacturer get upset).
>
> So, it would be good to know what sort of problem i'm actually trying
> to show that KV stores are good for. Jim, you said:
>
>> I need to maintain a data base of small text snippets keyed by arbitrary
>> strings, without the overhead of a full SQL relational database. We
>> will have several people putting data into it so it needs to support
>> concurrent access over a network.
>
> Could i trouble you to expand a little on sizing? In particular:
>
> - How many entries are there?
> - How big are the keys? What sort of things are they?
> - How big are the values?
> - What's the workload mix? Read vs write vs delete?
> - Are writes mostly of new entries, or overwrites of existing ones?
> - How skewed is the access towards a few hot entries?
> - How many users are using it in parallel?
> - What is the request rate like?
> - How much RAM can be used? How much disk space?
>
> Answers to any of those, no matter how rough, would be useful.
>
> I'll post to cljp if/when i get this done.
>
> tom
We want our customers to be able to add their own annotations to
fields on our application's screens, sort of like sticky notes. (This
is why I was fooling with tooltips earlier). Which means that every
time the app displays a screen it needs to check for an annotation on
pretty much every field on the screen. Most of the time there won't
be any, or only very few, but it has to check anyway, and we have some
very complicated screens with nested tabbed panes. Many fields appear
in more than one screen. For fields tied directly to a column in a
database table the key is the table name plus the column name. For
derived fields we assign a more or less arbitrary key; these can't be
allowed to change or the sticky notes get mixed up.
The app is in Java, started through JNLP, communicates with an AS/400
through a LAN (each customer has its own). Some customers have
branches attached through slow connections.
So:
Number of entries probably < 1000 (but who knows what people will do?)
Keys are strings, 20 to 30 characters.
Values are strings, maybe 50 to 500 characters.
Frequent reads, occasional writes, very few deletes.
Maybe 5 to 20 users.
Every time someone displays a screen, 20 to 100 reads need to happen
in well under a second. Most of these will be misses.
As I said in another message, the real problem here is probably
network latency, and switching databases won't help that. I think
I'll have to abandon the brute force approach and try something more
clever.
--
Jim Janney
|
|
0
|
|
|
|
Reply
|
Jim
|
4/3/2010 3:20:26 AM
|
|
On Fri, 2 Apr 2010, Jim Janney wrote:
> So:
> Number of entries probably < 1000 (but who knows what people will do?)
> Keys are strings, 20 to 30 characters.
> Values are strings, maybe 50 to 500 characters.
> Frequent reads, occasional writes, very few deletes.
> Maybe 5 to 20 users.
> Every time someone displays a screen, 20 to 100 reads need to happen
> in well under a second. Most of these will be misses.
My gut feeling is that this is not a big ask, and that an RDBMS will be
able to deal with this without any trouble. The first performance-related
thing i'd do (possibly even upfront, before i had hard data that
performance was actually a problem) would be to batch the reads: collect
together all the keys you're going to need, send them over the wire all at
once, and have the far end do a query like:
SELECT key, value
FROM annotation
WHERE key IN (?, ?, ?, ?)
Plugging each key into one of the parameters in the IN-set.
There's a question over how many parameters you put in that IN-set, and
what you do if your number of keys is different. I think i'd construct the
PreparedStatement on the fly with exactly the right number of parameters,
so the question is moot. Since there will only be a small number of
different numbers of parameters (not more than one per form in your app),
the database's PreparedStatement cache should happily hold all of them,
and you won't be incurring a lot of preparation overhead.
> As I said in another message, the real problem here is probably network
> latency, and switching databases won't help that. I think I'll have to
> abandon the brute force approach and try something more clever.
Probably. Robert Klemme's suggestion of building a toy version to get some
data on the performance numbers is an excellent one; in my part of the
world, we call this a 'spike':
http://www.extremeprogramming.org/rules/spike.html
tom
--
It's never too late to change the future.
|
|
0
|
|
|
|
Reply
|
Tom
|
4/3/2010 2:00:08 PM
|
|
Tom Anderson <twic@urchin.earth.li> writes:
> On Fri, 2 Apr 2010, Jim Janney wrote:
>
>> So:
>> Number of entries probably < 1000 (but who knows what people will do?)
>> Keys are strings, 20 to 30 characters.
>> Values are strings, maybe 50 to 500 characters.
>> Frequent reads, occasional writes, very few deletes.
>> Maybe 5 to 20 users.
>> Every time someone displays a screen, 20 to 100 reads need to happen
>> in well under a second. Most of these will be misses.
>
> My gut feeling is that this is not a big ask, and that an RDBMS will
> be able to deal with this without any trouble. The first
> performance-related thing i'd do (possibly even upfront, before i had
> hard data that performance was actually a problem) would be to batch
> the reads: collect together all the keys you're going to need, send
> them over the wire all at once, and have the far end do a query like:
>
> SELECT key, value
> FROM annotation
> WHERE key IN (?, ?, ?, ?)
>
> Plugging each key into one of the parameters in the IN-set.
>
> There's a question over how many parameters you put in that IN-set,
> and what you do if your number of keys is different. I think i'd
> construct the PreparedStatement on the fly with exactly the right
> number of parameters, so the question is moot. Since there will only
> be a small number of different numbers of parameters (not more than
> one per form in your app), the database's PreparedStatement cache
> should happily hold all of them, and you won't be incurring a lot of
> preparation overhead.
>
>> As I said in another message, the real problem here is probably
>> network latency, and switching databases won't help that. I think
>> I'll have to abandon the brute force approach and try something more
>> clever.
>
> Probably. Robert Klemme's suggestion of building a toy version to get
> some data on the performance numbers is an excellent one; in my part
> of the world, we call this a 'spike':
>
> http://www.extremeprogramming.org/rules/spike.html
It occurred to me this morning that if I put a timestamp in each
record, I could
SELECT * FROM annotation WHERE timestamp > ?
to find any changes not in the local cache. Most of the time that
would return nothing, presumably fairly quickly. It would miss
deletes, but that's probably acceptable.
--
Jim Janney
|
|
0
|
|
|
|
Reply
|
Jim
|
4/3/2010 2:50:01 PM
|
|
On Sat, 3 Apr 2010, Jim Janney wrote:
> Tom Anderson <twic@urchin.earth.li> writes:
>
>> On Fri, 2 Apr 2010, Jim Janney wrote:
>>
>>> So:
>>> Number of entries probably < 1000 (but who knows what people will do?)
>>> Keys are strings, 20 to 30 characters.
>>> Values are strings, maybe 50 to 500 characters.
>>> Frequent reads, occasional writes, very few deletes.
>>> Maybe 5 to 20 users.
>>> Every time someone displays a screen, 20 to 100 reads need to happen
>>> in well under a second. Most of these will be misses.
>
> It occurred to me this morning that if I put a timestamp in each
> record, I could
>
> SELECT * FROM annotation WHERE timestamp > ?
>
> to find any changes not in the local cache. Most of the time that would
> return nothing, presumably fairly quickly. It would miss deletes, but
> that's probably acceptable.
Good idea. And you could do deletes by setting the value to NULL and
leaving the record in the database.
This starts to sound like what you really want is some sort of file
replication, though.
tom
--
Civis Britannicus sum.
|
|
0
|
|
|
|
Reply
|
Tom
|
4/5/2010 12:05:11 AM
|
|
|
4 Replies
162 Views
(page loaded in 0.135 seconds)
Similiar Articles:7/27/2012 12:12:23 AM
|