Hi
Here is an implementation of CLOS-based file abstraction and an example
of its use to implement persistent binary search trees:
https://gist.github.com/2823562
I've implemented this in a couple of hours, so it's just a proof of
concept. But it demonstrates the idea:
I never liked to work with file streams, neither in C/C++ nor in CL:
they tend to bite you in all kinds of ways if you do something non-trivial.
I want to use file just like with array -- read element at certain
position, write element and so on.
SO here's an API I want to have:
(defgeneric read-elt (file position))
(defgeneric write-elt (file position elt))
Implementing it for files which consist of bytes is, of course, trivial.
Now having this API, we can implement more high-level kinds of files
which consists of records, not just bytes. (Some programming languages,
e.g. Pascal, implement files-of-records natively.)
When we can files which consist of records we can write more complex
data structures into them, like binary search trees. (A very basic
version is included as example of use.)
....
What's next? Now having persistent, updatable binary search trees we can
implement BDB-like database. And use it as underlying storage for
Elephant, for example. Or use it to implement a triple store.
Note that as file abstraction is based on CLOS objects, it's pretty much
trivial to add
* cache
* bufferization, read-ahead
* transactions
* replication
* storage which works over network
* distributed storage
Just make a class which implements file API while doing one or more
things above, and higher level abstractions would use it automatically.
....
I wonder why it wasn't done before. Or it was?
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/29/2012 10:34:24 AM |
|
* Alex Mizrahi <4fc4a632$0$287$14726298@news.sunsite.dk> :
Wrote on Tue, 29 May 2012 13:34:24 +0300:
| I wonder why it wasn't done before. Or it was?
I imagine the devil is in the details: handling variable size records;
then database issues of garbage collection, vacuuming, compaction; then
you'd want to mmap the file...
[I believe the first time I started rolling this out on my own, I
decided to use ffi to dbm instead, to "outsource" the db drudge work.
Most Recently I've seen `manardb' which provides the full mmapped
persistent CLOS database, ala allegrocache, (but I did not run it
because it depends on showstopper libraries.) Incidentally the README
indicated that allegrocache runs on SBCL, which came as a shock, re
Franz's strategies]
--- Madhu
|
|
0
|
|
|
|
Reply
|
enometh (826)
|
5/29/2012 4:22:19 PM
|
|
> | I wonder why it wasn't done before. Or it was?
> I imagine the devil is in the details: handling variable size records;
> then database issues of garbage collection, vacuuming, compaction;
I believe there is a simple sub-optimal solution to each of these issues.
Some people use in-memory, transaction-logging approaches like
prevalence and are happy with it. I really don't see how a binary
serialization based one can be inferior.
> then
> you'd want to mmap the file...
It's an optimization. It isn't necessary. Good thing about file
abstraction is that you can add use of mmap() later on implementation
which support it while still keeping pure CL version working.
(I don't think there is a big need for use of mmap() -- overhead of
doing a system call is minuscule compared to communicating with a SQL
server over a socket _which is a norm_, and moreover you can make it
even less via caching. If you absolutely want this, you can allocate a
really large cache -- then OS will page out things which are
infrequently used thus doing exactly same thing which mmap() does, but
portably.)
> [I believe the first time I started rolling this out on my own, I
> decided to use ffi to dbm instead, to "outsource" the db drudge work.
Well, we have a BDB backend in Elephant, but some people want a pure
Lisp backend.
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/29/2012 6:11:06 PM
|
|
Alex Mizrahi <alex.mizrahi@gmail.com> writes:
>> | I wonder why it wasn't done before. Or it was?
>
>> I imagine the devil is in the details: handling variable size records;
>> then database issues of garbage collection, vacuuming, compaction;
>
> I believe there is a simple sub-optimal solution to each of these issues.
> Some people use in-memory, transaction-logging approaches like
> prevalence and are happy with it. I really don't see how a binary
> serialization based one can be inferior.
Binary serialization can be more expensive in space.
Eg. if you serialize fixnums, you will use 4 bytes per fixnum. But most
fixnums are 0 and 1, so that'd use only two bytes in a textual
serialization.
You can compensate by using a compressing scheme, but this increments
complexity.
Another example: characters use up 24-bit or 32-bit nowadays. But most
characters are in the ASCII character set, and would require only one
byte in a textual (UTF-8) serialization.
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
|
|
0
|
|
|
|
Reply
|
pjb (7667)
|
5/29/2012 6:27:10 PM
|
|
> Binary serialization can be more expensive in space.
Well, yes. It depends on what data you work with.
> Eg. if you serialize fixnums, you will use 4 bytes per fixnum. But most
> fixnums are 0 and 1, so that'd use only two bytes in a textual
> serialization.
This isn't true. 0 and 1 might be a tiny bit more likely than other
numbers, but it's unlikely that majority of your dataset consists of
ones and zeros. (If this is the case, you should serialize it as a bitmap.)
Integers are often used to identify objects, i.e. each object is
assigned an integer id. If you work with 1000 objects median length of
decimal representation of an identifier is 3 digits, for 10000 objects
it is 4 digits and so on.
So if we compare 32-bit binary integers vs. ASCII, at 1000 objects you
have (approximately) break-even, at 10000 binary representation clearly
wins.
If you work with less than 1000 objects it probably doesn't matter what
you use for serialization, PRINT might be good enough.
Another typical use is a serialization of a graph, and you have exactly
same picture with it.
What else can you use integers for? Fixed point representation of
rational numbers, often used in finance. Well, I guess you'll have a lot
of zeros, but 1 isn't any more likely than other numbers.
Some databases might allow you to optimize storage of zeros by just not
storing them. E.g. if you use triple store you can wipe corresponding
triples.
If you actually need to store ones and zeros (but that's more like
boolean rather than integer) it's better to use bitmap, it will be 16
times more space-efficient than ASCII.
> Another example: characters use up 24-bit or 32-bit nowadays. But most
> characters are in the ASCII character set, and would require only one
> byte in a textual (UTF-8) serialization.
Storing individual characters is incredibly rare, IMO. And storing
strings in UTF-8 just makes sense no matter whether it is binary format
or plain text format.
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/30/2012 7:14:38 AM
|
|
>> I believe there is a simple sub-optimal solution to each of these issues.
>> Some people use in-memory, transaction-logging approaches like
>> prevalence and are happy with it. I really don't see how a binary
>> serialization based one can be inferior.
> Binary serialization can be more expensive in space.
It isn't really about space efficiency: I think a primary reason to use
a 'binary' database is to get reasonable indexing without a need to load
whole dataset into memory.
OTOH textual representation is usually good enough for stream
processing: it allows you to use simple and robust tools (sed, awk) and
simple compression (gzip).
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/30/2012 7:33:54 AM
|
|
> then you'd want to mmap the file...
Here's a very simple benchmark for non-mmap data access:
(defun ccc ()
(time
(loop with stream = (stream-of *ff*)
with seq = (make-array '(4)
:element-type '(unsigned-byte 32))
repeat 1000000
do (file-position stream 0)
sum (read-sequence seq stream))))
$ (ccc)
Evaluation took:
3.269 seconds of real time
3.250000 seconds of total run time (2.490000 user, 0.760000 system)
[ Run times consist of 0.010 seconds GC time, and 3.240 seconds
non-GC time. ]
99.42% CPU
5,231,154,728 processor cycles
79,986,688 bytes consed
4000000
So, 4000000*4/3.25 = 4923076 octets per second. For many application
that's probably enough.
So mmap is over-rated. People who actually need mmap probably work with
really huge scientific datasets.
For typical database workloads it's a better idea to implement
bufferization, read-ahead and caching, and this can be done portably.
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/30/2012 8:34:07 AM
|
|
* Alex Mizrahi <4fc5db80$0$287$14726298@news.sunsite.dk> :
Wrote on Wed, 30 May 2012 11:34:07 +0300:
| So mmap is over-rated. People who actually need mmap probably work
| with really huge scientific datasets.
Speed is not the reason to want to use mmap (I didn't understand it that
way at least).
In the context (persistent database for lisp objects), lisp objects have
to be serialized and deserialized, whereas one could just coerce a blob
of mmapped memory into a C++ object, by casting it, and then manipulate
the object in place. IMO The big win would be to avoid the binary
serialization layers, (the same principle wants one to avoid serializing
strings to utf-8), but I apparently do not possess enough-fu to
successfully hack gencgc to add support for this.
[I believe discussing swap/mmap issues with file are invariably address
a fake issue, because of the way Virtual Memory is implemented across
Linux (and other consumer OS) --- which swap file pages in and out
regardless of mmap/swap being used or turned off...]
| For typical database workloads it's a better idea to implement
| bufferization, read-ahead and caching, and this can be done portably.
In any case the seductive thinking process that suggests using mmap
follows the same parrern as when advocating kernel scheduled threads,
over other varieties of concurrency: "Do not think, thinking will only
hurt your [small] brain, just rely on the clever OS programmers to do
the right thing and just use our layers, entrench ... "
--- Madhu
|
|
0
|
|
|
|
Reply
|
enometh (826)
|
5/30/2012 2:39:08 PM
|
|
Madhu <enometh@meer.net> writes:
> [I believe discussing swap/mmap issues with file are invariably address
> a fake issue, because of the way Virtual Memory is implemented across
> Linux (and other consumer OS) --- which swap file pages in and out
> regardless of mmap/swap being used or turned off...]
The difference is that if you rely on the swap, you are doing twice the
I/O than if you use mmap:
swap:
- read the file into RAM.
- swap out the RAM into the swap partition
- swap in the swap partition into the RAM
- write the file from the RAM.
mmap:
- read the file into RAM (mmap).
- swap out the RAM into the file.
(of course, depending on mmap options, accesses and memory load).
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
|
|
0
|
|
|
|
Reply
|
pjb (7667)
|
5/30/2012 7:44:10 PM
|
|
> Speed is not the reason to want to use mmap (I didn't understand it that
> way at least).
I believe it's the main reason.
> In the context (persistent database for lisp objects), lisp objects have
> to be serialized and deserialized, whereas one could just coerce a blob
> of mmapped memory into a C++ object, by casting it, and then manipulate
> the object in place.
You can as well read blob using fread, cast it into C++ object,
manipulate it and fwrite it back. Only extra things you need is
fread/fwrite pair, and it is pretty much trivial.
Actually I once worked on a project where we needed to implement a
custom database in C++. First version used mmap as we saw it as a
simplest way to do it. But then it turned out that it's unreliable: we
could not allocate large block of continuous address space on some
machines, and using windowing would complicate everything.
It took maybe a couple of days to re-implement it using fread/fwrite, we
only had to change I/O related parts keeping interfaces and
datastructures the same. (But I guess our interface was
check-in/check-out friendly.) If we used fread/fwrite from the start it
would barely cost us anything.
> IMO The big win would be to avoid the binary
> serialization layers,
In C/C++ there is no difference -- you can always grab bytes of
structure object no matter whether you're going to use it with mmap or
file I/O.
In CL you, perhaps, can speed up development a bit by dumping raw data
structures, but then you lose portability; it's very well possible that
your 'database' will be readable only by a certain version of CL
implementation; and most likely all dev. time savings you gain by not
writing a serialization layer will be spent on reading documentation on
raw data formats and mmap-related operations. So it would be in fact
harder to implement, and also worse from compatibility standpoint. The
only potential gain is speed.
Even if you're afraid of serialization for some reason, you can try
using struct (:type vector) -- you can dump those things directly into
byte files. Need different base data types? Use different files. Simple, eh?
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/31/2012 1:51:02 AM
|
|
>> [I believe discussing swap/mmap issues with file are invariably address
>> a fake issue, because of the way Virtual Memory is implemented across
>> Linux (and other consumer OS) --- which swap file pages in and out
>> regardless of mmap/swap being used or turned off...]
> The difference is that if you rely on the swap, you are doing twice the
> I/O than if you use mmap:
Presumably data is accessed multiple times. You do "twice the I/O" only
during initial load. So it's not so much N vs 2*N as it is N vs N+1. For
large N difference is negligible :)
If application reads data only once use of swap makes no sense, I think.
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
5/31/2012 1:55:56 AM
|
|
> I imagine the devil is in the details: handling variable size records;
I've added an example of BST with string keys:
https://gist.github.com/2823562
Took me about a hour to implement.
I'm planning to do triple store next.
> then database issues of garbage collection, vacuuming, compaction;
All of this can be done using a copy operation which copies only live
data. Also works as a backup tool.
Online compaction is a luxury.
|
|
0
|
|
|
|
Reply
|
alex.mizrahi (228)
|
6/2/2012 1:56:36 PM
|
|
W dniu 2012-05-29 12:34, Alex Mizrahi pisze:
> (...)
> Note that as file abstraction is based on CLOS objects, it's pretty
> much trivial to add
>
> * cache
> * bufferization, read-ahead
> * transactions
> * replication
> * storage which works over network
> * distributed storage
>
> Just make a class which implements file API while doing one or more
> things above, and higher level abstractions would use it automatically.
> I wonder why it wasn't done before. Or it was?
From Python world... ZODB with FileStorage backend
seems similar in concept, but after addition of cache,
transactions, network protocol its not simple anymore.
http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html
|
|
0
|
|
|
|
Reply
|
piotr_chamera (24)
|
6/4/2012 5:20:44 PM
|
|
|
12 Replies
23 Views
(page loaded in 0.147 seconds)
|