generic files + persistent binary search trees

  • Follow


Hi

Here is an implementation of CLOS-based file abstraction and an example 
of its use to implement persistent binary search trees:

https://gist.github.com/2823562

I've implemented this in a couple of hours, so it's just a proof of 
concept. But it demonstrates the idea:

I never liked to work with file streams, neither in C/C++ nor in CL: 
they tend to bite you in all kinds of ways if you do something non-trivial.
I want to use file just like with array -- read element at certain 
position, write element and so on.
SO here's an API I want to have:

(defgeneric read-elt (file position))
(defgeneric write-elt (file position elt))

Implementing it for files which consist of bytes is, of course, trivial.

Now having this API, we can implement more high-level kinds of files 
which consists of records, not just bytes. (Some programming languages, 
e.g. Pascal, implement files-of-records natively.)

When we can files which consist of records we can write more complex 
data structures into them, like binary search trees. (A very basic 
version is included as example of use.)

....

What's next? Now having persistent, updatable binary search trees we can 
implement BDB-like database. And use it as underlying storage for 
Elephant, for example. Or use it to implement a triple store.

Note that as file abstraction is based on CLOS objects, it's pretty much 
trivial to add

  * cache
  * bufferization, read-ahead
  * transactions
  * replication
  * storage which works over network
  * distributed storage

Just make a class which implements file API while doing one or more 
things above, and higher level abstractions would use it automatically.

....

I wonder why it wasn't done before. Or it was?
0
Reply alex.mizrahi (228) 5/29/2012 10:34:24 AM

* Alex Mizrahi <4fc4a632$0$287$14726298@news.sunsite.dk> :
Wrote on Tue, 29 May 2012 13:34:24 +0300:

| I wonder why it wasn't done before. Or it was?

I imagine the devil is in the details: handling variable size records;
then database issues of garbage collection, vacuuming, compaction; then
you'd want to mmap the file...

[I believe the first time I started rolling this out on my own, I
 decided to use ffi to dbm instead, to "outsource" the db drudge work.

 Most Recently I've seen `manardb' which provides the full mmapped
 persistent CLOS database, ala allegrocache, (but I did not run it
 because it depends on showstopper libraries.)  Incidentally the README
 indicated that allegrocache runs on SBCL, which came as a shock, re
 Franz's strategies]

--- Madhu
0
Reply enometh (826) 5/29/2012 4:22:19 PM


> | I wonder why it wasn't done before. Or it was?

> I imagine the devil is in the details: handling variable size records;
> then database issues of garbage collection, vacuuming, compaction;

I believe there is a simple sub-optimal solution to each of these issues.
Some people use in-memory, transaction-logging approaches like 
prevalence and are happy with it. I really don't see how a binary 
serialization based one can be inferior.

> then
> you'd want to mmap the file...

It's an optimization. It isn't necessary. Good thing about file 
abstraction is that you can add use of mmap() later on implementation 
which support it while still keeping pure CL version working.

(I don't think there is a big need for use of mmap() -- overhead of 
doing a system call is minuscule compared to communicating with a SQL 
server over a socket _which is a norm_, and moreover you can make it 
even less via caching. If you absolutely want this, you can allocate a 
really large cache -- then OS will page out things which are 
infrequently used thus doing exactly same thing which mmap() does, but 
portably.)

> [I believe the first time I started rolling this out on my own, I
>   decided to use ffi to dbm instead, to "outsource" the db drudge work.

Well, we have a BDB backend in Elephant, but some people want a pure 
Lisp backend.
0
Reply alex.mizrahi (228) 5/29/2012 6:11:06 PM

Alex Mizrahi <alex.mizrahi@gmail.com> writes:

>> | I wonder why it wasn't done before. Or it was?
>
>> I imagine the devil is in the details: handling variable size records;
>> then database issues of garbage collection, vacuuming, compaction;
>
> I believe there is a simple sub-optimal solution to each of these issues.
> Some people use in-memory, transaction-logging approaches like
> prevalence and are happy with it. I really don't see how a binary
> serialization based one can be inferior.

Binary serialization can be more expensive in space.

Eg. if you serialize fixnums, you will use 4 bytes per fixnum.  But most
fixnums are 0 and 1, so that'd use only two bytes in a textual
serialization.

You can compensate by using a compressing scheme, but this increments
complexity.

Another example: characters use up 24-bit or 32-bit nowadays.  But most
characters are in the ASCII character set, and would require only one
byte in a textual (UTF-8) serialization.  


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
0
Reply pjb (7667) 5/29/2012 6:27:10 PM

> Binary serialization can be more expensive in space.

Well, yes. It depends on what data you work with.

> Eg. if you serialize fixnums, you will use 4 bytes per fixnum.  But most
> fixnums are 0 and 1, so that'd use only two bytes in a textual
> serialization.

This isn't true. 0 and 1 might be a tiny bit more likely than other 
numbers, but it's unlikely that majority of your dataset consists of 
ones and zeros. (If this is the case, you should serialize it as a bitmap.)

Integers are often used to identify objects, i.e. each object is 
assigned an integer id. If you work with 1000 objects median length of 
decimal representation of an identifier is 3 digits, for 10000 objects 
it is 4 digits and so on.

So if we compare 32-bit binary integers vs. ASCII, at 1000 objects you 
have (approximately) break-even, at 10000 binary representation clearly 
wins.

If you work with less than 1000 objects it probably doesn't matter what 
you use for serialization, PRINT might be good enough.

Another typical use is a serialization of a graph, and you have exactly 
same picture with it.

What else can you use integers for? Fixed point representation of 
rational numbers, often used in finance. Well, I guess you'll have a lot 
of zeros, but 1 isn't any more likely than other numbers.
Some databases might allow you to optimize storage of zeros by just not 
storing them. E.g. if you use triple store you can wipe corresponding 
triples.

If you actually need to store ones and zeros (but that's more like 
boolean rather than integer) it's better to use bitmap,  it will be 16 
times more space-efficient than ASCII.

> Another example: characters use up 24-bit or 32-bit nowadays.  But most
> characters are in the ASCII character set, and would require only one
> byte in a textual (UTF-8) serialization.

Storing individual characters is incredibly rare, IMO. And storing 
strings in UTF-8 just makes sense no matter whether it is binary format 
or plain text format.

0
Reply alex.mizrahi (228) 5/30/2012 7:14:38 AM

>> I believe there is a simple sub-optimal solution to each of these issues.
>> Some people use in-memory, transaction-logging approaches like
>> prevalence and are happy with it. I really don't see how a binary
>> serialization based one can be inferior.

> Binary serialization can be more expensive in space.

It isn't really about space efficiency: I think a primary reason to use 
a 'binary' database is to get reasonable indexing without a need to load 
whole dataset into memory.
OTOH textual representation is usually good enough for stream 
processing: it allows you to use simple and robust tools (sed, awk) and 
simple compression (gzip).
0
Reply alex.mizrahi (228) 5/30/2012 7:33:54 AM

> then you'd want to mmap the file...

Here's a very simple benchmark for non-mmap data access:

(defun ccc ()
          (time
           (loop with stream = (stream-of *ff*)
                 with seq = (make-array '(4)
                              :element-type '(unsigned-byte 32))
                 repeat 1000000
                 do (file-position stream 0)
                 sum (read-sequence seq stream))))

$ (ccc)
Evaluation took:
   3.269 seconds of real time
   3.250000 seconds of total run time (2.490000 user, 0.760000 system)
   [ Run times consist of 0.010 seconds GC time, and 3.240 seconds 
non-GC time. ]
   99.42% CPU
   5,231,154,728 processor cycles
   79,986,688 bytes consed

4000000

So, 4000000*4/3.25 = 4923076 octets per second. For many application 
that's probably enough.

So mmap is over-rated. People who actually need mmap probably work with 
really huge scientific datasets.

For typical database workloads it's a better idea to implement 
bufferization, read-ahead and caching, and this can be done portably.
0
Reply alex.mizrahi (228) 5/30/2012 8:34:07 AM

* Alex Mizrahi <4fc5db80$0$287$14726298@news.sunsite.dk> :
Wrote on Wed, 30 May 2012 11:34:07 +0300:

| So mmap is over-rated. People who actually need mmap probably work
| with really huge scientific datasets.

Speed is not the reason to want to use mmap (I didn't understand it that
way at least).

In the context (persistent database for lisp objects), lisp objects have
to be serialized and deserialized, whereas one could just coerce a blob
of mmapped memory into a C++ object, by casting it, and then manipulate
the object in place.  IMO The big win would be to avoid the binary
serialization layers, (the same principle wants one to avoid serializing
strings to utf-8), but I apparently do not possess enough-fu to
successfully hack gencgc to add support for this.

[I believe discussing swap/mmap issues with file are invariably address
 a fake issue, because of the way Virtual Memory is implemented across
 Linux (and other consumer OS) --- which swap file pages in and out
 regardless of mmap/swap being used or turned off...]

| For typical database workloads it's a better idea to implement
| bufferization, read-ahead and caching, and this can be done portably.

In any case the seductive thinking process that suggests using mmap
follows the same parrern as when advocating kernel scheduled threads,
over other varieties of concurrency: "Do not think, thinking will only
hurt your [small] brain, just rely on the clever OS programmers to do
the right thing and just use our layers, entrench ... "

--- Madhu
0
Reply enometh (826) 5/30/2012 2:39:08 PM

Madhu <enometh@meer.net> writes:

> [I believe discussing swap/mmap issues with file are invariably address
>  a fake issue, because of the way Virtual Memory is implemented across
>  Linux (and other consumer OS) --- which swap file pages in and out
>  regardless of mmap/swap being used or turned off...]

The difference is that if you rely on the swap, you are doing twice the
I/O than if you use mmap:

swap:

    - read the file into RAM.
    - swap out the RAM into the swap partition
    - swap in the swap partition into the RAM
    - write the file from the RAM.

mmap:

    - read the file into RAM (mmap).
    - swap out the RAM into the file.


(of course, depending on mmap options, accesses and memory load).


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
0
Reply pjb (7667) 5/30/2012 7:44:10 PM

> Speed is not the reason to want to use mmap (I didn't understand it that
> way at least).

I believe it's the main reason.

> In the context (persistent database for lisp objects), lisp objects have
> to be serialized and deserialized, whereas one could just coerce a blob
> of mmapped memory into a C++ object, by casting it, and then manipulate
> the object in place.

You can as well read blob using fread, cast it into C++ object, 
manipulate it and fwrite it back. Only extra things you need is 
fread/fwrite pair, and it is pretty much trivial.

Actually I once worked on a project where we needed to implement a 
custom database in C++. First version used mmap as we saw it as a 
simplest way to do it. But then it turned out that it's unreliable: we 
could not allocate large block of continuous address space on some 
machines, and using windowing would complicate everything.
It took maybe a couple of days to re-implement it using fread/fwrite, we 
only had to change I/O related parts keeping interfaces and 
datastructures the same. (But I guess our interface was 
check-in/check-out friendly.) If we used fread/fwrite from the start it 
would barely cost us anything.

>  IMO The big win would be to avoid the binary
> serialization layers,

In C/C++ there is no difference -- you can always grab bytes of 
structure object no matter whether you're going to use it with mmap or 
file I/O.

In CL you, perhaps, can speed up development a bit by dumping raw data 
structures, but then you lose portability; it's very well possible that 
your 'database' will be readable only by a certain version of CL 
implementation; and most likely all dev. time savings you gain by not 
writing a serialization layer will be spent on reading documentation on 
raw data formats and mmap-related operations. So it would be in fact 
harder to implement, and also worse from compatibility standpoint. The 
only potential gain is speed.

Even if you're afraid of serialization for some reason, you can try 
using struct (:type vector) -- you can dump those things directly into 
byte files. Need different base data types? Use different files. Simple, eh?
0
Reply alex.mizrahi (228) 5/31/2012 1:51:02 AM

>> [I believe discussing swap/mmap issues with file are invariably address
>>   a fake issue, because of the way Virtual Memory is implemented across
>>   Linux (and other consumer OS) --- which swap file pages in and out
>>   regardless of mmap/swap being used or turned off...]

> The difference is that if you rely on the swap, you are doing twice the
> I/O than if you use mmap:

Presumably data is accessed multiple times. You do "twice the I/O" only 
during initial load. So it's not so much N vs 2*N as it is N vs N+1. For 
large N difference is negligible :)

If application reads data only once use of swap makes no sense, I think.
0
Reply alex.mizrahi (228) 5/31/2012 1:55:56 AM

> I imagine the devil is in the details: handling variable size records;

I've added an example of BST with string keys:

https://gist.github.com/2823562

Took me about a hour to implement.

I'm planning to do triple store next.

> then database issues of garbage collection, vacuuming, compaction;

All of this can be done using a copy operation which copies only live 
data. Also works as a backup tool.

Online compaction is a luxury.
0
Reply alex.mizrahi (228) 6/2/2012 1:56:36 PM

W dniu 2012-05-29 12:34, Alex Mizrahi pisze:
> (...)
> Note that as file abstraction is based on CLOS objects, it's pretty
> much trivial to add
>
> * cache
> * bufferization, read-ahead
> * transactions
> * replication
> * storage which works over network
> * distributed storage
>
> Just make a class which implements file API while doing one or more
> things above, and higher level abstractions would use it automatically.

> I wonder why it wasn't done before. Or it was?

 From Python world... ZODB with FileStorage backend
seems similar in concept, but after addition of cache,
transactions, network protocol its not simple anymore.

http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html
0
Reply piotr_chamera (24) 6/4/2012 5:20:44 PM

12 Replies
23 Views

(page loaded in 0.147 seconds)

6/20/2013 12:22:23 PM


Reply: