I have been wondering some exactly what is huffman.
is huffman a specific algo (eg: like deflate is)?;
or is huffman just a general class of algorithms (like rle)?
some descriptions describe ways of constructing the nodes, which seem
different than mine:
gather statistics;
sort from highest to lowest probability;
predict a good number of bits for the code (ceil(log2(total/stat)));
pack however many codes will fit in the codelength into that code;
go to the next level (total having all the lower stats subtracted).
(some other descriptions sounded like they would just form a plain binary
tree, but I am unsure, that doesn't seem as good).
my results seem to be comprable (even if about 1/3-1/5 larger) to what I am
getting from an arithmatic coder, and the trees look sane for the
statisrtics (plus, the encoder/decoder is a whole lot faster).
but, then I found a description of shannon-fano codes, which seem to really
only differ in that the tree construction algo is different.
now, I wonder, is my scheme really even still huffman?...
some descriptions of huffman describe an algo similar to mine, so I guess it
is ok.
also, most things I had seen appeared to be using high-low bit ordering, but
I figured I would use low-high. to me this seems to make more sense, given
the positive direction for the bytestream is rightward why does bit
progression have to be leftward? (apart from maybe making a little more
sense in hex dumps, I see little other reason). or do most people see the
positive direction of the bytestream as leftward, or just live with the
flip?...
one allways has to worry about bit order it seems though...
I felt that it was necissary to include the tables at the start of the
bitstream (I couldn't come up with another obvious way to allow decoding).
some descriptions said it was only necessary to include code lengths (for
what exactly is the question...).
I dumped the trees in terms of each level, namely the code length and
((1<<b)-1) bytes giving each leaf value. this can cost up to about 385 bytes
as I have encoded it. this seems reasonable though.
the coding seems almost like, in itself, it is insufficient for fully
managing itself (issues like run packing, stream termination, ... aren't
terribly obvious). an idea here involved overlaying a simple rle scheme on
top of the encoder, which allows run packing and also makes termination
simple. the decoder reads codes and unpacks any data it encounters, and the
terminal is given as a 0 length run (fairly elegant in my rle scheme).
some gain is achieved from the rle coding it seems, but not that
signifigant.
I also found and was messing with something referred to as the "burrows
wheeler transform". I am not sure if I implemented it right, but it seems to
have little effect of the data I am compressing (graphical deltas), and
actually seems to make the probability worse in some cases, and as
implemented seems a little slow.
my guess is that since the data is allready fairly skewed the transform is
not that effective, and is maybe similarly not very effective given there is
little in the way of byte level correlations, but more general low frequency
patterns (gradiants, ...), noise (which is presumably fairly random), and
occasional harsh edges.
an order-2 statistical context apparently can't really do much of anything
useful, besides eat up lots of memory and be slow...
(there are probably better and faster ways of doing it, I don't know...).
I guess it could probably work better on more general data though.
the simple filter given in a few other posts seems to work much better for
this kind of data.
why is this filter apparently effective? I don't know. for each value I just
subtract the previous value, or do it for each scanline and subtract the
previous scanline.
it seems to work pretty well I guess, but would probably suck for the kinds
of bwt works well on...
but I am at a limit, I have data that codes to about 40kB with my coding,
and about 30kB with arithmatic coding. I am still battling with issues,
either slightly above or slightly below the 5Mbps mark for some crap
video...
maybe I can try again for arithmatic coding, and hopefully achieve
an encoder/decoder near or faster than my huffman coder (that actually
works, which was a limit with my previous attempts). I figure, I would
probably just use order-0 modeling, but would have to decide between fixed
or adaptive statistics...
right now I am leaning for fixed statistics, guessing they could probably
work a little better and also simplify things a little (and I also don't
have to worry so much about low probability codes falling out of
existance...).
the main problems with the existing one is:
it was not written by me;
I don't understand it that well;
it is horribly slow;
....
I also want the experience of myself achieving one that works.
why am I so limited, is there something that I am missing to possibly allow
losslessly squezing out more bytes, or am I approaching the limit of the
data (a point where basic entropy is constituting most of the data, and
movements are not that far above entropy wrt encoded data sizes)?
I can't really seem to really get much more out of the entropy (or at least
not maybe without plunging into more complex or slower algos).
is the only other option to go lossy?...
but then I have to wonder if there is any point. why should I really do
anything?... I feel alone and coding stuff doesn't really help.
or something...
|
|
0
|
|
|
|
Reply
|
cr88192 (24)
|
10/7/2004 5:19:55 PM |
|
> I felt that it was necissary to include the tables at the start of the
> bitstream (I couldn't come up with another obvious way to allow decoding).
> some descriptions said it was only necessary to include code lengths (for
> what exactly is the question...).
> I dumped the trees in terms of each level, namely the code length and
> ((1<<b)-1) bytes giving each leaf value. this can cost up to about 385 bytes
> as I have encoded it. this seems reasonable though.
I have likewise used huffman codes to compress graphical deltas in my
lossless video codec. At the time of the construction of my huffman
tree algorithm ( i had not created one before) i did not know how to
implement canonical huffman trees. So to reduce the overhead of
including the tree with the data i developed a simple algorithm to
reduce number of bits needed:
quoting from my honours thesis:
>>>>>>>>>>>>>>>
The Huffman Tree is a special case of a binary tree where each
non-leaf node, has both children nodes. That is, the nodes will never
have only one child node. There can also be many different equvilent
(in terms of bits per symbol) Huffman Trees. These two properties
allow the manipluation of the tree such that the left children nodes
are popluated before the right. This makes a 'left' lopsided tree
(fig). Since trees can be thought of as rows of nodes at each level,
then to efficently store the tree all that is needed is to firstly
record how many symbols there are in the tree. The first symbol is
then recorded. This first symbol then becomes a 'level indicator
symbol'. The level indicator symbol determines the end of the current
row of the tree. For each row of the tree the symbols are recorded
plus the level indicator symbol. This scheme stores the binary tree in
b((n[r]-1)+n[s]) bits, where b is the number bits requried to
represent a symbol, n[r] is the number of rows and n[s] is the number
of symbols. Compared to the intuitive way of recording the symbols of
the children of each node, giving an equation of SUM(1 to n(s)) of
b*(number of chilren)+c where c is b if number of children is> 0 else
c=0, the savings for the general tree are apparent. The new scheme
peforms best when there are many symbols with few rows. However even
in the worst case of the number of rows equalling half the number of
symbols, the new scheme is only b bits more than the intuitive scheme.
Reconstruction of the tree from the stored row format is a simple
matter of connecting each symbol of the current row with the next two
symbols in the next row and incrementing the positional pointer into
the next row by two, (fig)
There is some delay introduced by the manipulation of the generated
huffman tree into the final "left-lopsided" form, however the delay
this introduces is neglible and is peformed only once.
<<<<<<<<<<<<<<<<<<<
this may or may not help you.
> but I am at a limit, I have data that codes to about 40kB with my coding,
> and about 30kB with arithmatic coding. I am still battling with issues,
> either slightly above or slightly below the 5Mbps mark for some crap
> video...
In my codec the bit stream from compressing by huffman codes was then
further compressed using either a gzip or a bzip2 stream as i found
that there was redundancy which the huffman coding could not reduce.
To help the gzip and bzip compression i localized the different
bitstreams in the file. i.e. i put the control stream first then i put
one data stream then i put another data stream, etc.
>> but then I have to wonder if there is any point. why should I
really do
> anything?... I feel alone and coding stuff doesn't really help.
>
> or something...
go out with your girl/boy friend, if you do not have one, then get one
:)
|
|
0
|
|
|
|
Reply
|
budgetanime
|
10/7/2004 11:30:16 PM
|
|
"moogie" <budgetanime@mystarship.com> wrote in message
news:e353ade.0410071530.2bc49d5d@posting.google.com...
>> I felt that it was necissary to include the tables at the start of the
>> bitstream (I couldn't come up with another obvious way to allow
>> decoding).
>> some descriptions said it was only necessary to include code lengths (for
>> what exactly is the question...).
>> I dumped the trees in terms of each level, namely the code length and
>> ((1<<b)-1) bytes giving each leaf value. this can cost up to about 385
>> bytes
>> as I have encoded it. this seems reasonable though.
>
> I have likewise used huffman codes to compress graphical deltas in my
> lossless video codec. At the time of the construction of my huffman
> tree algorithm ( i had not created one before) i did not know how to
> implement canonical huffman trees. So to reduce the overhead of
> including the tree with the data i developed a simple algorithm to
> reduce number of bits needed:
>
> quoting from my honours thesis:
>
>>>>>>>>>>>>>>>>
>
> The Huffman Tree is a special case of a binary tree where each
> non-leaf node, has both children nodes. That is, the nodes will never
> have only one child node. There can also be many different equvilent
> (in terms of bits per symbol) Huffman Trees. These two properties
> allow the manipluation of the tree such that the left children nodes
> are popluated before the right. This makes a 'left' lopsided tree
> (fig). Since trees can be thought of as rows of nodes at each level,
> then to efficently store the tree all that is needed is to firstly
> record how many symbols there are in the tree. The first symbol is
> then recorded. This first symbol then becomes a 'level indicator
> symbol'. The level indicator symbol determines the end of the current
> row of the tree. For each row of the tree the symbols are recorded
> plus the level indicator symbol. This scheme stores the binary tree in
> b((n[r]-1)+n[s]) bits, where b is the number bits requried to
> represent a symbol, n[r] is the number of rows and n[s] is the number
> of symbols. Compared to the intuitive way of recording the symbols of
> the children of each node, giving an equation of SUM(1 to n(s)) of
> b*(number of chilren)+c where c is b if number of children is> 0 else
> c=0, the savings for the general tree are apparent. The new scheme
> peforms best when there are many symbols with few rows. However even
> in the worst case of the number of rows equalling half the number of
> symbols, the new scheme is only b bits more than the intuitive scheme.
>
> Reconstruction of the tree from the stored row format is a simple
> matter of connecting each symbol of the current row with the next two
> symbols in the next row and incrementing the positional pointer into
> the next row by two, (fig)
>
> There is some delay introduced by the manipulation of the generated
> huffman tree into the final "left-lopsided" form, however the delay
> this introduces is neglible and is peformed only once.
>
> <<<<<<<<<<<<<<<<<<<
>
> this may or may not help you.
>
ok.
I have found that for the most part the trees require about 160 bytes.
the statistics are heavily skewed twards 0 and values near 0, and many of
the values in the middle range (near 128) are missing.
as for the fragment, I don't get it that well...
an idea was vaguely considered:
all leaf values are sorted numerically;
in this way, I could get by basically telling the construction of the tree,
and which level each byte value is on. the number of bits for the level
could be based on the depth of the tree.
eg, 4 bit codes:
1, 2, 2, 3, 4, 5, 6, 6, 4, 0.
5 bits are thus chosen for each level:
0, 1, 1, 2, 3, ..., 3, 2, 2, 1
this example would cost about 165 bytes, thus not that much change. however,
the distribution is more regular, so it could be possible to use the level
lengths to construct a new huffman tree to encode/decode the level numbers
for each leaf.
this would somewhat complicate loading/saving the trees though.
>
>> but I am at a limit, I have data that codes to about 40kB with my coding,
>> and about 30kB with arithmatic coding. I am still battling with issues,
>> either slightly above or slightly below the 5Mbps mark for some crap
>> video...
>
> In my codec the bit stream from compressing by huffman codes was then
> further compressed using either a gzip or a bzip2 stream as i found
> that there was redundancy which the huffman coding could not reduce.
> To help the gzip and bzip compression i localized the different
> bitstreams in the file. i.e. i put the control stream first then i put
> one data stream then i put another data stream, etc.
>
dunno, maybe deflating the frames could make sense as a benchmark.
at present, each frame is compressed seperately, but I don't think this is
all that signifigant. I am not actually saving out files, but instead mostly
just encoding and decoding frames in memory and watching results, and
printing out various statistics and other data...
oh well, I have doubts that deflate would do much, byte pattern redundancy
is unlikely, and things like edges are handled by another filter (albeit
edges are rare in the data, mostly I am just seeing noise and the error from
the motion predictor).
>>> but then I have to wonder if there is any point. why should I
> really do
>> anything?... I feel alone and coding stuff doesn't really help.
>>
>> or something...
>
> go out with your girl/boy friend, if you do not have one, then get one
> :)
I have been alone for a bit over 2 years, I doubt that is going to change
anytime soon...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/8/2004 2:40:53 AM
|
|
>
> I have found that for the most part the trees require about 160 bytes.
> the statistics are heavily skewed twards 0 and values near 0, and many of
> the values in the middle range (near 128) are missing.
>
> as for the fragment, I don't get it that well...
>
> an idea was vaguely considered:
> all leaf values are sorted numerically;
> in this way, I could get by basically telling the construction of the tree,
> and which level each byte value is on. the number of bits for the level
> could be based on the depth of the tree.
>
> eg, 4 bit codes:
> 1, 2, 2, 3, 4, 5, 6, 6, 4, 0.
>
> 5 bits are thus chosen for each level:
> 0, 1, 1, 2, 3, ..., 3, 2, 2, 1
>
> this example would cost about 165 bytes, thus not that much change. however,
> the distribution is more regular, so it could be possible to use the level
> lengths to construct a new huffman tree to encode/decode the level numbers
> for each leaf.
>
> this would somewhat complicate loading/saving the trees though.
Maybe a diagram will help
Say we generate the following huffman tree:
/ \
/ \
A
/ \ / \
D E
/ \ / \
B C F G
My algorithm needs to re-arrange the tree such that the left node of
each node is popluated first. thus the resultant equivilent tree is
generated:
/ \
/ \
A
/ \ / \
D E
/ \ / \
B C F G
so we have a tree of depth 3 consiting of
level 1: A
level 2: CD
level 3: BCFG
a temporary row indicator symbol is chosen (it can be anything but the
first symbol of the first row, i.e. not 'A' ) in this case i choose
'G'. This symbol is then written. This step is necessary as some trees
do not have any symbols in the first row (or next rows either) so we
need to be able to record this.
The tree header now consits of "G"
The first symbol is then recorded. The row indicator is now set to
this symbol.
The tree header now consits of "GA"
since there are no other symbols on the first row write the row
indicator symbol.
The tree header now consits of "GAA"
We are now on the second row.
write each symbol on the second row and then write the row indicator
symbol.
The tree header now consits of "GAADEA"
For the third line repeat the same step as for the second row. however
since it is the last row, the row indicator symbol is not needed
The tree header now consits of "GAADEABCFG"
each symbol is 3 bits
so "GAADEABCFG" comes to 30 bits
adding 3 bits to record the number of symbols in the tree gives a
total of
33 bits to record the huffman tree.
in your scheme how many bits will be required?
some other trees:
/ \
/ \
/ \
/ \
/ \ / \
/ \ / A
/ \ / \ / \
B C D E F G
would result in a header of ("GGAABCDEFG") and thus 33 bit are
required also.
> > In my codec the bit stream from compressing by huffman codes was then
> > further compressed using either a gzip or a bzip2 stream as i found
> > that there was redundancy which the huffman coding could not reduce.
> > To help the gzip and bzip compression i localized the different
> > bitstreams in the file. i.e. i put the control stream first then i put
> > one data stream then i put another data stream, etc.
> >
> dunno, maybe deflating the frames could make sense as a benchmark.
>
> at present, each frame is compressed seperately, but I don't think this is
> all that signifigant. I am not actually saving out files, but instead mostly
> just encoding and decoding frames in memory and watching results, and
> printing out various statistics and other data...
My video codec also compresses each frame sperately. Memory or File it
makes no difference... it is all data.
> oh well, I have doubts that deflate would do much, byte pattern redundancy
> is unlikely, and things like edges are handled by another filter (albeit
> edges are rare in the data, mostly I am just seeing noise and the error from
> the motion predictor).
You might be surprised. I know i was. in certian situations an extra
50% compression was achievable (however those were rare) and i am
probably using the huffman codes to perform compression differently to
you.
>
> >>> but then I have to wonder if there is any point. why should I
> really do
> >> anything?... I feel alone and coding stuff doesn't really help.
> >>
> >> or something...
> >
> > go out with your girl/boy friend, if you do not have one, then get one
> > :)
>
> I have been alone for a bit over 2 years, I doubt that is going to change
> anytime soon...
well, up until recently i was without a partner for 5 years, so there
is hope!
|
|
0
|
|
|
|
Reply
|
budgetanime
|
10/8/2004 8:53:39 AM
|
|
"moogie" <budgetanime@mystarship.com> wrote in message
news:e353ade.0410080053.5ac6c2a5@posting.google.com...
>>
>> I have found that for the most part the trees require about 160 bytes.
>> the statistics are heavily skewed twards 0 and values near 0, and many of
>> the values in the middle range (near 128) are missing.
>>
>> as for the fragment, I don't get it that well...
>>
>> an idea was vaguely considered:
>> all leaf values are sorted numerically;
>> in this way, I could get by basically telling the construction of the
>> tree,
>> and which level each byte value is on. the number of bits for the level
>> could be based on the depth of the tree.
>>
>> eg, 4 bit codes:
>> 1, 2, 2, 3, 4, 5, 6, 6, 4, 0.
>>
>> 5 bits are thus chosen for each level:
>> 0, 1, 1, 2, 3, ..., 3, 2, 2, 1
>>
>> this example would cost about 165 bytes, thus not that much change.
>> however,
>> the distribution is more regular, so it could be possible to use the
>> level
>> lengths to construct a new huffman tree to encode/decode the level
>> numbers
>> for each leaf.
>>
>> this would somewhat complicate loading/saving the trees though.
>
>
> Maybe a diagram will help
>
> Say we generate the following huffman tree:
>
>
> / \
> / \
> A
> / \ / \
> D E
> / \ / \
> B C F G
>
> My algorithm needs to re-arrange the tree such that the left node of
> each node is popluated first. thus the resultant equivilent tree is
> generated:
>
>
> / \
> / \
> A
> / \ / \
> D E
> / \ / \
> B C F G
>
>
> so we have a tree of depth 3 consiting of
> level 1: A
> level 2: CD
> level 3: BCFG
>
> a temporary row indicator symbol is chosen (it can be anything but the
> first symbol of the first row, i.e. not 'A' ) in this case i choose
> 'G'. This symbol is then written. This step is necessary as some trees
> do not have any symbols in the first row (or next rows either) so we
> need to be able to record this.
>
> The tree header now consits of "G"
>
> The first symbol is then recorded. The row indicator is now set to
> this symbol.
>
> The tree header now consits of "GA"
>
> since there are no other symbols on the first row write the row
> indicator symbol.
>
> The tree header now consits of "GAA"
>
> We are now on the second row.
>
> write each symbol on the second row and then write the row indicator
> symbol.
>
> The tree header now consits of "GAADEA"
>
> For the third line repeat the same step as for the second row. however
> since it is the last row, the row indicator symbol is not needed
>
> The tree header now consits of "GAADEABCFG"
>
> each symbol is 3 bits
>
> so "GAADEABCFG" comes to 30 bits
>
> adding 3 bits to record the number of symbols in the tree gives a
> total of
>
> 33 bits to record the huffman tree.
>
ok.
my trees are structured different typically (as far as I can tell from the
diagram).
>
> in your scheme how many bits will be required?
>
the one I had imagined:
difficult to predict exactly, would need to do math and such...
>
> some other trees:
>
> / \
> / \
> / \
> / \
> / \ / \
> / \ / A
> / \ / \ / \
> B C D E F G
>
> would result in a header of ("GGAABCDEFG") and thus 33 bit are
> required also.
>
ok. this seems like a fairly dense coding.
misc:
I had initially not viewed my trees as binary. I preferred to think of them
as a set of levels of given code lengths (thus arbitrary binary trees are
invalid in my scheme).
my current trees depend on the number of symbols and how they are
distributed.
A
DE
BCFG
thus, with my current scheme this would be 72 bits (however, the previous
tree would not be generated).
1: A
2: DEB
2: CFG
would though, and would still be 72 bits (N<b>):
1<4> 'A'<8> 2<4> 'DEB'<24> 2<4> 'CFG'<24> 0<4>
>
>
>> > In my codec the bit stream from compressing by huffman codes was then
>> > further compressed using either a gzip or a bzip2 stream as i found
>> > that there was redundancy which the huffman coding could not reduce.
>> > To help the gzip and bzip compression i localized the different
>> > bitstreams in the file. i.e. i put the control stream first then i put
>> > one data stream then i put another data stream, etc.
>> >
>> dunno, maybe deflating the frames could make sense as a benchmark.
>>
>> at present, each frame is compressed seperately, but I don't think this
>> is
>> all that signifigant. I am not actually saving out files, but instead
>> mostly
>> just encoding and decoding frames in memory and watching results, and
>> printing out various statistics and other data...
>
> My video codec also compresses each frame sperately. Memory or File it
> makes no difference... it is all data.
>
ok.
>> oh well, I have doubts that deflate would do much, byte pattern
>> redundancy
>> is unlikely, and things like edges are handled by another filter (albeit
>> edges are rare in the data, mostly I am just seeing noise and the error
>> from
>> the motion predictor).
>
> You might be surprised. I know i was. in certian situations an extra
> 50% compression was achievable (however those were rare) and i am
> probably using the huffman codes to perform compression differently to
> you.
>
maybe.
I am now fiddling with trying to write a new arithmatic coder. I am thinking
"crap, I am doing single bit io, this is probably going to be slow". ways to
work in multi-bit chunks or such are being rare. oh well, if I can write one
that works I can probably mess with getting it faster, or stick to huffman
coding (not that much bigger than arithmatic coding).
another idea that comes up could be "lumpy" arithmatic coding, the idea here
is:
I blow off the whole idea of a bitstream, and forget about issues like
underflow;
instead, I just code values into a dword until no more will fit;
at this point I go onto the next dword.
this would probably get crappier compression, but could probably be
reasonably fast. values that do not fit in a dword could be a problem
though, and would need some fix, along with possibly determining exactly
when all the items have been puled out of the box (a 'boxmark' is possible,
but ugly).
>>
>> >>> but then I have to wonder if there is any point. why should I
>> really do
>> >> anything?... I feel alone and coding stuff doesn't really help.
>> >>
>> >> or something...
>> >
>> > go out with your girl/boy friend, if you do not have one, then get one
>> > :)
>>
>> I have been alone for a bit over 2 years, I doubt that is going to change
>> anytime soon...
>
> well, up until recently i was without a partner for 5 years, so there
> is hope!
yes.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/8/2004 10:29:01 AM
|
|
- snip -
> I am now fiddling with trying to write a new arithmatic coder. I am thinking
I have had luck with the extremely fast arithmetic encoder mentioned
in the
DCC 96 submission #1 on fast context modeling and arithmetic coding
on the page
http://www.cbloom.com/papers/index.html
- some simplifications (cumul.freq. a power of two, very simple
renormalization)
avoids divisions completely, your only problem would be how to
estimate frequencies, so they always add up to a power of two,
dynamically or statically.
If you have an alphabet of only two symbols (one bit at a time), it
becomes even simpler.
Just a suggestion - about as fast as Huffman and much simpler; no tree
to initialize etc.
Rasmus M�ller
Those people who think they know everything are a great annoyance to
those of us who do - Isaac Asimov
|
|
0
|
|
|
|
Reply
|
AERasmus
|
10/8/2004 2:12:09 PM
|
|
"Rasmus M?ller" <AERasmus@volcanomail.com> wrote in message
news:8dbb24e7.0410080612.fef7427@posting.google.com...
>- snip -
>
>> I am now fiddling with trying to write a new arithmatic coder. I am
>> thinking
>
>
> I have had luck with the extremely fast arithmetic encoder mentioned
> in the
>
> DCC 96 submission #1 on fast context modeling and arithmetic coding
>
> on the page
>
> http://www.cbloom.com/papers/index.html
>
ok.
this may be interesting to look at. I beat a lot of one together, but I was
feeling pessimistic of it being much faster than the one I was intending to
replace...
> - some simplifications (cumul.freq. a power of two, very simple
> renormalization)
> avoids divisions completely, your only problem would be how to
> estimate frequencies, so they always add up to a power of two,
> dynamically or statically.
> If you have an alphabet of only two symbols (one bit at a time), it
> becomes even simpler.
>
one bit at a time is what I am trying to get away from.
however, it seems I was allready doing renormalization with shifts.
other things are, however, done with multiplies.
maintaining power of 2 ranges: I idly thought of this before, but figured it
could hurt compression.
> Just a suggestion - about as fast as Huffman and much simpler; no tree
> to initialize etc.
>
yes, that could be cool.
the huffman coder was not that hard though, but it is at least fairly fast.
I can imagine a few more optimizations, but at present they weren't seeming
that necessary.
writing the encoder is taking extra long, but maybe that can be blamed on me
watching lots of inuyasha right now, and not spending much time writing the
encoder...
I just figured I would take a break for now and respond to this post.
as for context modeling:
I am presently under the oppinion that much beyond order-0 modelling is
unnecessary for the data I am compressing (using context-based modeling
doesn't really seem to give any real improvement, costs time, ...).
my guess is that it is not that useful in cases where the statistics for the
data is allready highly skewn.
maybe context modeling could help for other data though, in which case I
might look more at, eg, the burrows wheeler transform. it looks interesting,
but I am not sure I pulled it off well (probably should have read more of
the paper).
oh well...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/8/2004 3:38:02 PM
|
|
budgetanime@mystarship.com (moogie) writes:
> Maybe a diagram will help
>
> Say we generate the following huffman tree:
>
>
> / \
> / \
> A
> / \ / \
> D E
> / \ / \
> B C F G
No such tree. A is not uniquely decodable, as it's the prefix to B, C, and D.
All tokens' nodes must be leaves.
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/8/2004 4:52:20 PM
|
|
>
> No such tree. A is not uniquely decodable, as it's the prefix to B, C, and D.
> All tokens' nodes must be leaves.
>
> Phil
!
oops!
I originally had another tree but then i modified it and i broke it! :(
a more apporiate tree would be:
/ \
/ \
/ \ / \
D E
/ \ / \
C F G
/ \
A B
and which would be transformed to:
/ \
/ \
/ \ \
/ \ / \
D E
/ \ / \
C F G
/ \
A B
and thus would be represented by:
BBDEDCFGDAB
and would come to 36 bits
|
|
0
|
|
|
|
Reply
|
budgetanime
|
10/8/2004 9:54:52 PM
|
|
"cr88192" <cr88192@remove.hotmail.com> wrote in message
news:NZt9d.857$pM5.508@news.flashnewsgroups.com...
>
> I had initially not viewed my trees as binary. I preferred to think of
them
> as a set of levels of given code lengths (thus arbitrary binary trees are
> invalid in my scheme).
And that is essentially what happens in Canonical Huffman. You are correct
that there are many trees that can produce the same size output. Canonical
Huffman defines rules for choosing one particular tree out of the many.
The way Huffman is used now is a way to determine which symbols are assigned
to which level. After that you throw away the actual tree that was generated
and just treat them as a set of levels. But Huffman guarantees the optimal
assignment of codes to levels based on their weights.
You seem to imply that computing the Huffman tree is an inefficient
operation and are thus trying to avoid it. Huffman can be very efficient
when you go into it with the knowledge that all you want to know is how many
of each code length there are.
For a fast algorithm (O(n) in time and space) that takes in an array of
sorted weights and returns in the same array the code lengths corresponding
to those weights see:
http://www.cs.mu.oz.au/~alistair/abstracts/mk95:wads.html
> I am now fiddling with trying to write a new arithmatic coder. I am
thinking
> "crap, I am doing single bit io, this is probably going to be slow". ways
to
> work in multi-bit chunks or such are being rare. oh well, if I can write
one
> that works I can probably mess with getting it faster, or stick to huffman
> coding (not that much bigger than arithmatic coding).
Arithmetic is definitely going to be slower.
> another idea that comes up could be "lumpy" arithmatic coding, the idea
here
> is:
> I blow off the whole idea of a bitstream, and forget about issues like
> underflow;
> instead, I just code values into a dword until no more will fit;
> at this point I go onto the next dword.
All you really have to do is buffer the output. In most languages buffered
I/O is easy or even the default.
|
|
0
|
|
|
|
Reply
|
Dale
|
10/12/2004 2:38:10 PM
|
|
"Dale King" <dalewking@insightbb[dot]com.nospam> wrote in message
news:m%Rad.459433$8_6.177173@attbi_s04...
> "cr88192" <cr88192@remove.hotmail.com> wrote in message
> news:NZt9d.857$pM5.508@news.flashnewsgroups.com...
>>
>> I had initially not viewed my trees as binary. I preferred to think of
> them
>> as a set of levels of given code lengths (thus arbitrary binary trees are
>> invalid in my scheme).
>
> And that is essentially what happens in Canonical Huffman. You are correct
> that there are many trees that can produce the same size output. Canonical
> Huffman defines rules for choosing one particular tree out of the many.
>
> The way Huffman is used now is a way to determine which symbols are
> assigned
> to which level. After that you throw away the actual tree that was
> generated
> and just treat them as a set of levels. But Huffman guarantees the optimal
> assignment of codes to levels based on their weights.
>
ok.
I am not sure what I am doing is cannonical huffman, but I think it might be
"close enough"...
> You seem to imply that computing the Huffman tree is an inefficient
> operation and are thus trying to avoid it. Huffman can be very efficient
> when you go into it with the knowledge that all you want to know is how
> many
> of each code length there are.
>
I ended up figuring the bit depths from the statistics, and figuring the
counts for each level from this.
I will argue that it is not exactly a fast operation though (possibly
because I am using bubble sort, possibly because it involves allocating
memory to hold the levels/leaves). it is not unbearably slow or anything,
but it is not something, eg, one would be wanting to do all that often...
I have some ideas though for how to fairly quickly update the tree, but then
I realize that for faster updates it would in-fact make sense to eliminate
the trees (esentially they would be replaced by the rough representation
used in the encoded output, consisting primarily of the statistics tables
and a list of level-depths).
then for updates I could do:
increment the counts in the statistics table (possibly using another table
to allow direct indexing of the given statistic);
swap with the next higher priority statistic if the count becomes greater;
recompute the bit depths if the swap causes the optimal bit depths to
change.
this would be possibly a little slower than the current approach (likely not
signifigantly).
however, this would involve some fairly signifigant changes, and I don't
feel the need for supporting adaptive huffman right now.
this may make sense as a seperate encoder, since that design would
complicate loading/saving the tables, and doesn't really offer other
advantages.
> For a fast algorithm (O(n) in time and space) that takes in an array of
> sorted weights and returns in the same array the code lengths
> corresponding
> to those weights see:
> http://www.cs.mu.oz.au/~alistair/abstracts/mk95:wads.html
>
confusing...
>> I am now fiddling with trying to write a new arithmatic coder. I am
> thinking
>> "crap, I am doing single bit io, this is probably going to be slow". ways
> to
>> work in multi-bit chunks or such are being rare. oh well, if I can write
> one
>> that works I can probably mess with getting it faster, or stick to
>> huffman
>> coding (not that much bigger than arithmatic coding).
>
> Arithmetic is definitely going to be slower.
>
and it sure is...
arithmatic coding was enough of a scary pita to make me focus more effort on
improving my huffman coder, and now I am pretty close to the original
arithmatic coder.
>> another idea that comes up could be "lumpy" arithmatic coding, the idea
> here
>> is:
>> I blow off the whole idea of a bitstream, and forget about issues like
>> underflow;
>> instead, I just code values into a dword until no more will fit;
>> at this point I go onto the next dword.
>
> All you really have to do is buffer the output. In most languages buffered
> I/O is easy or even the default.
>
I meant changing the decoder so that this crap isn't a problem, but yeah,
the idea poses "problems" afaik...
err, as for me, I personally have kind of an aversion to using language
runtime features. typically, I work on memory buffers, often comming from my
own memory manager, and any io (my compression code is fully independent of
my io code, it just uses the memory manager, and some amount of calls to a
print function, also supplied by me).
maybe this kind of thing is a remnant of from when I was doing os coding...
in general, my whole coder ended up about 1.5 kloc (it could be trimmed
down, but this is about what I ended up with).
well, anyways, here is trimmed down version of my current function for
building the trees (expects pre-sorted stats):
int PDHuff_BuildNodes(int *stat, int *num,
PDHuff_CTX *ctx, int sz, int tn)
{
double f;
int i, j, k, l, l2, b, c, nl, nt, nb;
ctx->table[tn]=kalloc(sizeof(PDHuff_Table));
ctx->table[tn]->node=kalloc(PDHUFF_MAXLEVELS*sizeof(PDHuff_Node *));
ctx->table[tn]->idx=kalloc(PDHUFF_MAXCODES*sizeof(int));
i=0;
nl=0;
nt=0;
nb=0;
while(stat[i] && (i<PDHUFF_MAXCODES))
{
f=((double)(sz-nt))/((double)stat[i]);
f=log2(f);
b=ceil(f);
if(!b)b=1;
ctx->table[tn]->node[nl]=kalloc(sizeof(PDHuff_Node));
ctx->table[tn]->node[nl]->bits=b;
ctx->table[tn]->node[nl]->leaf=kalloc((1<<b)*sizeof(short));
k=(1<<b)-1;
for(j=0; (j<k) && ((i+j)<PDHUFF_MAXCODES); j++)
{
ctx->table[tn]->node[nl]->leaf[j]=num[i+j];
nt+=stat[i+j];
}
for(; j<k; j++)
ctx->table[tn]->node[nl]->leaf[j]=0xFFFF;
//now, as you can see here, I sort again to put the leaves in
//value ascending order so I can more efficiently encode the
//trees stored in the output
k=j;
c=1;
while(c)
{
c=0;
for(j=0; j<(k-1); j++)
{
l=ctx->table[tn]->node[nl]->leaf[j];
l2=ctx->table[tn]->node[nl]->leaf[j+1];
if(l2<l)
{
ctx->table[tn]->node[nl]->leaf[j]=l2;
ctx->table[tn]->node[nl]->leaf[j+1]=l;
c++;
}
}
for(j=(k-2); j>=0; j--)
{
l=ctx->table[tn]->node[nl]->leaf[j];
l2=ctx->table[tn]->node[nl]->leaf[j+1];
if(l2<l)
{
ctx->table[tn]->node[nl]->leaf[j]=l2;
ctx->table[tn]->node[nl]->leaf[j+1]=l;
c++;
}
}
}
for(j=0; j<k; j++)
{
l=ctx->table[tn]->node[nl]->leaf[j];
ctx->table[tn]->idx[l]=(nl<<12)+(j+1);
}
i+=k;
nl++;
nb+=b;
}
ctx->table[tn]->nl=nl;
return(nl);
}
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/12/2004 4:24:56 PM
|
|
> I will argue that it is not exactly a fast operation though (possibly
> because I am using bubble sort,
Bubble sort?! Why?? Does your religion forbid you from using the
qsort() function?
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/12/2004 8:46:47 PM
|
|
blr@drizzle.com (Brian Raiter) writes:
> > I will argue that it is not exactly a fast operation though (possibly
> > because I am using bubble sort,
>
> Bubble sort?! Why?? Does your religion forbid you from using the
> qsort() function?
If you have a trivial comparison function and fewer than a fair load
of items, then almost anything self-written is faster than qsort().
And if you do have a fair bunch of items, then a self-written heap-sort
is almost always faster than qsort(). Qsort is an absolute nightmare
when it comes to branch prediction (there is none), which shafts most
modern processors. Merge sort is lots of happy tight predictable loops.
And qsort also flies off through function pointers all the time, but
anything that you hand-write can have the comparison inlined.
However, bubblesort should be struck from the history of computing
as an abysmally contrived way of trying to be as slow as possible.
Insertion sorts are always faster, as they do a strict subset of
the operations done in a bubblesort with no extra cost.
OP should grab Knuth vol 3, rip out the page on bubblesort, and
then select the algorithm that most closely suits his situation.
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/12/2004 10:43:30 PM
|
|
>> Bubble sort?! Why?? Does your religion forbid you from using the
>> qsort() function?
>
> If you have a trivial comparison function and fewer than a fair load
> of items, then almost anything self-written is faster than qsort().
> And if you do have a fair bunch of items, then a self-written heap-sort
> is almost always faster than qsort(). Qsort is an absolute nightmare
> when it comes to branch prediction (there is none), which shafts most
> modern processors.
But aren't you assuming here that qsort() uses quick-sort? In reality
the "q" in the name is purely historical. The C standard imposes no
requirement on the actual sorting algorithm.
Anyway, you're making an argument based on speed. But all I said was
that qsort() is better than hand-writng a bubble sort. For the limited
range of items that a handwritten bubble sort is faster than qsort(),
the improvement is likely to be the difference of 1ms versus 10ms, and
unimportant to the application being discussed in this thread.
> However, bubblesort should be struck from the history of computing
> as an abysmally contrived way of trying to be as slow as possible.
Hm. I would argue that ripple sort is more deserving of that
distinction. (Of course, most people have never heard of ripple sort,
so I suppose it actualy did work out that way.)
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/13/2004 2:05:20 AM
|
|
"Brian Raiter" <blr@drizzle.com> wrote in message
news:ckhfrn$pr$1@drizzle.com...
>> I will argue that it is not exactly a fast operation though (possibly
>> because I am using bubble sort,
>
> Bubble sort?! Why?? Does your religion forbid you from using the
> qsort() function?
>
I am not fammiliar with qsort.
also, I typically avoid using the c runtime for various reasons, maybe
because on some level I still expect to be using my code on raw hardware
again or such.
anyways, bubble sort is fairly easy to write and often only involves a few
extra variables...
if performance demands I guess I could use something like quicksort or
such...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/13/2004 4:37:48 AM
|
|
"Brian Raiter" <blr@drizzle.com> wrote in message
news:cki2h0$knm$1@drizzle.com...
>>> Bubble sort?! Why?? Does your religion forbid you from using the
>>> qsort() function?
>>
>> If you have a trivial comparison function and fewer than a fair load
>> of items, then almost anything self-written is faster than qsort().
>> And if you do have a fair bunch of items, then a self-written heap-sort
>> is almost always faster than qsort(). Qsort is an absolute nightmare
>> when it comes to branch prediction (there is none), which shafts most
>> modern processors.
>
> But aren't you assuming here that qsort() uses quick-sort? In reality
> the "q" in the name is purely historical. The C standard imposes no
> requirement on the actual sorting algorithm.
>
> Anyway, you're making an argument based on speed. But all I said was
> that qsort() is better than hand-writng a bubble sort. For the limited
> range of items that a handwritten bubble sort is faster than qsort(),
> the improvement is likely to be the difference of 1ms versus 10ms, and
> unimportant to the application being discussed in this thread.
>
well, I am dealing with typically small item counts.
I was mostly just arguing that, comparatively, at present building the
huffman trees is one of the most expensive parts of doing the encoding...
I will argue this is due to things like:
using bubblesort;
allocating memory, copying values, ...
but I am now thinking, assuming I was not expecting to serialize the trees I
could get by with a more dynamic structure, eg I keep:
a (dynamically) sorted set of current statistics;
a set of optimal bit depths for each level;
a set of indices from linear space to sorted space.
in this way, rebuilding the trees wouldn't cost much, and I could do
adaptive coding.
I am not that sure how others would approach adaptive coding, but this makes
sense.
also, the adaptive algo could be made simpler than my current algo (the main
complexity being the means in which I encode the trees).
>> However, bubblesort should be struck from the history of computing
>> as an abysmally contrived way of trying to be as slow as possible.
>
> Hm. I would argue that ripple sort is more deserving of that
> distinction. (Of course, most people have never heard of ripple sort,
> so I suppose it actualy did work out that way.)
>
err, I am not sure here.
I just looked it up, yes, slower than bubble sort...
but, anyways, bubble sort is not that much more complicated than ripple sort
but is somewhat faster...
often, though, it is easy just to inline write a simple sort function, and
bubblesort works pretty well for this.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/13/2004 5:25:59 AM
|
|
cr88192 wrote:
)
) "Brian Raiter" <blr@drizzle.com> wrote in message
) news:ckhfrn$pr$1@drizzle.com...
)>> I will argue that it is not exactly a fast operation though (possibly
)>> because I am using bubble sort,
)>
)> Bubble sort?! Why?? Does your religion forbid you from using the
)> qsort() function?
)>
) I am not fammiliar with qsort.
) also, I typically avoid using the c runtime for various reasons, maybe
) because on some level I still expect to be using my code on raw hardware
) again or such.
)
) anyways, bubble sort is fairly easy to write and often only involves a few
) extra variables...
) if performance demands I guess I could use something like quicksort or
) such...
The last time I wrote a huffman (which was 15 years ago, granted) I used a
heap (priority queue) instead of sorting, which was a major speedup.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
Willem
|
10/13/2004 11:49:44 AM
|
|
blr@drizzle.com (Brian Raiter) writes:
> >> Bubble sort?! Why?? Does your religion forbid you from using the
> >> qsort() function?
> >
> > If you have a trivial comparison function and fewer than a fair load
> > of items, then almost anything self-written is faster than qsort().
> > And if you do have a fair bunch of items, then a self-written heap-sort
> > is almost always faster than qsort(). Qsort is an absolute nightmare
> > when it comes to branch prediction (there is none), which shafts most
> > modern processors.
>
> But aren't you assuming here that qsort() uses quick-sort? In reality
> the "q" in the name is purely historical. The C standard imposes no
> requirement on the actual sorting algorithm.
By what I've written, yes, it appears that way. However, I am aware that
there's no need for to it actaully be a quick sort. My main objection is
that it repeatedly (n log n times) calls functions via function pointers.
The fact that you can't guarantee what algorithm lies within does not
work in its favour though. Chose the algorithm with the best behaviour
for the type of data you'll be using. Even insertion sort can be optimal
at large sizes if you know your data fits its sweet spot well.
> Anyway, you're making an argument based on speed. But all I said was
> that qsort() is better than hand-writng a bubble sort. For the limited
> range of items that a handwritten bubble sort is faster than qsort(),
> the improvement is likely to be the difference of 1ms versus 10ms, and
> unimportant to the application being discussed in this thread.
All we know is that bubblesort was too slow, and that he was prepared to
write his own routines. From that I conclude that he should be prepared
to write a different routine based on a different algorithm.
Merge sort's trivial to code (but not in-place), heap sort's trivial to
code, and quick sort's trivial to code too. qsort() was good advice if
the requirement was to "just change to something apart from bubblesort
quickly", as having to write a comparison function's pretty trivial.
The only time I've needed to sort stuff, I've needed to do it quickly,
and qsort() has always been 2-3 binary orders of magnitude slower than
what I write myself.
> > However, bubblesort should be struck from the history of computing
> > as an abysmally contrived way of trying to be as slow as possible.
>
> Hm. I would argue that ripple sort is more deserving of that
> distinction. (Of course, most people have never heard of ripple sort,
> so I suppose it actualy did work out that way.)
The fact that Knuth spends several pages analysing the bubblsort has
unfortunately meant that it will never be forgotten. I do believe that
some time in the early 70s, Professor Donald went out and got mightily
pissed, and the next day wrote that section. :-|
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/13/2004 12:17:01 PM
|
|
"Willem" <willem@stack.nl> wrote in message
news:slrncmq5io.2s43.willem@toad.stack.nl...
> cr88192 wrote:
> )
> ) "Brian Raiter" <blr@drizzle.com> wrote in message
> ) news:ckhfrn$pr$1@drizzle.com...
> )>> I will argue that it is not exactly a fast operation though (possibly
> )>> because I am using bubble sort,
> )>
> )> Bubble sort?! Why?? Does your religion forbid you from using the
> )> qsort() function?
> )>
> ) I am not fammiliar with qsort.
> ) also, I typically avoid using the c runtime for various reasons, maybe
> ) because on some level I still expect to be using my code on raw hardware
> ) again or such.
> )
> ) anyways, bubble sort is fairly easy to write and often only involves a
> few
> ) extra variables...
> ) if performance demands I guess I could use something like quicksort or
> ) such...
>
> The last time I wrote a huffman (which was 15 years ago, granted) I used a
> heap (priority queue) instead of sorting, which was a major speedup.
>
I went and looked up heap sort...
however, I just did a little checking, and as I understand it heap sort
would probably take far more operations for small data sets than bubble
sort, but would win in that as data sets become larger the number of steps
required for bubble sort would go up much faster?...
as far as I understand it, the algo goes something like:
for(i=0; i<(PDHUFF_MAXCODES-1); i++)
{
b=-1;
c=-1;
for(j=i; j<PDHUFF_MAXCODES; j++)
{
if(stat[j]>c)
{
b=j;
c=stat[j];
}
}
t=stat[i];
stat[i]=stat[b];
stat[b]=t;
t=num[i];
num[i]=num[b];
num[b]=t;
}
which for the statistics seems to take far more operations. for 4096 items
(many of which are 0 however), this requires about 8.4M operations (measured
from the innermost loop).
wheras, bubble sort takes about 32 passes for the data, or about 262k
operations.
this is looking pretty bad here.
quicksort, however, would take about 49k operations afaik, making it a
winner here (just, for quicksort, I would need to allocate some memory and
have a seperate, recursive function, which is a hassle).
with a set of 4096 random values, buble sort still has about half the total
operations. boosting it to 16384, it leans more in bubble sort's favor (by a
large margin), similar for smaller sets.
(just as a misc test, I added ripple sort, which is ammusingly enough
getting lower operation counts for whatever reason than bubble sort, but not
signifigantly...).
to me bubble sort still seems like a more sensible option.
I don't really get it...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/13/2004 2:01:40 PM
|
|
"cr88192" <cr88192@remove.hotmail.com> wrote in message
news:8zabd.20489$pM5.20071@news.flashnewsgroups.com...
>
> "Willem" <willem@stack.nl> wrote in message
> news:slrncmq5io.2s43.willem@toad.stack.nl...
>> cr88192 wrote:
>> )
>> ) "Brian Raiter" <blr@drizzle.com> wrote in message
>> ) news:ckhfrn$pr$1@drizzle.com...
>> )>> I will argue that it is not exactly a fast operation though (possibly
>> )>> because I am using bubble sort,
>> )>
>> )> Bubble sort?! Why?? Does your religion forbid you from using the
>> )> qsort() function?
>> )>
>> ) I am not fammiliar with qsort.
>> ) also, I typically avoid using the c runtime for various reasons, maybe
>> ) because on some level I still expect to be using my code on raw
>> hardware
>> ) again or such.
>> )
>> ) anyways, bubble sort is fairly easy to write and often only involves a
>> few
>> ) extra variables...
>> ) if performance demands I guess I could use something like quicksort or
>> ) such...
>>
>> The last time I wrote a huffman (which was 15 years ago, granted) I used
>> a
>> heap (priority queue) instead of sorting, which was a major speedup.
>>
> I went and looked up heap sort...
>
> however, I just did a little checking, and as I understand it heap sort
> would probably take far more operations for small data sets than bubble
> sort, but would win in that as data sets become larger the number of steps
> required for bubble sort would go up much faster?...
>
> as far as I understand it, the algo goes something like:
> for(i=0; i<(PDHUFF_MAXCODES-1); i++)
> {
> b=-1;
> c=-1;
> for(j=i; j<PDHUFF_MAXCODES; j++)
> {
> if(stat[j]>c)
> {
> b=j;
> c=stat[j];
> }
> }
>
> t=stat[i];
> stat[i]=stat[b];
> stat[b]=t;
>
> t=num[i];
> num[i]=num[b];
> num[b]=t;
> }
>
> which for the statistics seems to take far more operations. for 4096 items
> (many of which are 0 however), this requires about 8.4M operations
> (measured from the innermost loop).
> wheras, bubble sort takes about 32 passes for the data, or about 262k
> operations.
> this is looking pretty bad here.
>
> quicksort, however, would take about 49k operations afaik, making it a
> winner here (just, for quicksort, I would need to allocate some memory and
> have a seperate, recursive function, which is a hassle).
>
> with a set of 4096 random values, buble sort still has about half the
> total operations. boosting it to 16384, it leans more in bubble sort's
> favor (by a large margin), similar for smaller sets.
>
> (just as a misc test, I added ripple sort, which is ammusingly enough
> getting lower operation counts for whatever reason than bubble sort, but
> not signifigantly...).
>
> to me bubble sort still seems like a more sensible option.
>
> I don't really get it...
>
adding to the confusion, I added a timer, and even though heap sort takes
about twice the operations, it takes somewhat less time (possibly related to
the operations being simpler, eg, single loop iterations vs. swaps).
bubble sort takes a little more time, and ripple sort takes about twice the
time of bubble sort even though it typically has a slightly lower operations
count.
comparatively, quicksort is the fastest by a few orders of magnitude.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/13/2004 2:53:23 PM
|
|
"cr88192" <cr88192@remove.hotmail.com> writes:
> "Willem" <willem@stack.nl> wrote in message
> news:slrncmq5io.2s43.willem@toad.stack.nl...
> > cr88192 wrote:
> > )
> > ) "Brian Raiter" <blr@drizzle.com> wrote in message
> > ) news:ckhfrn$pr$1@drizzle.com...
> > )>> I will argue that it is not exactly a fast operation though (possibly
> > )>> because I am using bubble sort,
> > )>
> > )> Bubble sort?! Why?? Does your religion forbid you from using the
> > )> qsort() function?
> > )>
> > ) I am not fammiliar with qsort.
> > ) also, I typically avoid using the c runtime for various reasons, maybe
> > ) because on some level I still expect to be using my code on raw hardware
> > ) again or such.
> > )
> > ) anyways, bubble sort is fairly easy to write and often only involves a
> > few
> > ) extra variables...
> > ) if performance demands I guess I could use something like quicksort or
> > ) such...
> >
> > The last time I wrote a huffman (which was 15 years ago, granted) I used a
> > heap (priority queue) instead of sorting, which was a major speedup.
> >
> I went and looked up heap sort...
>
> however, I just did a little checking, and as I understand it heap sort
> would probably take far more operations for small data sets than bubble
> sort, but would win in that as data sets become larger the number of steps
> required for bubble sort would go up much faster?...
>
> as far as I understand it, the algo goes something like:
> for(i=0; i<(PDHUFF_MAXCODES-1); i++)
> {
> b=-1;
> c=-1;
> for(j=i; j<PDHUFF_MAXCODES; j++)
> {
> if(stat[j]>c)
> {
> b=j;
> c=stat[j];
> }
> }
>
> t=stat[i];
> stat[i]=stat[b];
> stat[b]=t;
>
> t=num[i];
> num[i]=num[b];
> num[b]=t;
> }
That's an O(n^2)-compare, O(n)-move, insertion sort. Each
iteration of the i loop assumes that the array from [0 .. i)
is already sorted, and ends with the array from [0 .. i]
sorted.
> which for the statistics seems to take far more operations. for 4096 items
> (many of which are 0 however), this requires about 8.4M operations (measured
> from the innermost loop).
> wheras, bubble sort takes about 32 passes for the data, or about 262k
> operations.
> this is looking pretty bad here.
That's not a bubble-sort then. A bubble-sort is O(n^2)-compare,
O(n^2)-move. If you place the smallest element where you want
the largest element, and the largest element where you want
the smallest element, then one of them will take 4095 passes
to move to its correct place. I guess if all but 32 of your
values are 0, then it's possible to only take 32 passes, but
if that's the case, then you've only got 32 elements that you
actually need to sort, and you should be taking ~1024 operations
not 262144.
> quicksort, however, would take about 49k operations afaik, making it a
> winner here (just, for quicksort, I would need to allocate some memory and
> have a seperate, recursive function, which is a hassle).
You'd need to allocate auxiliary memory _or_ have a recursive function.
The former is more hassle than the latter, the latter's not a
hassle, it makes things much simpler, IMHO.
> with a set of 4096 random values, buble sort still has about half the total
> operations.
half of what?
Due to the reasons I mention above, bubble-sort will have O(n^2)
operations.
> boosting it to 16384, it leans more in bubble sort's favor (by a
> large margin), similar for smaller sets.
>
> (just as a misc test, I added ripple sort, which is ammusingly enough
> getting lower operation counts for whatever reason than bubble sort, but not
> signifigantly...).
>
> to me bubble sort still seems like a more sensible option.
>
> I don't really get it...
Look at Paul Hseih's pages on sorting.
http://www.azillionmonkeys.com/qed/sort.html
Including source code.
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/13/2004 3:28:16 PM
|
|
cr88192 wrote:
)> The last time I wrote a huffman (which was 15 years ago, granted) I used a
)> heap (priority queue) instead of sorting, which was a major speedup.
)>
) I went and looked up heap sort...
I didn't say heap sort. I said heap.
When you're doing huffman, you repeatedly have to get the smallest two
values from a set, for which a heap is an excellent solution.
What exactly are you sorting ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
Willem
|
10/13/2004 4:21:07 PM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87wtxuvmma.fsf@nonospaz.fatphil.org...
>
> The fact that Knuth spends several pages analysing the bubblsort has
> unfortunately meant that it will never be forgotten. I do believe that
> some time in the early 70s, Professor Donald went out and got mightily
> pissed, and the next day wrote that section. :-|
Translation for our USA readers:
mightily pissed = very drunk
|
|
0
|
|
|
|
Reply
|
Pete
|
10/13/2004 4:26:56 PM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:877jpuocxb.fsf@nonospaz.fatphil.org...
> "cr88192" <cr88192@remove.hotmail.com> writes:
>
>> "Willem" <willem@stack.nl> wrote in message
>> news:slrncmq5io.2s43.willem@toad.stack.nl...
>> > cr88192 wrote:
>> > )
>> > ) "Brian Raiter" <blr@drizzle.com> wrote in message
>> > ) news:ckhfrn$pr$1@drizzle.com...
>> > )>> I will argue that it is not exactly a fast operation though
>> > (possibly
>> > )>> because I am using bubble sort,
>> > )>
>> > )> Bubble sort?! Why?? Does your religion forbid you from using the
>> > )> qsort() function?
>> > )>
>> > ) I am not fammiliar with qsort.
>> > ) also, I typically avoid using the c runtime for various reasons,
>> > maybe
>> > ) because on some level I still expect to be using my code on raw
>> > hardware
>> > ) again or such.
>> > )
>> > ) anyways, bubble sort is fairly easy to write and often only involves
>> > a
>> > few
>> > ) extra variables...
>> > ) if performance demands I guess I could use something like quicksort
>> > or
>> > ) such...
>> >
>> > The last time I wrote a huffman (which was 15 years ago, granted) I
>> > used a
>> > heap (priority queue) instead of sorting, which was a major speedup.
>> >
>> I went and looked up heap sort...
>>
>> however, I just did a little checking, and as I understand it heap sort
>> would probably take far more operations for small data sets than bubble
>> sort, but would win in that as data sets become larger the number of
>> steps
>> required for bubble sort would go up much faster?...
>>
>> as far as I understand it, the algo goes something like:
>> for(i=0; i<(PDHUFF_MAXCODES-1); i++)
>> {
>> b=-1;
>> c=-1;
>> for(j=i; j<PDHUFF_MAXCODES; j++)
>> {
>> if(stat[j]>c)
>> {
>> b=j;
>> c=stat[j];
>> }
>> }
>>
>> t=stat[i];
>> stat[i]=stat[b];
>> stat[b]=t;
>>
>> t=num[i];
>> num[i]=num[b];
>> num[b]=t;
>> }
>
> That's an O(n^2)-compare, O(n)-move, insertion sort. Each
> iteration of the i loop assumes that the array from [0 .. i)
> is already sorted, and ends with the array from [0 .. i]
> sorted.
>
ok, I am confused then...
>> which for the statistics seems to take far more operations. for 4096
>> items
>> (many of which are 0 however), this requires about 8.4M operations
>> (measured
>> from the innermost loop).
>> wheras, bubble sort takes about 32 passes for the data, or about 262k
>> operations.
>> this is looking pretty bad here.
>
> That's not a bubble-sort then. A bubble-sort is O(n^2)-compare,
> O(n^2)-move. If you place the smallest element where you want
> the largest element, and the largest element where you want
> the smallest element, then one of them will take 4095 passes
> to move to its correct place. I guess if all but 32 of your
> values are 0, then it's possible to only take 32 passes, but
> if that's the case, then you've only got 32 elements that you
> actually need to sort, and you should be taking ~1024 operations
> not 262144.
>
that is describing what I understood to be ripple sort.
my sort is a little better.
similarly, I am dealing with likely about 256 items filled, the rest unused.
the fragment I was using for the test:
void BubbleSort(int *stat, int cnt)
{
int i, j, k, b, c, t;
c=1;
k=0;
while(c)
{
c=0;
for(i=0; i<(cnt-1); i++)
if(stat[i]<stat[i+1])
{
c++;
k++;
j=stat[i];
stat[i]=stat[i+1];
stat[i+1]=j;
}
for(i=(cnt-2); i>=0; i--)
if(stat[i]<stat[i+1])
{
c++;
k++;
j=stat[i];
stat[i]=stat[i+1];
stat[i+1]=j;
}
}
printf("bubble sort %d ops\n", k);
}
putting values in reverse order is not that much of a problem for the algo,
and in my experience it has not been too horribly slow in general.
for 16384 values, it takes about 4000 passes.
all the values in reverse order takes a single pass, but generates a
somewhat curious operations count...
>> quicksort, however, would take about 49k operations afaik, making it a
>> winner here (just, for quicksort, I would need to allocate some memory
>> and
>> have a seperate, recursive function, which is a hassle).
>
> You'd need to allocate auxiliary memory _or_ have a recursive function.
> The former is more hassle than the latter, the latter's not a
> hassle, it makes things much simpler, IMHO.
>
err, I sort of did both, oh well...
>> with a set of 4096 random values, buble sort still has about half the
>> total
>> operations.
>
> half of what?
> Due to the reasons I mention above, bubble-sort will have O(n^2)
> operations.
>
worst case, but for the test I was using random values, which are more
likely to test "average" case.
and, the sort I was understanding to be a bubble sort was talking about half
as many total operations as what I understood to be a heap sort on average.
>> boosting it to 16384, it leans more in bubble sort's favor (by a
>> large margin), similar for smaller sets.
>>
>> (just as a misc test, I added ripple sort, which is ammusingly enough
>> getting lower operation counts for whatever reason than bubble sort, but
>> not
>> signifigantly...).
>>
>> to me bubble sort still seems like a more sensible option.
>>
>> I don't really get it...
>
>
> Look at Paul Hseih's pages on sorting.
> http://www.azillionmonkeys.com/qed/sort.html
> Including source code.
>
ok.
my algos largely came from rough english descriptions of the algos, except
for the bubble sort which came from my common practice, namely wherever I
needed a simple sort and had a few variables to spare.
in other cases, I would often use an insertion or quicksort.
or, I had sometimes (but rarely) used the construction of a binary tree as a
kind of sorting (I trace down to where the value would go possibly splitting
leaves in some cases). afterwards the values would be sorted.
however, a big downside is that this requires allocating memory and having
structs to represent the tree, which is a big hassle imo.
but I am probably stupid anyways...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/13/2004 5:07:46 PM
|
|
"Willem" <willem@stack.nl> wrote in message
news:slrncmqlfj.22bn.willem@toad.stack.nl...
> cr88192 wrote:
> )> The last time I wrote a huffman (which was 15 years ago, granted) I
> used a
> )> heap (priority queue) instead of sorting, which was a major speedup.
> )>
> ) I went and looked up heap sort...
>
> I didn't say heap sort. I said heap.
> When you're doing huffman, you repeatedly have to get the smallest two
> values from a set, for which a heap is an excellent solution.
>
> What exactly are you sorting ?
>
first, I sort the gathered statistics into high-low order.
then, I sort the leaf values into ascending value order (to allow denser
representations of the stored tables).
however, in my case I skipped over the whole "binary tree" phase of the
algorithm, instead trying to directly compute sane bit depths from the
statistics (no need to fiddle with individual values once sorted).
err, apart from memory management I am not sure exactly what the term "heap"
is referring to here...
in my case I am not sure what I am doing is strictly huffman, given no
binary trees are used really at any point in the algo. binary trees though
just seemed like too much of a pain and too costly, so I took an easier
approach.
or something...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/13/2004 5:23:06 PM
|
|
"cr88192" <cr88192@remove.hotmail.com> writes:
> the fragment I was using for the test:
> void BubbleSort(int *stat, int cnt)
> {
> int i, j, k, b, c, t;
>
> c=1;
> k=0;
> while(c)
> {
> c=0;
>
> for(i=0; i<(cnt-1); i++)
> if(stat[i]<stat[i+1])
> {
> c++;
> k++;
>
> j=stat[i];
> stat[i]=stat[i+1];
> stat[i+1]=j;
> }
> for(i=(cnt-2); i>=0; i--)
> if(stat[i]<stat[i+1])
> {
> c++;
> k++;
>
> j=stat[i];
> stat[i]=stat[i+1];
> stat[i+1]=j;
> }
> }
> printf("bubble sort %d ops\n", k);
> }
Ah, OK, this is a bidirectional bubble sort with an early abort.
(if c=0 at the end of the outer loop, the array is sorted.)
> putting values in reverse order is not that much of a problem for the algo,
> and in my experience it has not been too horribly slow in general.
>
> for 16384 values, it takes about 4000 passes.
Yup, each pass is actually 2 passes, one forward, and one backward,
and the expected number of passes is 16384*.5 = 8192. So you'd
expect 4096 bidirectional passes.
> > Due to the reasons I mention above, bubble-sort will have O(n^2)
> > operations.
>
> worst case, but for the test I was using random values, which are more
> likely to test "average" case.
Ignoring the early-abort, and treating your outer loop as having 2
passes, on average, one would expect n passes, with n^2/2 comparisons
and n^2/2/2*3 moves (50% of comparisons cause 3 moves). Therefore
there will be n^2*5/4 operations on average.
However, n^2*100, n^2, and n^2/100 are all O(n^2).
Look up "Big-Oh" notation (or "big-O"). It simply indicates how the
function grows, but not a specific value.
f(n) is O(g(n)) if there exists a C and an N such that for all n>N f(n)<=C.g(n)
so 10^6*n^2 is O(n^2) as for C=10^-6, N=0 for all n>N f(n)<=C.g(n)
> and, the sort I was understanding to be a bubble sort was talking about half
> as many total operations as what I understood to be a heap sort on average.
A heap is O(n.lg(n)). For 4000 elements, lg(n)=12 and you should be
doing somewhere between 4000*12*2 and 4000*12*3 operations.
This is much less than 4000*4000*5/4.
> > Look at Paul Hseih's pages on sorting.
> > http://www.azillionmonkeys.com/qed/sort.html
> > Including source code.
>
> ok.
>
> my algos largely came from rough english descriptions of the algos, except
> for the bubble sort which came from my common practice, namely wherever I
> needed a simple sort and had a few variables to spare.
>
> in other cases, I would often use an insertion or quicksort.
> or, I had sometimes (but rarely) used the construction of a binary tree as a
> kind of sorting (I trace down to where the value would go possibly splitting
> leaves in some cases). afterwards the values would be sorted.
> however, a big downside is that this requires allocating memory and having
> structs to represent the tree, which is a big hassle imo.
Yeah, trees can be a hassle.
Heaps are much simpler than trees.
> but I am probably stupid anyways...
Not at all. You're asking all the right questions.
I wish I had a decent "all you wanted to know about sorting" URL
for you. I guess you're at the mercy of google, I'm sure there's
something out there. The above URL's OK if you already know the
basics, which you seem to.
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/13/2004 5:40:52 PM
|
|
"Willem" <willem@stack.nl> wrote in message
news:slrncmq5io.2s43.willem@toad.stack.nl...
> cr88192 wrote:
>
> The last time I wrote a huffman (which was 15 years ago, granted) I used a
> heap (priority queue) instead of sorting, which was a major speedup.
That is a mistake. If you sort the weights, then the Huffman algorithm can
be done in only O(n) time. J. van Leeuwen showed this in 1976 (article not
on-line).
One thing that is inefficient with the Huffman algorithm is that when you
combine 2 nodes to create a new tree it puts that new tree back into the
heap. The optimization is to not insert new trees into the heap but instead
put them into a separate FIFO. The trees that are created will always be
consumed to make new trees in the order that they are created. So there is
no reason to go through the extra steps to insert them into the heap.
Since the heap will only hold leaf nodes and there will be no insertions
into it, it can instead just be a sorted array. Thus the heap can be
eliminated entirely from the Huffman algorithm.
When choosing the next node you either choose a leaf node or the tree from
the front of the FIFO, whichever is lower (ties should favor the trees in
the FIFO to guarantee shallower trees see Schwartz 1964). Doing it this way
you never actually do an insertions into your heap during the algorithm.
The article I pointed to earlier at
http://www.cs.mu.oz.au/~alistair/abstracts/inplace.html (which unfortuanely
is not on-line except for the source code) is the best and fastest algorithm
for converting from a sorted set of weights to a set of code lengths. It is
done in-place with only O(1) extra space. If you can get a copy of the paper
I highly recommend reading it. If not I suggest studying the code that is
available there and see if you can understand it. I can answer any questions
about it as I know it very well and do have a copy of the paper. Or
alternatively just use the code without understanding it.
We can argue the merits of one sort algorithm over another as you guys
already have. The nice part of this approach here is that you can separate
the Huffman algorithm from ordering of the leaf nodes. This allows you to
use quicksort or whichever algorithm you favor instead of having the Huffman
algorithm force a heap on you.
|
|
0
|
|
|
|
Reply
|
Dale
|
10/13/2004 6:14:22 PM
|
|
"Brian Raiter" <blr@drizzle.com> wrote in message
news:cki2h0$knm$1@drizzle.com...
>
> > However, bubblesort should be struck from the history of computing
> > as an abysmally contrived way of trying to be as slow as possible.
>
> Hm. I would argue that ripple sort is more deserving of that
> distinction. (Of course, most people have never heard of ripple sort,
> so I suppose it actualy did work out that way.)
My favorite for slow sorting is the shuffle sort:
while( data is not in order )
shuffle the data;
Which is O(n!)
On this subject of slow algorithms, there is a funny, must-read paper on
"pessimal algorithms and simplexity analysis" (as opposed to optimal
algorithms and complexity analysis):
http://citeseer.ist.psu.edu/334813.html
It introduces the slowsort algorithm which is O(n^lg n)
http://c2.com/cgi/wiki?SlowSort
|
|
0
|
|
|
|
Reply
|
Dale
|
10/13/2004 6:43:18 PM
|
|
"Dale King" <dalewking@insightbb[dot]com.nospam> writes:
> "Willem" <willem@stack.nl> wrote in message
> news:slrncmq5io.2s43.willem@toad.stack.nl...
> > cr88192 wrote:
> >
> > The last time I wrote a huffman (which was 15 years ago, granted) I used a
> > heap (priority queue) instead of sorting, which was a major speedup.
>
> That is a mistake. If you sort the weights, then the Huffman algorithm can
> be done in only O(n) time. J. van Leeuwen showed this in 1976 (article not
> on-line).
But if you sort the weights using a comparison sort, then you're
in O(n.lg(n)) time anyway.
However, that's no reason to avoid replacing an O(n.lg(n)) component
with an O(n) component if it's easily done. The scheme you describe
sounds as if it should be within the grasp of anyone who understands
merge sorts, for example. (The new nodes are being merged with the
original ones, and as you say are pre-sorted by construction).
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/13/2004 6:57:02 PM
|
|
Dale wrote:
)> The last time I wrote a huffman (which was 15 years ago, granted) I used a
)> heap (priority queue) instead of sorting, which was a major speedup.
)
) That is a mistake. If you sort the weights, then the Huffman algorithm can
) be done in only O(n) time. J. van Leeuwen showed this in 1976 (article not
) on-line).
The only article I had to go on at that time was two pages in Dr.Dobbs
magazine that a friend gave me with the words: "Maybe you want to use this
for your project." (Note that the project was actually a 'demo' on a C=64.)
That, and the 'heap' structure that was taught in programming 101, lead to
how my algorithm worked. Ah, the olden days...
After that, I learned about arith coding, and never got back to huffman.
I'll read the paper, seems quite interesting.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
Willem
|
10/13/2004 7:00:38 PM
|
|
You might want to take a look at an Improved Comb Sort with pre-defined
gap table at...
http://world.std.com/~jdveale/combsort.htm
This is a little slower than a heap sort, but enjoys much of the
simplicity of a bubble sort.
Jim Veale
"cr88192" <cr88192@remove.hotmail.com> writes:
>"Brian Raiter" <blr@drizzle.com> wrote in message
>news:ckhfrn$pr$1@drizzle.com...
>>> I will argue that it is not exactly a fast operation though (possibly
>>> because I am using bubble sort,
>>
>> Bubble sort?! Why?? Does your religion forbid you from using the
>> qsort() function?
>>
>I am not fammiliar with qsort.
>also, I typically avoid using the c runtime for various reasons, maybe
>because on some level I still expect to be using my code on raw hardware
>again or such.
>anyways, bubble sort is fairly easy to write and often only involves a few
>extra variables...
>if performance demands I guess I could use something like quicksort or
>such...
|
|
0
|
|
|
|
Reply
|
James
|
10/13/2004 8:28:22 PM
|
|
> My favorite for slow sorting is the shuffle sort:
That one is perhaps more generally known among hackers as the bogo-sort:
http://www.catb.org/jargon/html/B/bogo-sort.html
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/13/2004 8:28:45 PM
|
|
>> anyways, bubble sort is fairly easy to write and often only
>> involves a few extra variables... if performance demands I guess I
>> could use something like quicksort or such...
All I'm saying is that if your main reason for using bubble sort is
not performance, but that it's fairly easy to write, then using
qsort() is clearly the better choice.
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/13/2004 8:35:08 PM
|
|
"cr88192" <cr88192@remove.hotmail.com> wrote in message news:<Djbbd.20503$pM5.15480@news.flashnewsgroups.com>...
> "cr88192" <cr88192@remove.hotmail.com> wrote in message
> news:8zabd.20489$pM5.20071@news.flashnewsgroups.com...
> >
> > "Willem" <willem@stack.nl> wrote in message
> > news:slrncmq5io.2s43.willem@toad.stack.nl...
> >> cr88192 wrote:
> >> )
> >> ) "Brian Raiter" <blr@drizzle.com> wrote in message
> >> ) news:ckhfrn$pr$1@drizzle.com...
> >> )>> I will argue that it is not exactly a fast operation though (possibly
> >> )>> because I am using bubble sort,
> >> )>
> >> )> Bubble sort?! Why?? Does your religion forbid you from using the
> >> )> qsort() function?
> >> )>
> >> ) I am not fammiliar with qsort.
> >> ) also, I typically avoid using the c runtime for various reasons, maybe
> >> ) because on some level I still expect to be using my code on raw
> >> hardware
> >> ) again or such.
> >> )
> >> ) anyways, bubble sort is fairly easy to write and often only involves a
> >> few
> >> ) extra variables...
> >> ) if performance demands I guess I could use something like quicksort or
> >> ) such...
> >>
> >> The last time I wrote a huffman (which was 15 years ago, granted) I used
> >> a
> >> heap (priority queue) instead of sorting, which was a major speedup.
> >>
> > I went and looked up heap sort...
> >
> > however, I just did a little checking, and as I understand it heap sort
> > would probably take far more operations for small data sets than bubble
> > sort, but would win in that as data sets become larger the number of steps
> > required for bubble sort would go up much faster?...
> >
> > as far as I understand it, the algo goes something like:
> > for(i=0; i<(PDHUFF_MAXCODES-1); i++)
> > {
> > b=-1;
> > c=-1;
> > for(j=i; j<PDHUFF_MAXCODES; j++)
> > {
> > if(stat[j]>c)
> > {
> > b=j;
> > c=stat[j];
> > }
> > }
> >
> > t=stat[i];
> > stat[i]=stat[b];
> > stat[b]=t;
> >
> > t=num[i];
> > num[i]=num[b];
> > num[b]=t;
> > }
> >
> > which for the statistics seems to take far more operations. for 4096 items
> > (many of which are 0 however), this requires about 8.4M operations
> > (measured from the innermost loop).
> > wheras, bubble sort takes about 32 passes for the data, or about 262k
> > operations.
> > this is looking pretty bad here.
> >
> > quicksort, however, would take about 49k operations afaik, making it a
> > winner here (just, for quicksort, I would need to allocate some memory and
> > have a seperate, recursive function, which is a hassle).
> >
> > with a set of 4096 random values, buble sort still has about half the
> > total operations. boosting it to 16384, it leans more in bubble sort's
> > favor (by a large margin), similar for smaller sets.
> >
> > (just as a misc test, I added ripple sort, which is ammusingly enough
> > getting lower operation counts for whatever reason than bubble sort, but
> > not signifigantly...).
> >
> > to me bubble sort still seems like a more sensible option.
> >
> > I don't really get it...
> >
> adding to the confusion, I added a timer, and even though heap sort takes
> about twice the operations, it takes somewhat less time (possibly related to
> the operations being simpler, eg, single loop iterations vs. swaps).
>
> bubble sort takes a little more time, and ripple sort takes about twice the
> time of bubble sort even though it typically has a slightly lower operations
> count.
>
> comparatively, quicksort is the fastest by a few orders of magnitude.
There's always IntroSort. Typically better overall than quicksort.
I have a very fast version at http://www.michael-maniscalco.com
- Michael
|
|
0
|
|
|
|
Reply
|
michael
|
10/13/2004 9:09:15 PM
|
|
michael@michael-maniscalco.com (michael) writes:
> There's always IntroSort. Typically better overall than quicksort.
> I have a very fast version at http://www.michael-maniscalco.com
Sounds interesting. It's a shame that your web page doesn't want to
display anything apart from your name in my browser. I already knew
your name, so I learnt nothing.
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/13/2004 9:45:34 PM
|
|
"michael" <michael@michael-maniscalco.com> wrote in message
news:8da60968.0410131309.8ba3598@posting.google.com...
> "cr88192" <cr88192@remove.hotmail.com> wrote in message
> news:<Djbbd.20503$pM5.15480@news.flashnewsgroups.com>...
>> "cr88192" <cr88192@remove.hotmail.com> wrote in message
>> news:8zabd.20489$pM5.20071@news.flashnewsgroups.com...
>> >
>> > "Willem" <willem@stack.nl> wrote in message
>> > news:slrncmq5io.2s43.willem@toad.stack.nl...
>> >> cr88192 wrote:
>> >> )
>> >> ) "Brian Raiter" <blr@drizzle.com> wrote in message
>> >> ) news:ckhfrn$pr$1@drizzle.com...
>> >> )>> I will argue that it is not exactly a fast operation though
>> >> (possibly
>> >> )>> because I am using bubble sort,
>> >> )>
>> >> )> Bubble sort?! Why?? Does your religion forbid you from using the
>> >> )> qsort() function?
>> >> )>
>> >> ) I am not fammiliar with qsort.
>> >> ) also, I typically avoid using the c runtime for various reasons,
>> >> maybe
>> >> ) because on some level I still expect to be using my code on raw
>> >> hardware
>> >> ) again or such.
>> >> )
>> >> ) anyways, bubble sort is fairly easy to write and often only involves
>> >> a
>> >> few
>> >> ) extra variables...
>> >> ) if performance demands I guess I could use something like quicksort
>> >> or
>> >> ) such...
>> >>
>> >> The last time I wrote a huffman (which was 15 years ago, granted) I
>> >> used
>> >> a
>> >> heap (priority queue) instead of sorting, which was a major speedup.
>> >>
>> > I went and looked up heap sort...
>> >
>> > however, I just did a little checking, and as I understand it heap sort
>> > would probably take far more operations for small data sets than bubble
>> > sort, but would win in that as data sets become larger the number of
>> > steps
>> > required for bubble sort would go up much faster?...
>> >
>> > as far as I understand it, the algo goes something like:
>> > for(i=0; i<(PDHUFF_MAXCODES-1); i++)
>> > {
>> > b=-1;
>> > c=-1;
>> > for(j=i; j<PDHUFF_MAXCODES; j++)
>> > {
>> > if(stat[j]>c)
>> > {
>> > b=j;
>> > c=stat[j];
>> > }
>> > }
>> >
>> > t=stat[i];
>> > stat[i]=stat[b];
>> > stat[b]=t;
>> >
>> > t=num[i];
>> > num[i]=num[b];
>> > num[b]=t;
>> > }
>> >
>> > which for the statistics seems to take far more operations. for 4096
>> > items
>> > (many of which are 0 however), this requires about 8.4M operations
>> > (measured from the innermost loop).
>> > wheras, bubble sort takes about 32 passes for the data, or about 262k
>> > operations.
>> > this is looking pretty bad here.
>> >
>> > quicksort, however, would take about 49k operations afaik, making it a
>> > winner here (just, for quicksort, I would need to allocate some memory
>> > and
>> > have a seperate, recursive function, which is a hassle).
>> >
>> > with a set of 4096 random values, buble sort still has about half the
>> > total operations. boosting it to 16384, it leans more in bubble sort's
>> > favor (by a large margin), similar for smaller sets.
>> >
>> > (just as a misc test, I added ripple sort, which is ammusingly enough
>> > getting lower operation counts for whatever reason than bubble sort,
>> > but
>> > not signifigantly...).
>> >
>> > to me bubble sort still seems like a more sensible option.
>> >
>> > I don't really get it...
>> >
>> adding to the confusion, I added a timer, and even though heap sort takes
>> about twice the operations, it takes somewhat less time (possibly related
>> to
>> the operations being simpler, eg, single loop iterations vs. swaps).
>>
>> bubble sort takes a little more time, and ripple sort takes about twice
>> the
>> time of bubble sort even though it typically has a slightly lower
>> operations
>> count.
>>
>> comparatively, quicksort is the fastest by a few orders of magnitude.
>
> There's always IntroSort. Typically better overall than quicksort.
> I have a very fast version at http://www.michael-maniscalco.com
>
ok.
> - Michael
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/14/2004 2:27:29 AM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87k6tums7v.fsf@nonospaz.fatphil.org...
> "cr88192" <cr88192@remove.hotmail.com> writes:
>> the fragment I was using for the test:
>> void BubbleSort(int *stat, int cnt)
>> {
>> int i, j, k, b, c, t;
>>
>> c=1;
>> k=0;
>> while(c)
>> {
>> c=0;
>>
>> for(i=0; i<(cnt-1); i++)
>> if(stat[i]<stat[i+1])
>> {
>> c++;
>> k++;
>>
>> j=stat[i];
>> stat[i]=stat[i+1];
>> stat[i+1]=j;
>> }
>> for(i=(cnt-2); i>=0; i--)
>> if(stat[i]<stat[i+1])
>> {
>> c++;
>> k++;
>>
>> j=stat[i];
>> stat[i]=stat[i+1];
>> stat[i+1]=j;
>> }
>> }
>> printf("bubble sort %d ops\n", k);
>> }
>
> Ah, OK, this is a bidirectional bubble sort with an early abort.
> (if c=0 at the end of the outer loop, the array is sorted.)
>
yes.
>> putting values in reverse order is not that much of a problem for the
>> algo,
>> and in my experience it has not been too horribly slow in general.
>>
>> for 16384 values, it takes about 4000 passes.
>
> Yup, each pass is actually 2 passes, one forward, and one backward,
> and the expected number of passes is 16384*.5 = 8192. So you'd
> expect 4096 bidirectional passes.
>
yes.
>> > Due to the reasons I mention above, bubble-sort will have O(n^2)
>> > operations.
>>
>> worst case, but for the test I was using random values, which are more
>> likely to test "average" case.
>
> Ignoring the early-abort, and treating your outer loop as having 2
> passes, on average, one would expect n passes, with n^2/2 comparisons
> and n^2/2/2*3 moves (50% of comparisons cause 3 moves). Therefore
> there will be n^2*5/4 operations on average.
>
> However, n^2*100, n^2, and n^2/100 are all O(n^2).
>
> Look up "Big-Oh" notation (or "big-O"). It simply indicates how the
> function grows, but not a specific value.
>
> f(n) is O(g(n)) if there exists a C and an N such that for all n>N
> f(n)<=C.g(n)
>
>
> so 10^6*n^2 is O(n^2) as for C=10^-6, N=0 for all n>N f(n)<=C.g(n)
>
ok.
>> and, the sort I was understanding to be a bubble sort was talking about
>> half
>> as many total operations as what I understood to be a heap sort on
>> average.
>
> A heap is O(n.lg(n)). For 4000 elements, lg(n)=12 and you should be
> doing somewhere between 4000*12*2 and 4000*12*3 operations.
> This is much less than 4000*4000*5/4.
>
it is likely what I was thinking was heap sort was something different.
I was just using the power of the search engine to write things based on
rough description...
>> > Look at Paul Hseih's pages on sorting.
>> > http://www.azillionmonkeys.com/qed/sort.html
>> > Including source code.
>>
>> ok.
>>
>> my algos largely came from rough english descriptions of the algos,
>> except
>> for the bubble sort which came from my common practice, namely wherever I
>> needed a simple sort and had a few variables to spare.
>>
>> in other cases, I would often use an insertion or quicksort.
>> or, I had sometimes (but rarely) used the construction of a binary tree
>> as a
>> kind of sorting (I trace down to where the value would go possibly
>> splitting
>> leaves in some cases). afterwards the values would be sorted.
>> however, a big downside is that this requires allocating memory and
>> having
>> structs to represent the tree, which is a big hassle imo.
>
> Yeah, trees can be a hassle.
> Heaps are much simpler than trees.
>
ok.
sorting is one of those issues that started comming up eventually, where I
started realizing I knew nothing about sorting...
I was then like "oh well, whatever" and little changed.
I would just have my preferred set of algos, which I would choose based on
speed, effort to be excerted, nature of the data being sorted, ...
I don't know much about the "in general" case.
actually, at first I was unsure of quicksort, but quicksort works pretty
good even though one needs extra passes to figure out the median value and
similar...
>> but I am probably stupid anyways...
>
> Not at all. You're asking all the right questions.
> I wish I had a decent "all you wanted to know about sorting" URL
> for you. I guess you're at the mercy of google, I'm sure there's
> something out there. The above URL's OK if you already know the
> basics, which you seem to.
>
ok.
> Phil
> --
> They no longer do my traditional winks tournament lunch - liver and bacon.
> It's just what you need during a winks tournament lunchtime to replace
> lost
> ... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/14/2004 2:48:09 AM
|
|
"Brian Raiter" <blr@drizzle.com> wrote in message
news:ckk3hs$3oa$1@drizzle.com...
>>> anyways, bubble sort is fairly easy to write and often only
>>> involves a few extra variables... if performance demands I guess I
>>> could use something like quicksort or such...
>
> All I'm saying is that if your main reason for using bubble sort is
> not performance, but that it's fairly easy to write, then using
> qsort() is clearly the better choice.
>
but, qsort requires writing a seperate function for the comparrison
operator. by this point I may as well just write a seperate function to do
quicksort and calling that...
similarly, using qsort implies an unnecessary dependency on the c runtime.
imo qsort would make more sense if c had lambdas and I felt more comfortable
about runtime dependencies...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/14/2004 3:23:46 AM
|
|
"James D. Veale" <jdveale@TheWorld.com> wrote in message
news:ckk356$oj$1@pcls4.std.com...
> You might want to take a look at an Improved Comb Sort with pre-defined
> gap table at...
>
> http://world.std.com/~jdveale/combsort.htm
>
> This is a little slower than a heap sort, but enjoys much of the
> simplicity of a bubble sort.
>
ok.
> Jim Veale
>
> "cr88192" <cr88192@remove.hotmail.com> writes:
>>"Brian Raiter" <blr@drizzle.com> wrote in message
>>news:ckhfrn$pr$1@drizzle.com...
>>>> I will argue that it is not exactly a fast operation though (possibly
>>>> because I am using bubble sort,
>>>
>>> Bubble sort?! Why?? Does your religion forbid you from using the
>>> qsort() function?
>>>
>>I am not fammiliar with qsort.
>>also, I typically avoid using the c runtime for various reasons, maybe
>>because on some level I still expect to be using my code on raw hardware
>>again or such.
>
>>anyways, bubble sort is fairly easy to write and often only involves a few
>>extra variables...
>>if performance demands I guess I could use something like quicksort or
>>such...
>
>
>
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/14/2004 3:32:14 AM
|
|
"cr88192" <cr88192@remove.hotmail.com> writes:
[SNIP _99_ lines of now-unnecesary context up to _7_ quotes deep]
> > There's always IntroSort. Typically better overall than quicksort.
> > I have a very fast version at http://www.michael-maniscalco.com
>
> ok.
For just 3 characters.
Sheesh, learn to quote.
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/14/2004 8:46:57 AM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87k6ttlm9q.fsf@nonospaz.fatphil.org...
> "cr88192" <cr88192@remove.hotmail.com> writes:
>
> [SNIP _99_ lines of now-unnecesary context up to _7_ quotes deep]
>
>> > There's always IntroSort. Typically better overall than quicksort.
>> > I have a very fast version at http://www.michael-maniscalco.com
>>
>> ok.
>
> For just 3 characters.
>
> Sheesh, learn to quote.
>
people are seeming harsh recently.
oh well, nothing new here...
> Phil
> --
> They no longer do my traditional winks tournament lunch - liver and bacon.
> It's just what you need during a winks tournament lunchtime to replace
> lost
> ... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/14/2004 9:17:01 AM
|
|
Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in message news:<87wtxul2bl.fsf@nonospaz.fatphil.org>...
> michael@michael-maniscalco.com (michael) writes:
> > There's always IntroSort. Typically better overall than quicksort.
> > I have a very fast version at http://www.michael-maniscalco.com
>
> Sounds interesting. It's a shame that your web page doesn't want to
> display anything apart from your name in my browser. I already knew
> your name, so I learnt nothing.
>
> Phil
Hmm. The site is as plain vanilla as you can get to avoid these
problems. But I do use javascript to place the text etc. so perhaps
you have this disabled? Just a thought. It should work with
any browser but what browser are you using?
At any rate, just go right to the zip at
http://www.michael-maniscalco.com/introsort.zip
- Michael
|
|
0
|
|
|
|
Reply
|
michael
|
10/14/2004 2:22:19 PM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87brf6moox.fsf@nonospaz.fatphil.org...
> "Dale King" <dalewking@insightbb[dot]com.nospam> writes:
>
> > "Willem" <willem@stack.nl> wrote in message
> > news:slrncmq5io.2s43.willem@toad.stack.nl...
> > > cr88192 wrote:
> > >
> > > The last time I wrote a huffman (which was 15 years ago, granted) I
used a
> > > heap (priority queue) instead of sorting, which was a major speedup.
> >
> > That is a mistake. If you sort the weights, then the Huffman algorithm
can
> > be done in only O(n) time. J. van Leeuwen showed this in 1976 (article
not
> > on-line).
>
> But if you sort the weights using a comparison sort, then you're
> in O(n.lg(n)) time anyway.
I realize that overall it doesn't change the big O complexity (and meant to
say something about that in my post but I forgot), but as we both know big O
doesn't tell the whole story. This optimization reduces the number of steps,
but not the growth rate. In big-O terms you have reduced the constants. It
should always be a win over the standard Huffman algorithm implementation.
Of course this all assumes that the weights were unsorted. It may be that
previous steps could easily maintain the weights sorted without an explicit
sort.
> However, that's no reason to avoid replacing an O(n.lg(n)) component
> with an O(n) component if it's easily done. The scheme you describe
> sounds as if it should be within the grasp of anyone who understands
> merge sorts, for example. (The new nodes are being merged with the
> original ones, and as you say are pre-sorted by construction).
It is very simple. The issue is that it is not well known because the papers
that describe it are not available on-line anywhere. They are only in
proceedings from some conferences that is great for academics who have ready
access to such publications but stinks for the unwashed masses. I was
thinking this morning that I should put up a web page somewhere that
explains this optimized way to create Huffman trees so that it will be
available to everyone.
|
|
0
|
|
|
|
Reply
|
Dale
|
10/14/2004 2:56:37 PM
|
|
"Willem" <willem@stack.nl> wrote in message
news:slrncmquqm.4d6.willem@toad.stack.nl...
> Dale wrote:
> )> The last time I wrote a huffman (which was 15 years ago, granted) I
used a
> )> heap (priority queue) instead of sorting, which was a major speedup.
> )
> ) That is a mistake. If you sort the weights, then the Huffman algorithm
can
> ) be done in only O(n) time. J. van Leeuwen showed this in 1976 (article
not
> ) on-line).
>
> The only article I had to go on at that time was two pages in Dr.Dobbs
> magazine that a friend gave me with the words: "Maybe you want to use this
> for your project." (Note that the project was actually a 'demo' on a
C=64.)
>
> That, and the 'heap' structure that was taught in programming 101, lead to
> how my algorithm worked. Ah, the olden days...
The idea of using a FIFO for the root nodes is not much more complicated
than the just heap implementation. It really should replace the old way of
explaining how to do Huffman.
> After that, I learned about arith coding, and never got back to huffman.
Huffman definitely still has its use. When choosing an algorithm there are
several factors that have to be weighed against your requirements. Some
advantages to Huffman are:
- Faster to decompress
- Faster to compress
- Decompression code is very small
- Decompression only requires a small O(1) amount of working space.
In my case I used it as part of a compressor for text strings for an
embedded system with an 8 bit micro where the compressed data and the
decompressor both went into ROM and I had little RAM to work with for
decompression.
> I'll read the paper, seems quite interesting.
The in-place algorithm is quite interesting and quite clever. Definitely
worthwhile read.
|
|
0
|
|
|
|
Reply
|
Dale
|
10/14/2004 3:11:25 PM
|
|
michael@michael-maniscalco.com (michael) writes:
> Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in message news:<87wtxul2bl.fsf@nonospaz.fatphil.org>...
> > michael@michael-maniscalco.com (michael) writes:
> > > There's always IntroSort. Typically better overall than quicksort.
> > > I have a very fast version at http://www.michael-maniscalco.com
> >
> > Sounds interesting. It's a shame that your web page doesn't want to
> > display anything apart from your name in my browser. I already knew
> > your name, so I learnt nothing.
> >
> > Phil
>
> Hmm. The site is as plain vanilla as you can get to avoid these
> problems. But I do use javascript to place the text etc. so perhaps
> you have this disabled?
Disabled? I use a browser that doesn't even support Javascript!
> Just a thought. It should work with
> any browser but what browser are you using?
Depends where I am. Usually w3m.
> At any rate, just go right to the zip at
> http://www.michael-maniscalco.com/introsort.zip
Much obliged.
Looks like a pretty good balance between the algorithms.
Why "Intro" sort?
(And why manage your own quicksort stack - why not let
the call stack do that?)
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/14/2004 4:06:08 PM
|
|
"Dale King" <dalewking@insightbb[dot]com.nospam> wrote in message
news:xGwbd.257411$3l3.37592@attbi_s03...
> "Willem" <willem@stack.nl> wrote in message
> news:slrncmquqm.4d6.willem@toad.stack.nl...
>
>> After that, I learned about arith coding, and never got back to huffman.
>
> Huffman definitely still has its use. When choosing an algorithm there are
> several factors that have to be weighed against your requirements. Some
> advantages to Huffman are:
>
> - Faster to decompress
> - Faster to compress
> - Decompression code is very small
> - Decompression only requires a small O(1) amount of working space.
>
in my case it is also a lot easier to get working...
maybe I am just stupid, but I found getting an arithmatic coder to work to
be somewhat harder than a huffman coder (due largely to seemingly trivial
math bugs).
maybe this is different in different approaches to arith coding, but then
patents seem to be running rampant...
I am unsure, just to me arith coding seems like it might be fairly sensitive
to minor variations which could make compatibility between different
implementations of an algo a little bit harder to achieve (eg, because both
ends need to keep in sync pretty well, minor things like round off or
variations in the handling of statistics could break things fairly
easily...).
or something...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/14/2004 5:08:33 PM
|
|
>> Sheesh, learn to quote.
>
> people are seeming harsh recently.
> oh well, nothing new here...
Please understand -- it gets a little frustrating for us having to
train every single newcomer in basic netiquette, over and over and
over. We get a little exasperated at times, simply because the job is
neverending and frequently thankless.
That said, your observation about people seeming harsh would have
carried more weight if you had done a better job of quoting. There's
really no need to include *other* people's signature blocks in your
post. Especially when they're larger than your own content -- I almost
didn't see your reply; it was buried in the quoted text.
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/14/2004 9:23:08 PM
|
|
> but, qsort requires writing a seperate function for the comparrison
> operator. by this point I may as well just write a seperate function to do
> quicksort and calling that...
As long as we're talking about programmer time, that just doesn't add
up. The vast majority of callback functions are a single line of
code. I don't think I've ever written one that was more than three
lines long.
static int compare(void const *a, void const *b)
{
return *(int const*)a - *(int const*)b;
}
qsort(intarray, sizeof intarray / sizeof *intarray,
sizeof *intarray, compare);
This took me less than a minute to type, including the time to consult
the manpage to check the order of the parameters.
> similarly, using qsort implies an unnecessary dependency on the c
> runtime.
I suppose that would make sense if you were working on production code
for a 8-bit embedded processor. But even during the development phase,
it seems like you would be better off using qsort(), and then
replacing that with your own sort routine once you had working code.
> imo qsort would make more sense if c had lambdas and I felt more
> comfortable about runtime dependencies...
That's as may be, but don't chop off your nose to spite your face.
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/14/2004 9:35:44 PM
|
|
"Brian Raiter" <blr@drizzle.com> wrote in message
news:ckmqns$ibl$1@drizzle.com...
>>> Sheesh, learn to quote.
>>
>> people are seeming harsh recently.
>> oh well, nothing new here...
>
> Please understand -- it gets a little frustrating for us having to
> train every single newcomer in basic netiquette, over and over and
> over. We get a little exasperated at times, simply because the job is
> neverending and frequently thankless.
>
> That said, your observation about people seeming harsh would have
> carried more weight if you had done a better job of quoting. There's
> really no need to include *other* people's signature blocks in your
> post. Especially when they're larger than your own content -- I almost
> didn't see your reply; it was buried in the quoted text.
>
hmm...
it is a mystery if how much to leave though...
but hell, at least I have apparently gone from the status of "troll" to
"poor netiquette".
other issues are rambling and going off onto different subjects
altogether...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/15/2004 2:41:48 AM
|
|
"Brian Raiter" <blr@drizzle.com> wrote in message
news:ckmrfg$4cg$1@drizzle.com...
>> but, qsort requires writing a seperate function for the comparrison
>> operator. by this point I may as well just write a seperate function to
>> do
>> quicksort and calling that...
>
> As long as we're talking about programmer time, that just doesn't add
> up. The vast majority of callback functions are a single line of
> code. I don't think I've ever written one that was more than three
> lines long.
>
> static int compare(void const *a, void const *b)
> {
> return *(int const*)a - *(int const*)b;
> }
>
> qsort(intarray, sizeof intarray / sizeof *intarray,
> sizeof *intarray, compare);
>
> This took me less than a minute to type, including the time to consult
> the manpage to check the order of the parameters.
>
hmm, but there is not that much difference.
maybe a minute to type the stub function, or maybe a few minutes to type a
version of quicksort.
I did that for my project, but it didn't work immediately as I hadn't really
noticed that the one I wrote put things in the wrong order (ascending order,
but I had meant descending). oh well, trivial to fix once I noticed.
I wrote the quicksort as I was getting a little tired about arguing why I
was using bubble sort. ok, bubble sort is still used for sorting the leaf
values, but hell, there aren't that many values (eg: <16 is common). now,
quicksort is used for the main statistics table.
>> similarly, using qsort implies an unnecessary dependency on the c
>> runtime.
>
> I suppose that would make sense if you were working on production code
> for a 8-bit embedded processor. But even during the development phase,
> it seems like you would be better off using qsort(), and then
> replacing that with your own sort routine once you had working code.
>
maybe.
I typically write working code right off.
hmm, maybe this is part of where that 130kloc came from?...
apart from occasional pruning and such I would probably have more (dunno
exactly).
I sort of miss os coding, but then there is the problem that with an os, no
one gives a crap either. oh well, so I was forced to userspace, that doesn't
mean I have to be a usual userspace coder, using api functions and 3rd party
libraries to such a degree that it makes building crap difficult apart from
using configure, which either just breaks or tells you that it wont build
without all these other libraries. crap, can't people even do their own
widget drawing?...
oh well, then I just get kinds of funky stuff, odd, gl based ui's, and I
have to have a stupid gl dependency as long as I want to use gl (I have
tried, wrapping gl is tiring and hinders performance). oh well, if I even
went back to os coding I would have to drop all my gl stuff.
could use sw rendering, but sw rendering is a pain and looks crappy if
performance is to not suck.
that was another thing that forced me back into userspace: hw accell.
vesa and similar just aren't the same...
>> imo qsort would make more sense if c had lambdas and I felt more
>> comfortable about runtime dependencies...
>
> That's as may be, but don't chop off your nose to spite your face.
>
?...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/15/2004 3:11:44 AM
|
|
"cr88192" <cr88192@remove.hotmail.com> writes:
> > That said, your observation about people seeming harsh would have
> > carried more weight if you had done a better job of quoting. There's
> > really no need to include *other* people's signature blocks in your
> > post. Especially when they're larger than your own content -- I almost
> > didn't see your reply; it was buried in the quoted text.
> >
> hmm...
>
> it is a mystery if how much to leave though...
> but hell, at least I have apparently gone from the status of "troll" to
> "poor netiquette".
>
> other issues are rambling and going off onto different subjects
> altogether...
Now you're learning what entropy is, this is comp.compression after all!
:-D
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/15/2004 6:30:52 AM
|
|
"cr88192" <cr88192@remove.hotmail.com> wrote in message news:<MNGbd.17$ZV5.14@news.flashnewsgroups.com>...
> it is a mystery if how much to leave though...
It's no mystery, you just quote what you are actually replying to --
like I just did.
> but hell, at least I have apparently gone from the status of "troll" to
> "poor netiquette".
True. But come on, you've been reading and posting here for over two
months, you should know this already.
> other issues are rambling and going off onto different subjects
> altogether...
Well, we can't help you with that :-)
|
|
0
|
|
|
|
Reply
|
trixter
|
10/15/2004 7:27:15 AM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87is9ch4rn.fsf@nonospaz.fatphil.org...
> "cr88192" <cr88192@remove.hotmail.com> writes:
>> hmm...
>>
>> it is a mystery if how much to leave though...
>> but hell, at least I have apparently gone from the status of "troll" to
>> "poor netiquette".
>>
>> other issues are rambling and going off onto different subjects
>> altogether...
>
>
> Now you're learning what entropy is, this is comp.compression after all!
> ok.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/15/2004 7:41:29 AM
|
|
"Jim Leonard" <trixter@despammed.com> wrote in message
news:f93aad69.0410142327.dce6af3@posting.google.com...
> "cr88192" <cr88192@remove.hotmail.com> wrote in message
> news:<MNGbd.17$ZV5.14@news.flashnewsgroups.com>...
>> it is a mystery if how much to leave though...
>
> It's no mystery, you just quote what you are actually replying to --
> like I just did.
>
hmm, retaining only a single level may lose too much context in many cases.
I guess too much context could be annoying though.
>> but hell, at least I have apparently gone from the status of "troll" to
>> "poor netiquette".
>
> True. But come on, you've been reading and posting here for over two
> months, you should know this already.
>
I have been using usenet for a lot longer.
everything is not so clear.
it is like, knowing how often to post, doing posts on topic for a specific
group, avoiding going off on tangents somewhat irrelevant in a given group,
....
>> other issues are rambling and going off onto different subjects
>> altogether...
>
> Well, we can't help you with that :-)
ok.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/15/2004 7:46:46 AM
|
|
"cr88192" <cr88192@remove.hotmail.com> writes:
> I have been using usenet for a lot longer.
> everything is not so clear.
Here at comp.compression we pride ourselves in doing
things cleanly, compactly, and correctly.
The rest of usenet can be whatever chaotic mess it
choses to be.
Some groups actually _prefer_ top posting, I've been
told. I wouldn't be seen dead in such a place myself.
Phil, straightening up his bow tie.
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/15/2004 9:01:14 AM
|
|
"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87y8i8fj8l.fsf@nonospaz.fatphil.org...
> "cr88192" <cr88192@remove.hotmail.com> writes:
>> I have been using usenet for a lot longer.
>> everything is not so clear.
>
> Here at comp.compression we pride ourselves in doing
> things cleanly, compactly, and correctly.
>
> The rest of usenet can be whatever chaotic mess it
> choses to be.
>
ok.
> Some groups actually _prefer_ top posting, I've been
> told. I wouldn't be seen dead in such a place myself.
>
dunno.
I did a simple adaptive huffman-like encoder.
performance seems acceptable, it is somewhat faster than the older
arithmatic encoder.
I haven't done any real speed testing, so I don't know how fast it is
exactly, or how it compares with the static huffman encoder I wrote
previously...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/15/2004 11:01:00 AM
|
|
Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in message news:<87acupjndb.fsf@nonospaz.fatphil.org>...
> michael@michael-maniscalco.com (michael) writes:
> > Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in message news:<87wtxul2bl.fsf@nonospaz.fatphil.org>...
> > > michael@michael-maniscalco.com (michael) writes:
> > > > There's always IntroSort. Typically better overall than quicksort.
> > > > I have a very fast version at http://www.michael-maniscalco.com
> > >
> > > Sounds interesting. It's a shame that your web page doesn't want to
> > > display anything apart from your name in my browser. I already knew
> > > your name, so I learnt nothing.
> > >
> > > Phil
> >
> > Hmm. The site is as plain vanilla as you can get to avoid these
> > problems. But I do use javascript to place the text etc. so perhaps
> > you have this disabled?
>
> Disabled? I use a browser that doesn't even support Javascript!
>
Ah. Well, that's the issue then. I was just lazy (or efficient, take
your
pick) and wrote a couple of JS functions to place the headers, links,
topic descriptions on the page. So your browser is obviously not
issuing
the writelns.
> > Just a thought. It should work with
> > any browser but what browser are you using?
>
> Depends where I am. Usually w3m.
>
> > At any rate, just go right to the zip at
> > http://www.michael-maniscalco.com/introsort.zip
>
> Much obliged.
>
> Looks like a pretty good balance between the algorithms.
> Why "Intro" sort?
Introspective sort. The page has a link to the paper by
David R. Musser at:
http://citeseer.ist.psu.edu/cache/papers/cs/27697/http:zSzzSzwww.cs.rpi.eduzSz~musserzSzgpzSzintrosort.pdf/musser97introspective.pdf
> (And why manage your own quicksort stack - why not let
> the call stack do that?)
More efficient to only recurse on one half of the partition.
- Michael
>
> Phil
|
|
0
|
|
|
|
Reply
|
TheThinker2004
|
10/15/2004 2:28:10 PM
|
|
Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in message news:<87y8i8fj8l.fsf@nonospaz.fatphil.org>...
> "cr88192" <cr88192@remove.hotmail.com> writes:
> > I have been using usenet for a lot longer.
> > everything is not so clear.
>
> Here at comp.compression we pride ourselves in doing
> things cleanly, compactly, and correctly.
>
> The rest of usenet can be whatever chaotic mess it
> choses to be.
>
> Some groups actually _prefer_ top posting, I've been
> told. I wouldn't be seen dead in such a place myself.
>
>
> Phil, straightening up his bow tie.
I tend to top post when the response is to a large post
which is not specifically focused. Just so I don't have
to snip and so that my response is easy to find and read.
But that's just me.
Michael, noticing a pizza stain on his t-shirt. :^)
- Michael
|
|
0
|
|
|
|
Reply
|
TheThinker2004
|
10/15/2004 2:34:12 PM
|
|
TheThinker2004@yahoo.com (Thinker2004) writes:
> Introspective sort. The page has a link to the paper by
> David R. Musser at:
>
> http://citeseer.ist.psu.edu/cache/papers/cs/27697/http:zSzzSzwww.cs.rpi.eduzSz~musserzSzgpzSzintrosort.pdf/musser97introspective.pdf
Merci.
> > (And why manage your own quicksort stack - why not let
> > the call stack do that?)
>
> More efficient to only recurse on one half of the partition.
By how much? The stack frame is always in the cache, and the
return is more predictable than a loop condition, after all.
Oh, and have you never met a compiler that loops tail-recursions?
Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
|
|
0
|
|
|
|
Reply
|
Phil
|
10/15/2004 3:18:46 PM
|
|
Phil wrote:
)> > (And why manage your own quicksort stack - why not let
)> > the call stack do that?)
)>
)> More efficient to only recurse on one half of the partition.
)
) By how much? The stack frame is always in the cache, and the
) return is more predictable than a loop condition, after all.
)
) Oh, and have you never met a compiler that loops tail-recursions?
For that matter, you can very easily tail-recurse by hand.
Just set the appropriate vars and 'goto'.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
Willem
|
10/15/2004 7:17:35 PM
|
|
> I tend to top post when the response is to a large post
> which is not specifically focused.
But then, why quote anything of the post at all?
The point of quoting is to give immediate context for your words. If
you're not directly responding to a particular question or sentence,
then the quoted lines aren't supplying important context. So why
repeat it?
If people need to read the prior post for context, they can always go
and read it themselves. Unless you're responding to something that is
months old, the original post is still there.
> Just so I don't have to snip and so that my response is easy to find
> and read.
Don't forget that your post is going to be replicated thousands and
thousands of times. The ten seconds it takes to snip will save
megabytes of bandwidth in the end. That's why trimming is considered
good netiquette. Spend that bandwidth on conversation, not detritus.
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/15/2004 7:43:41 PM
|
|
Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in message news:<87mzyof1rd.fsf@nonospaz.fatphil.org>...
> TheThinker2004@yahoo.com (Thinker2004) writes:
> > Introspective sort. The page has a link to the paper by
> > David R. Musser at:
> >
> > http://citeseer.ist.psu.edu/cache/papers/cs/27697/http:zSzzSzwww.cs.rpi.eduzSz~musserzSzgpzSzintrosort.pdf/musser97introspective.pdf
>
> Merci.
>
> > > (And why manage your own quicksort stack - why not let
> > > the call stack do that?)
> >
> > More efficient to only recurse on one half of the partition.
>
> By how much?
Does it matter? Better is better. And in this case, it didn't
really add much complexity at all. It's probably a matter of
style really. At what point does adding complexity become more
of an issue than the efficiency it adds.
> The stack frame is always in the cache, and the
> return is more predictable than a loop condition, after all.
>
The pointers used (which would be pushed onto the stack) are also
on the call stack as local variables, but only once.
And a tight loop is more efficient than a bunch of push/pop and returns.
In this case, given that I know the max recursion depth before switching to
heap sort, there is a max depth of the stack. So why not use a
fixed size stack rather than pushing and popping from the call stack?
This is not a matter of massive savings overall but, this is an
optimized sort routine.
In this case, this sort is one of the core sorts in my suffix sorting
algorithm. And it is called very often, so it is as efficient as I
could make it.
- Michael
> Oh, and have you never met a compiler that loops tail-recursions?
>
> Phil
|
|
0
|
|
|
|
Reply
|
michael
|
10/17/2004 6:09:00 PM
|
|
blr@drizzle.com (Brian Raiter) wrote in message news:<ckp99d$9uh$1@drizzle.com>...
> > I tend to top post when the response is to a large post
> > which is not specifically focused.
>
> But then, why quote anything of the post at all?
>
> The point of quoting is to give immediate context for your words. If
> you're not directly responding to a particular question or sentence,
> then the quoted lines aren't supplying important context. So why
> repeat it?
>
> If people need to read the prior post for context, they can always go
> and read it themselves. Unless you're responding to something that is
> months old, the original post is still there.
>
> > Just so I don't have to snip and so that my response is easy to find
> > and read.
>
> Don't forget that your post is going to be replicated thousands and
> thousands of times. The ten seconds it takes to snip will save
> megabytes of bandwidth in the end. That's why trimming is considered
> good netiquette. Spend that bandwidth on conversation, not detritus.
>
> b
I tend to not snip on other news groups. Mostly the political ones.
In those cases, a lot of people are all over the map with their
responses. So you can't just snip since they never reach a point.
And you must provide context, else they get all offended and such.
Different groups, different rules ... I guess.
As far as bandwidth goes, any basic LZ will limit the burden of
duplicating large blocks of English text. Perhaps not on the first
page, but on the second etc. And I believe that is build into
every modem.
Besides, I have a fast connection so pffffft to that. :^)
- Michael
|
|
0
|
|
|
|
Reply
|
michael
|
10/18/2004 3:10:09 PM
|
|
"michael" <michael@michael-maniscalco.com> wrote in message
news:8da60968.0410180710.3b421182@posting.google.com...
> blr@drizzle.com (Brian Raiter) wrote in message
> news:<ckp99d$9uh$1@drizzle.com>...
>> > I tend to top post when the response is to a large post
>> > which is not specifically focused.
>>
>> But then, why quote anything of the post at all?
>>
>> The point of quoting is to give immediate context for your words. If
>> you're not directly responding to a particular question or sentence,
>> then the quoted lines aren't supplying important context. So why
>> repeat it?
>>
>> If people need to read the prior post for context, they can always go
>> and read it themselves. Unless you're responding to something that is
>> months old, the original post is still there.
>>
>> > Just so I don't have to snip and so that my response is easy to find
>> > and read.
>>
>> Don't forget that your post is going to be replicated thousands and
>> thousands of times. The ten seconds it takes to snip will save
>> megabytes of bandwidth in the end. That's why trimming is considered
>> good netiquette. Spend that bandwidth on conversation, not detritus.
>>
>> b
>
> I tend to not snip on other news groups. Mostly the political ones.
> In those cases, a lot of people are all over the map with their
> responses. So you can't just snip since they never reach a point.
> And you must provide context, else they get all offended and such.
> Different groups, different rules ... I guess.
>
> As far as bandwidth goes, any basic LZ will limit the burden of
> duplicating large blocks of English text. Perhaps not on the first
> page, but on the second etc. And I believe that is build into
> every modem.
>
> Besides, I have a fast connection so pffffft to that. :^)
>
sometimes I wish outlook would cache up to a few messages ahead to make
reading of large threads a little less annoying (eliminating those minor
1/2s delays or such when clicking on messages...).
connecting
authorizing
downloading message 'whatever'
disconnecting
and maybe it could try connecting a few times before making annoying error
popups ("can not connect, connection reset by peer, ... bunch of irrelevant
crap...").
at least I have my timeout set fairly small so on the occasions when it
times out at least it does it quickly so I can try again (as opposed to
being stuck there for 30s or whatever...).
but this is more an outlook issue than a message issue...
oh yeah, and also the local cable and dsl speed is 500kbps and 384kbps max
(lame, it was a lot faster where I was before). also, the phone company is
incompetent and was unable to reconnect the landline, forcing my parents
just to get a cell phone. the cable people took their time, but at least
came around eventually, and when they did suceeded in hooking up the cable
(yes, there is a difference, cable is above ground and a bus network, and
the phone is a bit different, but if the phone were like cable at least the
people could manage to fix it...).
or something...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/19/2004 1:20:40 AM
|
|
> Besides, I have a fast connection so pffffft to that. :^)
Very funny. It's not YOUR bandwidth I'm asking you to be considerate of.
And yes, there are local variants on posting habits, but almost any
netiquette guide will encourage you to not waste bandwidth
egregiously.
b
|
|
0
|
|
|
|
Reply
|
blr
|
10/19/2004 4:50:14 PM
|
|
blr@drizzle.com (Brian Raiter) wrote in message news:<cl3gk6$uqt$1@drizzle.com>...
> > Besides, I have a fast connection so pffffft to that. :^)
>
> Very funny. It's not YOUR bandwidth I'm asking you to be considerate of.
>
> And yes, there are local variants on posting habits, but almost any
> netiquette guide will encourage you to not waste bandwidth
> egregiously.
>
> b
Oh, come on. It's just a joke.
- M
|
|
0
|
|
|
|
Reply
|
michael
|
10/20/2004 1:29:17 PM
|
|
|
65 Replies
176 Views
(page loaded in 9.705 seconds)
Similiar Articles:7/19/2012 6:08:39 PM
|