Hi Folks,
I'm doing some work where I store compressed program object code in
memory and decompress it as it's pulled into a computers instruction
cache. I need an efficient decompression technique to avoid degrading
the execution speed of the program being run too much. For this reason
I'd like to locate information about the decode efficiency/speed of
various adaptive dictionary-based algorithms, with a view towards
implementing them in hardware.
I've been unable to find information about hardware implementations of
lz77/lzw decompression, but have been told that lzw can be very slow to
run. Can anyone point me towards information on efficient hardware
decompression of lz77/lzw coded data, or indeed suggest alternative
algorithms that lend themselves to efficient hardware decoding?
Many thanks,
John Gilbert
|
|
0
|
|
|
|
Reply
|
gilbertj (1)
|
2/8/2006 11:13:09 AM |
|
John Gilbert wrote:
> I've been unable to find information about hardware implementations of
> lz77/lzw decompression, but have been told that lzw can be very slow to
> run. Can anyone point me towards information on efficient hardware
> decompression of lz77/lzw coded data, or indeed suggest alternative
> algorithms that lend themselves to efficient hardware decoding?
(BTW, you should post your question in comp.dsp or some other embedded
systems newsgroup for additional opinions)
In terms of decoding efficiency, LZ77 is fundamentally lower-resource
than LZ78 (LZW, etc.), but as with all things it depends on the
hardware. You didn't specify your hardware, so I can only speak
generally. Also note that while I'm going to describe LZ77, I don't
think it's the best choice for your implemenation (more later)
LZ77's best general-purpose implementation is LZSS, where you code
literals with a single bit. The general idea is this (assuming a
machineword size of 8 bits/1 byte):
Read bit
if set, read next eight bits from source and store to destination
if unset, read length of match, then offset of match from current
destination position and copy to destination
loop until no more input bits
Bitstream handling can be a pain if your instruction set doesn't grok
bits well, so you can group the literal bits into machineword-sized
chunks (byte, word, etc.) and this will allow you to machineword-align
the literals and the length/offset pairs.
All this being said, LZ77 is most likely NOT appropriate for a
continuous stream of data. This is because a sliding window needs to
be maintained to the previous decompression stream for the data; it
also operates in fundamentally larger "chunks" so buffering is
necessary.
LZW, Terry Welch's extension to LZ78, is probably what you're looking
for even though it will require bitstream handling. It acts as a true
"black box": You input uncompressed symbols at one end, and compressed
symbols come out the other end, and the decompressor reverses the same,
and no large window is necessary (you will need memory for the
dictionary but that rarely tops 4K). Welch designed LZW specifically
for embedded environments where streaming compression/decompression
needed to be part of the spec.
LZSS is faster, but LZW is what you're looking for. Hope that helps.
|
|
0
|
|
|
|
Reply
|
Jim
|
2/8/2006 6:49:10 PM
|
|
John Gilbert wrote:
> I've been unable to find information about hardware implementations of
> lz77/lzw decompression, but have been told that lzw can be very slow to
> run. Can anyone point me towards information on efficient hardware
> decompression of lz77/lzw coded data, or indeed suggest alternative
> algorithms that lend themselves to efficient hardware decoding?
(BTW, you should post your question in comp.dsp or some other embedded
systems newsgroup for additional opinions)
In terms of decoding efficiency, LZ77 is fundamentally lower-resource
than LZ78 (LZW, etc.), but as with all things it depends on the
hardware. You didn't specify your hardware, so I can only speak
generally. Also note that while I'm going to describe LZ77, I don't
think it's the best choice for your implemenation (more later)
LZ77's best general-purpose implementation is LZSS, where you code
literals with a single bit. The general idea is this (assuming a
machineword size of 8 bits/1 byte):
Read bit
if set, read next eight bits from source and store to destination
if unset, read length of match, then offset of match from current
destination position and copy to destination
loop until no more input bits
Bitstream handling can be a pain if your instruction set doesn't grok
bits well, so you can group the literal bits into machineword-sized
chunks (byte, word, etc.) and this will allow you to machineword-align
the literals and the length/offset pairs.
All this being said, LZ77 is most likely NOT appropriate for a
continuous stream of data. This is because a sliding window needs to
be maintained to the previous decompression stream for the data; it
also operates in fundamentally larger "chunks" so buffering is
necessary.
LZW, Terry Welch's extension to LZ78, is probably what you're looking
for even though it will require bitstream handling. It acts as a true
"black box": You input uncompressed symbols at one end, and compressed
symbols come out the other end, and the decompressor reverses the same,
and no large window is necessary (you will need memory for the
dictionary but that rarely tops 4K). Welch designed LZW specifically
for embedded environments where streaming compression/decompression
needed to be part of the spec.
LZSS is faster, but LZW is what you're looking for. Hope that helps.
|
|
0
|
|
|
|
Reply
|
Jim
|
2/8/2006 7:03:30 PM
|
|
John Gilbert wrote:
> I've been unable to find information about hardware implementations of
> lz77/lzw decompression, but have been told that lzw can be very slow to
> run. Can anyone point me towards information on efficient hardware
> decompression of lz77/lzw coded data, or indeed suggest alternative
> algorithms that lend themselves to efficient hardware decoding?
(BTW, you should post your question in comp.dsp or some other embedded
systems newsgroup for additional opinions)
In terms of decoding efficiency, LZ77 is fundamentally lower-resource
than LZ78 (LZW, etc.), but as with all things it depends on the
hardware. You didn't specify your hardware, so I can only speak
generally. Also note that while I'm going to describe LZ77, I don't
think it's the best choice for your implemenation (more later)
LZ77's best general-purpose implementation is LZSS, where you code
literals with a single bit. The general idea is this (assuming a
machineword size of 8 bits/1 byte):
Read bit
if set, read next eight bits from source and store to destination
if unset, read length of match, then offset of match from current
destination position and copy to destination
loop until no more input bits
Bitstream handling can be a pain if your instruction set doesn't grok
bits well, so you can group the literal bits into machineword-sized
chunks (byte, word, etc.) and this will allow you to machineword-align
the literals and the length/offset pairs.
All this being said, LZ77 is most likely NOT appropriate for a
continuous stream of data. This is because a sliding window needs to
be maintained to the previous decompression stream for the data; it
also operates in fundamentally larger "chunks" so buffering is
necessary.
LZW, Terry Welch's extension to LZ78, is probably what you're looking
for even though it will require bitstream handling. It acts as a true
"black box": You input uncompressed symbols at one end, and compressed
symbols come out the other end, and the decompressor reverses the same,
and no large window is necessary (you will need memory for the
dictionary but that rarely tops 4K). Welch designed LZW specifically
for embedded environments where streaming compression/decompression
needed to be part of the spec.
LZSS is faster, but LZW is what you're looking for. Hope that helps.
|
|
0
|
|
|
|
Reply
|
Jim
|
2/8/2006 7:04:58 PM
|
|
John Gilbert wrote:
> I've been unable to find information about hardware implementations of
> lz77/lzw decompression, but have been told that lzw can be very slow to
> run. Can anyone point me towards information on efficient hardware
> decompression of lz77/lzw coded data, or indeed suggest alternative
> algorithms that lend themselves to efficient hardware decoding?
(BTW, you should post your question in comp.dsp or some other embedded
systems newsgroup for additional opinions)
In terms of decoding efficiency, LZ77 is fundamentally lower-resource
than LZ78 (LZW, etc.), but as with all things it depends on the
hardware. You didn't specify your hardware, so I can only speak
generally. Also note that while I'm going to describe LZ77, I don't
think it's the best choice for your implemenation (more later)
LZ77's best general-purpose implementation is LZSS, where you code
literals with a single bit. The general idea is this (assuming a
machineword size of 8 bits/1 byte):
Read bit
if set, read next eight bits from source and store to destination
if unset, read length of match, then offset of match from current
destination position and copy to destination
loop until no more input bits
Bitstream handling can be a pain if your instruction set doesn't grok
bits well, so you can group the literal bits into machineword-sized
chunks (byte, word, etc.) and this will allow you to machineword-align
the literals and the length/offset pairs.
All this being said, LZ77 is most likely NOT appropriate for a
continuous stream of data. This is because a sliding window needs to
be maintained to the previous decompression stream for the data; it
also operates in fundamentally larger "chunks" so buffering is
necessary.
LZW, Terry Welch's extension to LZ78, is probably what you're looking
for even though it will require bitstream handling. It acts as a true
"black box": You input uncompressed symbols at one end, and compressed
symbols come out the other end, and the decompressor reverses the same,
and no large window is necessary (you will need memory for the
dictionary but that rarely tops 4K). Welch designed LZW specifically
for embedded environments where streaming compression/decompression
needed to be part of the spec.
LZSS is faster, but LZW is what you're looking for. Hope that helps.
|
|
0
|
|
|
|
Reply
|
Jim
|
2/8/2006 7:07:31 PM
|
|
John Gilbert wrote:
> I've been unable to find information about hardware implementations of
> lz77/lzw decompression, but have been told that lzw can be very slow to
> run. Can anyone point me towards information on efficient hardware
> decompression of lz77/lzw coded data, or indeed suggest alternative
> algorithms that lend themselves to efficient hardware decoding?
(BTW, you should post your question in comp.dsp or some other embedded
systems newsgroup for additional opinions)
In terms of decoding efficiency, LZ77 is fundamentally lower-resource
than LZ78 (LZW, etc.), but as with all things it depends on the
hardware. You didn't specify your hardware, so I can only speak
generally. Also note that while I'm going to describe LZ77, I don't
think it's the best choice for your implemenation (more later)
LZ77's best general-purpose implementation is LZSS, where you code
literals with a single bit. The general idea is this (assuming a
machineword size of 8 bits/1 byte):
Read bit
if set, read next eight bits from source and store to destination
if unset, read length of match, then offset of match from current
destination position and copy to destination
loop until no more input bits
Bitstream handling can be a pain if your instruction set doesn't grok
bits well, so you can group the literal bits into machineword-sized
chunks (byte, word, etc.) and this will allow you to machineword-align
the literals and the length/offset pairs.
All this being said, LZ77 is most likely NOT appropriate for a
continuous stream of data. This is because a sliding window needs to
be maintained to the previous decompression stream for the data; it
also operates in fundamentally larger "chunks" so buffering is
necessary.
LZW, Terry Welch's extension to LZ78, is probably what you're looking
for even though it will require bitstream handling. It acts as a true
"black box": You input uncompressed symbols at one end, and compressed
symbols come out the other end, and the decompressor reverses the same,
and no large window is necessary (you will need memory for the
dictionary but that rarely tops 4K). Welch designed LZW specifically
for embedded environments where streaming compression/decompression
needed to be part of the spec.
LZSS is faster, but LZW is what you're looking for. Hope that helps.
|
|
0
|
|
|
|
Reply
|
Jim
|
2/8/2006 7:10:32 PM
|
|
John Gilbert wrote:
> I've been unable to find information about hardware implementations of
> lz77/lzw decompression, but have been told that lzw can be very slow to
> run. Can anyone point me towards information on efficient hardware
> decompression of lz77/lzw coded data, or indeed suggest alternative
> algorithms that lend themselves to efficient hardware decoding?
I don't see how adaptive dictionary based coders suit the application at
all.
How would you even apply LZ77/LZW? You can't perform random access in
any meaningful way. A block based coder like LZO could work in theory,
but it's worst case performance (when code has low locality) would be
atrocious.
Just search the ACM digital library for "instruction compression"
(without the quotation marks). That should give you some idea of what to
look for specifically.
Marco
|
|
0
|
|
|
|
Reply
|
Marco
|
2/8/2006 8:20:21 PM
|
|
Hi Marco,
>
> I don't see how adaptive dictionary based coders suit the application at
> all.
>
> How would you even apply LZ77/LZW? You can't perform random access in
> any meaningful way. [...]
>
This work is part of my MSc thesis. I've identified a scheme that
allows 'random' access decompression of program instructions using a
modification to adaptive dictionary-based compression algorithms. So
far I've been using a software simulation of my scheme (with LZW as the
underlying compression algorithm), but now need to look at how to
realize the decompressor in hardware.
Unfortunately I haven't had much luck finding information on the
hardware implementation of decoders for 'traditional' adaptive
dictionary based algorithms, which I was hoping to extend/modify to
accommodate my scheme.
John G.
|
|
0
|
|
|
|
Reply
|
John
|
2/8/2006 11:13:07 PM
|
|
John Gilbert wrote:
> This work is part of my MSc thesis. I've identified a scheme that
> allows 'random' access decompression of program instructions using a
> modification to adaptive dictionary-based compression algorithms.
Can't really help you except suggesting some google searches ...
lzw (vhdl OR verilog)
lz99 (vhdl OR verilog)
lempel-ziv (vhdl OR verilog)
Try it with both web and newsgroup search, the second search on the web
has vhdl code on the first hit, and the first has on the second page a
paper on using LZW for code compression for a VLIW processor. Non
adaptively of course ...
How exactly do you plan to be locally adaptive and have fast random
access? They are mutually exclusive, to be locally adaptive you always
have to read more than just the encoded symbol.
Marco
|
|
0
|
|
|
|
Reply
|
Marco
|
2/9/2006 12:24:41 AM
|
|
|
8 Replies
344 Views
(page loaded in 4.467 seconds)
|