Hello,
I'm trying to find a ruby module/code that can decompress
LZW-compression-scheme from a PDF file. However, there is no such code
or module (as far as I've known) that exist publicly.
PDF usually compress its stream data by using FlateDecode,
ASCIIHexDecode, ASCII85Decode, and LZWDecode. In ruby, FlateDecode and
ASCII85Decode can be decompressed with existing ruby module which are
zlib and Ascii85. For ASCIIHexDecode, I just need to convert Hex
characters to char. My problem arise from the LZWDecode since there is
no module or code to decompress it.
Since there is no code example of implementing the LZW decompression in
ruby, I've found the implementation code from python. However,
translating python into ruby seems to be a pain-in-a-butt process.
Example of working LZW decompression in python is here:
http://pastebin.ca/1849009
My translated code in ruby is here: http://pastebin.ca/1849012
With a small input, I can decompress the it to get the equivalent output
like the python code.
e.g:
Python
data = "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01"
tmp = LZWDecode(data)
print tmp
data = "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01"
lzw = LZWDecoder.new(data)
puts lzw.run()
However, with a real stream from PDF file, I cannot get the decompressed
output. I guess it might be some error in the code or improper handling
of special character in ruby.
I've spent large amount of hours/days in digesting how to decompress LZW
stream and try to translate from python to ruby. It seems that my
current effort didnt give me a bright end. I really hope someone can
help me pointing some of the hint or solution towards this problem.
Thank you
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
ahmad.azizan (13)
|
3/22/2010 7:32:17 AM |
|
On Mar 22, 2010, at 00:32 , Ahmad Azizan wrote:
> With a small input, I can decompress the it to get the equivalent =
output
> like the python code.
> e.g:
> Python
> data =3D "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01"
> tmp =3D LZWDecode(data)
> print tmp
>=20
> data =3D "\x80\x0b\x60\x50\x22\x0c\x0c\x85\x01"
> lzw =3D LZWDecoder.new(data)
> puts lzw.run()
>=20
> However, with a real stream from PDF file, I cannot get the =
decompressed
> output. I guess it might be some error in the code or improper =
handling
> of special character in ruby.
Can you get the python code to decode the real stream? That'd be one way =
to determine if the original data is corrupt or not.
|
|
0
|
|
|
|
Reply
|
Ryan
|
3/22/2010 8:20:20 AM
|
|
> Example of working LZW decompression in python is here:
> http://pastebin.ca/1849009
> My translated code in ruby is here: http://pastebin.ca/1849012
Which version of ruby are you using? If it's 1.9 then your @fp[@inc] may
fall foul of the character encoding rules. Try this in your initialize:
puts @fp.encoding
@fp.force_encoding("ASCII-8BIT")
However if you pass in a StringIO rather than a String then you can just
copy what python is doing:
x = @fp.read(1)
@buff = x[0].unpack("C").first
and read(1) always reads single bytes. This has the advantage of being
able to decompress directly from files, without reading them into RAM
first.
Minor suggestion: it might be more rubyish to return nil rather than
raise EOFError, which would simplify your run loop to
result = ""
while code = readbits(@nbits)
result << feed(code)
end
return result
Regards,
Brian.
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
Brian
|
3/22/2010 9:31:01 AM
|
|
|
2 Replies
575 Views
(page loaded in 0.061 seconds)
Similiar Articles: Decompressing LZW compression from PDF file - comp.lang.ruby ...Hello, I'm trying to find a ruby module/code that can decompress LZW-compression-scheme from a PDF file. However, there is no such code or module... FlateDecode, ASCIIHexDecode, ASCII85Decode, and LZWDecode streams ...... browse_thread/thread/2d88635982a60b54/15fc41f6e3556d9?q=Decompressing+LZW+compression+from+PDF ... in many pdf files - comp.text.pdf Counting pages in many pdf files ... FlateDecode, ASCIIHexDecode, ASCII85Decode, and LZWDecode streams ...... browse_thread/thread/2d88635982a60b54/15fc41f6e3556d9?q=Decompressing+LZW+compression+from+PDF ... Counting pages in many pdf files - comp.text.pdf... Stream | About ... LZW compression and decompression in vhdl - comp.arch.fpga ...... an implementation of LZW compression/decompression ... in ZLib with large files - comp.compression LZW compression and decompression in ... Decompressing LZW compression from PDF File Cannot be Compressed - comp.compressionLZW compression and decompression in vhdl - comp.arch.fpga ... File Cannot be Compressed - comp.compression Decompressing LZW compression from PDF file - comp.lang.ruby ... Testing LZW Decompression - comp.compressionI am trying to write an LZW compression and decompression program for my own use. Where can I find some small tiff files to use for a test? ... Pdf javascript event onOpen, OnLoad? - comp.text.pdf... comp.lang ..... in javascript to set the value of the field on the body onload event? is this possible in javascript? ... Decompressing LZW compression from PDF file ... help: decompression of file LZ77 - comp.compressionHi Compression-Gurus, I need to decompress files that were ... very slow PDF file printing - comp.text.pdf I ... about hardware implementations of lz77/lzw decompression ... Lzw dictionnary implementation - comp.compressionIf it's true, i wonder how he could decompress the file! ... Lempel–Ziv–Welch - Wikipedia, the free encyclopedia ... does someone have a LZW compression source code in C ... does someone have a LZW compression source code in C - comp ...Does someone have a LZW compression and decompression source code in C what I need is a code ... G726 codec support with .wav files - comp.dsp And also which file compression ... Decompressing LZW compression from PDF file - comp.lang.ruby ...Hello, I'm trying to find a ruby module/code that can decompress LZW-compression-scheme from a PDF file. However, there is no such code or module... How to compress your PDF files? Compression arithmetic for PDF filesHow to compress your PDF files? Compression ... If you compress a file and then decompress it, it ... LZW compression is the compression of a file into a smaller file using ... 7/22/2012 12:52:15 AM
|