I have a PDF which contains 1000 pages but I believe they were all scanned
pages which have been stored in the PDF as compressed graphics with a
/CCITTFaxDecode command. I would like to write some software to decode them
but I need some information on the CCITT T.6 specification to do it.
Specifically :-
a) I have the codes for V(0), VL(1), VR(1) and Pass but what are the
codes for VL(2), VL(3), VR(2) and VR(3).
b) What other codes do I need to know about?
c) Do repetitive 1 bits indicate a series of blan scan lines?
I have searched many sites on the Internet but they are sparse on detail.
Can anyone help please?
Richard.
|
|
0
|
|
|
|
Reply
|
Richard
|
11/10/2004 5:06:45 PM |
|
"Richard Finn" <wrfinn(nospam)@ntlworld.com> wrote:
>I have a PDF which contains 1000 pages but I believe they were all scanned
>pages which have been stored in the PDF as compressed graphics with a
>/CCITTFaxDecode command. I would like to write some software to decode them
>but I need some information on the CCITT T.6 specification to do it.
You should be using the specification itself. Doesn't it have this
information?
Instead of the specification, I was able to implement a decoder using
a book which repeated all the necessary details:
http://www.amazon.com/exec/obidos/tg/detail/-/0471950726
This might be easier to order than CCITT or ISO standards.
----------------------------------------
Aandi Inston quite@dial.pipex.com http://www.quite.com
Please support usenet! Post replies and follow-ups, don't e-mail them.
|
|
0
|
|
|
|
Reply
|
quite
|
11/10/2004 5:19:14 PM
|
|
Richard Finn schrieb:
> I have a PDF which contains 1000 pages but I believe they were all scanned
> pages which have been stored in the PDF as compressed graphics with a
> /CCITTFaxDecode command. I would like to write some software to decode them
> but I need some information on the CCITT T.6 specification to do it.
There are already lots of open source software that do this job. Just a
few pointers:
* libtiff (C)
* xpdf (C++, file Stream.cc)
* ghostscript (C)
* iText (Java)
For links:
https://rnvs.informatik.tu-chemnitz.de/twiki/bin/view/Main/FreePdfUtilitiesAndLibraries#Free_PDF_libraries_for_developer
> Specifically :-
>
> a) I have the codes for V(0), VL(1), VR(1) and Pass but what are the
> codes for VL(2), VL(3), VR(2) and VR(3).
> b) What other codes do I need to know about?
> c) Do repetitive 1 bits indicate a series of blan scan lines?
>
> I have searched many sites on the Internet but they are sparse on detail.
>
> Can anyone help please?
There are a few hints and statements about CCITT compression in the TIFF
6 spec:
http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf
The full spec had once been available to the public at:
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_01.ps.gz
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_02.ps.gz
Maybe you can use the filenames and ask in some Ghostscript or TIFF
developer list. They may still find it on their harddisk, somewhere.
Ralf
--
Ralf Koenig
Wissenschaftlicher Mitarbeiter an der
Professur Rechnernetze und verteilte Systeme
TU Chemnitz, Zi. 1/B320, Tel. 0371-531-1532
|
|
0
|
|
|
|
Reply
|
Ralf
|
11/10/2004 7:05:17 PM
|
|
Richard Finn schrieb:
> I have a PDF which contains 1000 pages but I believe they were all scanned
> pages which have been stored in the PDF as compressed graphics with a
> /CCITTFaxDecode command. I would like to write some software to decode them
> but I need some information on the CCITT T.6 specification to do it.
>
> Specifically :-
>
> a) I have the codes for V(0), VL(1), VR(1) and Pass but what are the
> codes for VL(2), VL(3), VR(2) and VR(3).
> b) What other codes do I need to know about?
> c) Do repetitive 1 bits indicate a series of blan scan lines?
>
> I have searched many sites on the Internet but they are sparse on detail.
Sorry, I forgot something.
If you are not keen on programming, but just on getting the page images
out of the PDF, here is the way to go:
Get xpdf, it includes the wonderful utility pdfimages, that extracts all
images out of a PDF file *losslessly*. Let me say this again, pdfimages
does not rasterize or render the page, but really get the embedded
images with each pixel 100% identical to the one in the file.
$ pdfimages file.pdf image
creates: image0001.pbm, image0002.pbm, ..., which are exact copies (just
decompressed) of the raster images embedded inside the PDF.
After that use imagemagick or - even better - some other fine image
converter from the libtiff tools, to compress the images to G4 encoding
to save disk space.
$ convert -compress Group4 image0001.pbm image0001.tif
You may want to use a loop around this command of course. You can even
create a multi-page tiff (all pages in one tiff file), if you want that.
Ralf
--
Ralf Koenig
Wissenschaftlicher Mitarbeiter an der
Professur Rechnernetze und verteilte Systeme
TU Chemnitz, Zi. 1/B320, Tel. 0371-531-1532
|
|
0
|
|
|
|
Reply
|
Ralf
|
11/10/2004 7:23:56 PM
|
|
Ralf,
I sent a reply but it didn't appear. Maybe I sent the reply to you rather
than a group reply. You'll have to forgive me I'm a newbe at this.
Thank you Ralf for all your assistance, your suggestions have been very
helpful.
I will certainly look at the open source software that you suggest.
I've already had a look at the TIFF 6 spec. It helped me get this far but
doesn't provide the detail I was after.
I've posted a few requests for the T.6 spec on various newsgroups as you
suggested.
Although it is a real-life requirement for me I am using this problem as a
means of introducing myself to Delphi and PDFs, so while the open source
software will be helpful, I am interested in the programming problem for its
own sake.
Thanks again
Richard
"Ralf Koenig" <ralf.koenig@informatik.tu-chemnitz.de> wrote in message
news:cmtopd$ffj$1@anderson.hrz.tu-chemnitz.de...
> Richard Finn schrieb:
> > I have a PDF which contains 1000 pages but I believe they were all
scanned
> > pages which have been stored in the PDF as compressed graphics with a
> > /CCITTFaxDecode command. I would like to write some software to decode
them
> > but I need some information on the CCITT T.6 specification to do it.
>
> There are already lots of open source software that do this job. Just a
> few pointers:
>
> * libtiff (C)
> * xpdf (C++, file Stream.cc)
> * ghostscript (C)
> * iText (Java)
>
> For links:
>
https://rnvs.informatik.tu-chemnitz.de/twiki/bin/view/Main/FreePdfUtilitiesAndLibraries#Free_PDF_libraries_for_developer
>
> > Specifically :-
> >
> > a) I have the codes for V(0), VL(1), VR(1) and Pass but what are the
> > codes for VL(2), VL(3), VR(2) and VR(3).
> > b) What other codes do I need to know about?
> > c) Do repetitive 1 bits indicate a series of blan scan lines?
> >
> > I have searched many sites on the Internet but they are sparse on
detail.
> >
> > Can anyone help please?
>
> There are a few hints and statements about CCITT compression in the TIFF
> 6 spec:
> http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf
>
> The full spec had once been available to the public at:
>
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_01.ps.gz
>
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_02.ps.gz
>
> Maybe you can use the filenames and ask in some Ghostscript or TIFF
> developer list. They may still find it on their harddisk, somewhere.
>
> Ralf
>
> --
> Ralf Koenig
> Wissenschaftlicher Mitarbeiter an der
> Professur Rechnernetze und verteilte Systeme
> TU Chemnitz, Zi. 1/B320, Tel. 0371-531-1532
>
|
|
0
|
|
|
|
Reply
|
Richard
|
11/10/2004 10:46:01 PM
|
|
|
4 Replies
775 Views
(page loaded in 1.746 seconds)
|