CCITTFaxDecode

  • Follow


I have a PDF which contains 1000 pages but I believe they were all scanned
pages which have been stored in the PDF as compressed graphics with a
/CCITTFaxDecode command. I would like to write some software to decode them
but I need some information on the CCITT T.6 specification to do it.

Specifically :-

a)    I have the codes for V(0), VL(1), VR(1) and Pass but what are the
codes for VL(2), VL(3), VR(2) and VR(3).
b)    What other codes do I need to know about?
c)    Do repetitive 1 bits indicate a series of blan scan lines?

I have searched many sites on the Internet but they are sparse on detail.

Can anyone help please?

Richard.


0
Reply Richard 11/10/2004 5:06:45 PM

"Richard Finn" <wrfinn(nospam)@ntlworld.com> wrote:

>I have a PDF which contains 1000 pages but I believe they were all scanned
>pages which have been stored in the PDF as compressed graphics with a
>/CCITTFaxDecode command. I would like to write some software to decode them
>but I need some information on the CCITT T.6 specification to do it.

You should be using the specification itself. Doesn't it have this
information?

Instead of the specification, I was able to implement a decoder using
a book which repeated all the necessary details:
http://www.amazon.com/exec/obidos/tg/detail/-/0471950726
This might be easier to order than CCITT or ISO standards.

----------------------------------------
Aandi Inston  quite@dial.pipex.com http://www.quite.com
Please support usenet! Post replies and follow-ups, don't e-mail them.

0
Reply quite 11/10/2004 5:19:14 PM


Richard Finn schrieb:
> I have a PDF which contains 1000 pages but I believe they were all scanned
> pages which have been stored in the PDF as compressed graphics with a
> /CCITTFaxDecode command. I would like to write some software to decode them
> but I need some information on the CCITT T.6 specification to do it.

There are already lots of open source software that do this job. Just a 
few pointers:

* libtiff (C)
* xpdf (C++, file Stream.cc)
* ghostscript (C)
* iText (Java)

For links:
https://rnvs.informatik.tu-chemnitz.de/twiki/bin/view/Main/FreePdfUtilitiesAndLibraries#Free_PDF_libraries_for_developer

> Specifically :-
> 
> a)    I have the codes for V(0), VL(1), VR(1) and Pass but what are the
> codes for VL(2), VL(3), VR(2) and VR(3).
> b)    What other codes do I need to know about?
> c)    Do repetitive 1 bits indicate a series of blan scan lines?
> 
> I have searched many sites on the Internet but they are sparse on detail.
> 
> Can anyone help please?

There are a few hints and statements about CCITT compression in the TIFF 
6 spec:
http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf

The full spec had once been available to the public at:
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_01.ps.gz
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_02.ps.gz

Maybe you can use the filenames and ask in some Ghostscript or TIFF 
developer list. They may still find it on their harddisk, somewhere.

Ralf

-- 
Ralf Koenig
Wissenschaftlicher Mitarbeiter an der
Professur Rechnernetze und verteilte Systeme
TU Chemnitz, Zi. 1/B320, Tel. 0371-531-1532

0
Reply Ralf 11/10/2004 7:05:17 PM

Richard Finn schrieb:
> I have a PDF which contains 1000 pages but I believe they were all scanned
> pages which have been stored in the PDF as compressed graphics with a
> /CCITTFaxDecode command. I would like to write some software to decode them
> but I need some information on the CCITT T.6 specification to do it.
> 
> Specifically :-
> 
> a)    I have the codes for V(0), VL(1), VR(1) and Pass but what are the
> codes for VL(2), VL(3), VR(2) and VR(3).
> b)    What other codes do I need to know about?
> c)    Do repetitive 1 bits indicate a series of blan scan lines?
> 
> I have searched many sites on the Internet but they are sparse on detail.

Sorry, I forgot something.

If you are not keen on programming, but just on getting the page images 
out of the PDF, here is the way to go:

Get xpdf, it includes the wonderful utility pdfimages, that extracts all 
images out of a PDF file *losslessly*. Let me say this again, pdfimages 
does not rasterize or render the page, but really get the embedded 
images with each pixel 100% identical to the one in the file.

$ pdfimages file.pdf image

creates: image0001.pbm, image0002.pbm, ..., which are exact copies (just 
decompressed) of the raster images embedded inside the PDF.

After that use imagemagick or - even better - some other fine image 
converter from the libtiff tools, to compress the images to G4 encoding 
to save disk space.

$ convert -compress Group4 image0001.pbm image0001.tif

You may want to use a loop around this command of course. You can even 
create a multi-page tiff (all pages in one tiff file), if you want that.

Ralf

-- 
Ralf Koenig
Wissenschaftlicher Mitarbeiter an der
Professur Rechnernetze und verteilte Systeme
TU Chemnitz, Zi. 1/B320, Tel. 0371-531-1532

0
Reply Ralf 11/10/2004 7:23:56 PM

Ralf,
I sent a reply but it didn't appear. Maybe I sent the reply to you rather
than a group reply. You'll have to forgive me I'm a newbe at this.

Thank you Ralf for all your assistance, your suggestions have been very
helpful.

 I will certainly look at the open source software that you suggest.

I've already had a look at the TIFF 6 spec. It helped me get this far but
doesn't provide the detail I was after.

I've posted a few requests for the T.6 spec on various newsgroups as you
suggested.

Although it is a real-life requirement for me I am using this problem as a
means of introducing myself to Delphi and PDFs, so while the open source
software will be helpful, I am interested in the programming problem for its
own sake.

Thanks again

Richard

"Ralf Koenig" <ralf.koenig@informatik.tu-chemnitz.de> wrote in message
news:cmtopd$ffj$1@anderson.hrz.tu-chemnitz.de...
> Richard Finn schrieb:
> > I have a PDF which contains 1000 pages but I believe they were all
scanned
> > pages which have been stored in the PDF as compressed graphics with a
> > /CCITTFaxDecode command. I would like to write some software to decode
them
> > but I need some information on the CCITT T.6 specification to do it.
>
> There are already lots of open source software that do this job. Just a
> few pointers:
>
> * libtiff (C)
> * xpdf (C++, file Stream.cc)
> * ghostscript (C)
> * iText (Java)
>
> For links:
>
https://rnvs.informatik.tu-chemnitz.de/twiki/bin/view/Main/FreePdfUtilitiesAndLibraries#Free_PDF_libraries_for_developer
>
> > Specifically :-
> >
> > a)    I have the codes for V(0), VL(1), VR(1) and Pass but what are the
> > codes for VL(2), VL(3), VR(2) and VR(3).
> > b)    What other codes do I need to know about?
> > c)    Do repetitive 1 bits indicate a series of blan scan lines?
> >
> > I have searched many sites on the Internet but they are sparse on
detail.
> >
> > Can anyone help please?
>
> There are a few hints and statements about CCITT compression in the TIFF
> 6 spec:
> http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf
>
> The full spec had once been available to the public at:
>
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_01.ps.gz
>
ftp://sunsite.doc.ic.ac.uk/computing/ccitt/ccitt-standards/1988/7_3_02.ps.gz
>
> Maybe you can use the filenames and ask in some Ghostscript or TIFF
> developer list. They may still find it on their harddisk, somewhere.
>
> Ralf
>
> -- 
> Ralf Koenig
> Wissenschaftlicher Mitarbeiter an der
> Professur Rechnernetze und verteilte Systeme
> TU Chemnitz, Zi. 1/B320, Tel. 0371-531-1532
>


0
Reply Richard 11/10/2004 10:46:01 PM

4 Replies
775 Views

(page loaded in 1.746 seconds)

Similiar Articles:













7/24/2012 10:11:10 PM


Reply: