TIF to PDF conversion results in blank doc

Hi there,

I am developing a utility that programmatically converts a Group 3
tiff image (CCITTFaxDecode) into a PDF document.  All I am doing is
extracting the stream of data from the tif file and placing it between
the relevant 'stream' 'endstream' PDF tags, and then filling in the
blanks, by creating the cross reference, header, trailer, page
objects, etc.

The PDF document produced by my code appears to look correct.  The
header info. trailer, cross reference, etc. are all in place, and the
byte offsets and indirect object references are all correct (as far as
I can tell!).  However, when I view the created document in Adobe
Acrobat, or any other PDF viewer, all I get is a blank document. 
Adobe does not report any errors, and I have tried several other PDF
analysis tools, all of which say I have a valid document.

Does anyone have any ideas on the kinds of things that could cause
Adobe to display a blank doc?  Some info to be going on with.....

The start of my file is as follows:

%PDF-1.3.%    .1 0 obj.<<./Type /XObject./Subtype /Image./Width
1728./Height 1600./ColorSpace /DeviceGray./BitsPerComponent 1./Length
2 0 R./Filter /CCITTFaxDecode./DecodeParms << /K 0./EncodedByteAlign
false./EndOfBlock false./EndOfLine true./BlackIs1 true./Columns
1728./Rows 1600>> >>.stream.

followed by the data itself which is a straight binary copy between
the tiff file image data and the PDF document.

The remainder of the document after the data looks like this:

..endstream.endobj.2 0 obj.26263.endobj.3 0 obj.<<./Type /Page./Parent
4 0 R./Resources <<./ProcSet [ /PDF /ImageB ].>>./Contents [1 0
R].>>.endobj.4 0 obj.<<./Type /Pages./Kids [3 0 R.]./MediaBox [ 0 0
595 842 ]./Count 1>>.endobj.5 0
obj.<<./Creator(Gary)./CreationDate(D:20030924105139)./Title(Archived
G3 file)./Producer(Garys PDF Conversion
Library)./Keywords(G3).>>.endobj.6 0 obj.<<./Type /Catalog./Pages 4 0
R>>.endobj.xref.0 7.0000000000 65535 f .0000000015 00000 n .0000026577
00000 n .0000026598 00000 n .0000026706 00000 n .0000026789 00000 n
..0000026946 00000 n .trailer.<<./Size 7./Root 6 0 R./Info 5 0
R.>>.startxref.26994.%%EOF.

The image is a greyscale, single-page image.  The data is definitely
26263 bytes in length and all the cross reference offsets, etc. are
correct.

Thanks in advance,

Gary.
0
greynolds
9/24/2003 10:23:23 AM
comp.text.pdf 5594 articles. 0 followers. ramon (1518) is leader. Post Follow

5 Replies
679 Views

Similar Articles

[PageSpeed] 30

greynolds@equisys.com (Gary Reynolds) wrote:

>Hi there,
>
>I am developing a utility that programmatically converts a Group 3
>tiff image (CCITTFaxDecode) into a PDF document.  All I am doing is
>extracting the stream of data from the tif file and placing it between
>the relevant 'stream' 'endstream' PDF tags, and then filling in the
>blanks, by creating the cross reference, header, trailer, page
>objects, etc.
>
>The PDF document produced by my code appears to look correct.  The
>header info. trailer, cross reference, etc. are all in place, and the
>byte offsets and indirect object references are all correct (as far as
>I can tell!).  However, when I view the created document in Adobe
>Acrobat, or any other PDF viewer, all I get is a blank document. 

Consider - you haven't even told Acrobat where on the page this image
would go, and what size - how could it guess? This should be a clue
that something vital is missing.

Many things could be wrong, but one is clear: you are trying to use
image data as a page stream. No, you can't. A page stream must contain
marking operators, as described in the PDF Reference. You cannot just
substitute an image XObject for a content stream. The content stream
must use "Do" and other operators ("cm" especially) to position and
place the image.

Remember also, when it comes to details of the TIFF file, that each
strip or tile is separately compressed. Each strip or tile has to map
to a different compressed XObject - you cannot concatenate compressed
CCITT streams.
----------------------------------------
Aandi Inston  quite@dial.pipex.com http://www.quite.com
Please support usenet! Post replies and follow-ups, don't e-mail them.

0
quite
9/24/2003 10:43:17 AM
HI Gary,

A few months ago, we had a client come to us who wanted to take older fax
files in an .efx form and convert a large number automatically into PDF as
well as automatically assign the approriate file name for each.  Our
software, OctoPDF can do this for TIFF images as well. (www.octopdf.com)
This particular client contracted us to do the conversion, sent us his files
and we sent (via FTP) the resultant 1000+  PDFs back to him the next day.
Please give me a call or email and we can discuss your requirements if you
would like to persue this further.  Thanks for your time.

Larry T.

larry@jbmsystems.com
978 535-7676



"Gary Reynolds" <greynolds@equisys.com> wrote in message
news:7a17b2fb.0309240223.3fcd3f98@posting.google.com...
> Hi there,
>
> I am developing a utility that programmatically converts a Group 3
> tiff image (CCITTFaxDecode) into a PDF document.  All I am doing is
> extracting the stream of data from the tif file and placing it between
> the relevant 'stream' 'endstream' PDF tags, and then filling in the
> blanks, by creating the cross reference, header, trailer, page
> objects, etc.
>
> The PDF document produced by my code appears to look correct.  The
> header info. trailer, cross reference, etc. are all in place, and the
> byte offsets and indirect object references are all correct (as far as
> I can tell!).  However, when I view the created document in Adobe
> Acrobat, or any other PDF viewer, all I get is a blank document.
> Adobe does not report any errors, and I have tried several other PDF
> analysis tools, all of which say I have a valid document.
>
> Does anyone have any ideas on the kinds of things that could cause
> Adobe to display a blank doc?  Some info to be going on with.....
>
> The start of my file is as follows:
>
> %PDF-1.3.%    .1 0 obj.<<./Type /XObject./Subtype /Image./Width
> 1728./Height 1600./ColorSpace /DeviceGray./BitsPerComponent 1./Length
> 2 0 R./Filter /CCITTFaxDecode./DecodeParms << /K 0./EncodedByteAlign
> false./EndOfBlock false./EndOfLine true./BlackIs1 true./Columns
> 1728./Rows 1600>> >>.stream.
>
> followed by the data itself which is a straight binary copy between
> the tiff file image data and the PDF document.
>
> The remainder of the document after the data looks like this:
>
> .endstream.endobj.2 0 obj.26263.endobj.3 0 obj.<<./Type /Page./Parent
> 4 0 R./Resources <<./ProcSet [ /PDF /ImageB ].>>./Contents [1 0
> R].>>.endobj.4 0 obj.<<./Type /Pages./Kids [3 0 R.]./MediaBox [ 0 0
> 595 842 ]./Count 1>>.endobj.5 0
> obj.<<./Creator(Gary)./CreationDate(D:20030924105139)./Title(Archived
> G3 file)./Producer(Garys PDF Conversion
> Library)./Keywords(G3).>>.endobj.6 0 obj.<<./Type /Catalog./Pages 4 0
> R>>.endobj.xref.0 7.0000000000 65535 f .0000000015 00000 n .0000026577
> 00000 n .0000026598 00000 n .0000026706 00000 n .0000026789 00000 n
> .0000026946 00000 n .trailer.<<./Size 7./Root 6 0 R./Info 5 0
> R.>>.startxref.26994.%%EOF.
>
> The image is a greyscale, single-page image.  The data is definitely
> 26263 bytes in length and all the cross reference offsets, etc. are
> correct.
>
> Thanks in advance,
>
> Gary.


0
Larry
9/24/2003 1:48:14 PM
quite@dial.pipex.con (Aandi Inston) wrote in message news:<3f7173f2.855398558@reading.news.pipex.net>...
> greynolds@equisys.com (Gary Reynolds) wrote:
> 
> >Hi there,
> >
> >I am developing a utility that programmatically converts a Group 3
> >tiff image (CCITTFaxDecode) into a PDF document.  All I am doing is
> >extracting the stream of data from the tif file and placing it between
> >the relevant 'stream' 'endstream' PDF tags, and then filling in the
> >blanks, by creating the cross reference, header, trailer, page
> >objects, etc.
> >
> >The PDF document produced by my code appears to look correct.  The
> >header info. trailer, cross reference, etc. are all in place, and the
> >byte offsets and indirect object references are all correct (as far as
> >I can tell!).  However, when I view the created document in Adobe
> >Acrobat, or any other PDF viewer, all I get is a blank document. 
> 
> Consider - you haven't even told Acrobat where on the page this image
> would go, and what size - how could it guess? This should be a clue
> that something vital is missing.
> 
> Many things could be wrong, but one is clear: you are trying to use
> image data as a page stream. No, you can't. A page stream must contain
> marking operators, as described in the PDF Reference. You cannot just
> substitute an image XObject for a content stream. The content stream
> must use "Do" and other operators ("cm" especially) to position and
> place the image.
> 
> Remember also, when it comes to details of the TIFF file, that each
> strip or tile is separately compressed. Each strip or tile has to map
> to a different compressed XObject - you cannot concatenate compressed
> CCITT streams.
> ----------------------------------------
> Aandi Inston  quite@dial.pipex.com http://www.quite.com
> Please support usenet! Post replies and follow-ups, don't e-mail them.


Aandi,

Thanks for your reply to my message.  It was very useful and has
allowed me to progress my work somewhat.  I know have a file which
takes the following format:

%PDF-1.3.%    .1 0 obj.<<./Type /XObject./Subtype /Image./Name
/strm_0./Filter /CCITTFaxDecode /Width 1728./Height 1600./ColorSpace
/DeviceGray./BitsPerComponent 1./DecodeParms << /K 0./Columns
1728./Rows 1600>>/Length 2 0 R>>.stream.

<data here>

..endstream.endobj.2 0 obj.26263.endobj.3 0 obj.<</Length
29.>>stream.842 0 0 595 0 0 cm./strm_0 Do.endstream.endobj.4 0
obj.<<./Type /Page./Parent 6 0 R./Resources 5 0 R./MediaBox [0 0 842
595]./CropBox [0 0 842 595]./Contents [3 0 R].>>.endobj.5 0
obj.<<./ProcSet [/PDF /ImageB]./XObject <</strm_0 1 0 R.>>.>>.endobj.6
0 obj.<<./Type /Pages./Kids [4 0 R.]./Count 1>>.endobj.7 0
obj.<<./Creator(Zetafax)./CreationDate(D:20030925115855)./Title(Archived
Zetafax)./Producer(Zetafax PDF Conversion
Library)./Keywords(Zetafax).>>.endobj.8 0 obj.<<./Type
/Catalog./OpenAction 9 0 R./Pages 6 0 R>>.endobj.9 0 obj.[ 4 0 R /Fit
].endobj.xref.0 10.0000000000 65535 f .0000000015 00000 n .0000026515
00000 n .0000026536 00000 n .0000026613 00000 n .0000026742 00000 n
..0000026815 00000 n .0000026872 00000 n .0000027029 00000 n
..0000027095 00000 n .trailer.<<./Size 10./Root 8 0 R./Info 7 0
R.>>.startxref.27125.%%EOF.

Specifically, I have added in a content stream that references an
XObject (object 1 in the dump above) and have specified that object 1
is a resource.  At this point I am getting a drawing error from within
Adobe Acrobat Reader.  Does the general format look correct?  I think
it does, but judging by my last message I may still be way-off!  I
have read the relevant parts of the PDF specification (1.3) and can't
find anything that I've missed out.

I have also done a Hex dump of a Group 4 tiff image that has been
converted to PDF.  Apart from the data, the two files (mine and the G4
file) look pretty similar.

Any ideas?

Thanks in advance,

Gary.
0
greynolds
9/25/2003 11:15:09 AM
Larry,
Check out www.codeproject.com and search for PDF on that site.
There is some sample code to do exactly what you need there.

Regards.
0
oside_freak
9/28/2003 8:29:06 PM
greynolds@equisys.com (Gary Reynolds) wrote:

>Thanks for your reply to my message.  It was very useful and has
>allowed me to progress my work somewhat.  I know have a file which
>takes the following format:
>
>%PDF-1.3.%    .1 0 obj.<<./Type /XObject./Subtype /Image./Name
>/strm_0./Filter /CCITTFaxDecode /Width 1728./Height 1600./ColorSpace
>/DeviceGray./BitsPerComponent 1./DecodeParms << /K 0./Columns
>1728./Rows 1600>>/Length 2 0 R>>....

It's very hard to get useful information from a dump of a PDF file. In
this way. Where possible, when asking for help, post the file itself
on a web or FTP site.  Still, a quick look over what you have done
looks correct at the PDF level.

>At this point I am getting a drawing error from within
>Adobe Acrobat Reader.  Does the general format look correct?  I think
>it does, but judging by my last message I may still be way-off!  I
>have read the relevant parts of the PDF specification (1.3) and can't
>find anything that I've missed out.

Could be that the data doesn't match the needs of the CCITTFaxDecode
filter.
>
>I have also done a Hex dump of a Group 4 tiff image that has been
>converted to PDF.  Apart from the data, the two files (mine and the G4
>file) look pretty similar.

You haven't mentioned what you do to the TIFF file, especially my note
that each strip or tile must be processed separately. Be sure too that
the filter parameters exactly match what was used to compress the
TIFF, including details such as whether the data needs to be an exact
number of rows or whether it contains a G3 EOD marker.
----------------------------------------
Aandi Inston  quite@dial.pipex.com http://www.quite.com
Please support usenet! Post replies and follow-ups, don't e-mail them.

0
quite
9/30/2003 11:53:54 AM
Reply: