PDFBox - Documentation

  • Follow


Hello,

i am searching for a documentation about PDFBox.
With google i can not find something and the javaDoc-Api-Documentation
for this library is not very good.

Can someone help me?

Greets

Michael
0
Reply Michael 9/1/2005 1:03:30 PM

All available documentation for PDF is available on the website.  Is
there a specific question that you have?

Ben Litchfield
http://www.pdfbox.org/

0
Reply ben 9/1/2005 3:45:02 PM


Hello,

ben@csh.rit.edu schrieb:
> All available documentation for PDF is available on the website. 

That is all?
There are 1 or 2 documents. Not very much information.

> Is
> there a specific question that you have?
>

Yes. How could i import a page, add it to an new pdf-document and get a
especially string of it to replace it with a new string like a bad
template-system?

I can not find any solution for this problem. Maybe i do not understand
the javadoc in details (they are many information ;-) )

Greets

Michael
0
Reply Michael 9/2/2005 2:30:06 PM

I do agree that PDFBox(like almost every library) could use more
documentation, I disagree that there are "1 or 2 documents"!

If you get the nightly release from http://www.pdfbox.org/dist you will
see that there are a couple examples that are similar to what you want
to do.

org.pdfbox.examples.persistence.AppendDoc
org.pdfbox.examples.pdmodel.RemoveFirstPage
org.pdfbox.examples.pdmodel.ReplaceString

Ben

0
Reply ben 9/5/2005 6:44:39 PM

Hello Ben,

ben@csh.rit.edu schrieb:
> I do agree that PDFBox(like almost every library) could use more
> documentation, I disagree that there are "1 or 2 documents"!
> 
> If you get the nightly release from http://www.pdfbox.org/dist you will
> see that there are a couple examples that are similar to what you want
> to do.
> 
> org.pdfbox.examples.persistence.AppendDoc
> org.pdfbox.examples.pdmodel.RemoveFirstPage
> org.pdfbox.examples.pdmodel.ReplaceString
> 

thanks for your posting. It was very helpful. I had use the stable
version 0.7.1. There are these examples not included.

But i have a new Problem:

When i call the example-function

<Source>

new ReplaceString().doIt("temp/template.pdf", "temp/temp.pdf",
"#Antragsteller.Name#", "Something");

</Source>

i got the message "No filters for stream" and i can not get something
with it.

Google.de says nothing about the problem.

The Pdf-Template is exported with OpenOffice 1.4.

Thanks for answers.

Michael
0
Reply Michael 9/6/2005 5:54:21 PM

Hello,

i wrote:

> Hello Ben,
> 
......

> 
> When i call the example-function
> 
> <Source>
> 
> new ReplaceString().doIt("temp/template.pdf", "temp/temp.pdf",
> "#Antragsteller.Name#", "Something");
> 
> </Source>
> 
> i got the message "No filters for stream" and i can not get something
> with it.
> 
> Google.de says nothing about the problem.
> 

i think this message came because the string can not be found, but why?
The document contents the string.

What is the prob?

THX

Michael

PS: Thanks for your spended time and your nerves ;-)
0
Reply Michael 9/6/2005 6:01:12 PM

I will need to look at the PDF to know what the issue is, if you upload
it to ftp.pdfbox.org(a write-only site so only I can see the PDFs) I
will take a look at it and let you know what I find.

Because of the way PDF documents are structured, it is not always easy
to replace a string.  The ReplaceString that comes with PDFBox is a
very basic implementation.  It will only work when the String is drawn
as a whole string, which is not always the case.  Some pdf writers will
draw a string in two(or more) parts, for example

(#Antragsteller) Tj
(.Name#) Tj

In which case, the ReplaceString example will not find the string to
replace.  You can view the PDF using the graphical program
org.pdfbox.PDFDebugger to see how the string is drawn if you drill down
to the page contents.

I don't believe the "No filters for stream" message is the same issue,
I think that may be a different issue.

As you may have seen from other posts, you should really be using
acroforms, which will be more reliable.

Ben

0
Reply ben 9/6/2005 11:49:49 PM

Hallo Ben,

ben@csh.rit.edu schrieb:
> I will need to look at the PDF to know what the issue is, if you upload
> it to ftp.pdfbox.org(a write-only site so only I can see the PDFs) I
> will take a look at it and let you know what I find.

O.K. File is online.

> 
> Because of the way PDF documents are structured, it is not always easy
> to replace a string.  The ReplaceString that comes with PDFBox is a
> very basic implementation.  It will only work when the String is drawn
> as a whole string, which is not always the case.  Some pdf writers will
> draw a string in two(or more) parts, for example

O.K. i understand the problem.

> 
> (#Antragsteller) Tj
> (.Name#) Tj

> 
> In which case, the ReplaceString example will not find the string to
> replace.  


You can view the PDF using the graphical program
> org.pdfbox.PDFDebugger to see how the string is drawn if you drill down
> to the page contents.

Wow. Good program, but i don�t understand anything in this program
unless the structur of the objects.

> 
> I don't believe the "No filters for stream" message is the same issue,
> I think that may be a different issue.

Yes. This problem seems to be another. With an pdf-document of the
internet the simple program works fine.

Till now i had exported all pdf-documents of openoffice 1.4. Now i will
test it with the acrobat writer under windows.

> 
> As you may have seen from other posts, you should really be using
> acroforms, which will be more reliable.

Yes. But my dream is, that the user can edit the pdf-document in his
prefered program (example Word with printing in acrobat writer or
OpenOffice with the export-function). With AcroForms the user has to
know about these methods / objects and the handling of it.

Word and OpenOffice do not automatically create these AcroForm-Fields,
so there are not my preferred change.

Greetings

Michael
0
Reply Michael 9/7/2005 3:18:27 PM

There are two ways that strings can be used in a PDF document.  Once is
to wrap the character codes with open/close parens like this

(This is a String)

The other way is to hex encode it and wrap it in angle brackets like
this
<54686973206973206120537472696E67>

I uploaded a screenshot of the debugger app using your PDF to show you
where the contents are generated.
http://www.pdfbox.org/dist/pdfdebugger-screenshot.bmp

Tj is the show string operator in PDF and as you can see from the
screenshot all the strings are drawn as single characters, which is
somewhat common, because apps like OO want complete control over the
spacing of every character.

The issue is that the current version of ReplaceString will only work
when is can find the complete string.  So it will not work on this PDF.
 The code to do the string replacement in a PDF like yours would be
very complex as it would need to examine every character that is drawn
and put the string back together(this is somewhat implemented with the
org.pdfbox.ExtractText program) then those characters would need to be
removed from the content stream and replaced with the new string.


Ben

0
Reply ben 9/7/2005 4:26:45 PM

8 Replies
999 Views

(page loaded in 0.117 seconds)

Similiar Articles:













7/20/2012 8:04:42 PM


Reply: