Hi,
Does anyone have sample java code for extracting pdf metadata? The key
fields that I am looking at is the "Creator", "Producer", "Author" and
"Subject" metadata fields in the pdf document.
Or would someone be able to guide me in the correct direction on how to
extract these metadata fields from a pdf document. Is there a java
class that I can use. I was able to find a java class for extracting
image properties from a pdf file - Java clases "ImageInfo" however the
fields provided from this class are not the fields that I am looking
for. (the four above fields)
Thanks in advance for your help.
Prakash
|
|
-1
|
|
|
|
Reply
|
pchulani (2)
|
3/6/2006 7:27:26 PM |
|
iText - http://www.lowagie.com/iText/
fabrizio
|
|
-1
|
|
|
|
Reply
|
fhtino
|
3/7/2006 7:27:56 AM
|
|
PDFBox is an open source java library that can do this.
PDDocument pdf = PDDocument.load( "my.pdf" );
PDDocumentInformation info = pdf.getDocumentInformation();
System.out.println( "creator=" + info.getCreator() );
....
http://www.pdfbox.org
Ben
|
|
-1
|
|
|
|
Reply
|
ben
|
3/7/2006 1:40:56 PM
|
|
Hi,
Thanks for your help.
I am still confused though. Would you know which pdfbox files that I
would need to import. How would I use the pdfbox library and which ones
would i need to import.
"package ....?
import .....?
....."
I am still not sure which files to include to run my java code and
which files/libaries from the pdfbox to include. How do I get to
utilize this pdfbox library - do I need to import all the files, is
there some sort of way to have include just one file or a few files
like in C/C++ which makes calls to the other files. Do we really need
all these files?
I am still new to java and am unsure about this. I normally would just
use import but I do not which file to utilize and how to use this
pdfbox library - there just two many files. How can we just utilize the
library without importing all the files and tracking all the file
dependies.
Please help.
Thanks,
Prakash
|
|
-1
|
|
|
|
Reply
|
pchulani
|
3/9/2006 12:59:25 AM
|
|
You will always need the entire PDFBox-x.x.x.jar file, this is the only
'file' you need.
The jar file contains classes, which need to be referenced from your
class.
For the example I posted you can import an entire package or a single
class,
For the example to import an entire package use this line at the top of
your source file
import org.pdfbox.pdmodel.*;
or to import just the classes you can list each one individually, like
this
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDDocumentInformation;
Either way your compiled class will be exactly the same, Java does
dynamic linking, so an import statement only tells the compiler where
it should look for stuff to verify your syntax is correct. In C++ if
you add an include your exe gets bigger(been a while since I did C++ so
please forgive me if I am wrong) because it statically links in the
stuff you include, that does not happen in Java.
so complete source would look something like this
package mypackage;
import java.io.*;
import org.pdfbox.pdmodel.*;
public class MyClass
{
public static void main( String[] args ) throws IOException
{
PDDocument doc = null;
try
{
doc = PDDocument.load( "my.pdf" );
PDDocumentInformation info = pdf.getDocumentInformation();
System.out.println( "creator=" + info.getCreator() );
}
finally
{
if( doc != null )
{
doc.close();
}
}
}
|
|
0
|
|
|
|
Reply
|
ben
|
3/9/2006 2:30:27 AM
|
|
Hi Ben,
Thank you very much for the info. I think I got it now. This was very
helpful.
PC
|
|
0
|
|
|
|
Reply
|
pchulani
|
3/9/2006 4:13:40 PM
|
|
Hi Ben,
One more question. I have been reading the documentation and it
mentions that to use the PDFBox Library we need to install "Ant."
Reading "Ant" I understand that it ilike gmake for making files. So do
I need to install this first before I can use the PDFBox Library. Once
I get "Ant" how to I include all the files so that it can build the
PDFBox files.
Also do I need to change the Class Path after installing "Ant" in my
directory for me to use PDFBox? How to I build the PDFBox library to
use it - you mentioned something about PDFBox-x.x.x.jar file, I
downloaded "PDFBox-0.7.3-dev-20060219" and looked at the build
properties and all it mentions is
"#forrest.home=c:\\javalib\\apache-forrest-0.6\\src\\core
#ikvm.dir=C:\\javalib\\ikvm-12-07-2004\\ikvm"
What does this mean?
Sorry I am just confused about the importing process of PDFBox. The
Java documentation of the classes are excellent, it is easily
understandable. However the site missed one crucial piece of
information, how to actually get to use the library. Maybe it is just
me, other users who use java more probably understand this library
import process - the way I have done it before is to import each
one.The only way I understand is by importing every file which is
extremely tedious. How do you just do "import org.pdfbox.pdmodel.*;" Do
I just place all the class files in the same root directory? (Basically
my question is how to get the PDFBox library to be useable as "import
org.pdfbox.pdmodel.*;" the same way "import java.io.*", I need to know
how to build the custom library so it is like the standard library
"import java.io.*."
Could you give me an example. of the library building process. I am
guessing it is like
1) Download PDFBox
2) Download "Ant"
3) Set "Ant" Class Paths
4) Once that is done use "Ant" to build PDFBox.(In what directory do I
do the PDFBox build. "Ant" Class Path has been set and accessible from
any where, but how does that enable PDFBox library to be accessed from
any directory)
5) Now that it is built you can use it in the code as "import
org.pdfbox.pdmodel.*;"
Could you explain the process if you have some time. I am confused
about the process of using the PDFBox Library rather the class
code/structure itself of PDFBox Classes - that is straight forward and
there is excellent documentation.
Thanks,
PC
|
|
0
|
|
|
|
Reply
|
pchulani
|
3/9/2006 6:55:18 PM
|
|
You only need ANT if you want to compile the PDFBox binaries yourself.
PDFBox comes with the binaries so you don't need to do that unless you
want to, I would recommend just using the released version or if you
want the latest then grab the nightly build from
http://www.pdfbox.org/dist
In Java there is typically no benefit to building something yourself
versus using binaries. The only reason people should be building
PDFBox themselves is 1)they are curious and want to understand PDFBox
more 2)they are planning on making modifications(which of course should
be contributed back to PDFBox :) )
The build process will create the PDFBox-0.7.2.jar in the lib
directory. You already have this file, which is why you don't need to
build PDFBox, this is the file you need to be able to use PDFBox.
The build.properties has *optional* properties you can set when
building PDFBox.
PDFBox uses the following two projects for a complete build
IKVM: To build .NET DLLs from jar files
Apache Forrest:To generate the website documentation
You can leave these blank and the build will complete and just leave
out those parts and just build the jar file.
Ben
|
|
0
|
|
|
|
Reply
|
ben
|
3/9/2006 7:07:51 PM
|
|
Hi Ben,
Thanks for your help. It is appreciated.
PC
|
|
0
|
|
|
|
Reply
|
pchulani
|
3/10/2006 4:40:08 PM
|
|
|
8 Replies
817 Views
(page loaded in 0.522 seconds)
Similiar Articles: Does anyone have sample java code for extracting pdf metadata ...Hi, Does anyone have sample java code for extracting pdf metadata? The key fields that I am looking at is the "Creator", "Producer", "Author" and "Su... how to change pdf metadata on multiple files at once? - comp.text ...Does anyone have sample java code for extracting pdf metadata ... how to change pdf metadata on multiple files at once? - comp.text ... iText (Java) or iTextSharp (.NET ... How to show title/subject in Windows explorer list view? - comp ...how to find fonts used in PDF file using Java - comp.text.pdf ... Does anyone have sample java code for extracting pdf metadata ..... in Windows explorer list view? - comp ... Java Delegate and Visitor Pattern examples ... - comp.lang.java ...Does anyone have sample java code for extracting pdf metadata ... Java Delegate and Visitor Pattern examples ... - comp.lang.java ... The website visitor should be able to ... How do i extract data from a pdf document using code - comp.text ...Does anyone have sample java code for extracting pdf metadata ... How do i extract data from a pdf document using code - comp.text ... Can anyone tell me how do do this or ... PDFBox in .NET - comp.text.pdfDoes anyone have sample java code for extracting pdf metadata ... PDFBox comes with the binaries so you don't need to do that unless you want to, I ... how to find fonts used in PDF file using Java - comp.text.pdf ...Does anyone have sample java code for extracting pdf metadata ..... in Windows explorer list view? - comp ..... want to add the same metadata (say, Author, subject ... adobe acrobat does not close with javaw.exe running - comp.lang ...Does anyone have sample java code for extracting pdf metadata ... I am still not sure which files to include to run my java code and which ... In C++ if you add an include ... Bootstrapping multivariate data - comp.soft-sys.matlabDoes anyone have sample java code for extracting pdf metadata ... Bootstrapping multivariate data - comp.soft-sys.matlab For instance, I have a matrix of sample data (MxN ... PDFBox - Documentation - comp.text.pdfDoes anyone have sample java code for extracting pdf metadata ... Sorry I am just confused about the importing process of PDFBox. The Java documentation of the classes are ... Does anyone have sample java code for extracting pdf metadata ...Hi, Does anyone have sample java code for extracting pdf metadata? The key fields that I am looking at is the "Creator", "Producer", "Author" and "Su... Does anyone have sample java code for extracting pdf metadataDoes anyone have sample java code for extracting pdf metadata. Programming and Web Development Forums - pdf - Adobe Acrobat and Portable Document Format technology. 7/21/2012 8:17:59 PM
|