f



Recommendations for PDF to (X)HTML conversion tools?

Hi peeps,

I'm currently faced with the task of converting a large set of PDF
resources into HTML and am hoping that those with experience with this
could make some recommendations. The resources consist of simple
tables, textual documents, scanned images etc, etc. 

Of primary importance for me is that the HTML retains the layout of
tables and documents as much as possible - my experiments with the
various demos and open source tools out there are not providing very
high quality results. 

Oh and XHTML output would be very nice but lower on the priority
list...

Many thanks in advance

Lee
0
L
7/7/2003 2:10:42 PM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

5 Replies
875 Views

Similar Articles

[PageSpeed] 25

Adrian

> Hi peeps,

Is this a fellow Brit I spy?

> I'm currently faced with the task of converting a large set of PDF
> resources into HTML and am hoping that those with experience with this
> could make some recommendations. The resources consist of simple
> tables, textual documents, scanned images etc, etc.

HTML and PDF are fundamentally different. HTML was designed to specify the
meaning of document content and leave the precise rendering and layout up to
the browser. PDF was designed to specify the appearance of a document and
largely ignore the meaning of the document content.

HTML is being changed to allow greater control over the appearance of a
document and PDF is being changed to allow the meaning of a document to be
better represented. However the fact that the two specifications are based
on diametrically opposed concepts does mean that it can be phenomenally
difficult to convert between the two.

In order to deconstruct in the way you describe you essentially need to
deconstruct and 'render' the PDF document. Look at the code for projects
like Ghostscript and you'll begin to get an idea of the level of complexity
involved.

> Of primary importance for me is that the HTML retains the layout of
> tables and documents as much as possible - my experiments with the
> various demos and open source tools out there are not providing very
> high quality results.

Please do post your results. IMHO any results are good!

Best Wishes

Jos

----------------------------------------------------------------------
jos vernon
http://www.websupergoo.com
..NET Image Components & Consultancy
----------------------------------------------------------------------


0
Jos
7/7/2003 4:03:22 PM
I just went through a conversion using the plug-in mentioned by Mr. Farrow.
I got an html and an xml file from a pdf.  All I had to do was use the make
accessible plug-in before using the conversion plug-in.  Very easy.

Hugh

"john farrow" <george@xmldpf.com> wrote in message
news:fFlOa.3454$9f7.373185@news02.tsnz.net...
> Have a look on the Adobe site there is a beta (but working) version of an
> Acrobat extension which allows you to save the PDF as XML, from which you
> could presumably use XSLT to get HTML.
>
> Regards
>
> John Farrow
> www.xmlpdf.com
>
> "L. Adrian Jackson" <leej@citymutual.com> wrote in message
> news:g2vigvol5crnlkavgt1lsich25fnn5o6rd@4ax.com...
> > Hi peeps,
> >
> > I'm currently faced with the task of converting a large set of PDF
> > resources into HTML and am hoping that those with experience with this
> > could make some recommendations. The resources consist of simple
> > tables, textual documents, scanned images etc, etc.
> >
> > Of primary importance for me is that the HTML retains the layout of
> > tables and documents as much as possible - my experiments with the
> > various demos and open source tools out there are not providing very
> > high quality results.
> >
> > Oh and XHTML output would be very nice but lower on the priority
> > list...
> >
> > Many thanks in advance
> >
> > Lee
>
>


0
Hugh
7/7/2003 10:03:46 PM
I stand corrected. What's the URL for the plugin?

Best Wishes

Jos

----------------------------------------------------------------------
jos vernon
http://www.websupergoo.com
..NET Image Components & Consultancy
----------------------------------------------------------------------


0
Jos
7/7/2003 11:06:29 PM
Jos Vernon wrote:
> I stand corrected. What's the URL for the plugin?
>
For Windows its here
http://www.adobe.com/support/downloads/product.jsp?product=1&platform=Windows


0
Nadeem
7/8/2003 8:45:48 AM
Thanks for the comments peeps.

I dont currently have access to a copy of acrobat (sadly) however have
had surprisingly good results from Icenis Gemini solo. Given that its
also afordable and has a built in batch mode ( very useful ) I suspect
I'll be heading along that route.

Thanks again all

Lee

On Mon, 07 Jul 2003 15:10:42 +0100, L. Adrian Jackson
<leej@citymutual.com> wrote:

>Hi peeps,
>
>I'm currently faced with the task of converting a large set of PDF
>resources into HTML and am hoping that those with experience with this
>could make some recommendations. The resources consist of simple
>tables, textual documents, scanned images etc, etc. 
>
>Of primary importance for me is that the HTML retains the layout of
>tables and documents as much as possible - my experiments with the
>various demos and open source tools out there are not providing very
>high quality results. 
>
>Oh and XHTML output would be very nice but lower on the priority
>list...
>
>Many thanks in advance
>
>Lee

0
L
7/8/2003 1:50:12 PM
Reply: