Recovering a LaTeX file from PDF

  • Follow


Is there any better way of recovering a LaTeX file from the PDF output
(on a Linux system) than running the application pdftotext ,
and editing the result?

-- 
Timothy Murphy  
e-mail: gayleard /at/ eircom.net
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
0
Reply Timothy 9/25/2010 1:38:16 PM

On 25 Set, 14:38, Timothy Murphy <gayle...@eircom.net> wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?
>
> --
> Timothy Murphy =A0
> e-mail: gayleard /at/ eircom.net
> tel: +353-86-2336090, +353-1-2842366
> s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

Of course there is. Unfortunately nobody wrote it yet.
0
Reply Tordar 9/25/2010 3:10:24 PM


On 25-09-2010 14:38, Timothy Murphy wrote:

> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?

You will find a few suggestions at the FAQ:

http://www.tex.ac.uk/cgi-bin/texfaq2html?label=recovertex

Best regards,

Jose Carlos Santos
0
Reply ISO 9/25/2010 3:14:29 PM

Timothy Murphy <gayleard@eircom.net> wrote:

> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?

Unless the source has been included with the attachfile package, I guess
there's basically no better way.
0
Reply benoit 9/26/2010 7:04:33 AM

On 2010-09-26, Benoit RIVET <benoit.rivet@libre.fr.invalid> wrote:
> Timothy Murphy <gayleard@eircom.net> wrote:
>
>> Is there any better way of recovering a LaTeX file from the PDF output
>> (on a Linux system) than running the application pdftotext ,
>> and editing the result?
>
> Unless the source has been included with the attachfile package, I guess
> there's basically no better way.

And that will not recover the latex file for you either. It might
recover the text, but will not recover any of the latex for you.

0
Reply unruh 9/26/2010 4:15:30 PM

unruh <unruh@wormhole.physics.ubc.ca> wrote:

> >> Is there any better way of recovering a LaTeX file from the PDF output
> >> (on a Linux system) than running the application pdftotext ,
> >> and editing the result?
> >
> > Unless the source has been included with the attachfile package, I guess
> > there's basically no better way.
> 
> And that will not recover the latex file for you either. It might
> recover the text, but will not recover any of the latex for you.

I did once use attachfile to include a tex source in the compiled pdf
file, allowing me therefore to get rid of the source and only keep the
pdf for archiving purpose.

For example, if I compile the following test.tex file

\documentclass{article}
\usepackage{lipsum, attachfile}

\begin{document}

\lipsum

\vfill

\attachfile{test.tex}{source code}
\end{document}

the source code will be embedded in the pdf file as a attached file (and
can be recovered with Adobe Reader).

-- 
Beno�t
0
Reply benoit 9/26/2010 6:57:42 PM

unruh <unruh@wormhole.physics.ubc.ca> wrote:

> >> Is there any better way of recovering a LaTeX file from the PDF output
> >> (on a Linux system) than running the application pdftotext ,
> >> and editing the result?
> >
> > Unless the source has been included with the attachfile package, I guess
> > there's basically no better way.
> 
> And that will not recover the latex file for you either. It might
> recover the text, but will not recover any of the latex for you.

I did once use attachfile to include a tex source in the compiled pdf
file, allowing me therefore to get rid of the source and only keep the
pdf for archiving purpose.

For example, if I compile the following test.tex file

\documentclass{article}
\usepackage{lipsum, attachfile}

\begin{document}

\lipsum

\vfill

\attachfile{test.tex}{source code}
\end{document}

the source code will be embedded in the pdf file as a attached file (and
can be recovered with Adobe Reader).

-- 
Beno�t
0
Reply benoit 9/26/2010 6:59:51 PM

On Sep 26, 1:59=A0pm, benoit.ri...@libre.fr.invalid (Benoit RIVET)
wrote:
> unruh <un...@wormhole.physics.ubc.ca> wrote:
> > >> Is there any better way of recovering a LaTeX file from the PDF outp=
ut
> > >> (on a Linux system) than running the application pdftotext ,
> > >> and editing the result?
>
> > > Unless the source has been included with the attachfile package, I gu=
ess
> > > there's basically no better way.
>
> > And that will not recover the latex file for you either. It might
> > recover the text, but will not recover any of the latex for you.
>
> I did once use attachfile to include a tex source in the compiled pdf
> file, allowing me therefore to get rid of the source and only keep the
> pdf for archiving purpose.
> [...]
> the source code will be embedded in the pdf file as a attached file (and
> can be recovered with Adobe Reader).
>

This latter approach kind of makes sense (if the document is CC-BY or
something).

Luis.
0
Reply Luis 9/27/2010 8:47:57 PM

On Sep 25, 10:10=A0am, Tordar <orodri...@gmail.com> wrote:
> On 25 Set, 14:38, Timothy Murphy <gayle...@eircom.net> wrote:
>
> > Is there any better way of recovering a LaTeX file from the PDF output
> > (on a Linux system) than running the application pdftotext ,
> > and editing the result?
>
> > --
> > Timothy Murphy =A0
> > e-mail: gayleard /at/ eircom.net
> > tel: +353-86-2336090, +353-1-2842366
> > s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
>
> Of course there is. Unfortunately nobody wrote it yet.

If nobody has written it yet, of course there is NOT.

(Unless you assume that, if it is *possible*, then it should "exist"
in some possible world out there.)

Luis.
0
Reply Luis 9/27/2010 8:49:30 PM

On 25/09/10 14:38, Timothy Murphy wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?

No. That is like trying to recreate whole cows out of hamburgers, or
recreate whole eggs from scrambled eggs. The information contained in a
LaTeX document (in fact, any preparation system, including
wordprocessors) gets consumed when it is typeset to PDF, and the only
information left behind is about fonts and position on the page.

(That's not wholly true: there are ways of preserving the information
but they are not the default: the creator would have to have taken extra
steps or use additional software.)

There are PDF-to-wordprocessor packages you can buy which will try to
recreate the appearance, but I have never used them. In any case, they
absolutely cannot interpret the text and work out the *reason* for the
position and font (eg \section). That information would have to be put
back in again by hand.

///Peter
0
Reply Peter 9/30/2010 6:50:38 PM

On Thu, 30 Sep 2010 19:50:38 +0100, Peter Flynn
<peter.nosp@m.silmaril.ie> wrote:

>On 25/09/10 14:38, Timothy Murphy wrote:
>> Is there any better way of recovering a LaTeX file from the PDF output
>> (on a Linux system) than running the application pdftotext ,
>> and editing the result?
>
>No. That is like trying to recreate whole cows out of hamburgers, or
>recreate whole eggs from scrambled eggs. The information contained in a
>LaTeX document (in fact, any preparation system, including
>wordprocessors) gets consumed when it is typeset to PDF, and the only
>information left behind is about fonts and position on the page.
>
>(That's not wholly true: there are ways of preserving the information
>but they are not the default: the creator would have to have taken extra
>steps or use additional software.)

If hyperref is used, it automatically supplies some 
information. For example, a section header is marked 
with a named anchor, with default names like "section.1". 
A hyperref-aware pdf-to-text conversion program could 
incorporate some of the sectional structure at least. 

Also, assuming use of some standard set of fonts and some 
standard classes, there might be reasonable heuristic methods 
that could detect titles of sections, math displays, etc. 

Of course, the creator of the PDF would have to have used 
hyperref, have scupulously avoided visual formatting, and 
have avoided all but standard packages.

That said, I am not aware of any pdf-to-text programs that
incorporate any awareness of TeX at all, much less hyoerref.


Dan
To reply by email, change LookInSig to luecking
0
Reply Dan 10/1/2010 5:00:41 PM

10 Replies
669 Views

(page loaded in 0.162 seconds)

Similiar Articles:





7/23/2012 2:21:29 AM


Reply: