Recovering a LaTeX file from PDF

Is there any better way of recovering a LaTeX file from the PDF output
(on a Linux system) than running the application pdftotext ,
and editing the result?

On 25 Set, 14:38, Timothy Murphy <gayle...@eircom.net> wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?
>
Of course there is. Unfortunately nobody wrote it yet.

On 25-09-2010 14:38, Timothy Murphy wrote:

> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?

You will find a few suggestions at the FAQ:

http://www.tex.ac.uk/cgi-bin/texfaq2html?label=recovertex

Best regards,

Jose Carlos Santos

Timothy Murphy <gayleard@eircom.net> wrote:

> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?

Unless the source has been included with the attachfile package, I guess
there's basically no better way.

On 2010-09-26, Benoit RIVET <benoit.rivet@libre.fr.invalid> wrote:
> Timothy Murphy <gayleard@eircom.net> wrote:
>
>> Is there any better way of recovering a LaTeX file from the PDF output
>> (on a Linux system) than running the application pdftotext ,
>> and editing the result?
>
> Unless the source has been included with the attachfile package, I guess
> there's basically no better way.

And that will not recover the latex file for you either. It might
recover the text, but will not recover any of the latex for you.


unruh <unruh@wormhole.physics.ubc.ca> wrote:

> >> Is there any better way of recovering a LaTeX file from the PDF output
> >> (on a Linux system) than running the application pdftotext ,
> >> and editing the result?
> >
> > Unless the source has been included with the attachfile package, I guess
> > there's basically no better way.
>
> And that will not recover the latex file for you either. It might
> recover the text, but will not recover any of the latex for you.

I did once use attachfile to include a tex source in the compiled pdf
file, allowing me therefore to get rid of the source and only keep the
pdf for archiving purpose.

For example, if I compile the following test.tex file

\documentclass{article}
\usepackage{lipsum, attachfile}

\begin{document}

\lipsum

\vfill

\attachfile{test.tex}{source code}
\end{document}

the source code will be embedded in the pdf file as a attached file (and

unruh <unruh@wormhole.physics.ubc.ca> wrote:

> >> Is there any better way of recovering a LaTeX file from the PDF output
> >> (on a Linux system) than running the application pdftotext ,
> >> and editing the result?
> >
> > Unless the source has been included with the attachfile package, I guess
> > there's basically no better way.
>
> And that will not recover the latex file for you either. It might
> recover the text, but will not recover any of the latex for you.

On Sep 26, 1:59 pm, benoit.rivet@libre.fr.invalid (Benoit RIVET)
wrote:
wrote:
> unruh <un...@wormhole.physics.ubc.ca> wrote:
> > >> Is there any better way of recovering a LaTeX file from the PDF outp=
ut
> > >> (on a Linux system) than running the application pdftotext ,
> > >> and editing the result?
>
> > > Unless the source has been included with the attachfile package, I gu=
ess
> > > there's basically no better way.
>
> > And that will not recover the latex file for you either. It might
> > recover the text, but will not recover any of the latex for you.
>
> I did once use attachfile to include a tex source in the compiled pdf
> file, allowing me therefore to get rid of the source and only keep the
> pdf for archiving purpose.
> [...]
> the source code will be embedded in the pdf file as a attached file (and
>

This latter approach kind of makes sense (if the document is CC-BY or
something).

Luis.

On Sep 25, 10:10 am, Tordar <orodri...@gmail.com> wrote:
> On 25 Set, 14:38, Timothy Murphy <gayle...@eircom.net> wrote:
>
> > Is there any better way of recovering a LaTeX file from the PDF output
> > (on a Linux system) than running the application pdftotext ,
> > and editing the result?
>
> > --
> > Timothy Murphy =A0
> > e-mail: gayleard /at/ eircom.net
> > tel: +353-86-2336090, +353-1-2842366
> > s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
>
> Of course there is. Unfortunately nobody wrote it yet.

If nobody has written it yet, of course there is NOT.

(Unless you assume that, if it is *possible*, then it should "exist"
in some possible world out there.)

Luis.

On 25/09/10 14:38, Timothy Murphy wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?

No. That is like trying to recreate whole cows out of hamburgers, or
recreate whole eggs from scrambled eggs. The information contained in a
LaTeX document (in fact, any preparation system, including
wordprocessors) gets consumed when it is typeset to PDF, and the only
information left behind is about fonts and position on the page.

(That's not wholly true: there are ways of preserving the information
but they are not the default: the creator would have to have taken extra

There are PDF-to-wordprocessor packages you can buy which will try to
recreate the appearance, but I have never used them. In any case, they
absolutely cannot interpret the text and work out the *reason* for the
position and font (eg \section). That information would have to be put
back in again by hand.

///Peter

On Thu, 30 Sep 2010 19:50:38 +0100, Peter Flynn
<peter.nosp@m.silmaril.ie> wrote:

>On 25/09/10 14:38, Timothy Murphy wrote:
>> Is there any better way of recovering a LaTeX file from the PDF output
>> (on a Linux system) than running the application pdftotext ,
>> and editing the result?
>
>No. That is like trying to recreate whole cows out of hamburgers, or
>recreate whole eggs from scrambled eggs. The information contained in a
>LaTeX document (in fact, any preparation system, including
>wordprocessors) gets consumed when it is typeset to PDF, and the only
>information left behind is about fonts and position on the page.
>
>(That's not wholly true: there are ways of preserving the information
>but they are not the default: the creator would have to have taken extra

If hyperref is used, it automatically supplies some
information. For example, a section header is marked
with a named anchor, with default names like "section.1".
A hyperref-aware pdf-to-text conversion program could
incorporate some of the sectional structure at least.

Also, assuming use of some standard set of fonts and some
standard classes, there might be reasonable heuristic methods
that could detect titles of sections, math displays, etc.

Of course, the creator of the PDF would have to have used
hyperref, have scupulously avoided visual formatting, and
have avoided all but standard packages.

That said, I am not aware of any pdf-to-text programs that
incorporate any awareness of TeX at all, much less hyoerref.

