Is there any better way of recovering a LaTeX file from the PDF output
(on a Linux system) than running the application pdftotext ,
and editing the result?
--
Timothy Murphy
e-mail: gayleard /at/ eircom.net
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
|
|
0
|
|
|
|
Reply
|
Timothy
|
9/25/2010 1:38:16 PM |
|
On 25 Set, 14:38, Timothy Murphy <gayle...@eircom.net> wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?
>
> --
> Timothy Murphy =A0
> e-mail: gayleard /at/ eircom.net
> tel: +353-86-2336090, +353-1-2842366
> s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
Of course there is. Unfortunately nobody wrote it yet.
|
|
0
|
|
|
|
Reply
|
Tordar
|
9/25/2010 3:10:24 PM
|
|
On 25-09-2010 14:38, Timothy Murphy wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?
You will find a few suggestions at the FAQ:
http://www.tex.ac.uk/cgi-bin/texfaq2html?label=recovertex
Best regards,
Jose Carlos Santos
|
|
0
|
|
|
|
Reply
|
ISO
|
9/25/2010 3:14:29 PM
|
|
Timothy Murphy <gayleard@eircom.net> wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?
Unless the source has been included with the attachfile package, I guess
there's basically no better way.
|
|
0
|
|
|
|
Reply
|
benoit
|
9/26/2010 7:04:33 AM
|
|
On 2010-09-26, Benoit RIVET <benoit.rivet@libre.fr.invalid> wrote:
> Timothy Murphy <gayleard@eircom.net> wrote:
>
>> Is there any better way of recovering a LaTeX file from the PDF output
>> (on a Linux system) than running the application pdftotext ,
>> and editing the result?
>
> Unless the source has been included with the attachfile package, I guess
> there's basically no better way.
And that will not recover the latex file for you either. It might
recover the text, but will not recover any of the latex for you.
|
|
0
|
|
|
|
Reply
|
unruh
|
9/26/2010 4:15:30 PM
|
|
unruh <unruh@wormhole.physics.ubc.ca> wrote:
> >> Is there any better way of recovering a LaTeX file from the PDF output
> >> (on a Linux system) than running the application pdftotext ,
> >> and editing the result?
> >
> > Unless the source has been included with the attachfile package, I guess
> > there's basically no better way.
>
> And that will not recover the latex file for you either. It might
> recover the text, but will not recover any of the latex for you.
I did once use attachfile to include a tex source in the compiled pdf
file, allowing me therefore to get rid of the source and only keep the
pdf for archiving purpose.
For example, if I compile the following test.tex file
\documentclass{article}
\usepackage{lipsum, attachfile}
\begin{document}
\lipsum
\vfill
\attachfile{test.tex}{source code}
\end{document}
the source code will be embedded in the pdf file as a attached file (and
can be recovered with Adobe Reader).
--
Beno�t
|
|
0
|
|
|
|
Reply
|
benoit
|
9/26/2010 6:57:42 PM
|
|
unruh <unruh@wormhole.physics.ubc.ca> wrote:
> >> Is there any better way of recovering a LaTeX file from the PDF output
> >> (on a Linux system) than running the application pdftotext ,
> >> and editing the result?
> >
> > Unless the source has been included with the attachfile package, I guess
> > there's basically no better way.
>
> And that will not recover the latex file for you either. It might
> recover the text, but will not recover any of the latex for you.
I did once use attachfile to include a tex source in the compiled pdf
file, allowing me therefore to get rid of the source and only keep the
pdf for archiving purpose.
For example, if I compile the following test.tex file
\documentclass{article}
\usepackage{lipsum, attachfile}
\begin{document}
\lipsum
\vfill
\attachfile{test.tex}{source code}
\end{document}
the source code will be embedded in the pdf file as a attached file (and
can be recovered with Adobe Reader).
--
Beno�t
|
|
0
|
|
|
|
Reply
|
benoit
|
9/26/2010 6:59:51 PM
|
|
On Sep 26, 1:59=A0pm, benoit.ri...@libre.fr.invalid (Benoit RIVET)
wrote:
> unruh <un...@wormhole.physics.ubc.ca> wrote:
> > >> Is there any better way of recovering a LaTeX file from the PDF outp=
ut
> > >> (on a Linux system) than running the application pdftotext ,
> > >> and editing the result?
>
> > > Unless the source has been included with the attachfile package, I gu=
ess
> > > there's basically no better way.
>
> > And that will not recover the latex file for you either. It might
> > recover the text, but will not recover any of the latex for you.
>
> I did once use attachfile to include a tex source in the compiled pdf
> file, allowing me therefore to get rid of the source and only keep the
> pdf for archiving purpose.
> [...]
> the source code will be embedded in the pdf file as a attached file (and
> can be recovered with Adobe Reader).
>
This latter approach kind of makes sense (if the document is CC-BY or
something).
Luis.
|
|
0
|
|
|
|
Reply
|
Luis
|
9/27/2010 8:47:57 PM
|
|
On Sep 25, 10:10=A0am, Tordar <orodri...@gmail.com> wrote:
> On 25 Set, 14:38, Timothy Murphy <gayle...@eircom.net> wrote:
>
> > Is there any better way of recovering a LaTeX file from the PDF output
> > (on a Linux system) than running the application pdftotext ,
> > and editing the result?
>
> > --
> > Timothy Murphy =A0
> > e-mail: gayleard /at/ eircom.net
> > tel: +353-86-2336090, +353-1-2842366
> > s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
>
> Of course there is. Unfortunately nobody wrote it yet.
If nobody has written it yet, of course there is NOT.
(Unless you assume that, if it is *possible*, then it should "exist"
in some possible world out there.)
Luis.
|
|
0
|
|
|
|
Reply
|
Luis
|
9/27/2010 8:49:30 PM
|
|
On 25/09/10 14:38, Timothy Murphy wrote:
> Is there any better way of recovering a LaTeX file from the PDF output
> (on a Linux system) than running the application pdftotext ,
> and editing the result?
No. That is like trying to recreate whole cows out of hamburgers, or
recreate whole eggs from scrambled eggs. The information contained in a
LaTeX document (in fact, any preparation system, including
wordprocessors) gets consumed when it is typeset to PDF, and the only
information left behind is about fonts and position on the page.
(That's not wholly true: there are ways of preserving the information
but they are not the default: the creator would have to have taken extra
steps or use additional software.)
There are PDF-to-wordprocessor packages you can buy which will try to
recreate the appearance, but I have never used them. In any case, they
absolutely cannot interpret the text and work out the *reason* for the
position and font (eg \section). That information would have to be put
back in again by hand.
///Peter
|
|
0
|
|
|
|
Reply
|
Peter
|
9/30/2010 6:50:38 PM
|
|
On Thu, 30 Sep 2010 19:50:38 +0100, Peter Flynn
<peter.nosp@m.silmaril.ie> wrote:
>On 25/09/10 14:38, Timothy Murphy wrote:
>> Is there any better way of recovering a LaTeX file from the PDF output
>> (on a Linux system) than running the application pdftotext ,
>> and editing the result?
>
>No. That is like trying to recreate whole cows out of hamburgers, or
>recreate whole eggs from scrambled eggs. The information contained in a
>LaTeX document (in fact, any preparation system, including
>wordprocessors) gets consumed when it is typeset to PDF, and the only
>information left behind is about fonts and position on the page.
>
>(That's not wholly true: there are ways of preserving the information
>but they are not the default: the creator would have to have taken extra
>steps or use additional software.)
If hyperref is used, it automatically supplies some
information. For example, a section header is marked
with a named anchor, with default names like "section.1".
A hyperref-aware pdf-to-text conversion program could
incorporate some of the sectional structure at least.
Also, assuming use of some standard set of fonts and some
standard classes, there might be reasonable heuristic methods
that could detect titles of sections, math displays, etc.
Of course, the creator of the PDF would have to have used
hyperref, have scupulously avoided visual formatting, and
have avoided all but standard packages.
That said, I am not aware of any pdf-to-text programs that
incorporate any awareness of TeX at all, much less hyoerref.
Dan
To reply by email, change LookInSig to luecking
|
|
0
|
|
|
|
Reply
|
Dan
|
10/1/2010 5:00:41 PM
|
|
|
10 Replies
669 Views
(page loaded in 0.162 seconds)
|