f



Type-written text to TeX?

Slightly OT, but I have several hundred pages of typewritten material
already scanned in. (I just have a PDF file with one image per page.)
There is a small amount of maths in it, but not much.

Can anyone advise me of a good OCR program to use?
I'd prefer a Linux program, but am not religious.

I tried gocr, which was useless,
and am presently trying clara, which looks as though it might work,
although it seems to require several degrees in Computer Science
to understand and use.

Also, is there a program to abstract the images from the PDF file?
I've written a little Java program to do this,
but would prefer to use a kosher application.

-- 
Timothy Murphy  
e-mail (<80k only): tim /at/ birdsnest.maths.tcd.ie
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
0
tim549 (916)
9/13/2004 12:13:37 PM
comp.text.tex 39029 articles. 3 followers. Post Follow

1 Replies
1739 Views

Similar Articles

[PageSpeed] 42

Timothy Murphy wrote:
> Slightly OT, but I have several hundred pages of typewritten material
> already scanned in. (I just have a PDF file with one image per page.)
> There is a small amount of maths in it, but not much.
> 
> Can anyone advise me of a good OCR program to use?
> I'd prefer a Linux program, but am not religious.
> 
> I tried gocr, which was useless,
> and am presently trying clara, which looks as though it might work,
> although it seems to require several degrees in Computer Science
> to understand and use.

I have once (April 2001) reviewed several OCR solutions for UNIX, see

http://tinyurl.com/54xdu

(links to a long URL with Google Translation Service from German to 
English).

At that point in time, gocr was almost not usable (obviously, this is 
still true). I had not been aware of ClaraOCR at that time, but its 
results look more promising. But I also had problems with its user 
interface. I thougt, these were "alpha-level" software "features" 
(remember, this was in 2001). It's a pity, that progress has been 
minimal according to your description.

You said, you prefered a Linux solution: There is Vividata OCRShop 
(commercial, expensive > $1000) and there once was scanworx (for Sun 
machines, quite good recognition, from Xerox, no more support or 
development, AFAIK).

If you just want to get the job done, I can only recommend using 
Omnipage (an out-dated version such as 9 or 10 will be fine.) The 
results are *way better* (see the results on my page) than even the 
commercial programs for UNIX mentioned before.

Maybe Abbyy Finereader does a good job, as well. There was once a free 
version on a CD of a the computer magazin I subscribe to. There is a 
free 15-day try&buy edition at:

http://www.abbyy.com/download/?param=28844

If you get the job done in 15 days, that's all you need.

> Also, is there a program to abstract the images from the PDF file?
> I've written a little Java program to do this,
> but would prefer to use a kosher application.

Use "pdfimages", that comes with xpdf. Also note, that there is a switch 
"-j" which instructs pdfimages to leave JPEG in JPEG form (just in case, 
you embedded the scans as JPEG).

Then use "convert -resolution xxx file.ext file.tif", as TIFF together 
with a correct resolution tag is the format, that most OCR software can 
read without problems. Resolution is the original scanning resolution. 
It is needed so that the OCR software will be able to determine a 
reasonable font size (just in case you care ...).

Ralf

PS. I also consider your question off-topic in a TeX-related news group, 
but maybe other TeXies can make use of the information I gave, as well.

-- 
Dipl.-Inf. Ralf Koenig, Professur Rechnernetze und Verteilte Systeme
Technische Universitaet Chemnitz, Tel. +371-531-1532
0
9/13/2004 8:11:48 PM
Reply:

Similar Artilces:

text + text
What is "text + text" supposed to do right now? It doesn't seem very useful to me. What about making "text + text" as an equivalent for "text || text"? Most strongly-typed programming languages do this. And MS SQL Server too, I think (CMIIW). -- dave ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Am Freitag, 8. Oktober 2004 12:57 schrieb David Garamond: > What is "text + text" supposed to do right now? Nothing. > What a...

text-text
Wondering how what I input to my UTF-8 terminal gets passed along through my patched [1] trn ... Cyrillic: А Б В Г Д Е Ж З И Й К Л М Н О П а б в г д е ж з и й к л м н о п IPA: ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ [1] https://groups.google.com/d/msg/comp.sys.raspberry-pi/7Z37Hdrm0DM/6aqD-reXFzAJ ...

setting TeX text variable depending on keyval-generated param? (low-level TeX question)
I am using keyval to set a "scaled" parameter and am having trouble getting the right behavior, which is: If I \usepackage{package} with no [scaled] option, I want to set a variable to a compile-time-specified default, say "s*[a.b]". If I \usepackage[scaled]{package}, same thing. And if I \usepackage[scaled=x.y], I want to set the variable to "s*[x.y]". (This is for font scaling, and I've used as my model the Helvetica scaled parameter.) The variable name (for the purposes of this conversation) is pnm@scaled. What I have that's not working correctly i...

Text Uicontrol & TeX
Is there a way to use LaTeX formulas as a String in a uicontrol text with MATLAB 7??? Thx DrBrain ...

Why does $TEXT() not return text?
I am looking at a file I have created named TMGUSRI2.m This is with GT.M 6.2-002A Here is a stat of the .m file and the .o file kdt0p@poweredge:/opt/worldvista/EHR/p$ stat TMGUSRI2.m File: `TMGUSRI2.m' Size: 19171 Blocks: 40 IO Block: 4096 regular file Device: 821h/2081d Inode: 705008 Links: 1 Access: (0744/-rwxr--r--) Uid: ( 1000/ kdt0p) Gid: ( 1000/ kdt0p) Access: 2015-08-10 20:50:51.000000000 -0400 Modify: 2015-06-23 20:23:12.000000000 -0400 Change: 2015-06-23 20:23:11.000000000 -0400 Birth: - kdt0p@poweredge:/opt/worldvista/EH...

text 2 text
I'm rather new to ustation and am having trouble finding replacements for all of my Acad lisp routines. One that I'd like to find changes a selected text string to read like the second selected text string. I realize this can all be done in the text editor but it's not all that quick. Does anyone know of a macro or mdl which can handle this task? Thanks in advance.---Calvin I don't know of any application that will do what you describe, but have you looked at Edit > Find/Replace text? You may also find Bentley's discussion groups of assistance. Over th...

Text from required text box to read-only text box
Hello, I am fairly new to JavaScript and its use in Acrobat Professional. My situation is this: I have a form with a text box field which is required for the user to enter his/her name. I would like the required text box to display the name in all caps. I also need the user's name to appear in a read-only text box later in the form, which I would like to have the first letter of the user's first, middle initial, and last names to be capitalized. I would also like to have all required fields on the form highlighted in yellow, but the highlighting not printed. Lastly, I would like the...

Replace text in text box with innerhtml type thing
Currently, I am having a problem replacing the value of a input box with something else using the innerHTML thing. Right now I have something going <script type="text/javascript"><!-- function changeText(newText){ document.getElementById("WHATEVER").innerHTML=newText } //--> </script> and a link with <a href='javascript:changeText("Hola Mi Amigo")'>Dont know</a> and the text box like <INPUT TYPE="TEXT" NAME="WHATEVER" id="WHATEVER" VALUE="TESTING" SIZE=60"> and I am tryi...

Encryption from text to text...
I am looking for an encryption algorithm that converts plain texts into another form, one character to another...What are the nice and simple algorithms out there? Thanks. In article <418f03f8@news.starhub.net.sg>, Kelvin <thefatcat28@hotmail.com> wrote: > I am looking for an encryption algorithm that converts plain texts into > another > form, one character to another...What are the nice and simple algorithms out > there? > > Thanks. You can look on CPAN (<http://search.cpan.org>) for various Crypt:: modules. FYI: this newsgroup is defunct. Try co...

Converting pain text to TeX
Hi, I am trying to convert a plain text file to TeX format. For this I wrote a awk script to convert double quotes to the correct type in TeX, as in #!/usr/bin/awk -f # DQ2TQ : Converts double qoutes in plaint text file to TeX format # USAGE # dq2tq filename > newfilename BEGIN { count = 0 } /"/ { for (i=1; i<=NF; i++) {if (count % 2 == 0) {sub(/"/,"``",$i);count++} else {sub(/"/,"''",$i);count++}} } { print } However this does not seem to work right. Given the following input : " " " " &...

Surrounding text with text
I was wondering if it was possible to surround a text body with text like so: +--------------+ |ABCDEFGHIJKLM | |H N| |A Main Body O| |L Text goes P| |B here Q| |-=+_ZYXWVUTSR | | | | | | | | | +--------------+ This seems far-fetched, but just curious. I suppose that I could just move stuff around by hand in the GIMP later on, but there's probably a {better,more {extensible,clean}} way of doing it from (La)TeX. -FreeSmith ptjm@interlog.com (Patrick TJ McPhee) wrote in message news:<bffbhe$per$1@news.eusc.inter...

I need a text editor written with text draw calls
I am using wxPython (wxWindows) and need a simple non-styled text control with an arbitrary image background. Since wx does not support transparent colored backgrounds on controls, I'm being forced to use a simple pane with a bitmap loaded and write the text directly over the bitmap with text drawing calls. Does anyone know of any existing python code that implements the logic of a text control with direct calls to painting text on any canvas? I am not looking forward to writing this from scratch in this day and age. I toyed around for a while with the idea of implementing an off-scree...

Pages
I have a titel-textfield over a pic (headline) , text is black, background for text transparent. A second textfield should overlapp the first textline..... When i arrange the second field with the same settings like the first, the first text disappear... How can i do, that the second text overpapps the first, all over the pic. Any help appreciated! Thanks for replies! I am german and hope that the engish speaking people understand my problem! Soory! Gerd In article <611db9e2-b085-4fe5-907a-ca714b0c32dd@m74g2000hsh.googlegroups.com>, hurlebaus <gerd.schenk@freenet.de> wrote:...

TeX-Interpreter for static text(GUI)!
Hello everybody! I've got the problem that when I type LaTeX-orders like V_{DS} or $V_{DS} $ a.s.o. as strings in static text, this orders are not converted. How can I manage, that my LaTeX-orders are interpreted! Thank you very much! Thomas ...

Web resources about - Type-written text to TeX? - comp.text.tex

Resources last updated: 2/6/2016 11:32:52 PM