f



Obfuscating PDFs typeset with TeX, or modifying the underlying text in a PDF

I'd like to create PDFs using (La)TeX which are readable by humans but
cannot be read by naive automated techniques. For example, it should
not be possible to copy plain text out of the document. One easy way
to do this, I realize, would be typesetting the document normally then
taking a screenshot of each page and creating a new document
containing only images. Unfortunately, should the (human) reader want
to zoom in, he would find that the resolution is limited and
everything would become blocky.

A while back I did a Google search to determine how to obfuscate PDFs
in this way. I found a resource that described a simple and beautiful
idea: apply a permutation to the font glyph data and the inverse
permutation to the text in the TeX source. For example, suppose that
the glyph data for characters "e" and "l" were interchanged. Then, the
TeX source

Hleeo, wored!
\bye

would be used to produce what appears to be "Hello, world!" to a human
but, barring the implementation of OCR, "Hleeo, wored!" to a computer.

This will not yield exactly the same results, of course. I suppose
certain "professional" features of TeX such as kerning, ligatures, and
intelligent line breaking would be broken. However, this I am willing
to sacrifice.

I am somehow no longer able to locate this resource! Either it has
been taken offline, or it is simply eluding my efforts to find it. In
the latter case, can someone else direct me to the resource? Or, can
someone suggest another way of accomplishing my goal?

I notice that a lot of older papers available online appear not to
have been typeset in TeX but rather scanned from journals and the
like. However, it is still possible to copy the text out of them,
suggesting that they have been run through OCR and the text obtained
added somehow. Does this mean there is an easy way of independently
setting the text which would be obtained by copying? If so, this could
be another solution to my problem. Does anybody know how can it be
done?
0
ng3r9h7w (2)
9/12/2009 6:43:58 PM
comp.text.tex 39029 articles. 2 followers. Post Follow

5 Replies
406 Views

Similar Articles

[PageSpeed] 18

J. Random Hacker wrote:
> Or, can
> someone suggest another way of accomplishing my goal?

I am not sure I understand completely what you are trying to do. But 
newer pdf-versions have /ActualText that you can use to set the text 
being copied in copy/paste. This seems to work with Adobe Reader 9.1 on 
my system:

\pdfliteral direct{/Span << /ActualText (Hleeo, wored!) >> BDC }%
Hello World!%
\pdfliteral direct{EMC }%
\bye
0
mr_heller (370)
9/12/2009 11:04:46 PM
On Sep 12, 7:43=A0pm, "J. Random Hacker" <ng3r9...@gmail.com> wrote:
> I'd like to create PDFs using (La)TeX which are readable by humans but
> cannot be read by naive automated techniques.

Whatever you do, as long as it is possible to print the document (to a
file) and run OCR on it, all is lost (unless you typeset the document
in a "CAPTCHA" style). Newer versions of Acrobat come with OCR built-
in.

> someone suggest another way of accomplishing my goal?

Convert all fonts to paths or in other words all text to graphics. The
same basic outcome as rasterizing the pages but with vector graphics
as the end result.

Cheers,

Tomek
0
t34www (104)
9/13/2009 12:25:12 PM
On 2009-09-13 04:13:58 +0930, "J. Random Hacker" <ng3r9h7w@gmail.com> said:

> Does anybody know how can it be
> done?

The randtext package was designed for this:
  <http://www.ctan.org/tex-archive/help/Catalogue/entries/randtext.html>

I think Adobe Reader might be clever enough to bypass its obfuscation, 
however. (But automated scanning tools would presumably still be 
foiled.)

Hope this helps,
Will

0
wspr81 (1209)
9/13/2009 1:24:21 PM
Thanks guys for your responses.

On Sep 13, 8:25=A0am, T3X <t34...@googlemail.com> wrote:
> On Sep 12, 7:43=A0pm, "J. Random Hacker" <ng3r9...@gmail.com> wrote:
>
> > I'd like to create PDFs using (La)TeX which are readable by humans but
> > cannot be read by naive automated techniques.
>
> Whatever you do, as long as it is possible to print the document (to a
> file) and run OCR on it, all is lost (unless you typeset the document
> in a "CAPTCHA" style). Newer versions of Acrobat come with OCR built-
> in.
>
> > someone suggest another way of accomplishing my goal?
>
> Convert all fonts to paths or in other words all text to graphics. The
> same basic outcome as rasterizing the pages but with vector graphics
> as the end result.
>
> Cheers,
>
> Tomek

Anyway, I am assuming that I am not up against OCR. I'm a student who
has to submit assignments to Turnitin and I resent being forced to
enable Turnitin to use my work for commercial purposes. Since Turnitin
allows submitting as PDF, I would like to obfuscate the files I submit
so that the actual content of my assignments cannot be kept for
Turnitin's use. (Incidentally, this would also defeat the plagirism
detection if it worked, but this is not my actual purpose - honest!) I
suspect that Turnitin only does a quick-and-dirty automated conversion
to text and I highly doubt that they would go to the trouble of
implementing OCR because it would be unnecessary in 99.9% of cases.
(This is for an English class; the teacher would have no idea what is
going on, especially since she can download the PDF directly from the
Turnitin web site and see that there's "nothing wrong with it")

The randtext package is pretty good - I tried an online PDF to text
converter and it was certainly fooled. It does however have two faults
I noticed that make it difficult to use: one, it only properly handles
plain text as its argument; for example, if something in \emph{} is in
its argument, the macro expansion fails to produce correct TeX, and
two, spaces within its argument must be escaped as "\ " otherwise they
seem to be ignored (and no space is produced between pairs of adjacent
words).

So I'm most interested in your suggestion - how would I do this?
0
ng3r9h7w (2)
9/13/2009 2:49:51 PM
On 2009-09-12 at 20:04 ADT, Martin Heller <mr_heller@yahoo.dk> wrote:
> J. Random Hacker wrote:
>> Or, can
>> someone suggest another way of accomplishing my goal?
>
> I am not sure I understand completely what you are trying to do. But 
> newer pdf-versions have /ActualText that you can use to set the text 
> being copied in copy/paste. This seems to work with Adobe Reader 9.1 on 
> my system:
>
> \pdfliteral direct{/Span << /ActualText (Hleeo, wored!) >> BDC }%
> Hello World!%
> \pdfliteral direct{EMC }%
> \bye

Unfortunately, xpdf (and possibly whatever turnitin uses) grabs the
"expected" text, even with your code above.

However, what you have above is interesting in its own right.

Cheers.

				Jim
0
9/13/2009 6:10:07 PM
Reply:

Similar Artilces:

text-text
Wondering how what I input to my UTF-8 terminal gets passed along through my patched [1] trn ... Cyrillic: А Б В Г Д Е Ж З И Й К Л М Н О П а б в г д е ж з и й к л м н о п IPA: ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ [1] https://groups.google.com/d/msg/comp.sys.raspberry-pi/7Z37Hdrm0DM/6aqD-reXFzAJ ...

text + text
What is "text + text" supposed to do right now? It doesn't seem very useful to me. What about making "text + text" as an equivalent for "text || text"? Most strongly-typed programming languages do this. And MS SQL Server too, I think (CMIIW). -- dave ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Am Freitag, 8. Oktober 2004 12:57 schrieb David Garamond: > What is "text + text" supposed to do right now? Nothing. > What about making "text + text" as an equivalent for "text > || text"? Most strongly-typed programming languages do this. And MS SQL > Server too, I think (CMIIW). What would this gain except for bloat? It's not like SQL is utterly compatible with any programming language; users will still have to learn all the operators anyway. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Peter Eisentraut wrote: >>What is "text + text" supposed to do right now? > > Nothing. Then are these bugs? (7.4.5 and 8.0.0beta1 give same results). Frankly, the current behaviour is quite strange to me. ------------------ =...

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

setting TeX text variable depending on keyval-generated param? (low-level TeX question)
I am using keyval to set a "scaled" parameter and am having trouble getting the right behavior, which is: If I \usepackage{package} with no [scaled] option, I want to set a variable to a compile-time-specified default, say "s*[a.b]". If I \usepackage[scaled]{package}, same thing. And if I \usepackage[scaled=x.y], I want to set the variable to "s*[x.y]". (This is for font scaling, and I've used as my model the Helvetica scaled parameter.) The variable name (for the purposes of this conversation) is pnm@scaled. What I have that's not working correctly is: \expandafter\ifx\csname pmn@scaled\endcsname\relax \let\pmn@scaled\@empty \else \edef\pmn@scaled{s*[1.2]}% \fi I think what this says is "if pnm@scaled is not defined (case 1), leave it empty, otherwise set it to "s[1.2]". But the else clause is not working correctly, and I don't know how to differentiate between "defined by keyval but with no value" and "defined by keyval but with a value". Also I'm pretty sure I don't really know the difference between \let and \edef. Any help gratefully appreciated. Thanks. JDO john_owens@yahoo.com schrieb: > I am using keyval to set a "scaled" parameter and am having trouble > getting the right behavior, which is: > > If I \usepackage{package} with no [scaled] option, I want to set a > variable to a compile-time-specified default, say "s*[a.b]". > ...

ANN: Fly Text to PDF
Hi All: Fly Text to PDF 1.3 is powerful tool which can convert your text files into PDF. This tool is powerful converter tool running on Microsoft Windows Operating System. You can use this tool to convert your text report, text documents and other text files into PDF quickly and easily. You also can set the PDF properties in each text files by using special tags, or set the default properties for every output PDF files. Please visit our website for more information: http://www.medafan.com/pdf-tools For the output sample, please click on: http://www.medafan.com/pdf-tools/license.pdf Key fea...

PDF image of text to readable text ?
Seems there are web based tools and software. My son needs text to have it read for him. He has a PC. Found PDF reader $50 , http://thurly.net/11ia and http://thurly.net/11i4 the last being google. Wondering what you folks found useful or use ? Thanks! -- Bill S. Jersey USA zone 5 shade garden http://uppitywis.org/ live WI ...

Has anyone tried PDF text to SVG text?
Has anyone tried PDF text to SVG text? http://www.pdftron.com/pdf2svg/index.html The pdfTron PDF2SVG converter enables users to publish PDF documents in SVG (Scalable Vector Graphics), the open-standard W3C recommendation for high-end graphics on the web. The flawless conversion process produces compact SVG documents that can be viewed using freely available SVG viewers and plugins. "Dr Joolz" <jxm96c@hotmail.com> wrote in message news:c1oo74$kkv$1$8302bc10@news.demon.co.uk... > Has anyone tried PDF text to SVG text? > > pdf2vector is available in desktop, ser...

TeX to PDF to Acrobat (for review) and back to TeX
Hi TeXperts, my current pipeline is: - TeTeX compiler - write documentation with TeX (either with TeXshop or iTeXmac) - get as an output PDF files - with Acrobat professional share those PDF to multiple reviewers (either by mail or www) - every reviewer adds comments or text notes on the PDF using acrobat professional, then sends back review - Acrobat imports all comments form all reviewers automatically into the original PDF What I would like to do is to at this point is to convert back my PDF to TeX without loosing this extra blocks containing comments from reviewers OR alternatively revi...

Encryption from text to text...
I am looking for an encryption algorithm that converts plain texts into another form, one character to another...What are the nice and simple algorithms out there? Thanks. In article <418f03f8@news.starhub.net.sg>, Kelvin <thefatcat28@hotmail.com> wrote: > I am looking for an encryption algorithm that converts plain texts into > another > form, one character to another...What are the nice and simple algorithms out > there? > > Thanks. You can look on CPAN (<http://search.cpan.org>) for various Crypt:: modules. FYI: this newsgroup is defunct. Try comp.lang.perl.misc in the future. On Mon, 08 Nov 2004 13:35:46 +0800, Kelvin wrote: > I am looking for an encryption algorithm that converts plain texts into > another > form, one character to another...What are the nice and simple algorithms out > there? > > Thanks. I'm not sure how secure you need it to be, you could use the rot13 method. Used in newsgroups... It changes every letter to the letter 13 places ahead of it. The nice thing about it though is that you use the same method of to decode, (there's 26 letters). Not secure, but nice easy and simple.mine not sure how sercure you need it to be, you could use the rot13 method. Used in newsgroups... It changes every letter to the letter 13 places ahead of it. The nice thing about it though is that you use the same method of to decoed, (theres 26 letters). Not sercure, but nice easy and simple. -- Andre...

Type-written text to TeX?
Slightly OT, but I have several hundred pages of typewritten material already scanned in. (I just have a PDF file with one image per page.) There is a small amount of maths in it, but not much. Can anyone advise me of a good OCR program to use? I'd prefer a Linux program, but am not religious. I tried gocr, which was useless, and am presently trying clara, which looks as though it might work, although it seems to require several degrees in Computer Science to understand and use. Also, is there a program to abstract the images from the PDF file? I've written a little Java program to ...

supp-pdf.tex or supp-mis.tex bug?
I'v run into two possible bugs in the macros for including metapost graphics in pdflatex (or pdftex). I am not able to determine where the problem lies, but one bug appears only when using recent versions of supp-pdf.tex and supp-mis.tex (from ConTeXt beta). The other bug appears in the beta as well as in the the current official versions and recent past versions. Here is the .mp file that triggers the bugs. The bugs occur only in the second figure, which does nothing but reflect the first figure in vertical midline: %%%%%%%% rdbend.mp %%%%%%%% u:=20/36pt; hheight:=250/36pt; border:=2...

PDF Converter Pro
Anyone know how I can change the default font color and style for text boxes in PDF Converter Professional 3.0? Thanks. ...

Surrounding text with text
I was wondering if it was possible to surround a text body with text like so: +--------------+ |ABCDEFGHIJKLM | |H N| |A Main Body O| |L Text goes P| |B here Q| |-=+_ZYXWVUTSR | | | | | | | | | +--------------+ This seems far-fetched, but just curious. I suppose that I could just move stuff around by hand in the GIMP later on, but there's probably a {better,more {extensible,clean}} way of doing it from (La)TeX. -FreeSmith ptjm@interlog.com (Patrick TJ McPhee) wrote in message news:<bffbhe$per$1@news.eusc.inter...

ghostscript PDF page extraction, leaving text as text
Ghostscript may be used to extract pages from a PDF file with a command like this: gs -sDEVICE=pdfwrite \ -dNOPAUSE -dBATCH -dSAFER \ -dFirstPage=48 -dLastPage=48 \ -sOutputFile=onepage.pdf input.pdf The problem is, while that page looks the same as the original in a PDF reader, it seems to be an image rather than an "object" representation. That is, open the extracted PDF in something like Acrobat or PDF XChange Viewer and "search" and "text selection" work, whereas in the extracted one neither function works. Presumably this is because the text has been r...

text 2 text
I'm rather new to ustation and am having trouble finding replacements for all of my Acad lisp routines. One that I'd like to find changes a selected text string to read like the second selected text string. I realize this can all be done in the text editor but it's not all that quick. Does anyone know of a macro or mdl which can handle this task? Thanks in advance.---Calvin I don't know of any application that will do what you describe, but have you looked at Edit > Find/Replace text? You may also find Bentley's discussion groups of assistance. Over there you can meet other users of Bentley products, exchange ideas, and discuss a wide range of technical subjects. These groups are an excellent technical resource for all users of Bentley products and services. Hope to see you there! For more information take a peek at this page: http://selectservices.bentley.com/discussion/index.htm -- Best Regards, Inga Morozoff [Bentley] www.askinga.com "jgonzales24" <jgonzales24@cox.net> wrote in message news:xcLMb.13975$zs4.2376@fed1read01... > I'm rather new to ustation and am having trouble finding replacements for > all of my Acad lisp routines. One that I'd like to find changes a selected > text string to read like the second selected text string. I realize this > can all be done in the text editor but it's not all that quick. Does anyone > know of a macro or mdl whi...

Converting pain text to TeX
Hi, I am trying to convert a plain text file to TeX format. For this I wrote a awk script to convert double quotes to the correct type in TeX, as in #!/usr/bin/awk -f # DQ2TQ : Converts double qoutes in plaint text file to TeX format # USAGE # dq2tq filename > newfilename BEGIN { count = 0 } /"/ { for (i=1; i<=NF; i++) {if (count % 2 == 0) {sub(/"/,"``",$i);count++} else {sub(/"/,"''",$i);count++}} } { print } However this does not seem to work right. Given the following input : " " " " &...

Underlying text in Xelatex PDFs
Hello, One reason I switched to Xelatex was that I was interested in being able to create fully unicode documents (I'm a linguist, and often have linguistic symbols as well as Arabic text in my documents) that could be cut and pasted (for example, if I need to change the document to a word file for a journal). One problem I have is that ligatures often do not transfer well, and in general I'd prefer to be able to cut and paste the individual characters (the underlying text). Simply removing all of the commands from the original text file isn't an option either, as I often use Arabxetex for transliterating Arabic. Is there some way to instruct Xelatex to create a PDF that displays the ligatures, but when someone selects the text to cut and paste, they select a non-ligatured underlying text? I know this is possible with PDFs that have been OCRed, i.e. to have an image of the text overlying an actual text. Failing this, is it possible to disable ligatures? Thanks! On Jul 22, 10:41=A0am, Skander <amagi...@gmail.com> wrote: > One reason I switched to Xelatex was that I was interested in being > able to create fully unicode documents (I'm a linguist, and often have > linguistic symbols as well as Arabic text in my documents) that could > be cut and pasted (for example, if I need to change the document to a > word file for a journal). One problem I have is that ligatures often > do not transfer well, and in general I'd prefer to be ...

Why does $TEXT() not return text?
I am looking at a file I have created named TMGUSRI2.m This is with GT.M 6.2-002A Here is a stat of the .m file and the .o file kdt0p@poweredge:/opt/worldvista/EHR/p$ stat TMGUSRI2.m File: `TMGUSRI2.m' Size: 19171 Blocks: 40 IO Block: 4096 regular file Device: 821h/2081d Inode: 705008 Links: 1 Access: (0744/-rwxr--r--) Uid: ( 1000/ kdt0p) Gid: ( 1000/ kdt0p) Access: 2015-08-10 20:50:51.000000000 -0400 Modify: 2015-06-23 20:23:12.000000000 -0400 Change: 2015-06-23 20:23:11.000000000 -0400 Birth: - kdt0p@poweredge:/opt/worldvista/EH...

Pages
I have a titel-textfield over a pic (headline) , text is black, background for text transparent. A second textfield should overlapp the first textline..... When i arrange the second field with the same settings like the first, the first text disappear... How can i do, that the second text overpapps the first, all over the pic. Any help appreciated! Thanks for replies! I am german and hope that the engish speaking people understand my problem! Soory! Gerd In article <611db9e2-b085-4fe5-907a-ca714b0c32dd@m74g2000hsh.googlegroups.com>, hurlebaus <gerd.schenk@freenet.de> wrote:...

Text from required text box to read-only text box
Hello, I am fairly new to JavaScript and its use in Acrobat Professional. My situation is this: I have a form with a text box field which is required for the user to enter his/her name. I would like the required text box to display the name in all caps. I also need the user's name to appear in a read-only text box later in the form, which I would like to have the first letter of the user's first, middle initial, and last names to be capitalized. I would also like to have all required fields on the form highlighted in yellow, but the highlighting not printed. Lastly, I would like the...

pdf uncripted but I cannot modify text! help !
I have uncripted a pdf file that was protected by password. All seem to be ok but I cannot modify the text ! How I can resolve the probl�em? please help me. Enermax Il /02 ago 2006/, *EnerMax* ha scritto: > I have uncripted a pdf file that was protected by password. All > seem to be ok but I cannot modify the text ! > How I can resolve the probl�em? maybe your pdf document have compressed data, try to uncompress with *pdftk* - http://www.pdfhacks.com/pdftk/pdftk-1.12.exe.zip Uncompress PDF page streams for editing the PDF pdftk doc.pdf output doc.unc.pdf uncompress And ...

TeX-Interpreter for static text(GUI)!
Hello everybody! I've got the problem that when I type LaTeX-orders like V_{DS} or $V_{DS} $ a.s.o. as strings in static text, this orders are not converted. How can I manage, that my LaTeX-orders are interpreted! Thank you very much! Thomas ...

change text color with TeX support?
All Consider the string involving greek letters to appear in a plot title made with the command: SetaEL='\fontsize{14}{\eta}\fontsize{10}EL\fontsize{10}=' when used in a title argument, the string SetaEL plots like: etaEL= except that eta will be the actual greek letter of course. This works! BUT THE COLOR ID DEFAULT IS BLACK. THE QUESTION: How to change my SetaEL command useing TeX syntax, so it is say red? Thanks in advance....tony Could you be a little more specific? If you want the whole title to be in red (or any other color), I don't think you need to...

How can I modify the name of the text section which is by default ".text"
When C source code is compiled into an object file (.o), the functions are put into the .text section. Is it possible that I put all functions to a renamed text section whose name may be ".text.app". When linking several .o files into an executable, I need to put the code of .o files into different positions(in MCU). If I can rename the text section of object files, I can set different section start addresses for each section. LinNan wrote: > When C source code is compiled into an object file (.o), the functions > are put into the .text section. > Is it possible that I p...

Web resources about - Obfuscating PDFs typeset with TeX, or modifying the underlying text in a PDF - comp.text.tex

Obfuscating Code
.NET is almost exclusively Just-In-Time compiled. JIT'ing means, "I was just about to interpret this, but I'll compile it at the very last minute ...

IRS obfuscating the truth about targeting
After three congressional hearings on the burgeoning scandal, we have learned nothing from the IRS.

Obfuscating With Statistics: Obama's Pre-existing Conditions Crisis
How to Lie with Statistics was a standard college statistics textbook in the 1960s. It became one of the best-selling statistics books in history ...

Donald Trump Is Obfuscating the Present State of Trump University
This is not a current rating. The Better Business Bureau does not currently have a rating for the Trump Entrepreneur Initiative ( formerly known ...

Tapper: Carney’s Benghazi Comments ‘Dissembling, Obfuscating, and Often Insulting’
... the latest Benghazi news. Tapper didn’t join Hewitt in calling Carney a liar, but remarked that he found Carney’s comments “dissembling, obfuscating, ...

Tapper: Jay Carney's Benghazi Responses Have Been "Dissembling, Obfuscating and Often Insulting" - Video ...
... calling somebody a liar is it's not normally the kind of language I use. I think that the comments that are being made are dissembling, obfuscating ...

Marco Rubio’s immigration sham: How he’s twisting and obfuscating his record
The White House hopeful says his 2013 immigration bill was a triumph and a failure, a mistake and an accomplishment VIDEO

Is the Anti-Gay-Marriage Campaign Obfuscating Its Finances or Just Disorganized?
As I report in this week's paper : If someone makes a pledge to your political campaign, even if they haven't given you the money yet, you need ...

CNN's Tapper on WH Benghazi Responses: "Dissembling, Obfuscating, Insulting"
CNN anchor Jake Tapper who interviewed Amb. Susan Rice on September 16, 2012 on ABC's This Week appeared on Hugh Hewitt's...

Obfuscating Iran’s Nuclear Weapons Program
During an interview with NBC’s Matt Lauer shortly before the Super Bowl on February 5, President Obama was asked about Iran’s nuclear weapons ...

Resources last updated: 3/10/2016 6:42:09 PM