f



How to copy a HTML file into plain text file

I have a file named test.html, I used following command and copied as
test.txt file from dos command shell.

copy c:\test.html c:\test.txt

The output file, "test.txt" is generated as text formatted file but all
the HTML tags are existed around each line.

If I use windows "file->save as" function and select to save as ".txt"
format, this file is saved as plain text file without HTML tags.

How can I save this file from dos command into text formatted (.txt)
without HTML tags around ?

0
jamilur_rahman
5/18/2005 8:02:24 PM
comp.os.msdos.programmer 1324 articles. 0 followers. Post Follow

4 Replies
598 Views

Similar Articles

[PageSpeed] 43

>I have a file named test.html, I used following command and copied as
>test.txt file from dos command shell.

>copy c:\test.html c:\test.txt

>The output file, "test.txt" is generated as text formatted file but all
>the HTML tags are existed around each line.

I'm surprised you think the "COPY" command should do anything other than
(ahem) COPY!  For example, when I press the "COPY" button on a photocopy
machine, I expect the output to look as much like the input as possible.


>How can I save this file from dos command into text formatted (.txt)
>without HTML tags around ?

There are programs (DOS and others) to do that.  Look for names such as
HTML2TXT.  Personally, I use HTMSTRIP, although it seems to work only
from the root directory.

-- 
--Myron A. Calhoun.
Five boxes preserve our freedoms:  soap, ballot, witness, jury, and cartridge
PhD EE (retired).   "Barbershop" tenor.   CDL(PTXS).  W0PBV.   (785) 539-4448
NRA Life Member and Certified Instructor (Home Firearm Safety, Rifle, Pistol)
0
mcalhoun
5/18/2005 10:21:56 PM
<jamilur_rahman@yahoo.com> schreef in berichtnieuws
1116446544.056878.274340@g47g2000cwa.googlegroups.com...
> I have a file named test.html, I used following command and copied as
> test.txt file from dos command shell.
>
> copy c:\test.html c:\test.txt
>
> The output file, "test.txt" is generated as text formatted file but all
> the HTML tags are existed around each line.

When you have a box with a label on it describing it's contents, does
changing that lable change the contents too ?  Cause that is what you are
saying here :-)

> If I use windows "file->save as" function and select to save as ".txt"
> format, this file is saved as plain text file without HTML tags.

Correct : In that case the contents of the box are taken, and put away in
another box, "sorted" differently.   You could even have stored them in the
*same* box, which than would carry a wrong label :-)

> How can I save this file from dos command into text formatted (.txt)
> without HTML tags around ?

You can't.   You need a program that will scan thru the text that makes up
the HTML-page, and removes anything starting with "<" and ending with ">".
Ofcourse, that's a very crude method and leaves much to desired for ...



0
R
5/23/2005 5:03:07 PM
On Mon, 23 May 2005, R.Wieser wrote:

> <jamilur_rahman@yahoo.com> schreef in berichtnieuws
> 1116446544.056878.274340@g47g2000cwa.googlegroups.com...
> > I have a file named test.html, I used following command and copied as
> > test.txt file from dos command shell.
> >
> > copy c:\test.html c:\test.txt
> >
> > The output file, "test.txt" is generated as text formatted file but all
> > the HTML tags are existed around each line.
> 
> When you have a box with a label on it describing it's contents, does
> changing that lable change the contents too ?  Cause that is what you are
> saying here :-)
> 
> > If I use windows "file->save as" function and select to save as ".txt"
> > format, this file is saved as plain text file without HTML tags.
> 
> Correct : In that case the contents of the box are taken, and put away in
> another box, "sorted" differently.   You could even have stored them in the
> *same* box, which than would carry a wrong label :-)
> 
> > How can I save this file from dos command into text formatted (.txt)
> > without HTML tags around ?
> 
> You can't.   You need a program that will scan thru the text that makes up
> the HTML-page, and removes anything starting with "<" and ending with ">".
> Ofcourse, that's a very crude method and leaves much to desired for ...

One thing that will convert an HTML file into plain text without the HTML
markup is the lynx text-only browser.  The following command will render
the "filename.html" file and print it as rendered text to "filename.txt"
with the list of URLs in the file suppressed.  (Omit the "-nolist" if you
*do* want a list of the URLs at the end of the dump.)

lynx -dump -nolist filename.html >filename.txt

Some versions of lynx which have been compiled with experimental colour
support may also need a pointer to a lynx.lss file which specifies the
colours for lynx to display.  (Colour information is not really needed for
the dump but lynx complains anyway and includes the error message at the
top of the dumped file if it doesn't know where to find the *.lss file.)

"~/www/MAN/lynx.man.html"
    http://www.cs.huji.ac.il/~bioskill/MAN/lynx.html

"DOS: THE ALTERNATIVE OPERATING SYSTEM"
(includes info on DOS version of lynx)
    http://www.mwpms.uklinux.net/page5.htm

A sample lynx.lss file:
http://gd.tuwien.ac.at/infosys/browsers/lynx/lynx-2.8.1/lynx2-8-1/samples/lynx.lss

"Lynx binaries for Windows"
    http://www.fredlwm.hpg.ig.com.br/cygwin/lynx/

-- 
">> consider moving away from Front Page...."
">To what? Any suggestions?"
"Naked bungee-jumping. It's less humiliating <g>"
             -- Matt Probert in alt.www.webmaster, March 20, 2005


0
Norman
5/23/2005 6:14:20 PM
Using at least one appendage, the entity known in this space-time 
continuum as "Norman L. DeForest" <af380@chebucto.ns.ca> revealed in 
news:Pine.GSO.3.95.iB1.0.1050523145836.2205A-100000
@halifax.chebucto.ns.ca:

> How can I save this file from dos command into text formatted (.txt)
>> > without HTML tags around ?
>> 
> 

Try DOS freebie HTMStrip 9.11 - The version is probably higher now, if 
it's still supported. The author is Bruce Guthrie, and he is/was at 

http://users.erols.com/waynesof/

or

http://www.geocities.com/SiliconValley/Lakes/2414/

A search on HTMST911.ZIP or rather HTMST???.ZIP might also find it. 
Should be on many major archives like Simtel or SAC

-- 
Will Cornish of Cardigan, UK - No nastier than you; No filthier than 
usual

To EMail Remove Anti-Spam Spaces:           filthy-mcnasty @ 
btconnect.com
0
filthy
5/23/2005 8:30:52 PM
Reply: