sed for removing non ASCII characters

  • Follow


I know the tr solution for the problem but I have to use sed hence the
question here. I have to remove all the bytes from 128 - 255 in my
text file. For which I have following command:

sed 's/[\200-\377]//' text_file

This removes other characters which should not be removed. The reason
I *think* is the range specified does not take \200 as an octal value
but as three different characters - '\' '2' '0' . Any suggestions of
how to achieve this using sed? I have searched on google for the
solutions but either they solve different problem all together or
suggest usage of tr.
0
Reply agarwal.prateek (27) 3/6/2009 9:31:19 AM

On Friday 6 March 2009 10:31, quarkLore wrote:

> I know the tr solution for the problem but I have to use sed hence the
> question here.

Uhm...this is comp.lang.awk.

> I have to remove all the bytes from 128 - 255 in my text file. For which I
> have following command: 
> 
> sed 's/[\200-\377]//' text_file
> 
> This removes other characters which should not be removed. The reason
> I *think* is the range specified does not take \200 as an octal value
> but as three different characters - '\' '2' '0' . Any suggestions of
> how to achieve this using sed? I have searched on google for the
> solutions but either they solve different problem all together or
> suggest usage of tr.

If you have a sed that understands \xnnn (like eg GNU sed), you can do

sed 's/[\x80-\xff]//' text_file

also, prefixing the value with \o or \d may work (tested with GNU sed):

sed 's/[\o200-\o377]//' text_file

sed 's/[\d128-\d255]//' text_file

0
Reply pk 3/6/2009 9:41:27 AM


On Friday 6 March 2009 10:41, pk wrote:

> sed 's/[\x80-\xff]//' text_file

And in all variations, you probably need a /g at thend:

sed 's/[\x80-\xff]//g' text_file
0
Reply pk 3/6/2009 9:48:00 AM

On Mar 6, 3:31=A0am, quarkLore <agarwal.prat...@gmail.com> wrote:
> I know the tr solution for the problem but I have to use sed hence the
> question here.

This is comp.lang.AWK. You can get good sed answers at
comp.unix.shell.

    Ed.
0
Reply Ed 3/6/2009 2:29:27 PM

On 6 Mar, 09:31, quarkLore <agarwal.prat...@gmail.com> wrote:
> I know the tr solution for the problem but I have to use sed hence the
> question here. I have to remove all the bytes from 128 - 255 in my
> text file. For which I have following command:
>
> sed 's/[\200-\377]//' text_file
>
> This removes other characters which should not be removed. The reason
> I *think* is the range specified does not take \200 as an octal value
> but as three different characters - '\' '2' '0' . Any suggestions of
> how to achieve this using sed?

s/[^ -~]//g

or, more on topic:

{ gsub(/[^ -~]/, "")}1


> I have searched on google for the
> solutions but either they solve different problem all together or
> suggest usage of tr.

Why do you have to use sed?

-Ed
--
(You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258)

/d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1
r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12
    d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage

0
Reply Edward 3/11/2009 3:12:27 PM

4 Replies
1689 Views

(page loaded in 0.732 seconds)

Similiar Articles:













7/20/2012 9:42:49 AM


Reply: