Strip path to get filename

  • Follow


The output of a command is this

/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1

how can I strip off the path, and so just get the 'libgcc_s.so.1' ?

I guess I need to strip from the first character, to the last '/', but are not 
sure how to do this.

Dave
-- 
I respectfully request that this message is not archived by companies as
unscrupulous as 'Experts Exchange' . In case you are unaware,
'Experts Exchange'  take questions posted on the web and try to find
idiots stupid enough to pay for the answers, which were posted freely
by others. They are leeches.
0
Reply foo25 (218) 12/3/2009 2:16:28 AM

On Thu, 03 Dec 2009 02:16:28 +0000, Dave wrote:
> The output of a command is this
>
> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>
> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?

basename /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1

0
Reply Bit 12/3/2009 2:44:20 AM


Dave wrote:
> The output of a command is this
> 
> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
> 
> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
> 
> I guess I need to strip from the first character, to the last '/', but 
> are not sure how to do this.
> 
> Dave

$ var="/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1"
$ echo "$var"
/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
$ echo "${var##*/}"
libgcc_s.so.1

    Ed.
0
Reply Ed 12/3/2009 2:55:31 AM

On Dec 3, 7:16=A0am, Dave <f...@coo.com> wrote:
> The output of a command is this
>
> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>
> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>
> I guess I need to strip from the first character, to the last '/', but ar=
e not
> sure how to do this.
>

Apart from the command 'basename' which is tailor-made for this task,
you could do this too:

out=3D'/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1'
savIFS=3D$IFS IFS=3D'/'
set -f
set x $out; shift
for var
do
   :
done
IFS=3D$savIFS
printf '%s\n' "$var"

## or you could simply do:
printf '%s\n' "$out" | sed -e 's|.*/||'

--Rakesh
0
Reply Rakesh 12/3/2009 8:02:28 AM

On 2009-12-03, Rakesh Sharma wrote:
> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>> The output of a command is this
>>
>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>
>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>
>> I guess I need to strip from the first character, to the last '/', but are not
>> sure how to do this.
>>
>
> Apart from the command 'basename' which is tailor-made for this task,

   As is POSIX parameter expansion.

> you could do this too:
>
> out='/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1'
> savIFS=$IFS IFS='/'
> set -f
> set x $out; shift
> for var
> do
>    :
> done
> IFS=$savIFS
> printf '%s\n' "$var"
>
> ## or you could simply do:
> printf '%s\n' "$out" | sed -e 's|.*/||'


-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/3/2009 8:30:23 AM

Rakesh Sharma wrote:
> On Dec 3, 7:16 am, Dave <f...@coo.com> wrote:
>> The output of a command is this
>>
>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>
>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>
>> I guess I need to strip from the first character, to the last '/', but are not
>> sure how to do this.
>>
> 
> Apart from the command 'basename' which is tailor-made for this task,

Thank you. I'll use that. It is defined by POSIX, works on HP-UX 11.11 and a 
Google shows it exists on AIX 3.1 (which is pretty damm old), so it would appear 
to be quite portable. In the relatively unlikely event that IRIX or Tru64 is 
supported on the system I'm looking at, it may be necessary to revisit this, but 
for now at least, that seems sufficiently portable.

Thank you. That is a new unix command I have learned.

Dave

-- 
I respectfully request that this message is not archived by companies as
unscrupulous as 'Experts Exchange' . In case you are unaware,
'Experts Exchange'  take questions posted on the web and try to find
idiots stupid enough to pay for the answers, which were posted freely
by others. They are leeches.
0
Reply Dave 12/3/2009 3:46:58 PM

On 2009-12-03, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
> On 2009-12-03, Rakesh Sharma wrote:
>> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>>> The output of a command is this
>>>
>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>
>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>
>>> I guess I need to strip from the first character, to the last '/', but are not
>>> sure how to do this.
>>>
>>
>> Apart from the command 'basename' which is tailor-made for this task,
>
>    As is POSIX parameter expansion.

Not sure why you ned to invoke POSIX here; basename is a also a POSIX feature,
and not a recent addition either.

If parameter ``tailor-made'' for this problem, why do we
run into this problem when we apply parameter expansion
in the straighforward way, and how do we fix it?

  path=/
  base=${path##*/}    # yields empty string, should be "/"

  path=trailing/slash/path/
  base=${path##*/}    # yields empty string, should be "path"

Correctly computing a basename with parameter expansion
seems to require something like this:

  case "$path" in
  / ) 
    base=/
    ;;
  */ )
    base=${path%/}
    base=${path##*/}
    ;;
  * )
    base=${path##*/}
    ;;
  esac

Maybe there is a reason why we have a function for this?

Exercise for readers: rewrite this with parameter expansions:

 "$(basename "$(dirname "$(dirname "$(dirname "$FOO")")")")"

Solution given in spoiler below.



Solution to exercise:

  #!/bin/sh
  echo "just say no"

0
Reply Kaz 12/3/2009 11:04:25 PM

On 2009-12-03, Kaz Kylheku wrote:
> On 2009-12-03, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>> On 2009-12-03, Rakesh Sharma wrote:
>>> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>>>> The output of a command is this
>>>>
>>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>>
>>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>>
>>>> I guess I need to strip from the first character, to the last '/', but are not
>>>> sure how to do this.
>>>>
>>>
>>> Apart from the command 'basename' which is tailor-made for this task,
>>
>>    As is POSIX parameter expansion.
>
> Not sure why you ned to invoke POSIX here; basename is a also a POSIX feature,
> and not a recent addition either.
>
> If parameter ``tailor-made'' for this problem, why do we
> run into this problem when we apply parameter expansion
> in the straighforward way, and how do we fix it?
>
>   path=/
>   base=${path##*/}    # yields empty string, should be "/"

   No, it should be an empty string as there is no name after the slash.

>   path=trailing/slash/path/
>   base=${path##*/}    # yields empty string, should be "path"

   Ditto.

> Correctly computing a basename with parameter expansion
> seems to require something like this:
>
>   case "$path" in
>   / ) 
>     base=/
>     ;;
>   */ )
>     base=${path%/}
>     base=${path##*/}
>     ;;
>   * )
>     base=${path##*/}
>     ;;
>   esac
>
> Maybe there is a reason why we have a function for this?

   Basename is not a function; it is an external command.

   There is a POSIX-compliant basename function at
   <http://cfaj/cfajohnson.com/shell/scripts/basename-sh>.

-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/4/2009 1:22:00 AM

On 2009-12-04, Chris F.A. Johnson wrote:
> On 2009-12-03, Kaz Kylheku wrote:
>> On 2009-12-03, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>>> On 2009-12-03, Rakesh Sharma wrote:
>>>> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>>>>> The output of a command is this
>>>>>
>>>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>>>
>>>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>>>
>>>>> I guess I need to strip from the first character, to the last '/', but are not
>>>>> sure how to do this.
>>>>>
>>>>
>>>> Apart from the command 'basename' which is tailor-made for this task,
>>>
>>>    As is POSIX parameter expansion.
>>
>> Not sure why you ned to invoke POSIX here; basename is a also a POSIX feature,
>> and not a recent addition either.
>>
>> If parameter ``tailor-made'' for this problem, why do we
>> run into this problem when we apply parameter expansion
>> in the straighforward way, and how do we fix it?
>>
>>   path=/
>>   base=${path##*/}    # yields empty string, should be "/"
>
>    No, it should be an empty string as there is no name after the slash.
>
>>   path=trailing/slash/path/
>>   base=${path##*/}    # yields empty string, should be "path"
>
>    Ditto.
>
>> Correctly computing a basename with parameter expansion
>> seems to require something like this:
>>
>>   case "$path" in
>>   / ) 
>>     base=/
>>     ;;
>>   */ )
>>     base=${path%/}
>>     base=${path##*/}
>>     ;;
>>   * )
>>     base=${path##*/}
>>     ;;
>>   esac
>>
>> Maybe there is a reason why we have a function for this?
>
>    Basename is not a function; it is an external command.
>
>    There is a POSIX-compliant basename function at
>    <http://cfaj/cfajohnson.com/shell/scripts/basename-sh>.

   Sorry, that's my local copy. It should be:

   <http://cfajohnson.com/shell/scripts/basename-sh>.

-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/4/2009 1:23:34 AM

Chris F.A. Johnson wrote:

>    Basename is not a function; it is an external command.
> 
>    There is a POSIX-compliant basename function at
>    <http://cfaj/cfajohnson.com/shell/scripts/basename-sh>.
> 

Are there any systems which do not have basename? I tried a few, including HP-UX 
11.11, and all had it.



-- 
I respectfully request that this message is not archived by companies as
unscrupulous as 'Experts Exchange' . In case you are unaware,
'Experts Exchange'  take questions posted on the web and try to find
idiots stupid enough to pay for the answers, which were posted freely
by others. They are leeches.
0
Reply Dave 12/4/2009 1:24:29 AM

On 2009-12-04, Dave wrote:
> Chris F.A. Johnson wrote:
>
>>    Basename is not a function; it is an external command.
>> 
>>    There is a POSIX-compliant basename function at
>>    <http://cfaj/cfajohnson.com/shell/scripts/basename-sh>.
>> 
>
> Are there any systems which do not have basename? I tried a few, including HP-UX 
> 11.11, and all had it.

   All POSIX systems should have it.

-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/4/2009 1:25:34 AM

On 2009-12-03, Kaz Kylheku wrote:
> On 2009-12-03, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>> On 2009-12-03, Rakesh Sharma wrote:
>>> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>>>> The output of a command is this
>>>>
>>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>>
>>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>>
>>>> I guess I need to strip from the first character, to the last '/', but are not
>>>> sure how to do this.
>>>>
>>>
>>> Apart from the command 'basename' which is tailor-made for this task,
>>
>>    As is POSIX parameter expansion.
>
> Not sure why you ned to invoke POSIX here; basename is a also a POSIX feature,
> and not a recent addition either.

   An external command such as basename is many, many times slower
   than the shell's parameter expansion.

-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/4/2009 1:27:23 AM

On 2009-12-04, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>> If parameter ``tailor-made'' for this problem, why do we
>> run into this problem when we apply parameter expansion
>> in the straighforward way, and how do we fix it?
>>
>>   path=/
>>   base=${path##*/}    # yields empty string, should be "/"
>
>    No, it should be an empty string as there is no name after the slash.

One tiny problem with that reasoning is that if you
parse "$path" into "$dirname" and "$basename" with this
method, and then:

  chdir "$dirname"

and then try to access the empty string "$basename", there is no
directory entry by that in that directory! Oops!

Paths are not (exactly) strings. Blind string manipulation is not
path manipulations.

Paths are syntax which denote a structured name object. The argument can be
made that it needs just a little bit of care in parsing and generation.
0
Reply Kaz 12/4/2009 3:55:54 AM

Dave wrote:
> Rakesh Sharma wrote:
>> On Dec 3, 7:16 am, Dave <f...@coo.com> wrote:
>>> The output of a command is this
>>>
>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>
>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>
>>
>> Apart from the command 'basename' which is tailor-made for this task,
> 
> Thank you. I'll use that. It is defined by POSIX, works on HP-UX 11.11 
> and a Google shows it exists on AIX 3.1 (which is pretty damm old), so 
> it would appear to be quite portable. In the relatively unlikely event 
> that IRIX or Tru64 is supported on the system I'm looking at, it may be 
> necessary to revisit this, but for now at least, that seems sufficiently 
> portable.
> 
> Thank you. That is a new unix command I have learned.
> 

basename(1) existed over 30 years ago in UNIX V7.
0
Reply Jon 12/4/2009 6:21:32 AM

Dave wrote:

> Are there any systems which do not have basename? I tried a few,
> including HP-UX 11.11, and all had it.

As Jon mentioned, 7th ed unix knew basename(1), and also had
the suffix feature implemented already.
V7 is virtually _the_ ancestor concerning traditional utilities.
Thus you really should find it in every toolkit (and if not, you
will almost certainly have a whole bunch of different problems).
0
Reply Sven 12/4/2009 3:13:06 PM

On 2009-12-04, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
> On 2009-12-03, Kaz Kylheku wrote:
>> On 2009-12-03, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>>> On 2009-12-03, Rakesh Sharma wrote:
>>>> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>>>>> The output of a command is this
>>>>>
>>>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>>>
>>>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>>>
>>>>> I guess I need to strip from the first character, to the last '/', but are not
>>>>> sure how to do this.
>>>>>
>>>>
>>>> Apart from the command 'basename' which is tailor-made for this task,
>>>
>>>    As is POSIX parameter expansion.
>>
>> Not sure why you ned to invoke POSIX here; basename is a also a POSIX feature,
>> and not a recent addition either.
>
>    An external command such as basename is many, many times slower
>    than the shell's parameter expansion.

That's an implementation problem, isn't it.

Some systems have command interpreters which provide fast built-in commands for
things which are external on some other systems.

Yes, you /can/ be quite sure that something like parameter expansion is done in
the same process. Plus anything which manipulates the environment and /cannot/
be a child program, like ``chdir''.

To me, it's a premature optimization. If basename isn't being used in some
hotspot in the code, it's not worth worrying about.

If a particular basename call ends up in a hotspot, then you can just fix
/that/ instance of it, not every single basename everywhere.

Efficiency is not a high priority in shell programming. If the shell
programming community cared about efficiency, there would be a wide-spread use
of shell script compilation.

A shell script compiler could turn your basename calls as well as
parameter expansion into efficient C,

Can you form a viable start-up company around a shell script compiler?
I suspect not, because nobody cares about such a thing.

Comeau Computing offers a shell script compiler. Could the business
stay afloat just on that product alone, I wonder.  It would be interesting to
see what is the revenue from that.
0
Reply Kaz 12/4/2009 9:23:11 PM

Kaz Kylheku wrote:

>>    An external command such as basename is many, many times slower
>>    than the shell's parameter expansion.
> 
> That's an implementation problem, isn't it.
> 
> Some systems have command interpreters which provide fast built-in
> commands for things which are external on some other systems.
> 
> Yes, you /can/ be quite sure that something like parameter expansion is
> done in the same process. Plus anything which manipulates the environment
> and /cannot/ be a child program, like ``chdir''.

(I'm sure ou mean "cd" here, not "chdir")

POSIX mandates that conforming systems provide "cd" and other "built-in" 
utilities as external commands (or, more exactly, that they provide a way to 
exec() them). Some real word OSes - not Linux - indeed have a /bin/cd 
command. It even provides more or less sensible use cases for it:


find . -type d -exec cd {} \; -exec foo {} \;
    (which invokes "foo" on accessible directories)

0
Reply pk 12/4/2009 9:35:56 PM

On 2009-12-04, Kaz Kylheku wrote:
> On 2009-12-04, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>> On 2009-12-03, Kaz Kylheku wrote:
>>> On 2009-12-03, Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
>>>> On 2009-12-03, Rakesh Sharma wrote:
>>>>> On Dec 3, 7:16?am, Dave <f...@coo.com> wrote:
>>>>>> The output of a command is this
>>>>>>
>>>>>> /opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>>>>>
>>>>>> how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>>>>>
>>>>>> I guess I need to strip from the first character, to the last '/', but are not
>>>>>> sure how to do this.
>>>>>>
>>>>>
>>>>> Apart from the command 'basename' which is tailor-made for this task,
>>>>
>>>>    As is POSIX parameter expansion.
>>>
>>> Not sure why you ned to invoke POSIX here; basename is a also a POSIX feature,
>>> and not a recent addition either.
>>
>>    An external command such as basename is many, many times slower
>>    than the shell's parameter expansion.
>
> That's an implementation problem, isn't it.

   Yes, and I avoid the problem by using something that isn't affected
   by it.

> Some systems have command interpreters which provide fast built-in commands for
> things which are external on some other systems.
>
> Yes, you /can/ be quite sure that something like parameter expansion is done in
> the same process. Plus anything which manipulates the environment and /cannot/
> be a child program, like ``chdir''.
>
> To me, it's a premature optimization. If basename isn't being used in some
> hotspot in the code, it's not worth worrying about.

   I use parameter expansion, and therefore I don't have to worry
   about whether it is a hotspot or not.

> If a particular basename call ends up in a hotspot, then you can just fix
> /that/ instance of it, not every single basename Everywhere.

   I avoid the problem altogether by not using basename.

> Efficiency is not a high priority in shell programming. If the shell
> programming community cared about efficiency, there would be a wide-spread use
> of shell script compilation.

   On the contrary, I find it very important. An efficiently written
   script doesn't need to be compiled.


-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/5/2009 11:15:21 PM

In article <4b171f83@212.67.96.135>, Dave  <foo@coo.com> wrote:
>The output of a command is this
>
>/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>
>how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>
>I guess I need to strip from the first character, to the last '/', but are not 
>sure how to do this.

Mandatory reading:

   Mastering Regular Expressions, 2nd ed, by Jeffrey Friedl.

Although the adjacent post has a good way to solve your
specific problem, you need to learn about regular expressions.

And man have they gotten FAR more powerful since the days
of Bell Labs.

There is one and ONLY one book, the "bible" of regular expressions,
so to speak, and it is that "Mastering" book.

He not only shows you the zillion features they have these
days, but he has lots of examples.

But what shows that he is THE expert on regular-expressions
is that he shows the internal MACHINERY that drives the
things.  At least lots of it.

No, do not buy the "cookbook" until AFTER you have devoured
THIS book first.

Trust me, friend -- regular expressions are what make unix/linux
so useful.  And no, they're not all that trivially simple,
but man are they POWERFUL.  One line can transform stuff
that would take a complicated MULTI-line program NOT using them.


Cheers!

David

(No, I am no expert in them, but I do have the book, and
with that, I can do what I need to do.)



0
Reply dkcombs 12/20/2009 2:59:27 AM

On 2009-12-20, David Combs wrote:
> In article <4b171f83@212.67.96.135>, Dave  <foo@coo.com> wrote:
>>The output of a command is this
>>
>>/opt/kirkby/gcc-4.4.2/lib/libgcc_s.so.1
>>
>>how can I strip off the path, and so just get the 'libgcc_s.so.1' ?
>>
>>I guess I need to strip from the first character, to the last '/', but are not 
>>sure how to do this.
>
> Mandatory reading:
>
>    Mastering Regular Expressions, 2nd ed, by Jeffrey Friedl.
>
> Although the adjacent post has a good way to solve your
> specific problem, you need to learn about regular expressions.
>
> And man have they gotten FAR more powerful since the days
> of Bell Labs.
>
> There is one and ONLY one book, the "bible" of regular expressions,
> so to speak, and it is that "Mastering" book.
>
> He not only shows you the zillion features they have these
> days, but he has lots of examples.
>
> But what shows that he is THE expert on regular-expressions
> is that he shows the internal MACHINERY that drives the
> things.  At least lots of it.
>
> No, do not buy the "cookbook" until AFTER you have devoured
> THIS book first.
>
> Trust me, friend -- regular expressions are what make unix/linux
> so useful.  And no, they're not all that trivially simple,
> but man are they POWERFUL.  One line can transform stuff
> that would take a complicated MULTI-line program NOT using them.

   I very rarely use anything more than very simple regular
   expressions. Complex REs are more trouble than they're worth,
   especially when they need to be modified. 


-- 
   Chris F.A. Johnson, author           <http://shell.cfajohnson.com/>
   ===================================================================
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
   Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
   ===== My code in this post, if any, assumes the POSIX locale  =====
   ===== and is released under the GNU General Public Licence    =====
0
Reply Chris 12/20/2009 4:21:22 AM

David Combs <dkcombs@panix.com> wrote:
<snip>
> Trust me, friend -- regular expressions are what make unix/linux
> so useful.  And no, they're not all that trivially simple,
> but man are they POWERFUL.  One line can transform stuff
> that would take a complicated MULTI-line program NOT using them.

Regular expressions are very useful, but also very limited. For one thing,
they can only represent regular languages, which means among other things
that you can't express nested structures. The greediness and lookbehind
operators do help, but if you look in the appendix of Mastering Regular
Expressions you'll find a regex which can parse any valid e-mail address;
it's two pages long! (At least, that's how I remember, but I haven't opened
that book in over 5 years.)

Perl 6 will come with something called Parsing Expression Grammars, which
are much more powerful. (Though Perl 6 didn't invent them.) I think this
will probably be the future, but it will obviously takes many years for the
rest of the world to catch up. Lua currently has one of the better
implementations, in terms of language integration.

For C I use Ragel for regular expressions. In Ragel you can handled nested
structures--and many other issues--easily because it let's you jump to
different [state] machines explicitly, and allows the use of a state stack.
Using Ragel I've discovered ambiguities in several RFC ABNF specifications
which are silently papered over by most common regular expression engines.
The big problem with regex's is that people just slop them together, and
never notice the bugs. They have been, and will continue to be, one of the
major sources of bugs and security issues.

Sometimes they just get used too much. For instance, the following Perl
basename implementation reads much better to me than any regex would:

	#!/usr/bin/env perl
	print STDOUT (split "/", shift)[-1], "\n"
0
Reply William 12/20/2009 8:07:03 AM

Chris F.A. Johnson wrote:
> On 2009-12-20, David Combs wrote:
>> [...]
> 
>    I very rarely use anything more than very simple regular
>    expressions. Complex REs are more trouble than they're worth,
>    especially when they need to be modified. 

In some languages (awk, for example, where you can compose them in
strings[*]) you can define them in a way that looks similar to a
quite good readable BNF notation. Being able to compose them that
way and reuse all parts in many places of the regexp definitions,
reduces a lot of their complexity and crypticality and makes them
a pleasure to use.

Janis

[*] The usual caveats apply.
0
Reply Janis 12/20/2009 9:12:03 AM

William Ahern wrote:
> David Combs <dkcombs@panix.com> wrote:
> <snip>
>> [...]
> 
> Regular expressions are very useful, but also very limited. For one thing,
> they can only represent regular languages, which means among other things
> that you can't express nested structures.

And back-references, to name another prominent example, which also do not
belong to the class of regular languages, but are nonetheless added to some
programming languages and libraries.

> [...]
> 
> For C I use Ragel for regular expressions. In Ragel you can handled nested
> structures--and many other issues--easily because it let's you jump to
> different [state] machines explicitly, and allows the use of a state stack.

Thanks for that useful hint.

Janis

> [...]
0
Reply Janis 12/20/2009 9:16:05 AM

22 Replies
310 Views

(page loaded in 0.218 seconds)

Similiar Articles:


















7/16/2012 7:12:31 PM


Reply: