Assuming an array named @myfiles contained three elements like:
-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
I want to extract just the file which contain spaces to work with like:
file1.zip
file2 onespace.zip
file3 two spaces.zip
How can I extract the file names which have spaces?
I've been trying unsuccessfully with the glob function:
foreach my $f (@myfiles) {
print join "\n",glob("*")'
}
-Thanks
|
|
0
|
|
|
|
Reply
|
James
|
6/30/2010 10:58:50 PM |
|
>>>>> "JE" == James Egan <jegan473@comcast.net> writes:
JE> Assuming an array named @myfiles contained three elements like:
JE> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
JE> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
JE> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
are you saying your array has those lines from ls?
JE> I want to extract just the file which contain spaces to work with like:
JE> file1.zip
JE> file2 onespace.zip
JE> file3 two spaces.zip
JE> How can I extract the file names which have spaces?
JE> I've been trying unsuccessfully with the glob function:
JE> foreach my $f (@myfiles) {
JE> print join "\n",glob("*")'
JE> }
but you said you already have an array with those lines in it. why would
a glob work when globs work on directories, not arrays or lines?
you need a regex to grab the file part of the lines. but why did you
even have them in ls format? just read the dir directly with glob or
opendir/readdir and get them that way. in fact readdir will be better
as it doesn't have to care about spaces in the file names..
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
6/30/2010 11:09:43 PM
|
|
> but you said you already have an array with those lines in it. why would
> a glob work when globs work on directories, not arrays or lines?
>
I'm not reading a directory with the ls command. I don't want to
complicate matters with where the long listing of file names comes
from. Suffice it to say it's a long listing, and the file names
have spaces, and I need to extract the file names.
-Thanks
|
|
0
|
|
|
|
Reply
|
James
|
6/30/2010 11:18:18 PM
|
|
James Egan wrote:
> Assuming an array named @myfiles contained three elements like:
>
> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>
>
> I want to extract just the file which contain spaces to work with like:
>
> file1.zip
> file2 onespace.zip
> file3 two spaces.zip
echo "-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two
spaces.zip" |perl -wnle '
print substr($_, 50);
'
file3 two spaces.zip
--
Ruud
|
|
0
|
|
|
|
Reply
|
Dr
|
6/30/2010 11:41:33 PM
|
|
On Wed, 30 Jun 2010 23:18:18 GMT, James Egan <jegan473@comcast.net>
wrote:
>I'm not reading a directory with the ls command. I don't want to
>complicate matters with where the long listing of file names comes
>from. Suffice it to say it's a long listing, and the file names
>have spaces, and I need to extract the file names.
CODE:
#!/usr/bin/perl
open DATA, 'data';
@files = <DATA>;
foreach (@files) {
print;
}
contents of 'data' FILE:
file1.zip
file2 onespace.zip
file3 two spaces.zip
OUTPUT:
file1.zip
file2 onespace.zip
file3 two spaces.zip
The spaces don't matter, the newline characters in the 'data' file are
delimiters. Can you exlpain what you want to do, and why spaces are a
problem?
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
6/30/2010 11:49:55 PM
|
|
On Wed, 30 Jun 2010 23:49:55 +0000, John Kelly wrote:
> On Wed, 30 Jun 2010 23:18:18 GMT, James Egan <jegan473@comcast.net>
> wrote:
>
>>I'm not reading a directory with the ls command. I don't want to
>>complicate matters with where the long listing of file names comes from.
>> Suffice it to say it's a long listing, and the file names have spaces,
>>and I need to extract the file names.
>
> CODE:
>
> #!/usr/bin/perl
>
> open DATA, 'data';
> @files = <DATA>;
> foreach (@files) {
> print;
> }
>
>
> contents of 'data' FILE:
>
> file1.zip
> file2 onespace.zip
> file3 two spaces.zip
>
>
> OUTPUT:
>
> file1.zip
> file2 onespace.zip
> file3 two spaces.zip
>
>
> The spaces don't matter, the newline characters in the 'data' file are
> delimiters. Can you exlpain what you want to do, and why spaces are a
> problem?
I want to take these three array elements and extract the file names which
include spaces:
-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
|
|
0
|
|
|
|
Reply
|
James
|
6/30/2010 11:53:13 PM
|
|
On Thu, 01 Jul 2010 01:41:33 +0200, Dr.Ruud wrote:
> James Egan wrote:
>
>> Assuming an array named @myfiles contained three elements like:
>>
>> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip -rwxrwxrwx
>> 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip -rwxrwxrwx 1
>> 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>>
>>
>> I want to extract just the file which contain spaces to work with like:
>>
>> file1.zip
>> file2 onespace.zip
>> file3 two spaces.zip
>
> echo "-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two
> spaces.zip" |perl -wnle '
>
> print substr($_, 50);
> '
> file3 two spaces.zip
I should have mentioned that the dates, sizes, names, of the files, might be
different, so they won't always start at position 50.
-Thanks
|
|
0
|
|
|
|
Reply
|
James
|
6/30/2010 11:56:03 PM
|
|
>>>>> "JK" == John Kelly <jak@isp2dial.com> writes:
JK> CODE:
JK> #!/usr/bin/perl
JK> open DATA, 'data';
always check open for success or failure
JK> @files = <DATA>;
JK> foreach (@files) {
JK> print;
JK> }
and how is that is different than?
print <DATA> ;
also don't use DATA for a file handle, it is defaulted to the __DATA__
section of the main file. also use lexical file handles. and to add one
more, that can be all done with:
use File::Slurp ;
print read_file( 'data' ) ;
JK> contents of 'data' FILE:
JK> file1.zip
JK> file2 onespace.zip
JK> file3 two spaces.zip
and what did that show? he is not reading or getting a file with just
file names in them. they are full ls listings. of course you will get
pissed off at my question but try to answer it!
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 12:03:54 AM
|
|
>>>>> "JE" == James Egan <jegan473@comcast.net> writes:
JE> I should have mentioned that the dates, sizes, names, of the
JE> files, might be different, so they won't always start at position
JE> 50.
so use a regex! it isn't hard to write one to parse out the file from ls
output. and you can always assume the earlier part ofls is fixed
width. the date is always fixed width. only the size and file name can
change in width. so skip to the size, then match a number and space and
the rest is the file name so match that and grab it. easy regex.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 12:05:43 AM
|
|
On Wed, 30 Jun 2010 23:53:13 GMT, James Egan <jegan473@comcast.net>
wrote:
>I want to take these three array elements and extract the file names which
>include spaces:
>
>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
This works for me:
>#!/usr/bin/perl
>
>open DATA, 'data';
>@files = <DATA>;
>foreach (@files) {
>
> my $file;
> (undef, undef, undef, undef, undef, undef, undef, undef, $file) = split ' ', $_, 9;
> $file =~ / / and print "$file";
>
>}
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 12:22:21 AM
|
|
On Wed, 30 Jun 2010 20:03:54 -0400, "Uri Guttman" <uri@StemSystems.com>
wrote:
>and what did that show? he is not reading or getting a file with just
>file names in them. they are full ls listings. of course you will get
>pissed off at my question but try to answer it!
Well he said:
>I'm not reading a directory with the ls command.
So it sounded like his data only included the file names. But in a
later post he says it does include the directory info. My post prompted
him to explain further, without treating him like a fool.
Try to relax. It's only Perl. Or does Perl make you tense.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 12:30:56 AM
|
|
Quoth James Egan <jegan473@comcast.net>:
> Assuming an array named @myfiles contained three elements like:
>
> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>
> I want to extract just the file which contain spaces to work with like:
>
> file1.zip
> file2 onespace.zip
> file3 two spaces.zip
>
>
> How can I extract the file names which have spaces?
ls -l output intentionally uses fixed-width columns, except for the
filename. So
for (@myfiles) {
print substr($_, 50), "\n";
}
You could also try File::Listing from the LWP distribution.
> I've been trying unsuccessfully with the glob function:
>
> foreach my $f (@myfiles) {
> print join "\n",glob("*")'
> }
Why on Earth would you have expected that to work? Apart from anything
else, you aren't even using $f...
Ben
|
|
0
|
|
|
|
Reply
|
Ben
|
7/1/2010 12:38:15 AM
|
|
>>>>> "JK" == John Kelly <jak@isp2dial.com> writes:
JK> On Wed, 30 Jun 2010 20:03:54 -0400, "Uri Guttman" <uri@StemSystems.com>
JK> wrote:
>> and what did that show? he is not reading or getting a file with just
>> file names in them. they are full ls listings. of course you will get
>> pissed off at my question but try to answer it!
JK> Well he said:
>> I'm not reading a directory with the ls command.
JK> So it sounded like his data only included the file names. But in a
JK> later post he says it does include the directory info. My post prompted
JK> him to explain further, without treating him like a fool.
JK> Try to relax. It's only Perl. Or does Perl make you tense.
no, just bad and/or useless perl makes me react. you seem to be a font
of it. i will correct your posted code whenever i feel like it. note
that others didn't even come close to your way off response to the same
ambiguous OP. and you still never answered my question, what was your
code trying to even show? reading and printing a file of line
(regardless of their being file names) doesn't do anything close to what
the OP wanted. so your analytical skills need to be honed as well. best
you sit back and not respond to most posts here until you have seen what
others have to say. that is generally good advice on usenet.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 1:08:27 AM
|
|
>>>>> "JK" == John Kelly <jak@isp2dial.com> writes:
JK> On Wed, 30 Jun 2010 23:53:13 GMT, James Egan <jegan473@comcast.net>
JK> wrote:
>> I want to take these three array elements and extract the file names which
>> include spaces:
>>
>> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
JK> This works for me:
>> #!/usr/bin/perl
>>
>> open DATA, 'data';
>> @files = <DATA>;
>> foreach (@files) {
>>
>> my $file;
>> (undef, undef, undef, undef, undef, undef, undef, undef, $file) = split ' ', $_, 9;
>> $file =~ / / and print "$file";
that is just poor code. do you want to count the undefs each time you do
something like that? and why is that last line checking for space? he
wants all the file names. it is just that some also have spaces in them.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 1:10:20 AM
|
|
>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
BM> ls -l output intentionally uses fixed-width columns, except for the
BM> filename. So
normally that is true, but very large files can cause the name column to
be shifted over. some ls flavors or options will change the size to use
a suffix but you can't count on fixed width there. as i posted it is
best to assume fixed width until the size but that is always a number
with a possible size suffix so it is easy to match and the rest is the
file name.
BM> You could also try File::Listing from the LWP distribution.
another good idea. parsing ls -l is annoying. why the OP is stuck with
that is a good question.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 1:12:49 AM
|
|
On Wed, 30 Jun 2010 21:08:27 -0400, "Uri Guttman" <uri@StemSystems.com>
wrote:
> JK> Try to relax. It's only Perl. Or does Perl make you tense.
>no, just bad and/or useless perl makes me react. you seem to be a font
>of it. i will correct your posted code whenever i feel like it. note
>that others didn't even come close to your way off response to the same
>ambiguous OP. and you still never answered my question, what was your
>code trying to even show? reading and printing a file of line
>(regardless of their being file names) doesn't do anything close to what
>the OP wanted. so your analytical skills need to be honed as well. best
>you sit back and not respond to most posts here until you have seen what
>others have to say.
Criticism is one thing. When it's mean and personal, it's trolling.
Why would I want to follow the advice of a troll.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 1:46:45 AM
|
|
On Wed, 30 Jun 2010 21:10:20 -0400, "Uri Guttman" <uri@StemSystems.com>
wrote:
> >> my $file;
> >> (undef, undef, undef, undef, undef, undef, undef, undef, $file) = split ' ', $_, 9;
> >> $file =~ / / and print "$file";
>
>that is just poor code. do you want to count the undefs each time you do
>something like that? and why is that last line checking for space? he
>wants all the file names.
That's not what he said. But trolls don't care what people say, do
they.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 1:55:27 AM
|
|
Quoth "Uri Guttman" <uri@StemSystems.com>:
> >>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>
> BM> ls -l output intentionally uses fixed-width columns, except for the
> BM> filename. So
>
> normally that is true, but very large files can cause the name column to
> be shifted over. some ls flavors or options will change the size to use
> a suffix but you can't count on fixed width there. as i posted it is
> best to assume fixed width until the size but that is always a number
> with a possible size suffix so it is easy to match and the rest is the
> file name.
Meh. Yes, you're probably right. (Now I check ls(1), at least on my
system, it appears most of the fields are variable-width.) Since modern
systems (OS X, at least) allow user- and group names with spaces in,
splitting on space doesn't work either.
The correct answer, of course, is 'go back several steps and get the
data in a more reasonable format'.
Ben
|
|
0
|
|
|
|
Reply
|
Ben
|
7/1/2010 2:06:46 AM
|
|
On Wed, 30 Jun 2010 21:08:27 -0400, "Uri Guttman" <uri@StemSystems.com>
wrote:
> best you sit back and not respond to most posts here until you have
> seen what others have to say
The second "N" in NNTP means "news." If you don't have any news to
share, maybe answering questions is all you can do. But don't expect
everyone else to be like you. I sometimes have news to post. Which
reminds me ...
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 2:19:33 AM
|
|
James Egan <jegan473@comcast.net> wrote:
>Assuming an array named @myfiles contained three elements like:
>
>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>
>
>I want to extract just the file which contain spaces to work with like:
>
>file1.zip
>file2 onespace.zip
>file3 two spaces.zip
Easy. split() the line into its 9 elements at any non-empty sequence of
spaces and then pick the last one:
my $file = (split(/ +/, $_, 9))[8];
>How can I extract the file names which have spaces?
I would run a grep afterwards and filter for those names that contain
spaces.
>I've been trying unsuccessfully with the glob function:
>
>foreach my $f (@myfiles) {
> print join "\n",glob("*")'
>}
You seem to be very confused about what glob() is doing.
jue
|
|
0
|
|
|
|
Reply
|
J
|
7/1/2010 2:24:07 AM
|
|
"Uri Guttman" <uri@StemSystems.com> wrote:
>that is just poor code. do you want to count the undefs each time you do
>something like that? and why is that last line checking for space? he
>wants all the file names.
That is not what the OP said. He explicitely said:
"I want to extract just the file which contain spaces"
To me that excludes filenames without spaces.
jue
|
|
0
|
|
|
|
Reply
|
J
|
7/1/2010 2:27:14 AM
|
|
On Wed, 30 Jun 2010 19:24:07 -0700, J�rgen Exner <jurgenex@hotmail.com>
wrote:
>>I want to extract just the file which contain spaces to work with like:
>>
>>file1.zip
>>file2 onespace.zip
>>file3 two spaces.zip
>
>Easy. split() the line into its 9 elements at any non-empty sequence of
>spaces and then pick the last one:
>
> my $file = (split(/ +/, $_, 9))[8];
>
That handles blanks, but this will handle all whitespace, such as tabs.
my $file = (split(' ', $_, 9))[8];
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 2:30:40 AM
|
|
On Wed, 30 Jun 2010 20:05:43 -0400, Uri Guttman wrote:
>>>>>> "JE" == James Egan <jegan473@comcast.net> writes:
>
> JE> I should have mentioned that the dates, sizes, names, of the JE>
> files, might be different, so they won't always start at position JE>
> 50.
>
> so use a regex! it isn't hard to write one to parse out the file from ls
> output. and you can always assume the earlier part ofls is fixed width.
> the date is always fixed width. only the size and file name can change
> in width. so skip to the size, then match a number and space and the
> rest is the file name so match that and grab it. easy regex.
>
> uri
Assume the files vary greatly in size. Then the file names may
not start at position 50 like:
-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
|
|
0
|
|
|
|
Reply
|
James
|
7/1/2010 3:01:37 AM
|
|
On Thu, 01 Jul 2010 03:01:37 GMT, James Egan <jegan473@comcast.net>
wrote:
>Assume the files vary greatly in size. Then the file names may
>not start at position 50 like:
>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
>-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
True but the number of fields is always the same, up to the file name.
What J�rgen and I posted, is enough to get you going.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 3:08:05 AM
|
|
On Wed, 30 Jun 2010 22:58:50 GMT, James Egan <jegan473@comcast.net> wrote:
>Assuming an array named @myfiles contained three elements like:
>
>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>
>
>I want to extract just the file which contain spaces to work with like:
>
>file1.zip
>file2 onespace.zip
>file3 two spaces.zip
>
>
>How can I extract the file names which have spaces?
>
Replacing 'space' with \s is left as an exercise.
-sln
---------------------
use strict;
use warnings;
##
my @fnames = (join '',<DATA>) =~ / \d*:\d* +([^ \n].* .*[^ \n])/g;
print "@fnames";
# Or ..
# while (<DATA>) {
# /\ \d*:\d*\ + ( [^\ \n] .* \ .* [^\ \n] ) /x and print "$1\n";
# # or
# # /\s\d*:\d*\s+(.++)/ and $1 =~ tr/ // and print $1,"\n";
# }
# Or ..
# while (<DATA>) {
# /(?<=\d{2}:\d{2})\ +([^ \n].* .*[^ \n])/ and print $1,"\n";
# }
__DATA__
I want to take these three array elements and extract the file names which
include spaces:
-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
|
|
0
|
|
|
|
Reply
|
sln
|
7/1/2010 3:14:07 AM
|
|
James Egan <jegan473@comcast.net> wrote:
[ snip where a nice soul has tried to solve the OP's poorly specified problem ]
> I should have mentioned that the dates, sizes, names, of the files, might be
> different, so they won't always start at position 50.
No, you should not have mentioned that.
You should have provided test data that reflects your real data.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
|
|
0
|
|
|
|
Reply
|
Tad
|
7/1/2010 3:17:33 AM
|
|
James Egan <jegan473@comcast.net> wrote:
>
> I want to take these three array elements and extract the file names which
> include spaces:
>
> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
That is not three array elements.
my @ra = (
"-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip\n",
"-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip",
"-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip",
);
_That_ is three array elements.
You should speak Perl rather than English, when possible.
Have you seen the Posting Guidelines that are posted here frequently?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
|
|
0
|
|
|
|
Reply
|
Tad
|
7/1/2010 3:21:38 AM
|
|
>>>>> "JE" == James Egan <jegan473@comcast.net> writes:
JE> Assume the files vary greatly in size. Then the file names may
JE> not start at position 50 like:
JE> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
JE> -rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
JE> -rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
so what have you tried? you can always skip the first fixed fields. or
you can match them with a regex. even something as trivial as a set of
\S+ parts with \s+ separators will do. the number of fields is fixed
too. so try a regex with that hint and see what you can do. if you still
have troubles, post what code you have tried. this is really an easy
problem which is why i am not giving you a direct answer.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 3:23:39 AM
|
|
>>>>> "JK" == John Kelly <jak@isp2dial.com> writes:
JK> On Wed, 30 Jun 2010 21:08:27 -0400, "Uri Guttman" <uri@StemSystems.com>
JK> wrote:
JK> Try to relax. It's only Perl. Or does Perl make you tense.
>> no, just bad and/or useless perl makes me react. you seem to be a font
>> of it. i will correct your posted code whenever i feel like it. note
>> that others didn't even come close to your way off response to the same
>> ambiguous OP. and you still never answered my question, what was your
>> code trying to even show? reading and printing a file of line
>> (regardless of their being file names) doesn't do anything close to what
>> the OP wanted. so your analytical skills need to be honed as well. best
>> you sit back and not respond to most posts here until you have seen what
>> others have to say.
JK> Criticism is one thing. When it's mean and personal, it's trolling.
JK> Why would I want to follow the advice of a troll.
well, cause i am not a troll. but your perl code suggestions are weak at
best. i paTROL this group and correct/improve bad code. you are just a
recent target. nothing new. it isn't personal. i don't like to see poor
perl posted and will always comment on it. you can take it personally,
but that is your issue not mine. i care about the code.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 3:25:23 AM
|
|
>>>>> "JK" == John Kelly <jak@isp2dial.com> writes:
JK> On Wed, 30 Jun 2010 21:08:27 -0400, "Uri Guttman" <uri@StemSystems.com>
JK> wrote:
>> best you sit back and not respond to most posts here until you have
>> seen what others have to say
JK> The second "N" in NNTP means "news." If you don't have any news to
JK> share, maybe answering questions is all you can do. But don't expect
JK> everyone else to be like you. I sometimes have news to post. Which
JK> reminds me ...
you don't get this group at all. it discusses perl. all posts here are
open to comments about the perl in them. the exclusively news aspect of
usenet or nntp is long gone. your bringing that up is just silly. silly!
silly! silly!
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 3:26:59 AM
|
|
>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
BM> Quoth "Uri Guttman" <uri@StemSystems.com>:
>> >>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>>
BM> ls -l output intentionally uses fixed-width columns, except for the
BM> filename. So
>>
>> normally that is true, but very large files can cause the name column to
>> be shifted over. some ls flavors or options will change the size to use
>> a suffix but you can't count on fixed width there. as i posted it is
>> best to assume fixed width until the size but that is always a number
>> with a possible size suffix so it is easy to match and the rest is the
>> file name.
BM> Meh. Yes, you're probably right. (Now I check ls(1), at least on my
BM> system, it appears most of the fields are variable-width.) Since modern
BM> systems (OS X, at least) allow user- and group names with spaces in,
BM> splitting on space doesn't work either.
yow, that is annoying then. do they use tabs for separators or just more
spaces? if so, you can't really parse ls -l there. a group name could be
a number (or multiple numbers!) which is confused with the size, etc. blecch.
BM> The correct answer, of course, is 'go back several steps and get the
BM> data in a more reasonable format'.
yep. which i have suggested. oh well.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 3:28:58 AM
|
|
Quoth John Kelly <jak@isp2dial.com>:
> On Wed, 30 Jun 2010 19:24:07 -0700, J�rgen Exner <jurgenex@hotmail.com>
> wrote:
>
> >>I want to extract just the file which contain spaces to work with like:
> >>
> >>file1.zip
> >>file2 onespace.zip
> >>file3 two spaces.zip
> >
> >Easy. split() the line into its 9 elements at any non-empty sequence of
> >spaces and then pick the last one:
> >
> > my $file = (split(/ +/, $_, 9))[8];
> >
>
> That handles blanks, but this will handle all whitespace, such as tabs.
>
> my $file = (split(' ', $_, 9))[8];
....which do not appear in the output of ls(1), except possibly as part
of a filename.
It's also worth noting that none of the solutions offered (except
perhaps File::Listing) handle symlinks.
Ben
|
|
0
|
|
|
|
Reply
|
Ben
|
7/1/2010 3:31:16 AM
|
|
James Egan <jegan473@comcast.net> wrote:
>On Wed, 30 Jun 2010 20:05:43 -0400, Uri Guttman wrote:
>>>>>>> "JE" == James Egan <jegan473@comcast.net> writes:
>> JE> I should have mentioned that the dates, sizes, names, of the JE>
>> files, might be different, so they won't always start at position JE>
>> 50.
>>
>> so use a regex! it isn't hard to write one to parse out the file from ls
>> output.
>
>Assume the files vary greatly in size. Then the file names may
>not start at position 50 like:
Which is completely irrelevant for the vast majority of regular
expressions.
jue
|
|
0
|
|
|
|
Reply
|
J
|
7/1/2010 3:34:38 AM
|
|
Ben Morrow <ben@morrow.me.uk> wrote:
>It's also worth noting that none of the solutions offered (except
>perhaps File::Listing) handle symlinks.
Which on the other hand weren't part of his problem statement....
A poorly specified problem necessarily leads to arbitrary guesses and
widely diverging 'solutions'.
jue
|
|
0
|
|
|
|
Reply
|
J
|
7/1/2010 3:38:57 AM
|
|
James Egan <jegan473@comcast.net> wrote:
> On Wed, 30 Jun 2010 20:05:43 -0400, Uri Guttman wrote:
>
>>>>>>> "JE" == James Egan <jegan473@comcast.net> writes:
>>
>> JE> I should have mentioned that the dates, sizes, names, of the JE>
>> files, might be different, so they won't always start at position JE>
>> 50.
>>
>> so use a regex! it isn't hard to write one to parse out the file from ls
^^^^^^^^^^^
^^^^^^^^^^^
>> output.
> Assume the files vary greatly in size. Then the file names may
> not start at position 50 like:
>
> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
> -rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
> -rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
How's about *you* assume that, and then attempt to use a regex?
We are here to help you with your Perl problem.
We are not here to write your Perl program for you.
It is expected that you will try and do that once we have pointed
you in the right direction.
Oh hell, have a fish.
----------------
#!/usr/bin/perl
use warnings;
use strict;
my @ra = (
"-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip",
"-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip",
"-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two
spaces.zip",
);
my @spacy;
foreach my $ls (@ra) {
$ls =~ s/^(\S+\s+){8}//;
push @spacy, $ls if $ls =~ / /;
}
print "$_\n" for @spacy;
# same thing, but done all at once
@spacy = map {s/^(\S+\s+){8}//; / / ? $_ : ()} @ra;
print "$_\n" for @spacy;
----------------
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
|
|
0
|
|
|
|
Reply
|
Tad
|
7/1/2010 3:44:10 AM
|
|
On Wed, 30 Jun 2010 23:26:59 -0400, "Uri Guttman" <uri@StemSystems.com>
wrote:
>you don't get this group at all. it discusses perl. all posts here are
>open to comments about the perl in them. the exclusively news aspect of
>usenet or nntp is long gone. your bringing that up is just silly. silly!
>silly! silly!
Like a troll, you don't hear what I say. You hear what you want to
hear. I didn't say usenet is *exclusively* news.
You regard this ng as your personal territory, and hate it when someone
won't follow your self projection bias.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 3:46:17 AM
|
|
John Kelly <jak@isp2dial.com> wrote:
> On Wed, 30 Jun 2010 23:26:59 -0400, "Uri Guttman" <uri@StemSystems.com>
> wrote:
>
>>you don't get this group at all.
> Like a troll,
Do you have much experience on Usenet?
If so, and if you truly believe that Uri is a troll,
then please attempt to exert some self control and don't feed it!
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
|
|
0
|
|
|
|
Reply
|
Tad
|
7/1/2010 4:03:46 AM
|
|
>>>>> "JK" == John Kelly <jak@isp2dial.com> writes:
JK> On Wed, 30 Jun 2010 23:26:59 -0400, "Uri Guttman" <uri@StemSystems.com>
JK> wrote:
>> you don't get this group at all. it discusses perl. all posts here are
>> open to comments about the perl in them. the exclusively news aspect of
>> usenet or nntp is long gone. your bringing that up is just silly. silly!
>> silly! silly!
JK> Like a troll, you don't hear what I say. You hear what you want to
JK> hear. I didn't say usenet is *exclusively* news.
JK> You regard this ng as your personal territory, and hate it when someone
JK> won't follow your self projection bias.
nah, i just don't like bad perl code and comment on it when i see
it. you are projecting dislike of your code onto dislike of you. keep it
up and they will become the same. in the mean time your code needs work
and you shouldn't be offering help to others until it gets better. you
will still do that and will still comment on your code.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
|
|
0
|
|
|
|
Reply
|
Uri
|
7/1/2010 4:05:23 AM
|
|
On Jun 30, 4:56=A0pm, James Egan <jegan...@comcast.net> wrote:
> On Thu, 01 Jul 2010 01:41:33 +0200, Dr.Ruud wrote:
> > James Egan wrote:
>
> >> Assuming an array named @myfiles contained three elements like:
>
> >> -rwxrwxrwx =A0 1 777 =A0 22000 =A0 2971201 Jan 24 18:17 file1.zip -rwx=
rwxrwx
> >> =A01 777 =A0 22000 =A0 2969941 Jan 28 18:10 file2 onespace.zip -rwxrwx=
rwx =A0 1
> >> 777 =A0 22000 =A0 2969941 Jan 29 13:28 file3 two spaces.zip
>
> >> I want to extract just the file which contain spaces to work with like=
:
>
> >> file1.zip
> >> file2 onespace.zip
> >> file3 two spaces.zip
>
> > echo "-rwxrwxrwx =A0 1 777 =A0 22000 =A0 2969941 Jan 29 13:28 file3 two
> > spaces.zip" |perl -wnle '
>
> > =A0 =A0print substr($_, 50);
> > '
> > file3 two spaces.zip
>
> I should have mentioned that the dates, sizes, names, of the files, might=
be
> different, so they won't always start at position 50.
>
A split'n-splice perhaps:
my @f =3D split ' ', $dirline;
splice( @f, 0, 8 );
print qq{@f} if @f > 1;
This'll squeeze down multiple spaces though.
To avoid that:
chomp $dirline;
my @f =3D split /(\s+)/x, $dirline;
splice( @f, 0, 16 );
print join '',@f,"\n" if @f > 1;
--
Charles DeRykus
|
|
0
|
|
|
|
Reply
|
C
|
7/1/2010 4:08:39 AM
|
|
On Wed, 30 Jun 2010 23:03:46 -0500, Tad McClellan <tadmc@seesig.invalid>
wrote:
>Do you have much experience on Usenet?
I think so. How much do you have?
>If so, and if you truly believe that Uri is a troll,
>then please attempt to exert some self control and don't feed it!
Why don't you ask Uri to ignore me. I'm not important, but he seems
obsessed with me.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 4:41:03 AM
|
|
On Jun 30, 9:08=A0pm, "C.DeRykus" <dery...@gmail.com> wrote:
> On Jun 30, 4:56=A0pm, James Egan <jegan...@comcast.net> wrote:
>
>
> ...
>
> A split'n-splice perhaps:
>
> =A0 =A0my @f =3D split ' ', $dirline;
> =A0 =A0splice( =A0@f, 0, 8 );
> =A0 =A0print qq{@f} if =A0@f > 1;
>
> This'll squeeze down multiple spaces though.
> To avoid that:
>
> =A0 =A0chomp $dirline;
> =A0 =A0my @f =3D split =A0/(\s+)/x, $dirline;
> =A0 =A0splice( =A0@f, 0, 16 );
> =A0 =A0print join '',@f,"\n" if @f > 1;
>
I just spotted Ben's mention of potential spaces in
other fields so this could be a portability issue.
--
Charles DeRykus
|
|
0
|
|
|
|
Reply
|
C
|
7/1/2010 4:53:07 AM
|
|
John Kelly wrote:
> open DATA, 'data';
ITYM: open my $fh_data, "<", "data";
Under which rock were you the last 10 years?
> @files = <DATA>;
> foreach (@files) {
> print;
> }
A for-loop? Make it a while.
--
Ruud
|
|
0
|
|
|
|
Reply
|
Dr
|
7/1/2010 7:55:52 AM
|
|
On 2010-07-01, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>
> BM> ls -l output intentionally uses fixed-width columns, except for the
> BM> filename. So
>
> normally that is true, but very large files can cause the name column to
> be shifted over. some ls flavors or options will change the size to use
> a suffix but you can't count on fixed width there. as i posted it is
> best to assume fixed width until the size but that is always a number
> with a possible size suffix so it is easy to match and the rest is the
> file name.
An observation (that may be erroneous) of the output of ls: The second
to last field is always the time, which contains a colon. How about
matching /:\d{2}\s+.*\s+.+\b/ ?
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
if (/:\d{2}\s+(.*\s+.+)\b/) {
print $1, "\n";
}
}
__DATA__
-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
Justin.
--
Justin C, by the sea.
|
|
0
|
|
|
|
Reply
|
Justin
|
7/1/2010 11:14:33 AM
|
|
On Thu, 01 Jul 2010 09:55:52 +0200, "Dr.Ruud" <rvtol+usenet@xs4all.nl>
wrote:
>John Kelly wrote:
>
>> open DATA, 'data';
>
>ITYM: open my $fh_data, "<", "data";
>
>Under which rock were you the last 10 years?
>
>
>> @files = <DATA>;
>> foreach (@files) {
>> print;
>> }
>
>A for-loop? Make it a while.
The OP was not initially clear on what he wanted. So I threw something
together and put it out there, to see if he would elaborate. At that
early stage, starting a dialog was more important than providing elegant
Perl code. Sometimes people wander into a newsgroup looking for ideas,
and need a friendly helping hand more than elegant code.
He seems to have disappeared though. I guess the mean people scared him
away.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 11:55:30 AM
|
|
J?rgen Exner <jurgenex@hotmail.com> wrote:
>>Assuming an array named @myfiles contained three elements like:
>>
>>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>>-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>>-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>>
>>I want to extract just the file which contain spaces to work with like:
>>
>>file1.zip
>>file2 onespace.zip
>>file3 two spaces.zip
> Easy. split() the line into its 9 elements at any non-empty sequence of
> spaces and then pick the last one:
How about split on the : and then substr from 3 chars in?
|
|
0
|
|
|
|
Reply
|
vicky
|
7/1/2010 1:46:27 PM
|
|
Justin C <justin.0911@purestblue.com> wrote:
>On 2010-07-01, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>>
>> BM> ls -l output intentionally uses fixed-width columns, except for the
>> BM> filename. So
>>
>> normally that is true, but very large files can cause the name column to
>> be shifted over. some ls flavors or options will change the size to use
>> a suffix but you can't count on fixed width there. as i posted it is
>> best to assume fixed width until the size but that is always a number
>> with a possible size suffix so it is easy to match and the rest is the
>> file name.
>
>An observation (that may be erroneous) of the output of ls: The second
>to last field is always the time, which contains a colon. How about
>matching /:\d{2}\s+.*\s+.+\b/ ?
[...]
> if (/:\d{2}\s+(.*\s+.+)\b/) {
> print $1, "\n";
Not a good idea because colons and digits are legal characters in
filenames and therefore it will chop up filenames like e.g.
foo45:10 bar baz.tmp
jue
|
|
0
|
|
|
|
Reply
|
J
|
7/1/2010 2:38:54 PM
|
|
<vicky@dinky.vm.bytemark.co.uk> wrote:
>J?rgen Exner <jurgenex@hotmail.com> wrote:
>>>Assuming an array named @myfiles contained three elements like:
>>>
>>>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>>>-rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>>>-rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>>>
>>>I want to extract just the file which contain spaces to work with like:
>>>
>>>file1.zip
>>>file2 onespace.zip
>>>file3 two spaces.zip
>> Easy. split() the line into its 9 elements at any non-empty sequence of
>> spaces and then pick the last one:
>
>How about split on the : and then substr from 3 chars in?
You would need to be able to distinguish between the : as part of the
time stamp and a : as part of a file name.
As others have mentioned already: this format is very poorly suited to
be parsed. The best solution is to ask for the file list in a usable
form.
jue
|
|
0
|
|
|
|
Reply
|
J
|
7/1/2010 2:41:51 PM
|
|
On Thu, 01 Jul 2010 07:41:51 -0700, J�rgen Exner <jurgenex@hotmail.com>
wrote:
><vicky@dinky.vm.bytemark.co.uk> wrote:
>>How about split on the : and then substr from 3 chars in?
>
>You would need to be able to distinguish between the : as part of the
>time stamp and a : as part of a file name.
>
>As others have mentioned already: this format is very poorly suited to
>be parsed. The best solution is to ask for the file list in a usable
>form.
>
>jue
Similar to my idea of split with 8 undefs, but more concisely, your
suggestion of:
my $file = (split(/ +/, $_, 9))[8];
works fine. The field count, prior to the file name, will not likely
change for the OP. Not every solution needs to be universally portable,
most people just need something that works in their local environment.
Of course I realize that does little to assuage the social thirst of the
dominant clique here.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 3:07:45 PM
|
|
John Kelly <jak@isp2dial.com> writes:
> Of course I realize that does little to assuage the social thirst of the
> dominant clique here.
Boo hoo. Whine much?
sherm--
--
Sherm Pendley <www.shermpendley.com>
<www.camelbones.org>
Cocoa Developer
|
|
0
|
|
|
|
Reply
|
Sherm
|
7/1/2010 3:31:49 PM
|
|
On Thu, 01 Jul 2010 11:31:49 -0400, Sherm Pendley
<spamtrap@shermpendley.com> wrote:
>John Kelly <jak@isp2dial.com> writes:
>
>> Of course I realize that does little to assuage the social thirst of the
>> dominant clique here.
>
>Boo hoo. Whine much?
You just couldn't resist could you.
"While snarlers strive with proud but fruitless pain
To wound immortals, or to slay the slain."
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 3:38:15 PM
|
|
On 2010-07-01 11:14, Justin C <justin.0911@purestblue.com> wrote:
> On 2010-07-01, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>>
>> BM> ls -l output intentionally uses fixed-width columns, except for the
>> BM> filename. So
>>
>> normally that is true, but very large files can cause the name column to
>> be shifted over. some ls flavors or options will change the size to use
>> a suffix but you can't count on fixed width there. as i posted it is
>> best to assume fixed width until the size but that is always a number
>> with a possible size suffix so it is easy to match and the rest is the
>> file name.
>
> An observation (that may be erroneous) of the output of ls: The second
> to last field is always the time, which contains a colon.
Not true. If the file is older than 6 months, this field will be the
year. But something like
($stuff_before_date, $date, $filename) =
m/
(.*) \040
((?: Jan | Feb | ... | Dec) \d\d (?: \d\d:\d\d | \d\d\d\d) ) \040
(.*)
/x;
might work. Of course the output of ls depends on the locale, so it
might be completely different, but if we want to parse every possible
output of ls the problem becomes intractable, so you need to make some
assumptions, like "locale is known" or "user and group names don't
contain spaces" or "fields are lined up". To take advantage of the
last property you need to check all the lines to detect vertical
columns.
hp
|
|
0
|
|
|
|
Reply
|
Peter
|
7/1/2010 4:02:15 PM
|
|
John Kelly <jak@isp2dial.com> wrote:
> On Thu, 01 Jul 2010 11:31:49 -0400, Sherm Pendley
><spamtrap@shermpendley.com> wrote:
>
>>John Kelly <jak@isp2dial.com> writes:
>>
>>> Of course I realize that does little to assuage the social thirst of the
>>> dominant clique here.
>>
>>Boo hoo. Whine much?
>
> You just couldn't resist could you.
*you* just couldn't resist, could you?
Your followup partially quoted above was a decent contribution
to the group... until you had to inject your bile at the end.
You reap what you sow.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
|
|
0
|
|
|
|
Reply
|
Tad
|
7/1/2010 4:24:42 PM
|
|
On 2010-07-01 03:46, John Kelly <jak@isp2dial.com> wrote:
> On Wed, 30 Jun 2010 23:26:59 -0400, "Uri Guttman" <uri@StemSystems.com>
> wrote:
[nothing of importance]
Can you two please take your bickering elsewhere? This is getting
tireseome.
hp
|
|
0
|
|
|
|
Reply
|
Peter
|
7/1/2010 4:41:12 PM
|
|
In article <j4vn26110qd9p89ai9tngmr4o43eia26pb@4ax.com>, J�rgen Exner
<jurgenex@hotmail.com> wrote:
> "Uri Guttman" <uri@StemSystems.com> wrote:
> >that is just poor code. do you want to count the undefs each time you do
> >something like that? and why is that last line checking for space? he
> >wants all the file names.
>
> That is not what the OP said. He explicitely said:
> "I want to extract just the file which contain spaces"
>
> To me that excludes filenames without spaces.
Unfortunately, "file which contain spaces" is incorrect English and
ambiguous. The phrase should be either:
1. "file names that contain spaces"
or
2. "file names, which can contain spaces"
"that" is "restrictive" here and means only file names that include
spaces are wanted, while "which" is "non-restrictive" and means that
all file names are wanted, but some may contain spaces. Note the
difference in punctuation.
<http://www.worldwidewords.org/articles/which.htm>
While this sounds like a nit-pick, it does influence the code, as we
have seen.
--
Jim Gibson
|
|
0
|
|
|
|
Reply
|
Jim
|
7/1/2010 4:44:56 PM
|
|
On Thu, 1 Jul 2010 18:41:12 +0200, "Peter J. Holzer"
<hjp-usenet2@hjp.at> wrote:
>On 2010-07-01 03:46, John Kelly <jak@isp2dial.com> wrote:
>> On Wed, 30 Jun 2010 23:26:59 -0400, "Uri Guttman" <uri@StemSystems.com>
>> wrote:
>[nothing of importance]
>
>Can you two please take your bickering elsewhere? This is getting
>tireseome.
You're right, it's gone too far.
I will try to exert more willpower and resist the urge. I hope people
understand that means I won't answer, when they question why did I post
this or that.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 4:58:35 PM
|
|
On 2010-07-01 00:38, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth James Egan <jegan473@comcast.net>:
>> Assuming an array named @myfiles contained three elements like:
>>
>> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>>
>> I want to extract just the file which contain spaces to work with like:
>>
>> file1.zip
>> file2 onespace.zip
>> file3 two spaces.zip
>>
>>
>> How can I extract the file names which have spaces?
>
> ls -l output intentionally uses fixed-width columns, except for the
> filename. So
Depends on the version of ls. Recent versions of GNU ls vary all the
column widths to fit their contents. So they are always nicely aligned
but different on each listing.
hp
|
|
0
|
|
|
|
Reply
|
Peter
|
7/1/2010 5:02:44 PM
|
|
John Kelly wrote:
> Sometimes people wander into a newsgroup looking for ideas,
> and need a friendly helping hand more than elegant code.
Gentle healers make stinking wounds.
--
Ruud
|
|
0
|
|
|
|
Reply
|
Dr
|
7/1/2010 5:15:29 PM
|
|
On 2010-07-01 03:17, Tad McClellan <tadmc@seesig.invalid> wrote:
> James Egan <jegan473@comcast.net> wrote:
>
> [ snip where a nice soul has tried to solve the OP's poorly specified problem ]
>
>> I should have mentioned that the dates, sizes, names, of the files,
>> might be different, so they won't always start at position 50.
>
>
> No, you should not have mentioned that.
>
> You should have provided test data that reflects your real data.
I disagree. He should have mentioned that and quite a few things more
(for example the different date formats, whether user and group names
are always numeric, and if not, whether they can contain spaces, etc.)
Test data is nice but you can never assume that it covers all possible
cases and requirements reverse engineered from a few lines of test
data are almost guaranteed to be incomplete. Besides, why should
everyone in this group have to figure out the requirements when the OP
can do it once?
hp
|
|
0
|
|
|
|
Reply
|
Peter
|
7/1/2010 8:35:34 PM
|
|
On Thu, 01 Jul 2010 11:14:33 -0000, Justin C <justin.0911@purestblue.com> wrote:
>On 2010-07-01, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>>
>> BM> ls -l output intentionally uses fixed-width columns, except for the
>> BM> filename. So
>>
>> normally that is true, but very large files can cause the name column to
>> be shifted over. some ls flavors or options will change the size to use
>> a suffix but you can't count on fixed width there. as i posted it is
>> best to assume fixed width until the size but that is always a number
>> with a possible size suffix so it is easy to match and the rest is the
>> file name.
>
>An observation (that may be erroneous) of the output of ls: The second
>to last field is always the time, which contains a colon. How about
>matching /:\d{2}\s+.*\s+.+\b/ ?
^
' ' is in the class defined by . and \s
Given "18:17\040\040file1.zip",
:\d{2}\s+ will match ":17\040", ".*" will match nothing
and "\s+.+\b" will match "\040file1.zip"
Equally, /:\d{2}\s+.+\s+.+\b/
^
will produce the same problem given
"18:17\040\040\040file1.zip"
The solution is to anchor both ends of the filename with a single
character of the class \S, then let backtracking take over the middle
with the 0 or more quantifier \S.*\s.*\S
Test case:
"Jan 24 18:17 file1.zip" =~ /:\d{2}\s+(.*\s+.+)\b/
and print "$1\n";
"Jan 24 18:17 file2.zip" =~ /:\d{2}\s+(.+\s+.+)\b/
and print "$1\n";
"Jan 24 18:17 file3.zip" =~ /:\d{2}\s+(\S.*\s.*\S)\b/
and print "$1\n";
"Jan 24 18:17 file4.zip" =~ /:\d{2}\s+(\S.*\s.*\S)\b/
and print "$1\n";
>
>#!/usr/bin/perl
>
>use strict;
>use warnings;
>
>while (<DATA>) {
> if (/:\d{2}\s+(.*\s+.+)\b/) {
^^^^^^^^^^^^^^^^^^^^^^
/:\d{2}\s+(\S.*\s.*\S)\b/
> print $1, "\n";
> }
>}
>
>__DATA__
>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
>-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
>
-sln
|
|
0
|
|
|
|
Reply
|
sln
|
7/1/2010 8:51:16 PM
|
|
On Thu, 1 Jul 2010 18:02:15 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>On 2010-07-01 11:14, Justin C <justin.0911@purestblue.com> wrote:
>> On 2010-07-01, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>>>
>>> BM> ls -l output intentionally uses fixed-width columns, except for the
>>> BM> filename. So
>>>
>>> normally that is true, but very large files can cause the name column to
>>> be shifted over. some ls flavors or options will change the size to use
>>> a suffix but you can't count on fixed width there. as i posted it is
>>> best to assume fixed width until the size but that is always a number
>>> with a possible size suffix so it is easy to match and the rest is the
>>> file name.
>>
>> An observation (that may be erroneous) of the output of ls: The second
>> to last field is always the time, which contains a colon.
>
>Not true. If the file is older than 6 months, this field will be the
>year. But something like
>
>($stuff_before_date, $date, $filename) =
> m/
> (.*) \040
> ((?: Jan | Feb | ... | Dec) \d\d (?: \d\d:\d\d | \d\d\d\d) ) \040
> (.*)
> /x;
>
>might work. Of course the output of ls depends on the locale, so it
>might be completely different, but if we want to parse every possible
>output of ls the problem becomes intractable, so you need to make some
>assumptions, like "locale is known" or "user and group names don't
>contain spaces" or "fields are lined up". To take advantage of the
>last property you need to check all the lines to detect vertical
>columns.
>
Increasingly, it becomes evident that fields may be the easiest to
maintain but certainly not a fix. The date as an anchor for the filename
may be more reliable, but the locale could present issues.
So in the spirit of split, something via regex maybe?
/\s* (?: \S+ \040+ ){8} ([^\040\n] .* \040 .* [^\040\n]) \n/x
-sln
|
|
0
|
|
|
|
Reply
|
sln
|
7/1/2010 9:10:22 PM
|
|
On 2010-07-01, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> I disagree. He should have mentioned that and quite a few things more
> (for example the different date formats, whether user and group names
> are always numeric, and if not, whether they can contain spaces, etc.)
>
> Test data is nice but you can never assume that it covers all possible
> cases and requirements reverse engineered from a few lines of test
> data are almost guaranteed to be incomplete. Besides, why should
> everyone in this group have to figure out the requirements when the OP
> can do it once?
I think that now you realized that your arguments lead you to a
contradiction. This thread provided tons of useful information about
how the OP's requirements MIGHT look like. Armed with this list of
possible complications, NOW the OP is in the position of answering the
question about which of these complications might appear in his listings.
And I consider it very probable that before these contributions, the
OP could not formulate his requirements JUST BEFORE HE DID NOT REALIZE
THESE COMPLICATIONS (as I missed the considerations of the user/group
names and locales when I first read his post).
====
In general, I consider the pipe dream of "formulate the requirements
first, then code to specification" very counter-productive; most
probably, it is one of the principal tumbling blocks in the
contemporary state of the programming. It may work in a handful of
toy situations, but it does not scale up to "what really happens".
Software designers should be ready to changes-in-specification when
new information is gathered by pilot implementations. They should be
ready to advise ASAP the clients on possible pitfalls in the pilot
specifications, etc.
Yours,
Ilya
|
|
0
|
|
|
|
Reply
|
Ilya
|
7/1/2010 10:25:41 PM
|
|
On 7/1/2010 3:25 PM, Ilya Zakharevich wrote:
> On 2010-07-01, Peter J. Holzer<hjp-usenet2@hjp.at> wrote:
>> I disagree. He should have mentioned that and quite a few things more
>> (for example the different date formats, whether user and group names
>> are always numeric, and if not, whether they can contain spaces, etc.)
>>
>> Test data is nice but you can never assume that it covers all possible
>> cases and requirements reverse engineered from a few lines of test
>> data are almost guaranteed to be incomplete. Besides, why should
>> everyone in this group have to figure out the requirements when the OP
>> can do it once?
>
> I think that now you realized that your arguments lead you to a
> contradiction. This thread provided tons of useful information about
> how the OP's requirements MIGHT look like. Armed with this list of
> possible complications, NOW the OP is in the position of answering the
> question about which of these complications might appear in his listings.
>
> And I consider it very probable that before these contributions, the
> OP could not formulate his requirements JUST BEFORE HE DID NOT REALIZE
> THESE COMPLICATIONS (as I missed the considerations of the user/group
> names and locales when I first read his post).
>
> ====
>
> In general, I consider the pipe dream of "formulate the requirements
> first, then code to specification" very counter-productive; most
> probably, it is one of the principal tumbling blocks in the
> contemporary state of the programming. It may work in a handful of
> toy situations, but it does not scale up to "what really happens".
>
> Software designers should be ready to changes-in-specification when
> new information is gathered by pilot implementations. They should be
> ready to advise ASAP the clients on possible pitfalls in the pilot
> specifications, etc.
>
Here, here! (roar of applause in the background)
\s
--
"There is no use in your walking five miles to fish when you can depend
on being just as unsuccessful near home." M. Twain
|
|
0
|
|
|
|
Reply
|
Steve
|
7/5/2010 12:37:08 AM
|
|
On Sun, 04 Jul 2010 17:37:08 -0700, Steve M
<stevem_clipthis_@clubtrout.com> wrote:
>On 7/1/2010 3:25 PM, Ilya Zakharevich wrote:
>> In general, I consider the pipe dream of "formulate the requirements
>> first, then code to specification" very counter-productive; most
>> probably, it is one of the principal tumbling blocks in the
>> contemporary state of the programming. It may work in a handful of
>> toy situations, but it does not scale up to "what really happens".
>> Software designers should be ready to changes-in-specification when
>> new information is gathered by pilot implementations. They should be
>> ready to advise ASAP the clients on possible pitfalls in the pilot
>> specifications, etc.
>Here, here! (roar of applause in the background)
Encore!
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/5/2010 1:12:56 AM
|
|
On Thu, 01 Jul 2010 22:25:41 +0000, Ilya Zakharevich wrote:
> In general, I consider the pipe dream of "formulate the requirements
> first, then code to specification" very counter-productive; most
> probably, it is one of the principal tumbling blocks in the contemporary
> state of the programming. It may work in a handful of toy situations,
> but it does not scale up to "what really happens".
That is why one of the cardinal rules of Prince II project management is
"Requirements change".
> Software designers should be ready to changes-in-specification when new
> information is gathered by pilot implementations. They should be ready
> to advise ASAP the clients on possible pitfalls in the pilot
> specifications, etc.
Encapsulation and loose coupling are your friends here. Well written
components encapsulate the requirements and are easy to change.
M4
|
|
0
|
|
|
|
Reply
|
Martijn
|
7/5/2010 5:29:23 AM
|
|
On 2010-07-05, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
>> In general, I consider the pipe dream of "formulate the requirements
>> first, then code to specification" very counter-productive; most
>> probably, it is one of the principal tumbling blocks in the contemporary
>> state of the programming. It may work in a handful of toy situations,
>> but it does not scale up to "what really happens".
> That is why one of the cardinal rules of Prince II project management is
> "Requirements change".
>> Software designers should be ready to changes-in-specification when new
>> information is gathered by pilot implementations. They should be ready
>> to advise ASAP the clients on possible pitfalls in the pilot
>> specifications, etc.
> Encapsulation and loose coupling are your friends here. Well written
> components encapsulate the requirements and are easy to change.
Not in my experience. Encapsulation is also a mantra which "really
helps" in toy situations only. "With real problems" you cannot guess
the "proper" capsules until you finished the project...
So, essentially, encapsulation helps only when one is ready to move
boundaries of capsules (which usually leads to major headackes
coding-wise...).
Likewise, `loose coupling' is possible only if you guessed boundaries
of capsules "correct", AND this choice provides enough "looseness".
And, of course, "in real life" the boundaries of capsules MUST go over
"a live flesh" when there is just not enough loose coupling inherent in
the problem domain.
Yours,
Ilya
|
|
0
|
|
|
|
Reply
|
Ilya
|
7/5/2010 10:05:29 AM
|
|
Martijn Lievaart wrote:
> On Thu, 01 Jul 2010 22:25:41 +0000, Ilya Zakharevich wrote:
>> In general, I consider the pipe dream of "formulate the requirements
>> first, then code to specification" very counter-productive; most
>> probably, it is one of the principal tumbling blocks in the contemporary
>> state of the programming. It may work in a handful of toy situations,
>> but it does not scale up to "what really happens".
>
> That is why one of the cardinal rules of Prince II project management is
> "Requirements change".
>
>> Software designers should be ready to changes-in-specification when new
>> information is gathered by pilot implementations. They should be ready
>> to advise ASAP the clients on possible pitfalls in the pilot
>> specifications, etc.
>
> Encapsulation and loose coupling are your friends here. Well written
> components encapsulate the requirements and are easy to change.
OTOH, you can't always afford a wasteful interface. And project
management is often evil in itself.
We make many small changes every day to a big central code base, shared
by all sub-processes. Hardly any redesign or refactoring needed. No
tiers, just Linux, Apache, MySQL, Perl.
Only one out of ten changes really survives, so never mind throwing out
what you recently did. By keeping changes small, the mistakes are
limited too, and you still know immediately where they come from.
--
Ruud
|
|
0
|
|
|
|
Reply
|
Dr
|
7/5/2010 10:12:28 AM
|
|
On Mon, 05 Jul 2010 10:05:29 +0000, Ilya Zakharevich wrote:
> On 2010-07-05, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
>
>> Encapsulation and loose coupling are your friends here. Well written
>> components encapsulate the requirements and are easy to change.
>
> Not in my experience. Encapsulation is also a mantra which "really
> helps" in toy situations only. "With real problems" you cannot guess
> the "proper" capsules until you finished the project...
I agree that it is hard to get it right the first time, but as the
requirements DO change, you get several shots at it. :-) Although said
tongue in cheek, it is very true in real life.
I also agree that in retrospect you always see better ways and some grave
design errors you made.
Still, encapsulation does help. Some parts are always trivial, some parts
are (much) harder and some thinking upfront (preferably in a team!) may
save you a lot of time later. If unsure, prototype first. Not for the
customer, but for yourself. Be prepared to throw the prototype away ro
completely rewrite it.
If the requirements change and it turns out your encapsulation was wrong,
you probably didn't understand the problem domain well enough anyhow and
should indeed change the encapsulation as a result of your new insights.
And I strongly disagree about the "Toy situations". The bigger the
project, the more important modeling and encapsulation become.
>
> So, essentially, encapsulation helps only when one is ready to move
> boundaries of capsules (which usually leads to major headackes
> coding-wise...).
Strongly disagree. Although sometimes the problem context is hard to
model, more often than not it is not that hard. And it always pays off.
If you did not encapsulate well or at all, the rewrite is often just as
painful, or even more. Only when your encapsulation is really poor, it
becomes a real pain in the butt and you had better start over.
>
> Likewise, `loose coupling' is possible only if you guessed boundaries of
> capsules "correct", AND this choice provides enough "looseness". And, of
Same reasoning as above applies, it is not only likewise, it is actually
the same issue.
> course, "in real life" the boundaries of capsules MUST go over "a live
> flesh" when there is just not enough loose coupling inherent in the
> problem domain.
I don't understand that last sentence (English is not my native
language), can you repeat that?
M4
|
|
0
|
|
|
|
Reply
|
Martijn
|
7/5/2010 11:32:39 AM
|
|
Martijn Lievaart wrote:
> The bigger the project
, the bigger the failure.
--
Ruud
|
|
0
|
|
|
|
Reply
|
Dr
|
7/5/2010 11:55:34 AM
|
|
On 2010-07-01 22:25, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> On 2010-07-01, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> I disagree. He should have mentioned that and quite a few things more
>> (for example the different date formats, whether user and group names
>> are always numeric, and if not, whether they can contain spaces, etc.)
>>
>> Test data is nice but you can never assume that it covers all possible
>> cases and requirements reverse engineered from a few lines of test
>> data are almost guaranteed to be incomplete. Besides, why should
>> everyone in this group have to figure out the requirements when the OP
>> can do it once?
>
> I think that now you realized that your arguments lead you to a
> contradiction.
No, not at all.
> This thread provided tons of useful information about
> how the OP's requirements MIGHT look like.
But not from test data provided by the OP. Most of the useful
information came from the experience of other posters. And they
presented their knowledge in English, not in the form of uncommented
"ls -l" output.
> Armed with this list of
> possible complications, NOW the OP is in the position of answering the
> question about which of these complications might appear in his listings.
Right. And without knowledge about these complications he couldn't have
selected the test data either. The knowledge you need to select minimal
test data with maximum coverage is exactly the same knowledge you need
to write a specification in plain English.
Now the "minimal" test data absolutely not a requirement in real life:
If I needed to parse ls output I would be happy to get 10000 lines of
real output from each of the target systems. But this is Usenet. If the
OP simply posted 10000 lines of test data here, Tad would be the first
to ask him whether he'd lost his marbles. And with good reason. This
group is not for solving somebodys programming requirements but for
discussing Perl.
hp
|
|
0
|
|
|
|
Reply
|
Peter
|
7/5/2010 1:27:18 PM
|
|
On Mon, 05 Jul 2010 13:55:34 +0200, Dr.Ruud wrote:
> Martijn Lievaart wrote:
>
>> The bigger the project
>
> , the bigger the failure.
:-)
One customer I work at has the ability to start a huge project to unify
the N tools in use for a certain purpose. With the next round of budget
cuts, the project is reduced in scope and we end up with N+1 tools for
the same purpose.
M4
|
|
0
|
|
|
|
Reply
|
Martijn
|
7/5/2010 2:51:40 PM
|
|
On Mon, 5 Jul 2010 15:27:18 +0200, "Peter J. Holzer"
<hjp-usenet2@hjp.at> wrote:
>This group is not for solving somebodys programming requirements but
>for discussing Perl.
Over in c.u.s and c.l.a they write solutions faster than you can say
"cool!" Not every ng is overrun with uncool people.
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
|
|
0
|
|
|
|
Reply
|
John
|
7/5/2010 5:03:00 PM
|
|
On 2010-07-05 17:03, John Kelly <jak@isp2dial.com> wrote:
> On Mon, 5 Jul 2010 15:27:18 +0200, "Peter J. Holzer"
><hjp-usenet2@hjp.at> wrote:
>
>>This group is not for solving somebodys programming requirements but
>>for discussing Perl.
>
> Over in c.u.s and c.l.a they write solutions faster than you can say
> "cool!"
Then try posting 10000 lines of example input there and see how fast
they write the solution.
hp
|
|
0
|
|
|
|
Reply
|
Peter
|
7/5/2010 6:22:46 PM
|
|
|
71 Replies
440 Views
(page loaded in 0.894 seconds)
|