f



Script to move files with one occurence of a string to one directory, and other files to another directory?

I could use a script for the bash shell to look through a group of files (s=
elected from one directory by a wildcard expression) and move any of those =
files with exactly one occurrence of the string Path: to a different direct=
ory, usually a subdirectory of the first directory.  The rest of the files =
selected by the wildcard expression are move to a third directory, also usu=
ally a subdirectory of the first directory.  I'm currently doing this manua=
lly, but with grep commands to count the number of occurrences.

NOT homework - I'm retired.  Actually using Cygwin (an emulator of Linux) u=
nder Window, but no replies in the Cygwin newsgroup.

This problem arose when I tried to clean up my saved newsgroup posts, and f=
ound that many of them are in files with two or more posts packed into the =
file, and therefore having Path: twice or more. 
0
Robert
4/5/2014 4:41:05 AM
comp.unix.shell 15484 articles. 3 followers. Post Follow

18 Replies
1801 Views

Similar Articles

[PageSpeed] 19

On 2014-04-05, Robert Miles <robertmilesxyz@gmail.com> wrote:
> I could use a script for the bash shell to look through a group of files
> (selected from one directory by a wildcard expression) and move any of those
> files with exactly one occurrence of the string Path: to a different
> directory, usually a subdirectory of the first directory.  The rest of the
> files selected by the wildcard expression are move to a third directory, also
> usually a subdirectory of the first directory.  I'm currently doing this
> manually, but with grep commands to count the number of occurrences.

for x in my*wild*card ; do
  if [ $(grep -c Path: -- "$x") -eq 1 ] ; then
    mv -- "$x" subdir1
  else
    mv -- "$x" subdir2
  fi
done

If none of the filenames look like command options (such as "-f" or "--long-opt")
then you can do without the -- guard. If they don't have spaces in their names,
you can use $x instead of "$x".
0
Kaz
4/5/2014 6:24:21 AM
Eesh.  Please wrap your lines at something around 80 characters,
preferably 65 as stated in RFC 1855.  Unfortunately, as you're
using Google Groups, you'll need to do this manually--or find
better software, which isn't hard.  Try news.software.readers or
news.answers.

On 2014-04-05, Robert Miles scribbled these curious markings:
> I could use a script for the bash shell to look through a group of
> files (selected from one directory by a wildcard expression) and
> move any of those files with exactly one occurrence of the string
> Path: to a different directory, usually a subdirectory of the first
> directory.  The rest of the files selected by the wildcard expression
> are move to a third directory, also usually a subdirectory of the
> first directory.  I'm currently doing this manually, but with grep
> commands to count the number of occurrences.

Sounds like it could be an interesting problem.  What do you have
so far?  I don't see any script along with your post.  Did Google
Groups eat it?

> NOT homework - I'm retired.  Actually using Cygwin (an emulator of
> Linux) under Window, but no replies in the Cygwin newsgroup.

I would expect that you didn't get any response from 'the Cygwin
newsgroup' (alt.comp.cygwin perhaps?) because you showed no
evidence of trying to solve this problem on your own.  Usually in
technical circles you'll need to show at least some effort before
people will be willing to help you.  I would advise you read the
seminal "How to ask questions the smart way" article[0], and some
FAQs about how Usenet works, before posting further.

[0]: http://catb.org/~esr/faqs/smart-questions.html

-- 
Chris Nehren
0
Chris
4/5/2014 10:00:01 PM
On 4/5/2014 5:00 PM, Chris Nehren wrote:
> Eesh.  Please wrap your lines at something around 80 characters,
> preferably 65 as stated in RFC 1855.  Unfortunately, as you're
> using Google Groups, you'll need to do this manually--or find
> better software, which isn't hard.  Try news.software.readers or
> news.answers.
>
> On 2014-04-05, Robert Miles scribbled these curious markings:
>> I could use a script for the bash shell to look through a group of
>> files (selected from one directory by a wildcard expression) and
>> move any of those files with exactly one occurrence of the string
>> Path: to a different directory, usually a subdirectory of the first
>> directory.  The rest of the files selected by the wildcard expression
>> are move to a third directory, also usually a subdirectory of the
>> first directory.  I'm currently doing this manually, but with grep
>> commands to count the number of occurrences.
>
> Sounds like it could be an interesting problem.  What do you have
> so far?  I don't see any script along with your post.  Did Google
> Groups eat it?

No script so far - I used many computer languages in my career, but
UNIX or Linux scripts were not among them.  I came up with only
single commands, such as:

grep -c Path: xyz.eml
grep Path: xyz.eml
mv xyz.eml fixed
mv xyz.eml notyet
flip -u xyz.eml
flip -d xyz.eml
Notepad (under Windows)

A few thousand files processed so far with these commands.  Some of
the remaining files do not use either the Unix end-of-lines or the DOS
end-of-lines properly, and therefore need to be process by software
that does not care which is used.

Windows does not seem to have similar commands, or I'd have used
those instead.

>> NOT homework - I'm retired.  Actually using Cygwin (an emulator of
>> Linux) under Window, but no replies in the Cygwin newsgroup.
>
> I would expect that you didn't get any response from 'the Cygwin
> newsgroup' (alt.comp.cygwin perhaps?) because you showed no
> evidence of trying to solve this problem on your own.  Usually in
> technical circles you'll need to show at least some effort before
> people will be willing to help you.  I would advise you read the
> seminal "How to ask questions the smart way" article[0], and some
> FAQs about how Usenet works, before posting further.
>
> [0]: http://catb.org/~esr/faqs/smart-questions.html
>

No sign of any posts at all so far this year on alt.comp.cygwin, so
I used gmane.os.cygwin instead, since it shows a little more activity
so far this year.  Google Groups has another group with cygwin in its 
name (not gated to Usenet), but I've seen no sign that the messages
posted there in the last two years even mention something related to Cygwin.

I've been using Usenet heavily since 2008, but I'll check if that link
leads to anything I haven't learned yet.
0
Robert
4/7/2014 1:14:09 AM
On 4/5/2014 1:24 AM, Kaz Kylheku wrote:
> On 2014-04-05, Robert Miles <robertmilesxyz@gmail.com> wrote:
>> I could use a script for the bash shell to look through a group of files
>> (selected from one directory by a wildcard expression) and move any of those
>> files with exactly one occurrence of the string Path: to a different
>> directory, usually a subdirectory of the first directory.  The rest of the
>> files selected by the wildcard expression are move to a third directory, also
>> usually a subdirectory of the first directory.  I'm currently doing this
>> manually, but with grep commands to count the number of occurrences.
>
> for x in my*wild*card ; do
>    if [ $(grep -c Path: -- "$x") -eq 1 ] ; then
>      mv -- "$x" subdir1
>    else
>      mv -- "$x" subdir2
>    fi
> done
>
> If none of the filenames look like command options (such as "-f" or "--long-opt")
> then you can do without the -- guard. If they don't have spaces in their names,
> you can use $x instead of "$x".

Thanks - that looks useful for getting started in shell scripts, 
although I haven't tried it yet.

I'm now using Thunderbird to reply, partly because it handles
line length limits better.

0
Robert
4/7/2014 1:19:56 AM
Robert Miles wrote:

> Thanks - that looks useful for getting started in shell scripts,
> although I haven't tried it yet.
           ^^^^^^^^^^^^^^^^^^^^^^
You see, that has been your problem all along.

Read “How To Ask Questions The Smart Way” and learn from it.
 
> I'm now using Thunderbird to reply, partly because it handles
> line length limits better.

The other part being read in the first place, I suppose, as Google Groups 
has become a continuous source of junk postings to Usenet.

-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.
0
Thomas
4/7/2014 9:57:31 AM
On 4/5/2014 1:24 AM, Kaz Kylheku wrote:
> On 2014-04-05, Robert Miles <robertmilesxyz@gmail.com> wrote:
>> I could use a script for the bash shell to look through a group of files
>> (selected from one directory by a wildcard expression) and move any of those
>> files with exactly one occurrence of the string Path: to a different
>> directory, usually a subdirectory of the first directory.  The rest of the
>> files selected by the wildcard expression are move to a third directory, also
>> usually a subdirectory of the first directory.  I'm currently doing this
>> manually, but with grep commands to count the number of occurrences.
>
> for x in my*wild*card ; do
>    if [ $(grep -c Path: -- "$x") -eq 1 ] ; then
>      mv -- "$x" subdir1
>    else
>      mv -- "$x" subdir2
>    fi
> done
>
> If none of the filenames look like command options (such as "-f" or "--long-opt")
> then you can do without the -- guard. If they don't have spaces in their names,
> you can use $x instead of "$x".

I'm now using the following script:

for x in *.eml ; do
   if [ $(grep -c Path: -- "$x") -eq 1 ] ; then
     mv -- "$x" notyet
   else
     mv -- "$x" fixed
   fi
done

It partially works - it divides all of the *.eml files between
the two subdirectories.  However, its choice of which subdirectory
to move them to seems to be unrelated to the number of times
Path: occurs in the file it is moving. How can I add statements to print 
$x and the count of how many times Path: appears in it (the
value actually used in the if) for debugging purposes?

Bear with me - I'm just starting to use this type of shell script
and do not yet know the names of suitable commands to look up in the
man file.

The first character of every filename is 2, but most of them have
at least one space in the filename.
0
Robert
4/8/2014 11:45:26 PM
On 2014-04-08, Robert Miles <milesrf@Usenet-News.net> wrote:
> It partially works - it divides all of the *.eml files between
> the two subdirectories.  However, its choice of which subdirectory
> to move them to seems to be unrelated to the number of times
> Path: occurs in the file it is moving. How can I add statements to print 
> $x and the count of how many times Path: appears in it (the
> value actually used in the if) for debugging purposes?

Perhaps like this:

  for x in *.eml ; do
    count=$(grep -c Path: -- "$x")
    echo "file = $x, count = $count"
    if [ $count -eq 1 ] ; then ...

Note that "grep -c" does not count match occurrences; it counts the number of
matching lines. This is the same as the number of matches only if no line
contains more than one match.

The values of "grep -c" used in the script should match those that you see
if you do the same "grep -c" command interactively in that directory.

> The first character of every filename is 2, but most of them have
> at least one space in the filename.

You're OK because every reference to $x is wrapped in quotes.
0
Kaz
4/9/2014 12:31:48 AM
On 4/8/2014 7:31 PM, Kaz Kylheku wrote:
> On 2014-04-08, Robert Miles <milesrf@Usenet-News.net> wrote:
>> It partially works - it divides all of the *.eml files between
>> the two subdirectories.  However, its choice of which subdirectory
>> to move them to seems to be unrelated to the number of times
>> Path: occurs in the file it is moving. How can I add statements to print
>> $x and the count of how many times Path: appears in it (the
>> value actually used in the if) for debugging purposes?
>
> Perhaps like this:
>
>    for x in *.eml ; do
>      count=$(grep -c Path: -- "$x")
>      echo "file = $x, count = $count"
>      if [ $count -eq 1 ] ; then ...
>
> Note that "grep -c" does not count match occurrences; it counts the number of
> matching lines. This is the same as the number of matches only if no line
> contains more than one match.
>
> The values of "grep -c" used in the script should match those that you see
> if you do the same "grep -c" command interactively in that directory.
>
>> The first character of every filename is 2, but most of them have
>> at least one space in the filename.
>
> You're OK because every reference to $x is wrapped in quotes.

That worked, and told that my main problem was that I'd switched the
two subdirectory names, and hadn't inspected enough files for the number
of Path: occurrences.  Two occurrences in the same line is unlikely to
be a problem - I haven't seen that in the over 1000 files I fixed
manually.

I also tried looking at a copy of Linux for Dummies; that book barely
mentions such low-level details as scripts and grep (perhaps one page
each, saying little more than what they could be used for), and the
index didn't even mention if.  It generally looks like a description
of what Linux can do, with rather few details on how to make it do
much.

Tomorrow, I'll have a try at modifying the script to make it ignore
occurrences of Path: with certain adjacent characters.

0
Robert
4/9/2014 1:01:01 AM
Robert Miles wrote:

> I also tried looking at a copy of Linux for Dummies; that book barely
> mentions such low-level details as scripts and grep (perhaps one page
> each, saying little more than what they could be used for), and the
> index didn't even mention if.  It generally looks like a description
> of what Linux can do, with rather few details on how to make it do
> much.

Sounds like it contains misleading statements such as that Linux was a full 
operating system instead if just the kernel of a (usually GNU-based) 
operating system.  „For Dummies“, alright.

STFW for a shell script tutorial instead.  There are plenty out there, many 
really good ones, too; some are mentioned in signatures here.

-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.
0
Thomas
4/9/2014 7:36:37 PM
On 04/08/2014 10:03 PM, Robert Miles wrote:

> Tomorrow, I'll have a try at modifying the script to make it ignore
> occurrences of Path: with certain adjacent characters.
>
If you're looking for "Path:" in email headers you could use
"^Path: "
0
Bill
4/10/2014 1:16:25 PM
On 4/10/2014 8:16 AM, Bill Marcum wrote:
> On 04/08/2014 10:03 PM, Robert Miles wrote:
>
>> Tomorrow, I'll have a try at modifying the script to make it ignore
>> occurrences of Path: with certain adjacent characters.
>>
> If you're looking for "Path:" in email headers you could use
> "^Path: "

I'm now using a longer script, to move file the script cannot
handle to subdirectory notyet3 and to exclude certain longer
strings containing Path: from the count.

for x in *.eml ; do
   # first, exclude all files with at least two occurrences of Path: at the
   # beginning of a line from testing with the next few versions of this
   # script
   count=$(grep ^Path: "$x" | grep -v Path:$ - | grep -c ^Path: - )
   if [ $count -ne 1 ] ; then
     echo "file = $x excluded; count = $count "
     # this file needs manual editting
     mv "$x" notyet3
   else
     # if not excluded, count the number of occurrences of Path: not at the
     # end of the line, excluding parts of >Path: and "> Path:"
     # with filtering to ignore certain instances of Path:
     count=$(grep Path: "$x" \
       | grep -v Path:$ - \
       | grep -v ^\>Path: - \
       | grep -v ^\>\ Path: - \
       | grep -v "n-Path:" - \
       | grep -c Path: - )
     # The above filtering lines work.
     # filtering line that fails with an error about a t option:
     # | grep -v "-Path:" - \
     echo "file = $x, count = $count"
     if [ $count -eq 1 ] ; then
       # this file doesn't need any action
       mv "$x" fixed
     else
       # this file  needs an update to the script
       mv "$x" notyet
     fi
   fi
done

How can I modify the filter for n-Path: to filter out all
occurrences of -Path: instead?

Also, what does STFW mean?
0
Robert
4/11/2014 2:10:44 AM
On 11.04.2014 04:10, Robert Miles wrote:
> On 4/10/2014 8:16 AM, Bill Marcum wrote:
>> On 04/08/2014 10:03 PM, Robert Miles wrote:
>>
>>> Tomorrow, I'll have a try at modifying the script to make it ignore
>>> occurrences of Path: with certain adjacent characters.
>>>
>> If you're looking for "Path:" in email headers you could use
>> "^Path: "
> 
> I'm now using a longer script,

If it gets longre you may consider using awk instead of grep; either for the
filters, or for the whole logic.

> to move file the script cannot
> handle to subdirectory notyet3 and to exclude certain longer
> strings containing Path: from the count.
> 
> for x in *.eml ; do
>   # first, exclude all files with at least two occurrences of Path: at the
>   # beginning of a line from testing with the next few versions of this
>   # script
>   count=$(grep ^Path: "$x" | grep -v Path:$ - | grep -c ^Path: - )
>   if [ $count -ne 1 ] ; then
>     echo "file = $x excluded; count = $count "
>     # this file needs manual editting
>     mv "$x" notyet3
>   else
>     # if not excluded, count the number of occurrences of Path: not at the
>     # end of the line, excluding parts of >Path: and "> Path:"
>     # with filtering to ignore certain instances of Path:
>     count=$(grep Path: "$x" \
>       | grep -v Path:$ - \
>       | grep -v ^\>Path: - \
>       | grep -v ^\>\ Path: - \
>       | grep -v "n-Path:" - \
>       | grep -c Path: - )
>     # The above filtering lines work.
>     # filtering line that fails with an error about a t option:
>     # | grep -v "-Path:" - \
>     echo "file = $x, count = $count"
>     if [ $count -eq 1 ] ; then
>       # this file doesn't need any action
>       mv "$x" fixed
>     else
>       # this file  needs an update to the script
>       mv "$x" notyet
>     fi
>   fi
> done
> 
> How can I modify the filter for n-Path: to filter out all
> occurrences of -Path: instead?

You are probably looking for the end-of-option feature?

  grep -v -- -Path:

*seems* to work. But...

As mentioned above, it may be simpler to use awk instead...

  awk '$1 !~ /^Path:$/'

or

  awk '$1 != "Path:"'

....which (for example) compares the first whitespace separated field in
the data if it does not match the regexp, resp. the string constant.

Counting the matches (or non-matches) would look like that...

  awk '$1 != "Path:" { c++ } END { print c }'

> 
> Also, what does STFW mean?

According to the wtf(1) program...

  $ wtf stfw
  STFW: search the fucking web


Janis

0
Janis
4/11/2014 8:31:11 AM
On 11.04.2014 10:31, Janis Papanagnou wrote:
> 
> According to the wtf(1) program...

wtf(6), actually.


NAME
     wtf — translates acronyms for you

SYNOPSIS
     wtf [-f dbfile] [-t type] [is] acronym ...

> 
>   $ wtf stfw
>   STFW: search the fucking web
> 
> 
> Janis
> 

0
Janis
4/11/2014 8:34:09 AM
On 11.04.2014 10:31, Janis Papanagnou wrote:
> On 11.04.2014 04:10, Robert Miles wrote:
>>[...]
>>     count=$(grep Path: "$x" \
>>       | grep -v Path:$ - \
>>       | grep -v ^\>Path: - \
>>       | grep -v ^\>\ Path: - \
>>       | grep -v "n-Path:" - \
>>       | grep -c Path: - )

  count=$( awk '$1=="Path:" {c++}  END {print c}'  "$x" )

seems all you need here in your concrete example.

Note this condition is even stricter than the grep cascade. But it will
(as your grep cascade) also trigger lines in the message body that start
with "Path:". Is that possible? - To cover that you may want to exit if
the body begins...

  count=$( awk '$1=="Path:" {c++} !NF {exit(0)} END {print c}' "$x" )

You see that with awk you can extend your program quite easily to handle
data more accurately and perform functions that grep is incapable to do.

Janis

>> [...]


0
Janis
4/11/2014 9:08:27 AM
On 4/11/2014 3:31 AM, Janis Papanagnou wrote:
> On 11.04.2014 04:10, Robert Miles wrote:
>> On 4/10/2014 8:16 AM, Bill Marcum wrote:
>>> On 04/08/2014 10:03 PM, Robert Miles wrote:
>>>
>>>> Tomorrow, I'll have a try at modifying the script to make it ignore
>>>> occurrences of Path: with certain adjacent characters.
>>>>
>>> If you're looking for "Path:" in email headers you could use
>>> "^Path:"
>>
>> I'm now using a longer script,
>
> If it gets longre you may consider using awk instead of grep; either for the
> filters, or for the whole logic.
>
>> to move file the script cannot
>> handle to subdirectory notyet3 and to exclude certain longer
>> strings containing Path: from the count.
>>
>> for x in *.eml ; do
>>    # first, exclude all files with at least two occurrences of Path: at the
>>    # beginning of a line from testing with the next few versions of this
>>    # script
>>    count=$(grep ^Path: "$x" | grep -v Path:$ - | grep -c ^Path: - )
>>    if [ $count -ne 1 ] ; then
>>      echo "file = $x excluded; count = $count "
>>      # this file needs manual editting
>>      mv "$x" notyet3
>>    else
>>      # if not excluded, count the number of occurrences of Path: not at the
>>      # end of the line, excluding parts of >Path: and "> Path:"
>>      # with filtering to ignore certain instances of Path:
>>      count=$(grep Path: "$x" \
>>        | grep -v Path:$ - \
>>        | grep -v ^\>Path: - \
>>        | grep -v ^\>\ Path: - \
>>        | grep -v "n-Path:" - \
>>        | grep -c Path: - )
>>      # The above filtering lines work.
>>      # filtering line that fails with an error about a t option:
>>      # | grep -v "-Path:" - \
>>      echo "file = $x, count = $count"
>>      if [ $count -eq 1 ] ; then
>>        # this file doesn't need any action
>>        mv "$x" fixed
>>      else
>>        # this file  needs an update to the script
>>        mv "$x" notyet
>>      fi
>>    fi
>> done
>>
>> How can I modify the filter for n-Path: to filter out all
>> occurrences of -Path: instead?
>
> You are probably looking for the end-of-option feature?
>
>    grep -v -- -Path:
>
> *seems* to work. But...
>
> As mentioned above, it may be simpler to use awk instead...
>
>    awk '$1 !~ /^Path:$/'
>
> or
>
>    awk '$1 != "Path:"'
>
> ...which (for example) compares the first whitespace separated field in
> the data if it does not match the regexp, resp. the string constant.
>
> Counting the matches (or non-matches) would look like that...
>
>    awk '$1 != "Path:" { c++ } END { print c }'
>
>>
>> Also, what does STFW mean?
>
> According to the wtf(1) program...
>
>    $ wtf stfw
>    STFW: search the fucking web
>
>
> Janis

That fix plus two more lines in the script seems to handle all
the cases I expect a script to be able to handle - over 2000
files so far.  It looks like I'll need to use manual editing
to handle the over 1000 files that a script could not handle
reliably.

For awk, I'd have to get familiar with its man entry first, and
I currently don't see enough reason to do that soon.

STFW works well once I know the correct name for what I'm
searching for, but not so well before then.

0
Robert
4/12/2014 3:16:47 AM
On 4/11/2014 4:08 AM, Janis Papanagnou wrote:
> On 11.04.2014 10:31, Janis Papanagnou wrote:
>> On 11.04.2014 04:10, Robert Miles wrote:
>>> [...]
>>>      count=$(grep Path: "$x" \
>>>        | grep -v Path:$ - \
>>>        | grep -v ^\>Path: - \
>>>        | grep -v ^\>\ Path: - \
>>>        | grep -v "n-Path:" - \
>>>        | grep -c Path: - )
>
>    count=$( awk '$1=="Path:" {c++}  END {print c}'  "$x" )
>
> seems all you need here in your concrete example.
>
> Note this condition is even stricter than the grep cascade. But it will
> (as your grep cascade) also trigger lines in the message body that start
> with "Path:". Is that possible? - To cover that you may want to exit if
> the body begins...
>
>    count=$( awk '$1=="Path:" {c++} !NF {exit(0)} END {print c}' "$x" )
>
> You see that with awk you can extend your program quite easily to handle
> data more accurately and perform functions that grep is incapable to do.
>
> Janis
>
>>> [...]

I doubt that would be suitable. I want to find all cases where a file
has two occurrence of Path: in positions suitable for the header of a
newsgroup post, but almost all the files should have at least one
occurrence.  I'm looking for cases where two or more newsgroup posts
were stored without the proper separation into separate files.

There are occasional cases where the second Path: does not occur at
the beginning of a line, but still needs to be counted, because it
appears that the first post was chopped off in the middle of a line,
and then Path: occurs next.

0
Robert
4/12/2014 3:27:45 AM
On 12.04.2014 05:27, Robert Miles wrote:
> On 4/11/2014 4:08 AM, Janis Papanagnou wrote:
>> On 11.04.2014 10:31, Janis Papanagnou wrote:
>>> On 11.04.2014 04:10, Robert Miles wrote:
>>>> [...]
>>>>      count=$(grep Path: "$x" \
>>>>        | grep -v Path:$ - \
>>>>        | grep -v ^\>Path: - \
>>>>        | grep -v ^\>\ Path: - \
>>>>        | grep -v "n-Path:" - \
>>>>        | grep -c Path: - )
>>
>>    count=$( awk '$1=="Path:" {c++}  END {print c}'  "$x" )
>>
>> seems all you need here in your concrete example.
>>
>> Note this condition is even stricter than the grep cascade. But it will
>> (as your grep cascade) also trigger lines in the message body that start
>> with "Path:". Is that possible? - To cover that you may want to exit if
>> the body begins...
>>
>>    count=$( awk '$1=="Path:" {c++} !NF {exit(0)} END {print c}' "$x" )
>>
>> You see that with awk you can extend your program quite easily to handle
>> data more accurately and perform functions that grep is incapable to do.
>>
>> Janis
>>
>>>> [...]
> 
> I doubt that would be suitable. I want to find all cases where a file
> has two occurrence of Path: in positions suitable for the header of a
> newsgroup post, but almost all the files should have at least one
> occurrence. 

Yes, so far so good.

> I'm looking for cases where two or more newsgroup posts
> were stored without the proper separation into separate files.

Then all it needs is to amit the "!NF" part and use the simpler first
proposal

    count=$( awk '$1=="Path:" {c++}  END {print c}'  "$x" )

My thought was that the files would start any header line with "Path:",
and count those, but not count "Path:" in the message body. The awk
program should do exactly that, ignoring all the "*-Path:", '>'-quoted
"Path:" entries, and all entries that come after the header (assuming
the header is separated from the body as it is in mail headers).

Note that your bulky grep cascade will not catch entries quoted by, e.g.,
'>>', ':', or, '>  '; you'd need to extend your regexp (not with the
awk code, but with your multi-grep's).

> 
> There are occasional cases where the second Path: does not occur at
> the beginning of a line, but still needs to be counted, because it
> appears that the first post was chopped off in the middle of a line,
> and then Path: occurs next.

Oh, that sounds line non-standard headers, or trashed data. Well...
Then you will not be able to correctly handle all files with whatever
(more or less suitable) tool you use.

Janis

0
Janis
4/12/2014 9:22:24 AM
On 12.04.2014 05:16, Robert Miles wrote:
> [...]
> 
> For awk, I'd have to get familiar with its man entry first, and
> I currently don't see enough reason to do that soon.

The basics that you need here are quite trivial. And it makes a lot
of typical tasks with large grep/sed/cut/head/tail... cascades much
simpler and more reliable. (But your choice of course.)

Janis

0
Janis
4/12/2014 9:28:17 AM
Reply:

Similar Artilces:

File permissions; copying file from one directory to another
Hey guys... I'm a bit inexperienced with Linux, so I've been struggling with this scenario; I have a web app that creates a temporary data file (under the web daemons user info - apache) in the temp directory, called - for example - bla Its just a file with some text in it. The web app creates the data file, and then I want to move it to another directory, /var/spool/processor/outgoing I'm getting permission denied trying to copy "bla" from the temp directory (/var/tmp) to the above directory. I've set the "outgoing" directory prermissions world writea...

find files in one directory to use to search through another directory
HI, I'm a beginner to intermediate user. Here's what I am trying to do. I have a directory (call it X) with many sub-directories with many .h files. I want to use the names of the .h files in directory X to search through another directory (call it Y) and see where the .h files from X are included in the .h and .cpp files in Y. In directory X I used the following command to find all the .h files: find . -name "*.h" The above command generates a list of all the .h files, but with the path name. I don't want the path names, only the name of the file.h Then I want to do ...

Read strings from one file and search for them in a directory containing htm files
Hi Folks, Trust this message finds you all in great spirits. I have a problem - I have one file where each line is treated as ONE STRING. I need to read each line from this file and search for that line in another directory which contains some 100 .html files. Once I find a matching line that contains that ONE STRING, I need to write that ONE STRING into another file. I need to discard those that are not found in any of those 100 .htm files. So basically my intention is to find out if the strings are used in any of the 100 .htm files that exist in another directory. Am new to awk and tried...

process files one by one in the directory
Hello, I have 4 files named ab12.txt, cd45.txt, ug3h.txt, kf8l.txt in a directory. I have a function like [X,Y] = myfunction (filename, collection). Collection = 2 always. But the filenames changes. I have to use this function using above filenames one by one. How can I just create the variable (filename = 'ab12'), run the function. again create the variable (filename = 'cd45') and run the function and so on until the last 4th file. cheers, Yogesh "yogesh kumkar" <yogeshkumkar@mathworks.com> wrote in message news:j0ju0t$f1s$1@newscl01ah.mathworks....

files, directories, files, directories
Hi folks, I've been trying to make a decision and it's driving me crazy. Is a directory a file or is a directory NOT a file but a node? Should I have A) public interface IFile { IFileName FileName; IContent GetContent(); } public interface IDirectory extends IFile { } or B) public interface INode { INodeName NodeName; } public interface IFile extends INode { IContent GetContent(); } public interface IDirectory extends INode { } Method A is nice cause IFile becomes the base "Node" type and you can use names like "IFileName" which sound...

Delete empty files in the current directory but not the ones in the sub-directories of current directory.
Hi all, I want to delete empty files in the current directory but not the ones in the sub-directories of current directory. Any hints? Regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :. Hongyi Zhao <hongyi.zhao@gmail.com> writes: > I want to delete empty files in the current directory but not the ones > in the sub-directories of current directory. Any hints? As usual, ‘find(1)’ is your friend. Read about the ‘-maxdepth’ and ‘-size’ options. -- \ “People come up to me and say, ‘Emo, do people really come up | `\ ...

Moving all files to one directory
Using Windows Explorer 5.0, I want to MOVE all of the files that are in MANY directories on SEVERAL networked PCs into ONE directory on ONE PC. (I do NOT want to COPY the files because the files will be in the target directory AND in the source directories). Question: Since many of the files have the same file-names but have different contents and/or were created/modified on different dates, how do I set up Windows Explorer 5.0 to prompt me to OK (or not OK) the replacement (overwriting) of the existing file? "Gary" <gcotterl@co.riverside.ca.us> wrote in message news:81c7c...

how do I read all the files in a given directory one by one?
I want to read one file in a directory, then the next file in the same directory, then the next file, until the last file in the directory. I don't know how many files are in the directory. Can anybody teach me how to accomplish this task? Do I have to use a system call dir and store the file names in another file and then access all the files? Or is there a shortcut to do this? Thank you William On Wed, 18 Jun 2008 09:10:38 -0700 (PDT), william <huxiankui@gmail.com> wrote: >I want to read one file in a directory, then the next file in the same >directory, then the next fi...

Billion files in one directory (UNIX) ??
is this possible and managable (applying filteres to file names, etc) in unix OS. * CMOS: > [OFF TOPIC] Please read the FAQ before posting. Don't feel unwelcome. But please read the FAQ before posting, especially about topicality and how to post. -- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? CMOS wrote: > is this possible and managable (applying filteres to file names, etc) > in unix OS. > Try to ask at alt.cookies.yum.yum.yum, probab...

perl + script files from only one directory
Does anyone know if it is possible to configure perl (win) in such a way that all necessary binary and script files can be run from a single directory? I have an EXE+DLL+support files packer that can totally wrap all its files and unpack and run them from a single exe, as long as all needed files are in the same directory... I can accept some real temp files being created in the temp directory, but I need perl to run without having the \in \lib \script etc.etc subdrectories... I 've seen what the PAR module can do, but that just unpacks the total structure with subdirs to a temp area... wh...

multiple files into one file based on unique entry in one of the files
I have 4 flat files where each field is separated by a pipe |. In each file the second field has a unique value that is in all the files. Each line is terminated by a newline. I what to combine each file and create one file. In this one file, there should be one line for each "unique" entry in that was found in the second file. I was able to do this in MS access by creating each file as table and linking the "unique" field in the 2nd file to the other files. I want do this in Tcl, so I can automate the process. Come some please point me in a starting directio...

Copy modified files from one directory to another.
Hi, Two directories with same content exist on the unix box. Some of the content of one directory keeps on changing over time. I have to write a shell script to update the other directory with the changes. Problem is to copy only the changed/modified files to the other directory (maintaining the directory structure). How can this be achieved using simple shell script? Thanks, On 7 Feb 2007 15:16:16 -0800, mizwam <mail.manasi.sharma@gmail.com> wrote: > > > Hi, > > Two directories with same content exist on the unix box. Some of the > content of one directory keeps...

Copy files and folders from one directory into another
I have files in myurl.com/myfolder and I would like to copy all of them into myurl.com/newfolder I'd like to copy all of the subdirectories and files that are in /myfolder and I would like to be able to use a form of some sort to name the file I want to copy, and create the folder I'd like to copy it to, such as newfolder. How can I use a form to create a new folder on the server? How can I use a form to name the old folder I want to copy everything from, then create the name of the new folder I want to copy everything into, and then copy? If anyone can help, I'd be VERY app...

Need to concatenate all files in a dir together into one file and read the first 225 characters from each file into another file.
I am trying to concatenate all files in a directory together into one file and read the first 225 characters of each file into one file as sort of a summary file. This does cat all file together into one bigfile my $directory = "c:\\myfiles"; my $bigfile =c:\\bigfile.txt"; opendir (DIR, $directory) or die $!; @ARGV = readdir(DIR); chdir( $directory ) or die $!; #Need to do this readdir has just filename @ARGV = grep( -f, @ARGV); foreach (@ARGV) { local $/ ; open OUT,">$bigfile" or die $!; while( <> ){ print OUT $_, "\n"; ...

Web resources about - Script to move files with one occurence of a string to one directory, and other files to another directory? - comp.unix.shell

Wikipedia:Quick directory - Wikipedia, the free encyclopedia
This page is a handy directory to various locations of interest in Wikipedia. Only dynamic pages should be listed here, no policy pages etc. ...

Telephone directory - Wikipedia, the free encyclopedia
"Phone book" and "White pages" redirect here. For a contact list, see Contact list . For other uses, see White pages (disambiguation) . Subscriber ...

Business Directory Listings: Eight Smart Tips for You to Get Started
If you are considering a business directory listing for your small business, congratulations – you are on the right track. Local SEO, local customers, ...

LinkedIn Lookup iOS app aims to replace your company’s awful intranet employee directory
... continue reading at 9to5Mac . What do you think? Discuss "LinkedIn Lookup iOS app aims to replace your company’s awful intranet employee directory" ...

Atlas Partner Directory Launched
... just for Facebook and Instagram –advertising platform Atlas has one, as well. On that note, Atlas Tuesday introduced the Atlas Partner Directory ...

Slack announces App Directory store
... create more apps for the store, and a new framework called Botkit which should simplify the process of creating apps. The Slack App Directory, ...

The British Library’s ‘Save Our Sounds’ Program Creates Directory of UK Sound Collections
The long-feared deterioration of the world's sound collections is a coming to a head with the British Library's last-ditch initiative called ...

First look: Microsoft Azure Active Directory Domain Services puts it all in the cloud
On Oct. 14, Microsoft announced the preview release of Azure Active Directory Domain Services or, as I like to call it, a domain in a cloud. ...

Security Recruiter Directory
To find the right security job or hire the right candidate, you first need to find the right recruiter. CSO's security recruiter directory is ...

Slack's App Directory proves it's more than just a chatroom
... are about to get a little easier to find and integrate within your team and over 160 of them live within the company's new Slack App Directory. ...

Resources last updated: 1/27/2016 2:15:09 AM