Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.
thanx in advance.....
|
|
0
|
|
|
|
Reply
|
junky_fellow (377)
|
2/12/2004 11:41:33 AM |
|
"junky_fellow" <junky_fellow@yahoo.co.in> wrote in message
news:8c7d4a6e.0402120341.1cc557eb@posting.google.com...
> Can anyone suggest some efficient way to search a substring in
> a text file. For eg. suppose i want to search the pattern "abc"
> in a text file, then which algorithm should i use.
> I just want some hints, not the complete program.
>
> thanx in advance.....
It is a fact that you must check all characters (except the last two). The
simplest method would be to search for 'a' and if you find it, test for 'b'
then 'c'.
The fastest method in this case (byte data[4] = a,b,c) would be to cast data
to an int (if you work with a 32-bit processor) then "xor" it with every
third byte of the data string, if the result is 0 you have found a sub
string. This can be optimized in assembler as well, and I think this is the
fastest method if combined with a suitable buffering method.
|
|
0
|
|
|
|
Reply
|
martinfj (21)
|
2/12/2004 12:07:42 PM
|
|
junky_fellow wrote:
>
> Can anyone suggest some efficient way to search a substring in
> a text file. For eg. suppose i want to search the pattern "abc"
> in a text file, then which algorithm should i use.
> I just want some hints, not the complete program.
Use srstr(), from <string.h>.
--
pete
|
|
0
|
|
|
|
Reply
|
pfiland (6613)
|
2/12/2004 12:15:15 PM
|
|
On Thu, 12 Feb 2004, junky_fellow wrote:
> Can anyone suggest some efficient way to search a substring in
> a text file. For eg. suppose i want to search the pattern "abc"
> in a text file, then which algorithm should i use.
> I just want some hints, not the complete program.
You have to define efficient. If primary memory is a concern then
efficient would be reading one character at a time and seeing if it
matches the first letter of the string. If it does then compare the
second, third, etc. characters of the string.
If efficient is as fast as possible without concern for primary memory
then determine file size, allocate a block of memory the size of the file,
read the entire contents into memory then search the memory for the string
(maybe using strstr() or just using the search for the first letter, if
you find it search the the entire string).
Or you could try a combination. Allocate a large chunk of memory, read in
part of the file and search the memory for the string you are looking for.
Only danger here is if the last letter of block #1 is 'a' and the first
letters of block #2 are "bc". You'll never to figure out how to handle
that situations.
--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vice.president@whitehouse.gov
|
|
0
|
|
|
|
Reply
|
darrell13 (357)
|
2/12/2004 1:59:24 PM
|
|
Martin Johansen wrote:
>
> "junky_fellow" <junky_fellow@yahoo.co.in> wrote in message
> news:8c7d4a6e.0402120341.1cc557eb@posting.google.com...
> > Can anyone suggest some efficient way to search a substring in
> > a text file. For eg. suppose i want to search the pattern "abc"
> > in a text file, then which algorithm should i use.
> > I just want some hints, not the complete program.
> >
> > thanx in advance.....
>
> It is a fact that you must check all characters
> (except the last two). The simplest method would be to
> search for 'a' and if you find it, test for 'b' then 'c'.
>
> The fastest method in this case (byte data[4] = a,b,c)
> would be to cast data to an int
> (if you work with a 32-bit processor) then "xor" it with every
xor "what" with every third byte ?
> third byte of the data string,
> if the result is 0 you have found a sub string.
Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.
--
pete
|
|
0
|
|
|
|
Reply
|
pfiland (6613)
|
2/12/2004 2:04:16 PM
|
|
"pete" <pfiland@mindspring.com> wrote in message
news:402B87E2.525D@mindspring.com...
> Martin Johansen wrote:
> >
> > "junky_fellow" <junky_fellow@yahoo.co.in> wrote in message
> > news:8c7d4a6e.0402120341.1cc557eb@posting.google.com...
> > > Can anyone suggest some efficient way to search a substring in
> > > a text file. For eg. suppose i want to search the pattern "abc"
> > > in a text file, then which algorithm should i use.
> > > I just want some hints, not the complete program.
> > >
> > > thanx in advance.....
> >
> > It is a fact that you must check all characters
> > (except the last two). The simplest method would be to
> > search for 'a' and if you find it, test for 'b' then 'c'.
> >
> > The fastest method in this case (byte data[4] = a,b,c)
> > would be to cast data to an int
> > (if you work with a 32-bit processor) then "xor" it with every
>
> xor "what" with every third byte ?
Oh, sorry, I ment xor every third byte with data[3] casted to int, three
byte from a file could be read into an array in the same manner as a,b,c in
data[3].
> third byte of the data string,
> > if the result is 0 you have found a sub string.
>
> Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
> ignoring alignment problems,
> I have no idea of what you think you're saying.
>
> --
> pete
|
|
0
|
|
|
|
Reply
|
martinfj (21)
|
2/12/2004 2:46:43 PM
|
|
Martin Johansen wrote:
>
> "pete" <pfiland@mindspring.com> wrote in message
> news:402B87E2.525D@mindspring.com...
> > Martin Johansen wrote:
> > >
> > > "junky_fellow" <junky_fellow@yahoo.co.in> wrote in message
> > > news:8c7d4a6e.0402120341.1cc557eb@posting.google.com...
> > > > Can anyone suggest some efficient way to search a substring in
> > > > a text file. For eg. suppose i want to search the pattern "abc"
> > > > in a text file, then which algorithm should i use.
> > > > I just want some hints, not the complete program.
> > > >
> > > > thanx in advance.....
> > >
> > > It is a fact that you must check all characters
> > > (except the last two). The simplest method would be to
> > > search for 'a' and if you find it, test for 'b' then 'c'.
> > >
> > > The fastest method in this case (byte data[4] = a,b,c)
> > > would be to cast data to an int
> > > (if you work with a 32-bit processor) then "xor" it with every
> >
> > xor "what" with every third byte ?
>
> Oh, sorry, I ment xor every third byte with data[3]
> casted to int, three byte from a file could be read
> into an array in the same manner as a,b,c in data[3].
I think you mean data[2], if you mean 'c'.
> > > third byte of the data string,
> > > if the result is 0 you have found a sub string.
I don't understand why you prefer to check the result of an xor,
instead of checking equality directly.
Your described procedure,
only checks for one letter, not a substring.
> > Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
> > ignoring alignment problems,
> > I have no idea of what you think you're saying.
--
pete
|
|
0
|
|
|
|
Reply
|
pfiland (6613)
|
2/12/2004 5:03:08 PM
|
|
Yes, I was a bit sloppy, simple character comparizon is the fastest.
|
|
0
|
|
|
|
Reply
|
martinfj (21)
|
2/12/2004 5:45:36 PM
|
|
|
7 Replies
25 Views
(page loaded in 0.129 seconds)
|