|
|
substring finding problem!
Hi group!
Reading all posts about Spinozas efforts to create string substitute
program, i wanted to code mine too and specs was that not to use the
<string.h> library. But already problem in finding all places where
substring occurs in a string. i'm looking for long time but not able to see
where error is. Any help on where i made mistake is appreciated. TIA:)
Code is :-
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
unsigned strLength(char *s)
{
unsigned idx;
for (idx = 0; s[idx] != '\0'; idx++)
;
return idx;
}
char *strSubstr(char *s, char *t, unsigned ls, unsigned lt)
{
char *substr = 0;
unsigned end = ls - lt, i, j;
for (i = 0; i <= end; i++) {
if (s[i] == t[0] && s[i + (lt-1)] == t[lt-1]) {
for (j = 1; j < lt && s[i + j] == t[j]; j++)
;
if (j == lt) {
substr = s + i;
i = end;
}
}
}
return substr;
}
unsigned findSubstr(char *s, char *t, unsigned ls, unsigned lt, char ***sp)
{
unsigned n, m, lu;
char *u;
for (n = 0, u = s, lu = ls;
lu >= lt && (u = strSubstr(u, t, lu, lt));
n++, u += lt, lu = ((s+ls) - u))
;
if (sp && (*sp = malloc(n * sizeof **sp))) {
for (m = 0, u = s, lu = ls; m < n; m++, u += lt, lu = ((s+ls) - u))
sp[0][m] = strSubstr(u, t, lu, lt);
}
return n;
}
int main()
{
char **p;
unsigned found, i;
printf("heee e\n");
found = findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
printf("%u times\n", found);
for (i = 0; i < found; i++)
printf("\t%p, %c\n", (void*)p[i], p[i][0]);
printf("hee e\n");
found = findSubstr("hee", "e", strlen("hee"), strlen("e"), &p);
printf("%u times\n", found);
for (i = 0; i < found; i++)
printf("\t%p, %c\n", (void*)p[i], p[i][0]);
printf("hhhh h\n");
found = findSubstr("hhhh", "h", strlen("hhhh"), strlen("h"), &p);
printf("%u times\n", found);
for (i = 0; i < found; i++)
printf("\t%p, %c\n", (void*)p[i], p[i][0]);
}
Output :-
heee e
3 times
0x400a66, e
0x400a66, e
0x400a67, e
hee e
2 times
0x400a84, e
0x400a84, e
hhhh h
4 times
0x400a90, h
0x400a91, h
0x400a92, h
0x400a93, h
It is correct when substr starts as first character of string, but if not
then always it is repeated twice...
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/14/2010 8:18:57 AM |
|
fedora wrote:
> Hi group!
>
> Reading all posts about Spinozas efforts to create string substitute
> program, i wanted to code mine too and specs was that not to use the
> <string.h> library. But already problem in finding all places where
> substring occurs in a string. i'm looking for long time but not able to see
> where error is. Any help on where i made mistake is appreciated. TIA:)
>
> Code is :-
>
> #include<stdlib.h>
> #include<stdio.h>
> #include<string.h>
Better:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
>
> unsigned strLength(char *s)
> {
> unsigned idx;
> for (idx = 0; s[idx] != '\0'; idx++)
> ;
> return idx;
> }
To find the length of a string, use strlen unless you have a compelling
reason not to. There is no evidence above of a compelling reason not to
use strlen.
> char *strSubstr(char *s, char *t, unsigned ls, unsigned lt)
> {
> char *substr = 0;
> unsigned end = ls - lt, i, j;
>
> for (i = 0; i <= end; i++) {
> if (s[i] == t[0] && s[i + (lt-1)] == t[lt-1]) {
> for (j = 1; j < lt && s[i + j] == t[j]; j++)
> ;
> if (j == lt) {
> substr = s + i;
> i = end;
What are you trying to do in this function?
> unsigned findSubstr(char *s, char *t, unsigned ls, unsigned lt, char ***sp)
> {
> unsigned n, m, lu;
> char *u;
>
> for (n = 0, u = s, lu = ls;
> lu >= lt && (u = strSubstr(u, t, lu, lt));
> n++, u += lt, lu = ((s+ls) - u))
> ;
> if (sp && (*sp = malloc(n * sizeof **sp))) {
> for (m = 0, u = s, lu = ls; m < n; m++, u += lt, lu = ((s+ls) - u))
> sp[0][m] = strSubstr(u, t, lu, lt);
> }
> return n;
> }
To find a substring, use strstr unless you have a compelling reason not
to. There is no evidence above of a compelling reason not to use strstr.
If you want help debugging your code, your first step is to choose more
meaningful names for your objects, so that we can more easily see what
you think you're doing.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/14/2010 8:27:36 AM
|
|
Richard Heathfield wrote:
> fedora wrote:
>> Hi group!
>>
>> Reading all posts about Spinozas efforts to create string substitute
>> program, i wanted to code mine too and specs was that not to use the
>> <string.h> library. But already problem in finding all places where
>> substring occurs in a string. i'm looking for long time but not able to
>> see where error is. Any help on where i made mistake is appreciated.
>> TIA:)
>>
>> Code is :-
>>
>> #include<stdlib.h>
>> #include<stdio.h>
>> #include<string.h>
>
> Better:
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
>
>>
>> unsigned strLength(char *s)
>> {
>> unsigned idx;
>> for (idx = 0; s[idx] != '\0'; idx++)
>> ;
>> return idx;
>> }
>
> To find the length of a string, use strlen unless you have a compelling
> reason not to. There is no evidence above of a compelling reason not to
> use strlen.
Not using string.h was the rule i thought.
>> char *strSubstr(char *s, char *t, unsigned ls, unsigned lt)
>> {
>> char *substr = 0;
>> unsigned end = ls - lt, i, j;
>>
>> for (i = 0; i <= end; i++) {
>> if (s[i] == t[0] && s[i + (lt-1)] == t[lt-1]) {
>> for (j = 1; j < lt && s[i + j] == t[j]; j++)
>> ;
>> if (j == lt) {
>> substr = s + i;
>> i = end;
>
> What are you trying to do in this function?
It returns same value as strstr in string.h. It gives pointer to 1st found
occurance of string t in string s or null ptr other wise. ls is length of s
and lt is length of t, same value that strlen gives.
>> unsigned findSubstr(char *s, char *t, unsigned ls, unsigned lt, char
>> ***sp)
>> {
>> unsigned n, m, lu;
>> char *u;
>>
>> for (n = 0, u = s, lu = ls;
>> lu >= lt && (u = strSubstr(u, t, lu, lt));
>> n++, u += lt, lu = ((s+ls) - u))
>> ;
>> if (sp && (*sp = malloc(n * sizeof **sp))) {
>> for (m = 0, u = s, lu = ls; m < n; m++, u += lt, lu = ((s+ls) - u))
>> sp[0][m] = strSubstr(u, t, lu, lt);
>> }
>> return n;
>> }
>
>
> To find a substring, use strstr unless you have a compelling reason not
> to. There is no evidence above of a compelling reason not to use strstr.
>
> If you want help debugging your code, your first step is to choose more
> meaningful names for your objects, so that we can more easily see what
> you think you're doing.
findSubstr returns the no. of times t occurs in s (not over lapping) and if
sp is not null ptr, it sets *sp to point to list of pointers that point to
each start of t in s. i got this method from some one else in another
thread.. I think Ben Bacarrisse.
Thanks for your comments! I'll post another version using strlen and better
var names shotly.
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/14/2010 9:11:35 AM
|
|
fedora wrote:
> Richard Heathfield wrote:
>
<snip>
>> To find the length of a string, use strlen unless you have a compelling
>> reason not to. There is no evidence above of a compelling reason not to
>> use strlen.
>
> Not using string.h was the rule i thought.
It's a silly rule, best ignored.
<snip>
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/14/2010 9:18:44 AM
|
|
Posting program again with longer var names. i prefer short names since long
names run out of 80x25 screen. Also now using strlen from string.h.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char *strSubstr(char *str, char *subStr, unsigned lstr, unsigned lsubStr)
{
char *substr = 0;
unsigned lastIdx = lstr - lsubStr, firstIdx, subStrIdx;
// locate first char of subStr in str
for (firstIdx = 0; firstIdx <= lastIdx; firstIdx++) {
if (str[firstIdx] == subStr[0] &&
str[firstIdx + (lsubStr-1)] == subStr[lsubStr-1]) {
// check if complete subStr occurs at this pos in str
for (subStrIdx = 1; subStrIdx < lsubStr &&
str[firstIdx + subStrIdx] == subStr[subStrIdx]; subStrIdx++)
;
if (subStrIdx == lsubStr) {
// subStr found, so return its start and break frm loop
substr = str + firstIdx;
firstIdx = lastIdx;
}
}
}
return substr;
}
unsigned findSubstr(
char *str,
char *subStr,
unsigned lstr,
unsigned lsubStr,
char ***sp)
{
unsigned found, ctr, lu;
char *u;
// find how many times subStr is in str
for (found = 0, u = str, lu = lstr;
lu >= lsubStr && (u = strSubstr(u, subStr, lu, lsubStr));
found++, u += lsubStr, lu = ((str + lstr) - u))
;
// alloc space and copy the start of all subStr in str
if (sp && (*sp = malloc(found * sizeof **sp))) {
for (ctr = 0, u = str, lu = lstr;
ctr < found;
ctr++, u += lsubStr, lu = ((str + lstr) - u))
sp[0][ctr] = strSubstr(u, subStr, lu, lsubStr);
}
return found;
}
int main()
{
char **p;
unsigned found, i;
printf("heee e\n");
found = findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
printf("%u times\n", found);
for (i = 0; i < found; i++)
printf("\t%p, %c\n", (void*)p[i], p[i][0]);
printf("hee e\n");
found = findSubstr("hee", "e", strlen("hee"), strlen("e"), &p);
printf("%u times\n", found);
for (i = 0; i < found; i++)
printf("\t%p, %c\n", (void*)p[i], p[i][0]);
printf("hhhh h\n");
found = findSubstr("hhhh", "h", strlen("hhhh"), strlen("h"), &p);
printf("%u times\n", found);
for (i = 0; i < found; i++)
printf("\t%p, %c\n", (void*)p[i], p[i][0]);
}
Output is :-
heee e
3 times
0x400a46, e
0x400a46, e
0x400a47, e
hee e
2 times
0x400a64, e
0x400a64, e
hhhh h
4 times
0x400a70, h
0x400a71, h
0x400a72, h
0x400a73, h
So if start of substring is not at first position in string, then always its
repeated twice in output. And i cant see where the error is for this. Any
help is appreciatad. TIA:)
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/14/2010 9:40:35 AM
|
|
On Feb 14, 4:18=A0pm, fedora <no_m...@invalid.invalid> wrote:
> Hi group!
>
> Reading all posts about Spinozas efforts to create string substitute
It is not an effort. I produced one in two hours and worked
collaboratively with a couple of posters other than the regs to find
the bugs in a few more hours. It now works, as far as I know, and I am
adding a stress test and further improvements.
It is being deliberately renarrated as an "effort" by the regs owing
to their inability to solve the problem.
> program, i wanted to code mine too and specs was that not to use the
> <string.h> library. But already problem in finding all places where
> substring occurs in a string. i'm looking for long time but not able to s=
ee
> where error is. Any help on where i made mistake is appreciated. TIA:)
>
> Code is :-
>
> #include<stdlib.h>
> #include<stdio.h>
> #include<string.h>
>
> unsigned strLength(char *s)
> {
> =A0 unsigned idx;
> =A0 for (idx =3D 0; s[idx] !=3D '\0'; idx++)
> =A0 =A0 ;
> =A0 return idx;
>
> }
>
> char *strSubstr(char *s, char *t, unsigned ls, unsigned lt)
> {
> =A0 char *substr =3D 0;
> =A0 unsigned end =3D ls - lt, i, j;
>
> =A0 for (i =3D 0; i <=3D end; i++) {
> =A0 =A0 if (s[i] =3D=3D t[0] && s[i + (lt-1)] =3D=3D t[lt-1]) {
> =A0 =A0 =A0 for (j =3D 1; j < lt && s[i + j] =3D=3D t[j]; j++)
> =A0 =A0 =A0 =A0 ;
> =A0 =A0 =A0 if (j =3D=3D lt) {
> =A0 =A0 =A0 =A0 substr =3D s + i;
> =A0 =A0 =A0 =A0 i =3D end;
> =A0 =A0 =A0 }
> =A0 =A0 }
> =A0 }
> =A0 return substr;
>
> }
>
> unsigned findSubstr(char *s, char *t, unsigned ls, unsigned lt, char ***s=
p)
> {
> =A0 unsigned n, m, lu;
> =A0 char *u;
>
> =A0 for (n =3D 0, u =3D s, lu =3D ls;
> =A0 =A0 =A0 =A0lu >=3D lt && (u =3D strSubstr(u, t, lu, lt));
> =A0 =A0 =A0 =A0n++, u +=3D lt, lu =3D ((s+ls) - u))
> =A0 =A0 ;
> =A0 if (sp && (*sp =3D malloc(n * sizeof **sp))) {
> =A0 =A0 for (m =3D 0, u =3D s, lu =3D ls; m < n; m++, u +=3D lt, lu =3D (=
(s+ls) - u))
> =A0 =A0 =A0 sp[0][m] =3D strSubstr(u, t, lu, lt);
> =A0 }
> =A0 return n;
>
> }
>
> int main()
> {
> =A0 char **p;
> =A0 unsigned found, i;
>
> =A0 printf("heee e\n");
> =A0 found =3D findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
> =A0 printf("%u times\n", found);
> =A0 for (i =3D 0; i < found; i++)
> =A0 =A0 printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>
> =A0 printf("hee e\n");
> =A0 found =3D findSubstr("hee", "e", strlen("hee"), strlen("e"), &p);
> =A0 printf("%u times\n", found);
> =A0 for (i =3D 0; i < found; i++)
> =A0 =A0 printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>
> =A0 printf("hhhh h\n");
> =A0 found =3D findSubstr("hhhh", "h", strlen("hhhh"), strlen("h"), &p);
> =A0 printf("%u times\n", found);
> =A0 for (i =3D 0; i < found; i++)
> =A0 =A0 printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>
> }
>
> Output :-
>
> heee e
> 3 times
> =A0 =A0 =A0 =A0 0x400a66, e
> =A0 =A0 =A0 =A0 0x400a66, e
> =A0 =A0 =A0 =A0 0x400a67, e
> hee e
> 2 times
> =A0 =A0 =A0 =A0 0x400a84, e
> =A0 =A0 =A0 =A0 0x400a84, e
> hhhh h
> 4 times
> =A0 =A0 =A0 =A0 0x400a90, h
> =A0 =A0 =A0 =A0 0x400a91, h
> =A0 =A0 =A0 =A0 0x400a92, h
> =A0 =A0 =A0 =A0 0x400a93, h
>
> It is correct when substr starts as first character of string, but if not
> then always it is repeated twice...
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/14/2010 10:43:49 AM
|
|
fedora <no_mail@invalid.invalid> writes:
> Posting program again with longer var names. i prefer short names since long
> names run out of 80x25 screen.
Long is not the same as good! Choosing good names is very hard and to
a large extent it is not culturally neutral. For example, when
searching for a sub-string, I'd call the first location "anchor"
because I am used to the term "anchored search".
> Also now using strlen from string.h.
That's a good idea, but there is not reason to abandon the other
string functions. If you want an exercise, then you could try to
write a fast search and replace. That will, most likely, lead to you
looking for an alternative to using strstr rather than simply
re-writing it (your strSubstr is very similar to strstr) but you will
learn practical things along the way, such as how to find where you
code is spending time.
Note, there will never be a "fastest" version because what is fast
will depend on all sorts of variables such as the quality of your C
implementation and the kind of search and replace calls you do. For
example, my simple version is still the fastest I can write for very
long strings with only a few replacements because the strstr in glibc
is very good at searching long strings.
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
>
> char *strSubstr(char *str, char *subStr, unsigned lstr, unsigned lsubStr)
> {
> char *substr = 0;
> unsigned lastIdx = lstr - lsubStr, firstIdx, subStrIdx;
You should use size_t for sizes like this -- it is the type returned
by strlen, but because it is an unsigned type you will need to think
about your method. You can't subtract the lengths like you do.
> // locate first char of subStr in str
> for (firstIdx = 0; firstIdx <= lastIdx; firstIdx++) {
> if (str[firstIdx] == subStr[0] &&
> str[firstIdx + (lsubStr-1)] == subStr[lsubStr-1]) {
I am not sure it is worth doing this second test.
> // check if complete subStr occurs at this pos in str
> for (subStrIdx = 1; subStrIdx < lsubStr &&
> str[firstIdx + subStrIdx] == subStr[subStrIdx]; subStrIdx++)
> ;
> if (subStrIdx == lsubStr) {
> // subStr found, so return its start and break frm loop
> substr = str + firstIdx;
> firstIdx = lastIdx;
There is a statement to do that: break. Altering the loop variable to
break out of the loop makes the code harder to change.
> }
> }
> }
> return substr;
> }
>
> unsigned findSubstr(
> char *str,
> char *subStr,
> unsigned lstr,
> unsigned lsubStr,
> char ***sp)
> {
> unsigned found, ctr, lu;
> char *u;
>
> // find how many times subStr is in str
> for (found = 0, u = str, lu = lstr;
> lu >= lsubStr && (u = strSubstr(u, subStr, lu, lsubStr));
> found++, u += lsubStr, lu = ((str + lstr) - u))
> ;
You have picked up an odd style. Using lots of variables in a for
loop is not very clear. How about:
size_t matches = 0, remaining_len = lstr;
char *u = str;
while (u = strSubstr(u, subStr, remaining_len, lsubStr)) {
matches++;
u += lsubStr;
remaining_len = str + lstr - u;
}
This is untested (and almost certainly wrong) but it shows another way
to write such loops.
Note that I've removed the lu >= lsubStr test. In effect this is the
first thing that strSubstr tests so there is no obvious need to
repeat it here.
>
> // alloc space and copy the start of all subStr in str
> if (sp && (*sp = malloc(found * sizeof **sp))) {
> for (ctr = 0, u = str, lu = lstr;
> ctr < found;
> ctr++, u += lsubStr, lu = ((str + lstr) - u))
> sp[0][ctr] = strSubstr(u, subStr, lu, lsubStr);
I'd rewrite that rather packing all the work into the for loop
controls. Loops are clearer when you can see what is being done in
the loop.
> }
> return found;
> }
>
> int main()
> {
> char **p;
> unsigned found, i;
>
> printf("heee e\n");
> found = findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
> printf("%u times\n", found);
> for (i = 0; i < found; i++)
> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>
> printf("hee e\n");
> found = findSubstr("hee", "e", strlen("hee"), strlen("e"), &p);
> printf("%u times\n", found);
> for (i = 0; i < found; i++)
> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>
> printf("hhhh h\n");
> found = findSubstr("hhhh", "h", strlen("hhhh"), strlen("h"), &p);
> printf("%u times\n", found);
> for (i = 0; i < found; i++)
> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>
>
> }
A general point. Lots of people seem to pack tests into main. I
never do. I make main use its arguments so I have a general-purpose
test program. The tests I want to run often can them be put into
script but I can quickly test new cases simply by typing a command.
<snip>
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/14/2010 12:39:23 PM
|
|
Ben Bacarisse wrote:
> fedora <no_mail@invalid.invalid> writes:
>
>> Posting program again with longer var names. i prefer short names since
>> long names run out of 80x25 screen.
>
> Long is not the same as good! Choosing good names is very hard and to
> a large extent it is not culturally neutral. For example, when
> searching for a sub-string, I'd call the first location "anchor"
> because I am used to the term "anchored search".
Good point:) i know my longer names are terrible but i wanted to post the
code quickly. I think the code is readable with shorter names but giving
names with good meaning is very hard!
>> Also now using strlen from string.h.
>
> That's a good idea, but there is not reason to abandon the other
> string functions. If you want an exercise, then you could try to
> write a fast search and replace. That will, most likely, lead to you
> looking for an alternative to using strstr rather than simply
> re-writing it (your strSubstr is very similar to strstr) but you will
> learn practical things along the way, such as how to find where you
> code is spending time.
>
> Note, there will never be a "fastest" version because what is fast
> will depend on all sorts of variables such as the quality of your C
> implementation and the kind of search and replace calls you do. For
> example, my simple version is still the fastest I can write for very
> long strings with only a few replacements because the strstr in glibc
> is very good at searching long strings.
Yes. my aim was to write a prog for the Spinoza contest ongoing but am still
stuck with the same error.
>> #include <stdlib.h>
>> #include <stdio.h>
>> #include <string.h>
>>
>> char *strSubstr(char *str, char *subStr, unsigned lstr, unsigned lsubStr)
>> {
>> char *substr = 0;
>> unsigned lastIdx = lstr - lsubStr, firstIdx, subStrIdx;
>
> You should use size_t for sizes like this -- it is the type returned
> by strlen, but because it is an unsigned type you will need to think
> about your method. You can't subtract the lengths like you do.
Can you explain in which case (for length of str and subStr) the subtraction
will be wrong? i considered both strings of same length but cant see any
error.
lsubStr has to be <= lstr. i just assume that condition from calling code!
>> // locate first char of subStr in str
>> for (firstIdx = 0; firstIdx <= lastIdx; firstIdx++) {
>> if (str[firstIdx] == subStr[0] &&
>> str[firstIdx + (lsubStr-1)] == subStr[lsubStr-1]) {
>
> I am not sure it is worth doing this second test.
I just thought if last position also matched then chance that it is the sub-
striong is much higher, so testing for both 1st & last positions would
decrease the no. of times the inner loop has to run...
>> // check if complete subStr occurs at this pos in str
>> for (subStrIdx = 1; subStrIdx < lsubStr &&
>> str[firstIdx + subStrIdx] == subStr[subStrIdx]; subStrIdx++)
>> ;
>> if (subStrIdx == lsubStr) {
>> // subStr found, so return its start and break frm loop
>> substr = str + firstIdx;
>> firstIdx = lastIdx;
>
> There is a statement to do that: break. Altering the loop variable to
> break out of the loop makes the code harder to change.
i read somewhere that break and continue are almost as bad as go to and
loops should terminate by their test expressions for readable elegant code.
>
>> }
>> }
>> }
>> return substr;
>> }
>>
>> unsigned findSubstr(
>> char *str,
>> char *subStr,
>> unsigned lstr,
>> unsigned lsubStr,
>> char ***sp)
>> {
>> unsigned found, ctr, lu;
>> char *u;
>>
>> // find how many times subStr is in str
>> for (found = 0, u = str, lu = lstr;
>> lu >= lsubStr && (u = strSubstr(u, subStr, lu, lsubStr));
>> found++, u += lsubStr, lu = ((str + lstr) - u))
>> ;
>
> You have picked up an odd style. Using lots of variables in a for
> loop is not very clear. How about:
>
> size_t matches = 0, remaining_len = lstr;
> char *u = str;
> while (u = strSubstr(u, subStr, remaining_len, lsubStr)) {
> matches++;
> u += lsubStr;
> remaining_len = str + lstr - u;
> }
>
> This is untested (and almost certainly wrong) but it shows another way
> to write such loops.
Ok! it's just so hard to be sure that all these lenghts and offsets will
always behave right for all cases of string and substrings! i thought
through my loops for cases of equal lengths etc, and i just cant spot where
mistake is...
> Note that I've removed the lu >= lsubStr test. In effect this is the
> first thing that strSubstr tests so there is no obvious need to
> repeat it here.
hmm okay. any minute change and i'm not sure where all it'd affect the code.
pointers+strings is tricky!
>> // alloc space and copy the start of all subStr in str
>> if (sp && (*sp = malloc(found * sizeof **sp))) {
>> for (ctr = 0, u = str, lu = lstr;
>> ctr < found;
>> ctr++, u += lsubStr, lu = ((str + lstr) - u))
>> sp[0][ctr] = strSubstr(u, subStr, lu, lsubStr);
>
> I'd rewrite that rather packing all the work into the for loop
> controls. Loops are clearer when you can see what is being done in
> the loop.
>
>> }
>> return found;
>> }
>>
>> int main()
>> {
>> char **p;
>> unsigned found, i;
>>
>> printf("heee e\n");
>> found = findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
>> printf("%u times\n", found);
>> for (i = 0; i < found; i++)
>> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>>
>> printf("hee e\n");
>> found = findSubstr("hee", "e", strlen("hee"), strlen("e"), &p);
>> printf("%u times\n", found);
>> for (i = 0; i < found; i++)
>> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>>
>> printf("hhhh h\n");
>> found = findSubstr("hhhh", "h", strlen("hhhh"), strlen("h"), &p);
>> printf("%u times\n", found);
>> for (i = 0; i < found; i++)
>> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>>
>>
>> }
>
> A general point. Lots of people seem to pack tests into main. I
> never do. I make main use its arguments so I have a general-purpose
> test program. The tests I want to run often can them be put into
> script but I can quickly test new cases simply by typing a command.
>
> <snip>
Yeah, i wrote a version where main will accept strings from user in a loop
and feed them to findSubstr and return the results and keep looping till
CTRL-C but that version is sometimes giving segfault and sometimes working
okay for same string pairs!!
right now i cant understand my own code. now i want to rewrite everything
but in very simple statements and loops so i can figure out where i'm going
wrong.
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/14/2010 1:04:37 PM
|
|
fedora wrote:
> Richard Heathfield wrote:
>>What are you trying to do in this function?
>
>
> It returns same value as strstr in string.h.
If you write
str_len, str_chr, and str_ncmp
first, then
str_str
is pretty simple to write without using string.h.
http://www.mindspring.com/~pfilandr/C/library/str_ing.c
--
pete
|
|
0
|
|
|
|
Reply
|
pfiland (6614)
|
2/14/2010 1:31:36 PM
|
|
In article <hl8gek$ev1$1@news.eternal-september.org>,
fedora <no_mail@invalid.invalid> wrote:
>unsigned findSubstr(
> char *str,
> char *subStr,
> unsigned lstr,
> unsigned lsubStr,
> char ***sp)
>{
> unsigned found, ctr, lu;
> char *u;
>
> // find how many times subStr is in str
> for (found = 0, u = str, lu = lstr;
> lu >= lsubStr && (u = strSubstr(u, subStr, lu, lsubStr));
> found++, u += lsubStr, lu = ((str + lstr) - u))
> ;
>
> // alloc space and copy the start of all subStr in str
> if (sp && (*sp = malloc(found * sizeof **sp))) {
> for (ctr = 0, u = str, lu = lstr;
> ctr < found;
> ctr++, u += lsubStr, lu = ((str + lstr) - u))
> sp[0][ctr] = strSubstr(u, subStr, lu, lsubStr);
Here is your bug; you want
sp[0][ctr] = u = strSubstr(u, subStr, lu, lsubStr);
> }
> return found;
>}
|
|
0
|
|
|
|
Reply
|
ike5 (222)
|
2/14/2010 2:24:33 PM
|
|
fedora <no_mail@invalid.invalid> writes:
> Ben Bacarisse wrote:
>
>> fedora <no_mail@invalid.invalid> writes:
<snip>
>>> Also now using strlen from string.h.
>>
>> That's a good idea, but there is not reason to abandon the other
>> string functions. If you want an exercise, then you could try to
>> write a fast search and replace. That will, most likely, lead to you
>> looking for an alternative to using strstr rather than simply
>> re-writing it (your strSubstr is very similar to strstr) but you will
>> learn practical things along the way, such as how to find where you
>> code is spending time.
>>
>> Note, there will never be a "fastest" version because what is fast
>> will depend on all sorts of variables such as the quality of your C
>> implementation and the kind of search and replace calls you do. For
>> example, my simple version is still the fastest I can write for very
>> long strings with only a few replacements because the strstr in glibc
>> is very good at searching long strings.
>
> Yes. my aim was to write a prog for the Spinoza contest ongoing but am still
> stuck with the same error.
Sorry, I missed you have an error you were stuck on. The problem is
that the first and second loops in findSubstr don't do the same
thing. The first correctly advances u by the match length *from that
last match*. The second loop advances u by only the match length
(from it's last value).
The fix (using your style):
if (sp && (*sp = malloc(found * sizeof **sp))) {
for (ctr = 0, u = str, lu = lstr;
ctr < found;
ctr++, lu = ((str + lstr) - u)) {
sp[0][ctr] = strSubstr(u, subStr, lu, lsubStr);
u = sp[0][ctr] + lsubStr;
}
}
This could be written much more clearly. I don't want to bang on
about the same thing all the time, but bugs are the compiler's way of
telling you that you code need to be clearer!
>>> #include <stdlib.h>
>>> #include <stdio.h>
>>> #include <string.h>
>>>
>>> char *strSubstr(char *str, char *subStr, unsigned lstr, unsigned lsubStr)
>>> {
>>> char *substr = 0;
>>> unsigned lastIdx = lstr - lsubStr, firstIdx, subStrIdx;
>>
>> You should use size_t for sizes like this -- it is the type returned
>> by strlen, but because it is an unsigned type you will need to think
>> about your method. You can't subtract the lengths like you do.
>
> Can you explain in which case (for length of str and subStr) the subtraction
> will be wrong? i considered both strings of same length but cant see any
> error.
>
> lsubStr has to be <= lstr. i just assume that condition from calling
> code!
Then there is no error! I think, though, that this is a rather string
condition to place on the calling code. That the pointers are not
NULL seems a reasonable condition; even that the match string is not
zero length; but that you can't search for "ab" in "a" seems rather
too restrictive.
BTW, you can document these "caller contract" restrictions in the code
by including assert calls:
assert(str && subStr && lsubStr <= lstr && lsubStr > 0);
(#include <assert.h> at the top).
>>> // locate first char of subStr in str
>>> for (firstIdx = 0; firstIdx <= lastIdx; firstIdx++) {
>>> if (str[firstIdx] == subStr[0] &&
>>> str[firstIdx + (lsubStr-1)] == subStr[lsubStr-1]) {
>>
>> I am not sure it is worth doing this second test.
>
> I just thought if last position also matched then chance that it is the sub-
> striong is much higher, so testing for both 1st & last positions would
> decrease the no. of times the inner loop has to run...
.... at the expense of more code. I'd put this in only after testing
that, in general, it pays off.
>>> // check if complete subStr occurs at this pos in str
>>> for (subStrIdx = 1; subStrIdx < lsubStr &&
>>> str[firstIdx + subStrIdx] == subStr[subStrIdx]; subStrIdx++)
>>> ;
>>> if (subStrIdx == lsubStr) {
>>> // subStr found, so return its start and break frm loop
>>> substr = str + firstIdx;
>>> firstIdx = lastIdx;
>>
>> There is a statement to do that: break. Altering the loop variable to
>> break out of the loop makes the code harder to change.
>
> i read somewhere that break and continue are almost as bad as go to and
> loops should terminate by their test expressions for readable elegant code.
Yes, some people are of that opinion. I am not, but I doubt that even
people with that opinion would advocate terminating the loop by
setting the loop variable. It is likely that they'd re-write the loop
in some new way but maybe if there is anyone of that opinion here they
could chip in. I don't like speaking for views I don't hold!
>>> }
>>> }
>>> }
>>> return substr;
>>> }
>>>
>>> unsigned findSubstr(
>>> char *str,
>>> char *subStr,
>>> unsigned lstr,
>>> unsigned lsubStr,
>>> char ***sp)
>>> {
>>> unsigned found, ctr, lu;
>>> char *u;
>>>
>>> // find how many times subStr is in str
>>> for (found = 0, u = str, lu = lstr;
>>> lu >= lsubStr && (u = strSubstr(u, subStr, lu, lsubStr));
>>> found++, u += lsubStr, lu = ((str + lstr) - u))
>>> ;
>>
>> You have picked up an odd style. Using lots of variables in a for
>> loop is not very clear. How about:
>>
>> size_t matches = 0, remaining_len = lstr;
>> char *u = str;
>> while (u = strSubstr(u, subStr, remaining_len, lsubStr)) {
>> matches++;
>> u += lsubStr;
>> remaining_len = str + lstr - u;
>> }
>>
>> This is untested (and almost certainly wrong) but it shows another way
>> to write such loops.
>
> Ok! it's just so hard to be sure that all these lenghts and offsets will
> always behave right for all cases of string and substrings! i thought
> through my loops for cases of equal lengths etc, and i just cant spot where
> mistake is...
>
>
>> Note that I've removed the lu >= lsubStr test. In effect this is the
>> first thing that strSubstr tests so there is no obvious need to
>> repeat it here.
>
> hmm okay. any minute change and i'm not sure where all it'd affect the code.
> pointers+strings is tricky!
Ah, you need to free yourself from that fear. Experience helps, but
striving to write the clearest code you can makes it much simpler to
be sure of your code. Do you ever reason about your code? For
example, do you assert the negation of a loop at the end to see what
it really means for the code that follows? Over the years, I've found
more bugs doing this than by any other method.
>>> // alloc space and copy the start of all subStr in str
>>> if (sp && (*sp = malloc(found * sizeof **sp))) {
>>> for (ctr = 0, u = str, lu = lstr;
>>> ctr < found;
>>> ctr++, u += lsubStr, lu = ((str + lstr) - u))
>>> sp[0][ctr] = strSubstr(u, subStr, lu, lsubStr);
>>
>> I'd rewrite that rather packing all the work into the for loop
>> controls. Loops are clearer when you can see what is being done in
>> the loop.
>>
>>> }
>>> return found;
>>> }
>>>
>>> int main()
>>> {
>>> char **p;
>>> unsigned found, i;
>>>
>>> printf("heee e\n");
>>> found = findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
>>> printf("%u times\n", found);
>>> for (i = 0; i < found; i++)
>>> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>>>
>>> printf("hee e\n");
>>> found = findSubstr("hee", "e", strlen("hee"), strlen("e"), &p);
>>> printf("%u times\n", found);
>>> for (i = 0; i < found; i++)
>>> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>>>
>>> printf("hhhh h\n");
>>> found = findSubstr("hhhh", "h", strlen("hhhh"), strlen("h"), &p);
>>> printf("%u times\n", found);
>>> for (i = 0; i < found; i++)
>>> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
>>>
>>>
>>> }
>>
>> A general point. Lots of people seem to pack tests into main. I
>> never do. I make main use its arguments so I have a general-purpose
>> test program. The tests I want to run often can them be put into
>> script but I can quickly test new cases simply by typing a command.
>>
>> <snip>
>
> Yeah, i wrote a version where main will accept strings from user in a loop
> and feed them to findSubstr and return the results and keep looping till
> CTRL-C but that version is sometimes giving segfault and sometimes working
> okay for same string pairs!!
I would not use input, I'd use argc and argv. For example, here is
what I wrote to investigate your bug:
int main(int argc, char **argv)
{
if (argc == 3) {
char **p;
const char *s = argv[1], *m = argv[2];
size_t mlen = strlen(m);
unsigned i, found = findSubstr(s, m, strlen(s), mlen, &p);
printf("In \"%s\" find \"%s\"\n%u times:\n", s, m, found);
for (i = 0; i < found; i++) {
int off = p[i] - s;
printf("\t%d, %.*s<%.*s>%s\n",
off, off, s, (int)mlen, p[i], s + off + mlen);
}
}
return 0;
}
> right now i cant understand my own code. now i want to rewrite everything
> but in very simple statements and loops so i can figure out where i'm going
> wrong.
It will come. BTW, kudos for your (void *) cast when printing with %p!
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/14/2010 2:38:31 PM
|
|
"fedora" <no_mail@invalid.invalid> wrote in message
news:hl8blk$s5$1@news.eternal-september.org...
> Reading all posts about Spinozas efforts to create string substitute
> program, i wanted to code mine too and specs was that not to use the
> <string.h> library. But already problem in finding all places where
> substring occurs in a string. i'm looking for long time but not able to
> see
> where error is. Any help on where i made mistake is appreciated. TIA:)
> #include<string.h>
You won't need this then...
> char *strSubstr(char *s, char *t, unsigned ls, unsigned lt)
> unsigned findSubstr(char *s, char *t, unsigned ls, unsigned lt, char
> ***sp)
Some comments about what each of these do wouldn't be amiss.
> printf("heee e\n");
> found = findSubstr("heee", "e", strlen("heee"), strlen("e"), &p);
> printf("%u times\n", found);
> for (i = 0; i < found; i++)
> printf("\t%p, %c\n", (void*)p[i], p[i][0]);
....
You're repeating code here that's best in a loop or in a function.
I've put together some code that also counts substrings, and also avoids
string.h, although it allows them to overlap so the results may not be the
same (so that "AA" is 3 substrings of "AAAA", not 2):
#include <stdio.h>
/* return how many times t occurs in s */
int findsubstrings(char *s, char*t){
int count=0;
char *p,*q;
if (*s==0 || *t==0) return 0;
while (*s) {
p=s;
q=t;
while (*p && *q && *p++==*q)++q;
if (*q==0) ++count;
++s;
}
return count;
}
void test(char *s,char *t){
printf("\"%s\" occurs %d times in \"%s\"\n",t,findsubstrings(s,t),s);
}
int main(void){
test("sisisisisisis","sis");
}
--
bartc
|
|
0
|
|
|
|
Reply
|
bartc (783)
|
2/14/2010 4:28:42 PM
|
|
On 2010-02-14, fedora <no_mail@invalid.invalid> wrote:
> Not using string.h was the rule i thought.
No, that was just Nilges being contrary to the point of stupidity.
That said, it could be a good exercise. You've already had one of the
key insights: The thing replacing strstr() should be written as a function
which is called by other functions, so as not to overcomplicate a single
gigantic function.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/14/2010 4:59:30 PM
|
|
On 2010-02-14, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> Another insight is that if one is not using strstr then its
> replacement should be more helpful that strstr is.
That's a point.
> The trouble is that strstr(x, y); returns NULL when y is not in x. It
> scans x and then tells you almost nothing. If I were re-doing this
> I'd at least make my strstr replacement act like GNU's strchrnul
> (there is no strstrnul in GNU's library). I.e. str_str_nul should
> return a pointer to the end of its first argument string when the
> search fails. At the least this would allow one to avoid re-scanning
> just to find the length[1].
And you'd replace the if (ptr) with if (*ptr), which would be fine. Hmm,
I like that.
> I'd argue that strtsr should have been defined this way from the
> start, but such is the C library.
Hmm.
Here's my thought: The C library functions we currently have are defined
in a way that's simple and well-defined; "return a pointer to X", and if
you can't, don't return a valid pointer. I think that's easier to specify
or discuss than "return a pointer to X, or possibly a pointer to Y".
> [1] At the expense of limiting the code to strings no longer than
> PTRDIFF_MAX characters. I think it is quite fiddly to avoid this
> restriction so I am not too bothered by that.
Yeah.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/14/2010 5:49:24 PM
|
|
Seebs <usenet-nospam@seebs.net> writes:
> On 2010-02-14, fedora <no_mail@invalid.invalid> wrote:
>> Not using string.h was the rule i thought.
>
> No, that was just Nilges being contrary to the point of stupidity.
>
> That said, it could be a good exercise. You've already had one of the
> key insights: The thing replacing strstr() should be written as a function
> which is called by other functions, so as not to overcomplicate a single
> gigantic function.
Another insight is that if one is not using strstr then its
replacement should be more helpful that strstr is.
The trouble is that strstr(x, y); returns NULL when y is not in x. It
scans x and then tells you almost nothing. If I were re-doing this
I'd at least make my strstr replacement act like GNU's strchrnul
(there is no strstrnul in GNU's library). I.e. str_str_nul should
return a pointer to the end of its first argument string when the
search fails. At the least this would allow one to avoid re-scanning
just to find the length[1].
Even when the search string /is/ present, the final call always scans
that tail with no valuable data being returned.
I'd argue that strtsr should have been defined this way from the
start, but such is the C library.
[1] At the expense of limiting the code to strings no longer than
PTRDIFF_MAX characters. I think it is quite fiddly to avoid this
restriction so I am not too bothered by that.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/14/2010 5:49:52 PM
|
|
Here is my humble little entry that took me around a half an hour or so to
create:
http://clc.pastebin.com/f62504e4c
If you want to avoid using `string.h' then you are going to have to implment
the following functions:
_________________________________________________
#define xstrstr strstr
#define xstrlen strlen
#define xstrcmp strcmp
#define xmemcpy memcpy
_________________________________________________
I personally don't see any need to do that unless you want to go through a
learning experience. Or perhaps if you just "know" that those functions are
very poorly implemented on your platform. Anyway, this code pre-computes all
of the substring matches and stores them in a linked-list. This gets around
having to scan the source string twice. It's fairly good at reducing the
number of list nodes by allowing a single node to hold multiple offsets into
the source string. So, in the code as-is, `malloc()/free()' is completely
avoided on list nodes _if_ there are less than or equal to 256 matches.
Any questions?
;^)
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/14/2010 7:53:19 PM
|
|
"Chris M. Thomasson" <no@spam.invalid> writes:
>Or perhaps if you just "know" that those functions are
>very poorly implemented on your platform.
Often, one can indeed assume that �stdlib.h:rand()� is
implemented in such a manner that it is not very random
in the less significant bits - although this knowledge
has nothing to do with the language C but only with the
culture of C implementations.
|
|
0
|
|
|
|
Reply
|
ram (2840)
|
2/14/2010 11:06:42 PM
|
|
"Stefan Ram" <ram@zedat.fu-berlin.de> wrote in message
news:rand-20100215000605@ram.dialup.fu-berlin.de...
> "Chris M. Thomasson" <no@spam.invalid> writes:
>>Or perhaps if you just "know" that those functions are
>>very poorly implemented on your platform.
>
> Often, one can indeed assume that �stdlib.h:rand()� is
> implemented in such a manner that it is not very random
> in the less significant bits - although this knowledge
> has nothing to do with the language C but only with the
> culture of C implementations.
good point. Humm... I would hope that `strstr()' does not commonly use a
naive algorithm to search for substrings.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/15/2010 12:41:28 AM
|
|
On Feb 15, 12:59=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-14, fedora <no_m...@invalid.invalid> wrote:
>
> > Not using string.h was the rule i thought.
>
> No, that was just Nilges being contrary to the point of stupidity.
>
> That said, it could be a good exercise. =A0You've already had one of the
> key insights: =A0The thing replacing strstr() should be written as a func=
tion
> which is called by other functions, so as not to overcomplicate a single
> gigantic function.
>
Hee hee. Sure is taking you tomatoes a long time to "ketchup" with me.
Peter, shouldn't you be checking my code like I told you? If you could
find a bug in the latest version I posted (and only that version, dear
heart), it would be a real feather in your little cap. In fact, I'll
send you a check for 25 Hong Kong dollars.
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/15/2010 4:35:18 PM
|
|
On Feb 15, 3:53=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> Here is my humble little entry that took me around a half an hour or so t=
o
> create:
>
> http://clc.pastebin.com/f62504e4c
>
> If you want to avoid using `string.h' then you are going to have to implm=
ent
> the following functions:
> _________________________________________________
> #define xstrstr strstr
> #define xstrlen strlen
> #define xstrcmp strcmp
> #define xmemcpy memcpy
> _________________________________________________
>
> I personally don't see any need to do that unless you want to go through =
a
> learning experience. Or perhaps if you just "know" that those functions a=
re
> very poorly implemented on your platform. Anyway, this code pre-computes =
all
> of the substring matches and stores them in a linked-list. This gets arou=
nd
> having to scan the source string twice. It's fairly good at reducing the
> number of list nodes by allowing a single node to hold multiple offsets i=
nto
> the source string. So, in the code as-is, `malloc()/free()' is completely
> avoided on list nodes _if_ there are less than or equal to 256 matches.
>
> Any questions?
Yeah, Chris. I have a question. Why did you call it an "entry" when
this (to me, anyway) implied that it was a contest entry to the
Spinoza challenge? Please don't be corrupted by the dishonesty and
brutality of these newsgroups.
Sure, you do say that I have to implement FOUR (4) non-trivial library
functions.
But by saying "it took me a half hour" I read an implied, perhaps
unintended slight at the approximately six hours I took ... where only
in dysfunctional corporations is it a bad thing to take a little extra
care and a little extra time in anticipation of difficulty.
A week late, in this thread, Seebach, Bacarisse et al. seem to be
running into confusion trying to help the OP meet the original
challenge. But I note nobody harassing them or the original poster,
targeting them for abuse.
In a sense, "I have only myself to blame" for this, because a year or
so ago, I jumped all over Seebach for his attacks on Schildt. I feel I
was right to do so, *et je ne regrette rien*. Nonetheless, I'm tired
of his lies.
If your "slight" was unintended, I apologize.
Without knowing as much about C as the regs, esp. postmodern C and the
standards (I'll be the first to concede this), I've left them in the
dust as regards my challenge. I've completed my solution, although I
am refining it by adding a stress test and may post a proof elsethread
proving that the code matches the algorithm statement I've posted
elsethread, and in so doing I may find a bug.
This is because "knowing C" is different from "knowing how to program"
and given the serious design flaws of C, there could be "knowing too
much about C".
But as far as I can see, no-ones mastered the problem to the same
extent, Seebach perhaps least of all, because he wasted too much time
last week attacking me. He may redeem himself by helping the OP of
this thread, but he's never even tried to write his own solution.
>
> ;^)
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/15/2010 4:53:03 PM
|
|
On Feb 15, 8:41=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
>
> news:rand-20100215000605@ram.dialup.fu-berlin.de...
>
> > "Chris M. Thomasson" <n...@spam.invalid> writes:
> >>Or perhaps if you just "know" that those functions are
> >>very poorly implemented on your platform.
>
> > =A0Often, one can indeed assume that =BBstdlib.h:rand()=AB is
> > =A0implemented in such a manner that it is not very random
> > =A0in the less significant bits - although this knowledge
> > =A0has nothing to do with the language C but only with the
> > =A0culture of C implementations.
>
> good point. Humm... I would hope that `strstr()' does not commonly use a
> naive algorithm to search for substrings.
I'd say that it better. It cannot use Boyer Moore or Knuth Morris
Pratt IF they use tables, and I believe they do, since that implies
(as far as I can tell) state (like malloc) or else an extra parameter
in the call.
cf. Donald Hennessy's books from Morgan Kaufman on computer
architecture: I'm willing to bet that strstr uses a straightforward
algorithm, that runs best on RISC, and RISC-influenced chips
(including modern Intel chips, influenced as they are), without
microcoded or hardwired special-purpose scan instructions...just
optimized character operations.
But take this cum grano salis. Although I took Computer Architecture
in grad school (me got an A) and worked on an architecture team in
Silicon Valley, I'm an English teacher today, and may not be *au
courant*.
And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
STRINGS. If you use it to construct a table of replace points you're
gonna have an interesting bug-o-rama:
replace("banana", "ana", "ono")
IF you restart one position after the find point, and not at its end.
Moral: don't let the library do your thinking for you.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/15/2010 5:02:49 PM
|
|
On Feb 14, 3:27=A0am, Richard Heathfield <r...@see.sig.invalid> wrote:
> To find the length of a string, use strlen unless you have a compelling
> reason not to. There is no evidence above of a compelling reason not to
> use strlen.
In production code perhaps, if a student is trying to learn comp.sci
through expressing ideas in C they are actually BETTER served by
writing their own versions of algorithms we take for granted.
Tom
|
|
0
|
|
|
|
Reply
|
tom236 (284)
|
2/15/2010 5:41:32 PM
|
|
On Feb 16, 12:53=A0am, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 15, 3:53=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
>
>
>
>
> > Here is my humble little entry that took me around a half an hour or so=
to
> > create:
>
> >http://clc.pastebin.com/f62504e4c
>
> > If you want to avoid using `string.h' then you are going to have to imp=
lment
> > the following functions:
> > _________________________________________________
> > #define xstrstr strstr
> > #define xstrlen strlen
> > #define xstrcmp strcmp
> > #define xmemcpy memcpy
> > _________________________________________________
>
> > I personally don't see any need to do that unless you want to go throug=
h a
> > learning experience. Or perhaps if you just "know" that those functions=
are
> > very poorly implemented on your platform. Anyway, this code pre-compute=
s all
> > of the substring matches and stores them in a linked-list. This gets ar=
ound
> > having to scan the source string twice. It's fairly good at reducing th=
e
> > number of list nodes by allowing a single node to hold multiple offsets=
into
> > the source string. So, in the code as-is, `malloc()/free()' is complete=
ly
> > avoided on list nodes _if_ there are less than or equal to 256 matches.
>
> > Any questions?
>
> Yeah, Chris. I have a question. Why did you call it an "entry" when
> this (to me, anyway) implied that it was a contest entry to the
> Spinoza challenge? Please don't be corrupted by the dishonesty and
> brutality of these newsgroups.
>
> Sure, you do say that I have to implement FOUR (4) non-trivial library
> functions.
>
> But by saying "it took me a half hour" I read an implied, perhaps
> unintended slight at the approximately six hours I took ... where only
> in dysfunctional corporations is it a bad thing to take a little extra
> care and a little extra time in anticipation of difficulty.
>
> A week late, in this thread, Seebach, Bacarisse et al. seem to be
> running into confusion trying to help the OP meet the original
> challenge. But I note nobody harassing them or the original poster,
> targeting them for abuse.
>
> In a sense, "I have only myself to blame" for this, because a year or
> so ago, I jumped all over Seebach for his attacks on Schildt. I feel I
> was right to do so, *et je ne regrette rien*. Nonetheless, I'm tired
> of his lies.
>
> If your "slight" was unintended, I apologize.
>
> Without knowing as much about C as the regs, esp. postmodern C and the
> standards (I'll be the first to concede this), I've left them in the
> dust as regards my challenge. I've completed my solution, although I
> am refining it by adding a stress test and may post a proof elsethread
> proving that the code matches the algorithm statement I've posted
> elsethread, and in so doing I may find a bug.
>
> This is because "knowing C" is different from "knowing how to program"
> and given the serious design flaws of C, there could be "knowing too
> much about C".
>
> But as far as I can see, no-ones mastered the problem to the same
> extent, Seebach perhaps least of all, because he wasted too much time
> last week attacking me. He may redeem himself by helping the OP of
> this thread, but he's never even tried to write his own solution.
Update: Willem posted an exciting, if apparently buggy, solution in
the thread where Peter mis-spelled "efficiency" in the name, and I
just had a chance to look at it. It uses recursion in place of any
data structure whatsoever. And in the same thread another poster seems
to claim he did a working (if for me hard to read) solution on day
one: I have asked him to use my test suite.
And yes. A solution WITH BUGS can be more intelligent than a clean
one. A solution that TAKES A LONG TIME can likewise be better than one
that doesn't. I was also prepared to concede victory to Willem despite
his one character identifiers because of the beauty of his idea: use
recursion and not a data structure.
Corporate thinking is applied Positivism and reductionism. Instead of
the intolerable to many effort of thinking, everything becomes a
reductionistic saw or maxim applied without feeling in the manner of
the performance review: so and so uses one character identifiers, or
likes to take more time, or is "verbose", or doesn't have comments, or
has too many comments, so therefore he's "reduced" to the incompetent
cipher, the infinitesimal that everyone in capitalism feels himself to
be, and fears himself to be.
Whereas I was, while setting up a test of Willem's code, rooting for
him. I was willing to hand the gold medal over to him, one character
identifiers and all, because of the beauty of his idea.
"The Good" in capitalist society is always reduced in the Positivist
spirit to something else because of the cash nexus and its alienation
of us from our selves and each other, which leads people around in a
(recursive) ring, chasing goals that in all cases are subgoals of
another goal...on the model of the toxic derivative which is found to
point back to itself.
Whereas if we could just say that The Good is a simple, recognizable
thing, whether a piece of code or a symphony...if we could just trust
our common humanity...imagine there's no Heaven.
But this I know. There are people in this newsgroup driven mad by
reductionism, who have been told that if they follow an external,
alienated code, whether a bunch of cargo-cult programming maxims or
"Jesus", they will be "saved", and it is these people who are starting
the fights when they are confronted with their own alienation.
>
>
>
>
>
> > ;^)
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/15/2010 6:08:59 PM
|
|
On Feb 16, 1:41=A0am, Tom St Denis <t...@iahu.ca> wrote:
> On Feb 14, 3:27=A0am, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> > To find the length of a string, use strlen unless you have a compelling
> > reason not to. There is no evidence above of a compelling reason not to
> > use strlen.
>
> In production code perhaps, if a student is trying to learn comp.sci
> through expressing ideas in C they are actually BETTER served by
> writing their own versions of algorithms we take for granted.
I think you're talking to a "wall, wondrous high, and covered with
serpent shapes" (the Wanderer): I don't think Heathfield wants people
to learn, at least as free, critical human beings. I think he wants
them to listen, and repeat rote maxims. If he's a teacher, as he seems
to want to be, he's the gym coach in the History Boys:
"Jesus didn't ask to be excused!"
"Actually, sir, he did."
>
> Tom
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/15/2010 6:12:50 PM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:d520a640-1606-407e-9b7f-b9c75f4d5159@s36g2000prf.googlegroups.com...
On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
> >
> > news:rand-20100215000605@ram.dialup.fu-berlin.de...
> >
> > > "Chris M. Thomasson" <n...@spam.invalid> writes:
> > >>Or perhaps if you just "know" that those functions are
> > >>very poorly implemented on your platform.
> >
> > > Often, one can indeed assume that �stdlib.h:rand()� is
> > > implemented in such a manner that it is not very random
> > > in the less significant bits - although this knowledge
> > > has nothing to do with the language C but only with the
> > > culture of C implementations.
> >
> > good point. Humm... I would hope that `strstr()' does not commonly use a
> > naive algorithm to search for substrings.
>
> I'd say that it better. It cannot use Boyer Moore or Knuth Morris
> Pratt IF they use tables, and I believe they do, since that implies
> (as far as I can tell) state (like malloc) or else an extra parameter
> in the call.
If fuc%ing better be more efficient than a naive algorithm!
:^o
[...]
>
> And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> STRINGS. If you use it to construct a table of replace points you're
> gonna have an interesting bug-o-rama:
>
> replace("banana", "ana", "ono")
>
> IF you restart one position after the find point, and not at its end.
Well, I simply did not construct my `replace()' function to detect
overlapping strings. Therefore, if I pass your input to my implementation I
get:
_______________________________________________
src: banana
cmp: ana
xchg: ono
expect: bonona
result: bonona
_______________________________________________
That result is fine with me. Humm... It might be interesting to see if I can
use `strstr()' to build a table that can handle overlapping strings. For the
`banana' example I would have two entries in the table:
1: offset 1
2: offset 3
After processing 1, the destination string is:
bono
After processing 2, the destination string is:
bonono
But that was easy because the exchange string is the exact same size as
comparand string. Things could get "dicey" if the exchange string were,
let's say, bigger '12345'. So, what should the final result look like in an
overlapping replace function for the following input:
replace("banana", "ana", "12345");
?
Would it be:
b1234512345
?
If so, would it be okay for replace("banana", "ana", "ono") to result in:
bonoono
?
We need to work out some rules here... ;^)
> Moral: don't let the library do your thinking for you.
How do you feel about a garbage collector doing all the thinking for you? I
think a GC is convenient, and I also feel the same way about certain library
functions. However, there are times when you do want to "re-invent"
something. For instance, I am okay with using various manual memory
management techniques to help relieve the pressure on a GC.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/15/2010 6:32:53 PM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:dd0b52a1-6d35-4faf-a428-41fa9fd28101@b7g2000pro.googlegroups.com...
[...]
> Update: Willem posted an exciting, if apparently buggy, solution in
> the thread where Peter mis-spelled "efficiency" in the name, and I
> just had a chance to look at it. It uses recursion in place of any
> data structure whatsoever. And in the same thread another poster seems
> to claim he did a working (if for me hard to read) solution on day
> one: I have asked him to use my test suite.
>
> And yes. A solution WITH BUGS can be more intelligent than a clean
> one. A solution that TAKES A LONG TIME can likewise be better than one
> that doesn't.
>
>
> I was also prepared to concede victory to Willem despite
Humm... I need to ask why would you feel the need to concede victory to
anybody? I thought this was not a contest. What am I missing?
> his one character identifiers because of the beauty of his idea: use
> recursion and not a data structure.
Can I pass it a bomb that can possibly blow the stack? I cannot seem to find
Willem's posting in the thread entitled "Efficency and the standard
library".
[...]
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/15/2010 6:45:19 PM
|
|
On 2010-02-15, Chris M. Thomasson <no@spam.invalid> wrote:
> Humm... I need to ask why would you feel the need to concede victory to
> anybody? I thought this was not a contest. What am I missing?
When has Nilges ever acted in a way that suggested that he did not view
everything as a contest with winners and losers? I think you're inventing
rationality not in evidence.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/15/2010 6:45:37 PM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:292491f2-c3ca-45ae-8d8e-355c56325544@x1g2000prb.googlegroups.com...
On Feb 15, 3:53 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> Here is my humble little entry that took me around a half an hour or so to
> create:
>
> http://clc.pastebin.com/f62504e4c
>
> If you want to avoid using `string.h' then you are going to have to
> implment
> the following functions:
> _________________________________________________
[...]
> _________________________________________________
>
[...]
> Any questions?
> Yeah, Chris. I have a question. Why did you call it an "entry" when
> this (to me, anyway) implied that it was a contest entry to the
> Spinoza challenge? Please don't be corrupted by the dishonesty and
> brutality of these newsgroups.
Ahh crap. I was thinking that it was sort of a "challenge" so to speak.
Anyway, I apologize for misrepresenting you.
> Sure, you do say that I have to implement FOUR (4) non-trivial library
> functions.
>
> But by saying "it took me a half hour" I read an implied, perhaps
> unintended slight at the approximately six hours I took ... where only
> in dysfunctional corporations is it a bad thing to take a little extra
> care and a little extra time in anticipation of difficulty.
I actually meant nothing by it. Quite frankly, now that I think about it, I
don't actually know why I posted how long it took me. I mean, who cares
right?
> A week late, in this thread, Seebach, Bacarisse et al. seem to be
> running into confusion trying to help the OP meet the original
> challenge. But I note nobody harassing them or the original poster,
> targeting them for abuse.
What challenge?
> In a sense, "I have only myself to blame" for this, because a year or
> so ago, I jumped all over Seebach for his attacks on Schildt. I feel I
> was right to do so, *et je ne regrette rien*. Nonetheless, I'm tired
> of his lies.
>
> If your "slight" was unintended, I apologize.
It was totally unintended Edward. I did not even think of insulting anybody
by posting how long it took be to flesh out that code.
> Without knowing as much about C as the regs, esp. postmodern C and the
> standards (I'll be the first to concede this), I've left them in the
> dust as regards my challenge.
Again, what challenge are you referring to?
> I've completed my solution, although I
> am refining it by adding a stress test and may post a proof elsethread
> proving that the code matches the algorithm statement I've posted
> elsethread, and in so doing I may find a bug.
Finding a bug is damn good thing! Nothing wrong with that. I hate it when
somebody gets pissed off when I find a bug in some of their code. They don't
even thank you for pointing it out to them!
Bastards!
> This is because "knowing C" is different from "knowing how to program"
> and given the serious design flaws of C, there could be "knowing too
> much about C".
>
> But as far as I can see, no-ones mastered the problem to the same
> extent, Seebach perhaps least of all, because he wasted too much time
> last week attacking me. He may redeem himself by helping the OP of
> this thread, but he's never even tried to write his own solution.
I just cannot really understand why you are trying to avoid `string.h' in
all cases. I mean, if you wanted to re-implement `strstr()', well, that's
fine. However, I don't see a real need to roll your own version of
`strlen()' or `memcpy()'. I mean, how can you do better than a good
implementation of the standard C library? An implementation of `memcpy()'
will most likely be using processor specific instructions that provide a
level of efficiency that cannot be reached with 100% pure portable C code.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/15/2010 7:13:54 PM
|
|
On Feb 15, 1:12=A0pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 16, 1:41=A0am, Tom St Denis <t...@iahu.ca> wrote:
>
> > On Feb 14, 3:27=A0am, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> > > To find the length of a string, use strlen unless you have a compelli=
ng
> > > reason not to. There is no evidence above of a compelling reason not =
to
> > > use strlen.
>
> > In production code perhaps, if a student is trying to learn comp.sci
> > through expressing ideas in C they are actually BETTER served by
> > writing their own versions of algorithms we take for granted.
>
> I think you're talking to a "wall, wondrous high, and covered with
> serpent shapes" (the Wanderer): I don't think Heathfield wants people
> to learn, at least as free, critical human beings. I think he wants
> them to listen, and repeat rote maxims. If he's a teacher, as he seems
> to want to be, he's the gym coach in the History Boys:
Nobody, least of all anyone looking like me asked you for your
opinion.
Tom
|
|
0
|
|
|
|
Reply
|
tom236 (284)
|
2/15/2010 7:20:37 PM
|
|
"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:4Dgen.97765$CM7.48825@newsfe04.iad...
> "spinoza1111" <spinoza1111@yahoo.com> wrote in message
> news:dd0b52a1-6d35-4faf-a428-41fa9fd28101@b7g2000pro.googlegroups.com...
> [...]
>
>> Update: Willem posted an exciting, if apparently buggy, solution in
>> the thread where Peter mis-spelled "efficiency" in the name, and I
>> just had a chance to look at it. It uses recursion in place of any
>> data structure whatsoever. And in the same thread another poster seems
>> to claim he did a working (if for me hard to read) solution on day
>> one: I have asked him to use my test suite.
>>
>> And yes. A solution WITH BUGS can be more intelligent than a clean
>> one. A solution that TAKES A LONG TIME can likewise be better than one
>> that doesn't.
>>
>>
>> I was also prepared to concede victory to Willem despite
[...]
>> his one character identifiers because of the beauty of his idea: use
>> recursion and not a data structure.
>
> Can I pass it a bomb that can possibly blow the stack? I cannot seem to
> find Willem's posting in the thread entitled "Efficency and the standard
> library".
Ahhh, I found his code in the "Warning to newbies" thread:
http://groups.google.com/group/comp.lang.c/msg/7c6bf8fae5249919
I don't think it can blow the stack because the recursion level is limited.
Also, I found a response to Willem from you:
http://groups.google.com/group/comp.lang.c/msg/5b2b278673c86951
in which you clearly state that this is indeed a "Spinoza challenge":
_____________________________________________________________
spinoza111: "2. I was, and remain, very impressed by your solution and as I
created the source file you see below I was rooting for you: for had
it ran with my test suite, I would have handed the "Olympic gold
medal" that I've awarded myself in the Spinoza challenge to you."
_____________________________________________________________
Now you are confusing me here. First you say it's not a challenge, then you
seem to contradict yourself. Can you please clear this up for me? Thanks.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/15/2010 7:22:52 PM
|
|
On 2010-02-15, Chris M. Thomasson <no@spam.invalid> wrote:
> "spinoza1111" <spinoza1111@yahoo.com> wrote in message
> news:292491f2-c3ca-45ae-8d8e-355c56325544@x1g2000prb.googlegroups.com...
>> But by saying "it took me a half hour" I read an implied, perhaps
>> unintended slight at the approximately six hours I took ... where only
>> in dysfunctional corporations is it a bad thing to take a little extra
>> care and a little extra time in anticipation of difficulty.
> I actually meant nothing by it. Quite frankly, now that I think about it, I
> don't actually know why I posted how long it took me. I mean, who cares
> right?
Right.
That said, I *did* mean a slight at the alleged six hours Nilges took to
produce a much buggier implementation, because that suggests that his
methodology is bad -- and since he posted his originally specifically as
a criticism of my off-the-cuff example I posted in another thread, I
figured he was looking for comparisons.
In short, he posted a large rant about how unprofessional and unconsidered
and badly-designed my code was, which point he "proved" by demonstrating
that, in only ten times as many lines of code, with four or five times as
many bugs, he could nearly solve a sort of similar problem. Very persuasive.
>> A week late, in this thread, Seebach, Bacarisse et al. seem to be
>> running into confusion trying to help the OP meet the original
>> challenge. But I note nobody harassing them or the original poster,
>> targeting them for abuse.
> What challenge?
I have no clue.
This whole thing started because I posted a snippet of code I found
interesting to make for some vaguely topical stuff. Nilges responded with
angry rants about how bad my code was and a gigantic, buggy, "solution"
to the problem.
>> Without knowing as much about C as the regs, esp. postmodern C and the
>> standards (I'll be the first to concede this), I've left them in the
>> dust as regards my challenge.
> Again, what challenge are you referring to?
The one he thinks it's very offensive that you implied existed.
He's not exactly consistent. At any given time, he believes whatever he
feels makes him look best. This can result in him believing wildly
contradictory things over the course of a post.
> Finding a bug is damn good thing! Nothing wrong with that. I hate it when
> somebody gets pissed off when I find a bug in some of their code. They don't
> even thank you for pointing it out to them!
Agreed!
> I just cannot really understand why you are trying to avoid `string.h' in
> all cases. I mean, if you wanted to re-implement `strstr()', well, that's
> fine. However, I don't see a real need to roll your own version of
> `strlen()' or `memcpy()'. I mean, how can you do better than a good
> implementation of the standard C library? An implementation of `memcpy()'
> will most likely be using processor specific instructions that provide a
> level of efficiency that cannot be reached with 100% pure portable C code.
I have no clue. I also don't see why he thinks my posts about his buggy
code have anything to do with the time or effort it takes to get this done.
When he proposed a more general problem than the one my original effort
solved, I posted a proposed solution, which took about ten minutes to write,
and in which one bug was found so far. (It went into an infinite loop if you
had it matching a zero-length substring.) I fixed that, and it's done. So
far as I can tell, it works for all inputs that don't exhaust memory or
size_t or something similar, and is otherwise unexceptional because the
task is fundamentally a very trivial one.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/15/2010 7:47:10 PM
|
|
Seebs wrote:
) On 2010-02-15, Chris M. Thomasson <no@spam.invalid> wrote:
)> I just cannot really understand why you are trying to avoid `string.h' in
)> all cases. I mean, if you wanted to re-implement `strstr()', well, that's
)> fine. However, I don't see a real need to roll your own version of
)> `strlen()' or `memcpy()'. I mean, how can you do better than a good
)> implementation of the standard C library? An implementation of `memcpy()'
)> will most likely be using processor specific instructions that provide a
)> level of efficiency that cannot be reached with 100% pure portable C code.
)
) I have no clue. I also don't see why he thinks my posts about his buggy
) code have anything to do with the time or effort it takes to get this done.
) When he proposed a more general problem than the one my original effort
) solved, I posted a proposed solution, which took about ten minutes to write,
) and in which one bug was found so far. (It went into an infinite loop if you
) had it matching a zero-length substring.) I fixed that, and it's done. So
) far as I can tell, it works for all inputs that don't exhaust memory or
) size_t or something similar, and is otherwise unexceptional because the
) task is fundamentally a very trivial one.
I wouldn't call matching a zero-length substring a bug, really. More of an
oversight in the specification. It's comparable to dividing by zero.
The reason I enjoyed coding it up without using string.h functions is
because it's an academic challenge/puzzle. Not a hard one, mind.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
willem7123 (117)
|
2/15/2010 8:04:07 PM
|
|
On 2010-02-15, Willem <willem@snail.stack.nl> wrote:
> I wouldn't call matching a zero-length substring a bug, really. More of an
> oversight in the specification. It's comparable to dividing by zero.
My code's behavior was a bug, though -- bad inputs shouldn't cause an
infinite loop.
> The reason I enjoyed coding it up without using string.h functions is
> because it's an academic challenge/puzzle. Not a hard one, mind.
It's interesting, and the recursive strategy is fascinating.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/15/2010 8:05:43 PM
|
|
On 15/02/2010 18:08, spinoza1111 wrote:
[snip]
> And yes. A solution WITH BUGS can be more intelligent than a clean
> one. A solution that TAKES A LONG TIME can likewise be better than one
> that doesn't. I was also prepared to concede victory to Willem despite
> his one character identifiers because of the beauty of his idea: use
> recursion and not a data structure.
Yes - VICTORY! - that's what we need. Doubtless you will be increasing
the chocolate ration from 30gm/week to 20gm/week in the near future too.
--
Tim
"That the freedom of speech and debates or proceedings in Parliament
ought not to be impeached or questioned in any court or place out of
Parliament"
Bill of Rights 1689
|
|
0
|
|
|
|
Reply
|
timstreater (943)
|
2/15/2010 8:27:52 PM
|
|
"Seebs" <usenet-nospam@seebs.net> wrote in message
news:slrnhnjagd.fm3.usenet-nospam@guild.seebs.net...
> On 2010-02-15, Willem <willem@snail.stack.nl> wrote:
>> I wouldn't call matching a zero-length substring a bug, really. More of
>> an
>> oversight in the specification. It's comparable to dividing by zero.
>
> My code's behavior was a bug, though -- bad inputs shouldn't cause an
> infinite loop.
>
>> The reason I enjoyed coding it up without using string.h functions is
>> because it's an academic challenge/puzzle. Not a hard one, mind.
>
> It's interesting, and the recursive strategy is fascinating.
Yes, I agree that the solution based on recursion is neat. However, any
recursive function tends to make me worry about blowing the stack. Perhaps I
worry to much!
;^)
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/15/2010 8:35:06 PM
|
|
"Chris M. Thomasson" wrote:
> Yes, I agree that the solution based on recursion is neat. However, any
> recursive function tends to make me worry about blowing the stack. Perhaps I
> worry to much!
As much as I like recursive solutions for many things including most of the
parsers I have written.
There are some application areas where recursion is avoided. Most of the
automotive bugs 10 or 15 years ago had a stack depth component and most
code is now written with predictable run time requirements.
Regards
Walter..
--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/15/2010 9:49:40 PM
|
|
On Feb 16, 2:32=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "spinoza1111" <spinoza1...@yahoo.com> wrote in message
>
> news:d520a640-1606-407e-9b7f-b9c75f4d5159@s36g2000prf.googlegroups.com...
> On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
>
>
>
>
> > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
>
> > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
>
> > > > "Chris M. Thomasson" <n...@spam.invalid> writes:
> > > >>Or perhaps if you just "know" that those functions are
> > > >>very poorly implemented on your platform.
>
> > > > Often, one can indeed assume that =BBstdlib.h:rand()=AB is
> > > > implemented in such a manner that it is not very random
> > > > in the less significant bits - although this knowledge
> > > > has nothing to do with the language C but only with the
> > > > culture of C implementations.
>
> > > good point. Humm... I would hope that `strstr()' does not commonly us=
e a
> > > naive algorithm to search for substrings.
>
> > I'd say that it better. It cannot use Boyer Moore or Knuth Morris
> > Pratt IF they use tables, and I believe they do, since that implies
> > (as far as I can tell) state (like malloc) or else an extra parameter
> > in the call.
>
> If fuc%ing better be more efficient than a naive algorithm!
>
> :^o
>
> [...]
>
>
>
> > And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> > STRINGS. If you use it to construct a table of replace points you're
> > gonna have an interesting bug-o-rama:
>
> > replace("banana", "ana", "ono")
>
> > IF you restart one position after the find point, and not at its end.
>
> Well, I simply did not construct my `replace()' function to detect
> overlapping strings. Therefore, if I pass your input to my implementation=
I
> get:
> _______________________________________________
> src: =A0 =A0banana
> cmp: =A0 =A0ana
> xchg: =A0 ono
> expect: bonona
> result: bonona
> _______________________________________________
>
> That result is fine with me. Humm... It might be interesting to see if I =
can
> use `strstr()' to build a table that can handle overlapping strings. For =
the
> `banana' example I would have two entries in the table:
>
> 1: offset 1
> 2: offset 3
>
> After processing 1, the destination string is:
>
> bono
>
> After processing 2, the destination string is:
>
> bonono
>
> But that was easy because the exchange string is the exact same size as
> comparand string. Things could get "dicey" if the exchange string were,
> let's say, bigger '12345'. So, what should the final result look like in =
an
> overlapping replace function for the following input:
>
> replace("banana", "ana", "12345");
>
> ?
>
> Would it be:
>
> b1234512345
>
> ?
>
> If so, would it be okay for replace("banana", "ana", "ono") to result in:
>
> bonoono
>
> ?
>
> We need to work out some rules here... =A0 ;^)
>
> > Moral: don't let the library do your thinking for you.
>
> How do you feel about a garbage collector doing all the thinking for you?=
I
Fine, since garbage collection is simpler than software design. We
have the right to think of software entities coming into existence and
dying without having to be midwifes or funeral directors.
> think a GC is convenient, and I also feel the same way about certain libr=
ary
> functions. However, there are times when you do want to "re-invent"
> something. For instance, I am okay with using various manual memory
> management techniques to help relieve the pressure on a GC.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 5:25:29 AM
|
|
On Feb 16, 2:45=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-15, Chris M. Thomasson <n...@spam.invalid> wrote:
>
> > Humm... I need to ask why would you feel the need to concede victory to
> > anybody? I thought this was not a contest. What am I missing?
>
> When has Nilges ever acted in a way that suggested that he did not view
> everything as a contest with winners and losers? =A0I think you're invent=
ing
It's better to have a contest with winners and losers than a rigged
game with bullies and victims, Seebach. You've forced me to
demonstrate that you're not competent to judge Schildt, but I would
much prefer not having to do this. If you'd behave yourself and start
a night school course in comp sci, then I won't start contests.
> rationality not in evidence.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 5:27:47 AM
|
|
On Feb 16, 3:13=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "spinoza1111" <spinoza1...@yahoo.com> wrote in message
>
> news:292491f2-c3ca-45ae-8d8e-355c56325544@x1g2000prb.googlegroups.com...
> On Feb 15, 3:53 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
>
>
>
>
> > Here is my humble little entry that took me around a half an hour or so=
to
> > create:
>
> >http://clc.pastebin.com/f62504e4c
>
> > If you want to avoid using `string.h' then you are going to have to
> > implment
> > the following functions:
> > _________________________________________________
> [...]
> > _________________________________________________
>
> [...]
> > Any questions?
> > Yeah, Chris. I have a question. Why did you call it an "entry" when
> > this (to me, anyway) implied that it was a contest entry to the
> > Spinoza challenge? Please don't be corrupted by the dishonesty and
> > brutality of these newsgroups.
>
> Ahh crap. I was thinking that it was sort of a "challenge" so to speak.
> Anyway, I apologize for misrepresenting you.
>
No prob.
> > Sure, you do say that I have to implement FOUR (4) non-trivial library
> > functions.
>
> > But by saying "it took me a half hour" I read an implied, perhaps
> > unintended slight at the approximately six hours I took ... where only
> > in dysfunctional corporations is it a bad thing to take a little extra
> > care and a little extra time in anticipation of difficulty.
>
> I actually meant nothing by it. Quite frankly, now that I think about it,=
I
> don't actually know why I posted how long it took me. I mean, who cares
> right?
Sorry for gettin' hot on the collar but as you can see there are a lot
of ill-intentioned people here.
>
> > A week late, in this thread, Seebach, Bacarisse et al. seem to be
> > running into confusion trying to help the OP meet the original
> > challenge. But I note nobody harassing them or the original poster,
> > targeting them for abuse.
>
> What challenge?
>
Write a replace() function without using string.h.
> > In a sense, "I have only myself to blame" for this, because a year or
> > so ago, I jumped all over Seebach for his attacks on Schildt. I feel I
> > was right to do so, *et je ne regrette rien*. Nonetheless, I'm tired
> > of his lies.
>
> > If your "slight" was unintended, I apologize.
>
> It was totally unintended Edward. I did not even think of insulting anybo=
dy
> by posting how long it took be to flesh out that code.
>
> > Without knowing as much about C as the regs, esp. postmodern C and the
> > standards (I'll be the first to concede this), I've left them in the
> > dust as regards my challenge.
>
> Again, what challenge are you referring to?
>
> > I've completed my solution, although I
> > am refining it by adding a stress test and may post a proof elsethread
> > proving that the code matches the algorithm statement I've posted
> > elsethread, and in so doing I may find a bug.
>
> Finding a bug is damn good thing! Nothing wrong with that. I hate it when
> somebody gets pissed off when I find a bug in some of their code. They do=
n't
> even thank you for pointing it out to them!
>
> Bastards!
Swine!
Let's hear it for the good guys! YAY
Let's hear it for the bad guys! BOO
>
> > This is because "knowing C" is different from "knowing how to program"
> > and given the serious design flaws of C, there could be "knowing too
> > much about C".
>
> > But as far as I can see, no-ones mastered the problem to the same
> > extent, Seebach perhaps least of all, because he wasted too much time
> > last week attacking me. He may redeem himself by helping the OP of
> > this thread, but he's never even tried to write his own solution.
>
> I just cannot really understand why you are trying to avoid `string.h' in
> all cases. I mean, if you wanted to re-implement `strstr()', well, that's
> fine. However, I don't see a real need to roll your own version of
> `strlen()' or `memcpy()'. I mean, how can you do better than a good
> implementation of the standard C library? An implementation of `memcpy()'
Actually, in terms of efficiency one often can. Library writers are
men of flesh and blood, and women too.
> will most likely be using processor specific instructions that provide a
> level of efficiency that cannot be reached with 100% pure portable C code=
..
How is that possible? The compiler of the library code will emit
"processor specific" instructions, to be sure, but it will do the same
for me, or any man. And if the library code forces out assembler code,
then it will only work on one processor, or at best small n processor.
Without any examples in front of me at the time, I'd say that well-
written library routines are basically simple and correct. They have
to run on multiple processors, and can no where assume instructions
that execute in one or small n machine cycles.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 5:34:37 AM
|
|
On Feb 16, 3:20=A0am, Tom St Denis <t...@iahu.ca> wrote:
> On Feb 15, 1:12=A0pm,spinoza1111<spinoza1...@yahoo.com> wrote:
>
>
>
>
>
> > On Feb 16, 1:41=A0am, Tom St Denis <t...@iahu.ca> wrote:
>
> > > On Feb 14, 3:27=A0am, Richard Heathfield <r...@see.sig.invalid> wrote=
:
>
> > > > To find the length of a string, use strlen unless you have a compel=
ling
> > > > reason not to. There is no evidence above of a compelling reason no=
t to
> > > > use strlen.
>
> > > In production code perhaps, if a student is trying to learn comp.sci
> > > through expressing ideas in C they are actually BETTER served by
> > > writing their own versions of algorithms we take for granted.
>
> > I think you're talking to a "wall, wondrous high, and covered with
> > serpent shapes" (the Wanderer): I don't think Heathfield wants people
> > to learn, at least as free, critical human beings. I think he wants
> > them to listen, and repeat rote maxims. If he's a teacher, as he seems
> > to want to be, he's the gym coach in the History Boys:
>
> Nobody, least of all anyone looking like me asked you for your
> opinion.
Too bad. It was free. You don't have to pay me.
>
> Tom
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 5:35:19 AM
|
|
On Feb 16, 3:22=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in messagenews:4Dgen.97765=
$CM7.48825@newsfe04.iad...
>
>
>
>
>
> > "spinoza1111" <spinoza1...@yahoo.com> wrote in message
> >news:dd0b52a1-6d35-4faf-a428-41fa9fd28101@b7g2000pro.googlegroups.com...
> > [...]
>
> >> Update: Willem posted an exciting, if apparently buggy, solution in
> >> the thread where Peter mis-spelled "efficiency" in the name, and I
> >> just had a chance to look at it. It uses recursion in place of any
> >> data structure whatsoever. And in the same thread another poster seems
> >> to claim he did a working (if for me hard to read) solution on day
> >> one: I have asked him to use my test suite.
>
> >> And yes. A solution WITH BUGS can be more intelligent than a clean
> >> one. A solution that TAKES A LONG TIME can likewise be better than one
> >> that doesn't.
>
> >> I was also prepared to concede victory to Willem despite
> [...]
> >> his one character identifiers because of the beauty of his idea: use
> >> recursion and not a data structure.
>
> > Can I pass it a bomb that can possibly blow the stack? I cannot seem to
> > find Willem's posting in the thread entitled "Efficency and the standar=
d
> > library".
>
> Ahhh, I found his code in the "Warning to newbies" thread:
>
> http://groups.google.com/group/comp.lang.c/msg/7c6bf8fae5249919
>
> I don't think it can blow the stack because the recursion level is limite=
d.
> Also, I found a response to Willem from you:
>
> http://groups.google.com/group/comp.lang.c/msg/5b2b278673c86951
>
> in which you clearly state that this is indeed a "Spinoza challenge":
> _____________________________________________________________
> spinoza111: "2. =A0I was, and remain, very impressed by your solution and=
as I
> created the source file you see below I was rooting for you: for had
> it ran with my test suite, I would have handed the "Olympic gold
> medal" that I've awarded myself in the Spinoza challenge to you."
> _____________________________________________________________
>
> Now you are confusing me here. First you say it's not a challenge, then y=
ou
> seem to contradict yourself. Can you please clear this up for me? Thanks.
It is a challenge.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 5:35:54 AM
|
|
spinoza1111 wrote:
> On Feb 16, 3:20 am, Tom St Denis <t...@iahu.ca> wrote:
<snip>
>> Nobody, least of all anyone looking like me asked you for your
>> opinion.
>
> Too bad. It was free.
And overpriced.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/16/2010 6:34:52 AM
|
|
On Feb 16, 3:47=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-15, Chris M. Thomasson <n...@spam.invalid> wrote:
>
> > "spinoza1111" <spinoza1...@yahoo.com> wrote in message
> >news:292491f2-c3ca-45ae-8d8e-355c56325544@x1g2000prb.googlegroups.com...
> >> But by saying "it took me a half hour" I read an implied, perhaps
> >> unintended slight at the approximately six hours I took ... where only
> >> in dysfunctional corporations is it a bad thing to take a little extra
> >> care and a little extra time in anticipation of difficulty.
> > I actually meant nothing by it. Quite frankly, now that I think about i=
t, I
> > don't actually know why I posted how long it took me. I mean, who cares
> > right?
>
> Right.
>
> That said, I *did* mean a slight at the alleged six hours Nilges took to
> produce a much buggier implementation, because that suggests that his
> methodology is bad -- and since he posted his originally specifically as
> a criticism of my off-the-cuff example I posted in another thread, I
> figured he was looking for comparisons.
>
> In short, he posted a large rant about how unprofessional and unconsidere=
d
> and badly-designed my code was, which point he "proved" by demonstrating
> that, in only ten times as many lines of code, with four or five times as
> many bugs, he could nearly solve a sort of similar problem. =A0Very persu=
asive.
You only knew about the bugs because I fixed them, Peter, whereas you
never fixed the bugs you reported in your first attempt. I haven't
been tracking your other code in detail because it doesn't interest
me, but I see in your posts nothing like diligence in testing. You're
so worried about taking "too long", having been in my view corrupted
by corporate life, that you created nothing like a systematic and
growing test suite (I did) nor did you systematically track and
document bugs and changes (I did).
In fact, most of the "six hours" I spent was in documentation and test
creation, not coding. How dare you even compare your buggy and
amateurish work?
If you look at the latest text of "my" replace, you'll see in the
Change Record that each bug save one is labeled with "bug:": one is
labeled "bug fix".
There were, in fact, only five bugs.
And, of course, you're counting in "lines of code" the test suite that
many other posters including Willem have found useful, but which you
probably dare not use.
>
> >> A week late, in this thread, Seebach, Bacarisse et al. seem to be
> >> running into confusion trying to help the OP meet the original
> >> challenge. But I note nobody harassing them or the original poster,
> >> targeting them for abuse.
> > What challenge?
>
> I have no clue.
>
> This whole thing started because I posted a snippet of code I found
> interesting to make for some vaguely topical stuff. =A0Nilges responded w=
ith
> angry rants about how bad my code was and a gigantic, buggy, "solution"
> to the problem.
Five bugs. All fixed. YOU NEVER FIXED the %s bug.
>
> >> Without knowing as much about C as the regs, esp. postmodern C and the
> >> standards (I'll be the first to concede this), I've left them in the
> >> dust as regards my challenge.
> > Again, what challenge are you referring to?
>
> The one he thinks it's very offensive that you implied existed.
>
> He's not exactly consistent. =A0At any given time, he believes whatever h=
e
> feels makes him look best. =A0This can result in him believing wildly
> contradictory things over the course of a post.
Your limitations are not my contradictions,
Your failures are not mine,
Your misery is your own history,
So, dear boy, don't whine.
>
> > Finding a bug is damn good thing! Nothing wrong with that. I hate it wh=
en
> > somebody gets pissed off when I find a bug in some of their code. They =
don't
> > even thank you for pointing it out to them!
>
> Agreed!
Asshole. I've bent over backward to acknowledge whatever contributions
you've made. You won't even respond to email.
>
> > I just cannot really understand why you are trying to avoid `string.h' =
in
> > all cases. I mean, if you wanted to re-implement `strstr()', well, that=
's
> > fine. However, I don't see a real need to roll your own version of
> > `strlen()' or `memcpy()'. I mean, how can you do better than a good
> > implementation of the standard C library? An implementation of `memcpy(=
)'
> > will most likely be using processor specific instructions that provide =
a
> > level of efficiency that cannot be reached with 100% pure portable C co=
de.
>
> I have no clue. =A0I also don't see why he thinks my posts about his bugg=
y
> code have anything to do with the time or effort it takes to get this don=
e.
> When he proposed a more general problem than the one my original effort
> solved, I posted a proposed solution, which took about ten minutes to wri=
te,
It uses string.h. It doesn't meet the challenge. God damn, boy, you
are a liar, aren't you?
If I missed where you wrote a bug free replace() without using
string.h, post it
*HERE*
so we can evaluate it.
> and in which one bug was found so far. =A0(It went into an infinite loop =
if you
> had it matching a zero-length substring.) =A0I fixed that, and it's done.=
=A0So
> far as I can tell, it works for all inputs that don't exhaust memory or
> size_t or something similar, and is otherwise unexceptional because the
> task is fundamentally a very trivial one.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 7:46:02 AM
|
|
spinoza1111 wrote:
> On Feb 16, 3:13 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>> roll your own version of `strlen()' or `memcpy()'. I mean, how can
>> you do better than a good implementation of the standard C library?
>> An implementation of `memcpy()'
>
> Actually, in terms of efficiency one often can. Library writers are
> men of flesh and blood, and women too.
But different men for different implementations. When you write your
strlen() equivalent, you are only going to write one, not dozens. And if you
stick to portable C, it might not be faster.
>> will most likely be using processor specific instructions that
>> provide a level of efficiency that cannot be reached with 100% pure
>> portable C code.
>
> How is that possible? The compiler of the library code will emit
> "processor specific" instructions, to be sure, but it will do the same
> for me, or any man. And if the library code forces out assembler code,
> then it will only work on one processor, or at best small n processor.
I think standard library routines can be written in a language other than C,
or some mix. For example, hand-written assembly.
And the library you use comes with the processor; switch processors, and
there could be a different library routine, optimised a different way (or
just optimised down by the compiler to a couple of inline machine
instructions).
--
Bartc
|
|
0
|
|
|
|
Reply
|
bartc (783)
|
2/16/2010 11:50:29 AM
|
|
On Feb 16, 7:50=A0pm, "bartc" <ba...@freeuk.com> wrote:
> spinoza1111wrote:
> > On Feb 16, 3:13 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> >> roll your own version of `strlen()' or `memcpy()'. I mean, how can
> >> you do better than a good implementation of the standard C library?
> >> An implementation of `memcpy()'
>
> > Actually, in terms of efficiency one often can. Library writers are
> > men of flesh and blood, and women too.
>
> But different men for different implementations. When you write your
> strlen() equivalent, you are only going to write one, not dozens. And if =
you
> stick to portable C, it might not be faster.
>
> >> will most likely be using processor specific instructions that
> >> provide a level of efficiency that cannot be reached with 100% pure
> >> portable C code.
>
> > How is that possible? The compiler of the library code will emit
> > "processor specific" instructions, to be sure, but it will do the same
> > for me, or any man. And if the library code forces out assembler code,
> > then it will only work on one processor, or at best small n processor.
>
> I think standard library routines can be written in a language other than=
C,
> or some mix. For example, hand-written assembly.
Correct. Wonder how many library routines are written in assembler.
Don't know.
>
> And the library you use comes with the processor; switch processors, and
> there could be a different library routine, optimised a different way (or
> just optimised down by the compiler to a couple of inline machine
> instructions).
Correct o mundo
>
> --
> Bartc
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/16/2010 12:22:59 PM
|
|
On 16 Feb, 12:22, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 16, 7:50=A0pm, "bartc" <ba...@freeuk.com> wrote:
> > spinoza1111wrote:
> > > On Feb 16, 3:13 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > >> roll your own version of `strlen()' or `memcpy()'. I mean, how can
> > >> you do better than a good implementation of the standard C library?
> > >> An implementation of `memcpy()'
>
> > > Actually, in terms of efficiency one often can. Library writers are
> > > men of flesh and blood, and women too.
>
> > But different men for different implementations. When you write your
> > strlen() equivalent, you are only going to write one, not dozens. And i=
f you
> > stick to portable C, it might not be faster.
>
> > >> will most likely be using processor specific instructions that
> > >> provide a level of efficiency that cannot be reached with 100% pure
> > >> portable C code.
>
> > > How is that possible? The compiler of the library code will emit
> > > "processor specific" instructions, to be sure, but it will do the sam=
e
> > > for me, or any man. And if the library code forces out assembler code=
,
> > > then it will only work on one processor, or at best small n processor=
..
>
> > I think standard library routines can be written in a language other th=
an C,
> > or some mix. For example, hand-written assembly.
>
> Correct. Wonder how many library routines are written in assembler.
> Don't know.
>
> > And the library you use comes with the processor; switch processors, an=
d
> > there could be a different library routine, optimised a different way (=
or
> > just optimised down by the compiler to a couple of inline machine
> > instructions).
>
> Correct o mundo
you've been spoilt by the "portable assembler" nature of C. C is
unusual in that much of it's standard library can be written in C.
Since Windows and most Unixes are also written in C, calls into the OS
are easy as well.
Many other languages will have the low level parts of their libraries
written in C.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/16/2010 2:08:14 PM
|
|
Hi all!
Have finished my program for spinoza's challenge. rewrote everything and
this time i made each statement as simple as posible, so that i can
understand the program. The allSubstr procedure can search for over lapping
sub-string too like spinoza wanted, but the replace routine doesnt use that
since i cant think how to replace over lapping ones!
haven't used any function from string.h! it works for strings i could think
of but maybe it got bugs since i'm just a beginner...
how's mine spinoza111? :)
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
size_t strLength(char *cstr) {
size_t index = 0;
while (cstr[index] != '\0') ++index;
return index;
}
char *strFirstCh(char *str, char ch, size_t lstr) {
char *chpos = 0;
size_t current;
for (current = 0; current < lstr; current++) {
if (str[current] == ch) {
chpos = str + current;
break;
}
}
return chpos;
}
int strComp(char *s, char *t, size_t len) {
int ret = 0;
size_t index;
for (index = 0; index < len; index++) {
if (s[index] != t[index]) {
ret = 1;
break;
}
}
return ret;
}
char *strSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub) {
char *substr = 0;
char *anchor = str;
size_t remaining_len = (lstr - lsub) + 1;
assert(str && sub && lstr && lsub && lstr >= lsub);
while (remaining_len > 0 && anchor) {
if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
if (strComp(anchor, sub, lsub) == 0) {
substr = anchor;
break;
}
anchor++;
remaining_len--;
}
}
return substr;
}
unsigned allSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub,
char ***ps,
int overlap) {
unsigned occurs = 0;
unsigned ctr;
char *orig_str = str;
size_t orig_lstr = lstr;
size_t step;
if (overlap == 1)
step = 1;
else
step = lsub;
while (lstr >= lsub) {
str = strSubstr(str, sub, lstr, lsub);
if (str == 0)
break;
occurs++;
str += step;
lstr = (orig_str + orig_lstr) - str;
}
if (occurs > 0 && ps) {
str = orig_str;
lstr = orig_lstr;
*ps = malloc(occurs * sizeof **ps);
if (*ps) {
for (ctr = 0; ctr < occurs; ctr++) {
ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
str += step;
lstr = (orig_str + orig_lstr) - str;
}
}
}
return occurs;
}
char *replace(char *str, char *substr, char *rep) {
char *new = 0;
size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
unsigned replacements;
char **subpos;
assert(str && substr && rep);
lstr = strLength(str);
lsubstr = strLength(substr);
lrep = strLength(rep);
if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
return 0;
replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
if (replacements > 0) {
lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
new = malloc(lnew + 1);
if (!new)
return 0;
strc = newc = replaced = 0;
while (strc <= lstr) {
if (str + strc == subpos[replaced]) {
for (repc = 0; repc < lrep; repc++) {
new[newc] = rep[repc];
newc++;
}
replaced++;
strc += lsubstr;
}
else {
new[newc] = str[strc];
strc++;
newc++;
}
}
free(subpos);
}
else {
new = malloc(lstr + 1);
if (!new)
return 0;
for (strc = 0; strc <= lstr; strc++)
new[strc] = str[strc];
}
return new;
}
int main(int argc, char **argv) {
char *newstr;
assert(argc == 4);
newstr = replace(argv[1], argv[2], argv[3]);
if (newstr)
printf("%s\n", newstr);
else
printf("replace() -> null\n");
free(newstr);
return 0;
}
thanks a lot all who helped!
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/16/2010 8:24:18 PM
|
|
"fedora" <no_mail@invalid.invalid> wrote in message
news:hleutd$utr$1@news.eternal-september.org...
> Hi all!
>
> Have finished my program for spinoza's challenge. rewrote everything and
> this time i made each statement as simple as posible, so that i can
> understand the program. The allSubstr procedure can search for over
> lapping
> sub-string too like spinoza wanted, but the replace routine doesnt use
> that
> since i cant think how to replace over lapping ones!
>
> haven't used any function from string.h! it works for strings i could
> think
> of but maybe it got bugs since i'm just a beginner...
Seems to be solid enough.
Except, if it can't find a substring (I think for substrings longer than the
text), sometimes it returns the original text unchanged, and sometimes it
returns NULL, eg.
"a", "ab", "" returns NULL, but:
"ab", "x", "" returns "ab"
--
Bartc
|
|
0
|
|
|
|
Reply
|
bartc (783)
|
2/16/2010 9:27:01 PM
|
|
fedora wrote:
> Hi all!
>
> Have finished my program for spinoza's challenge. rewrote everything and
> this time i made each statement as simple as posible, so that i can
> understand the program. The allSubstr procedure can search for over
> lapping sub-string too like spinoza wanted, but the replace routine doesnt
> use that since i cant think how to replace over lapping ones!
>
> haven't used any function from string.h! it works for strings i could
> think of but maybe it got bugs since i'm just a beginner...
>
> how's mine spinoza111? :)
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <assert.h>
>
> size_t strLength(char *cstr) {
> size_t index = 0;
>
> while (cstr[index] != '\0') ++index;
> return index;
> }
>
> char *strFirstCh(char *str, char ch, size_t lstr) {
> char *chpos = 0;
> size_t current;
>
> for (current = 0; current < lstr; current++) {
> if (str[current] == ch) {
> chpos = str + current;
> break;
> }
> }
> return chpos;
> }
>
> int strComp(char *s, char *t, size_t len) {
> int ret = 0;
> size_t index;
>
> for (index = 0; index < len; index++) {
> if (s[index] != t[index]) {
> ret = 1;
> break;
> }
> }
> return ret;
> }
>
> char *strSubstr(
> char *str,
> char *sub,
> size_t lstr,
> size_t lsub) {
> char *substr = 0;
> char *anchor = str;
> size_t remaining_len = (lstr - lsub) + 1;
>
> assert(str && sub && lstr && lsub && lstr >= lsub);
> while (remaining_len > 0 && anchor) {
> if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
> if (strComp(anchor, sub, lsub) == 0) {
> substr = anchor;
> break;
> }
> anchor++;
> remaining_len--;
> }
> }
> return substr;
> }
>
> unsigned allSubstr(
> char *str,
> char *sub,
> size_t lstr,
> size_t lsub,
> char ***ps,
> int overlap) {
> unsigned occurs = 0;
> unsigned ctr;
> char *orig_str = str;
> size_t orig_lstr = lstr;
> size_t step;
>
> if (overlap == 1)
> step = 1;
> else
> step = lsub;
>
> while (lstr >= lsub) {
> str = strSubstr(str, sub, lstr, lsub);
> if (str == 0)
> break;
> occurs++;
> str += step;
> lstr = (orig_str + orig_lstr) - str;
> }
>
> if (occurs > 0 && ps) {
> str = orig_str;
> lstr = orig_lstr;
> *ps = malloc(occurs * sizeof **ps);
> if (*ps) {
> for (ctr = 0; ctr < occurs; ctr++) {
> ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
> str += step;
> lstr = (orig_str + orig_lstr) - str;
> }
> }
> }
> return occurs;
> }
>
> char *replace(char *str, char *substr, char *rep) {
> char *new = 0;
> size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
> unsigned replacements;
> char **subpos;
>
> assert(str && substr && rep);
> lstr = strLength(str);
> lsubstr = strLength(substr);
> lrep = strLength(rep);
> if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
> return 0;
> replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
> if (replacements > 0) {
> lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
> new = malloc(lnew + 1);
> if (!new)
> return 0;
Oops... mem leak here! I return without giving back the subpos array. THat
should be :-
if (!new) {
free(subpos);
return 0;
}
> strc = newc = replaced = 0;
> while (strc <= lstr) {
> if (str + strc == subpos[replaced]) {
> for (repc = 0; repc < lrep; repc++) {
> new[newc] = rep[repc];
> newc++;
> }
> replaced++;
> strc += lsubstr;
> }
> else {
> new[newc] = str[strc];
> strc++;
> newc++;
> }
> }
> free(subpos);
> }
> else {
> new = malloc(lstr + 1);
> if (!new)
> return 0;
> for (strc = 0; strc <= lstr; strc++)
> new[strc] = str[strc];
> }
> return new;
> }
>
> int main(int argc, char **argv) {
> char *newstr;
>
> assert(argc == 4);
> newstr = replace(argv[1], argv[2], argv[3]);
> if (newstr)
> printf("%s\n", newstr);
> else
> printf("replace() -> null\n");
> free(newstr);
> return 0;
> }
>
> thanks a lot all who helped!
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/16/2010 10:10:42 PM
|
|
bartc wrote:
>
> "fedora" <no_mail@invalid.invalid> wrote in message
> news:hleutd$utr$1@news.eternal-september.org...
>> Hi all!
>>
>> Have finished my program for spinoza's challenge. rewrote everything and
>> this time i made each statement as simple as posible, so that i can
>> understand the program. The allSubstr procedure can search for over
>> lapping
>> sub-string too like spinoza wanted, but the replace routine doesnt use
>> that
>> since i cant think how to replace over lapping ones!
>>
>> haven't used any function from string.h! it works for strings i could
>> think
>> of but maybe it got bugs since i'm just a beginner...
>
> Seems to be solid enough.
>
> Except, if it can't find a substring (I think for substrings longer than
> the text), sometimes it returns the original text unchanged, and sometimes
> it returns NULL, eg.
>
> "a", "ab", "" returns NULL, but:
>
> "ab", "x", "" returns "ab"
thanks bartc!
The reason the first one returns null pointer is because i assume its
incorrect to call the routine with a substring bigger than target string.
maybe i should've put it into the assert but i thought it wasn't serious
enought to crash, so i return null ptr.
In second case it returns original string (not exactly but copy of original
because main routine free()s the replace's returned pointer but we cant
free() argv[] strings!!) because the sub-string doesn't occur and there's
nothing to replace. replace("ab", "ab", "") will give an empty string. hope
all cases are logivcal and consistent!
for right to left, we can simply reverse the target string before sending to
replace(), but i didn't put the functionality for that. The replace
procedure is very untidy to me! it could've been made much more neat and
efficient if i'd written strcpy() too, but not yet done!
anyways, seeing Williem's recursive program makes me ashamed i'm so stupid
beginner!!
thanks all
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/16/2010 10:20:04 PM
|
|
"Chris M. Thomasson" <no@spam.invalid> writes:
> "Chris M. Thomasson" <no@spam.invalid> wrote in message
> news:4Dgen.97765$CM7.48825@newsfe04.iad...
>> "spinoza1111" <spinoza1111@yahoo.com> wrote in message
> Now you are confusing me here. First you say it's not a challenge,
> then you seem to contradict yourself. Can you please clear this up for
> me? Thanks.
Do not feed the troll.
Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1
|
|
0
|
|
|
|
Reply
|
thefatphil_demunged (1562)
|
2/16/2010 10:35:20 PM
|
|
fedora <no_mail@invalid.invalid> writes:
> Have finished my program for spinoza's challenge. rewrote everything and
> this time i made each statement as simple as posible, so that i can
> understand the program. The allSubstr procedure can search for over lapping
> sub-string too like spinoza wanted, but the replace routine doesnt use that
> since i cant think how to replace over lapping ones!
>
> haven't used any function from string.h! it works for strings i could think
> of but maybe it got bugs since i'm just a beginner...
The result is good, but I am not sure you were right to accept the
peculiar notion of not using standard string functions. If you felt
you had to, why not use standard functions but then plug-in you own
versions? That way you learn about the standard library and get to
write the character-fiddling functions that can be useful learning
exercises.
I'll make a few detailed comments (one is a bug), but on the "big
picture" I don't see why you mix size_t and unsigned all over the
place. I'd stick to size_t.
Finally, I think it is odd to return a null result when the substring
is too long to match. I'd treat is like any other substring that does
not match.
<snip>
> #include <stdlib.h>
> #include <stdio.h>
> #include <assert.h>
>
> size_t strLength(char *cstr) {
> size_t index = 0;
>
> while (cstr[index] != '\0') ++index;
> return index;
> }
>
> char *strFirstCh(char *str, char ch, size_t lstr) {
> char *chpos = 0;
> size_t current;
>
> for (current = 0; current < lstr; current++) {
> if (str[current] == ch) {
> chpos = str + current;
> break;
> }
> }
> return chpos;
> }
>
> int strComp(char *s, char *t, size_t len) {
> int ret = 0;
> size_t index;
>
> for (index = 0; index < len; index++) {
> if (s[index] != t[index]) {
> ret = 1;
> break;
> }
> }
> return ret;
Small point: ret is the same as index != len. Can you see why? Many
C programmers would just return index != len here. Also, I'd reverse
the sense of the returned value and call the function strEqual.
> }
>
> char *strSubstr(
> char *str,
> char *sub,
> size_t lstr,
> size_t lsub) {
> char *substr = 0;
> char *anchor = str;
> size_t remaining_len = (lstr - lsub) + 1;
>
> assert(str && sub && lstr && lsub && lstr >= lsub);
> while (remaining_len > 0 && anchor) {
> if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
> if (strComp(anchor, sub, lsub) == 0) {
> substr = anchor;
> break;
> }
> anchor++;
> remaining_len--;
> }
> }
> return substr;
> }
>
> unsigned allSubstr(
> char *str,
> char *sub,
> size_t lstr,
> size_t lsub,
> char ***ps,
> int overlap) {
> unsigned occurs = 0;
> unsigned ctr;
> char *orig_str = str;
> size_t orig_lstr = lstr;
> size_t step;
>
> if (overlap == 1)
> step = 1;
> else
> step = lsub;
>
> while (lstr >= lsub) {
> str = strSubstr(str, sub, lstr, lsub);
> if (str == 0)
> break;
> occurs++;
> str += step;
> lstr = (orig_str + orig_lstr) - str;
> }
>
> if (occurs > 0 && ps) {
> str = orig_str;
> lstr = orig_lstr;
> *ps = malloc(occurs * sizeof **ps);
> if (*ps) {
> for (ctr = 0; ctr < occurs; ctr++) {
> ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
> str += step;
> lstr = (orig_str + orig_lstr) - str;
> }
> }
> }
> return occurs;
> }
>
> char *replace(char *str, char *substr, char *rep) {
> char *new = 0;
> size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
> unsigned replacements;
> char **subpos;
>
> assert(str && substr && rep);
> lstr = strLength(str);
> lsubstr = strLength(substr);
> lrep = strLength(rep);
> if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
> return 0;
> replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
> if (replacements > 0) {
> lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
> new = malloc(lnew + 1);
> if (!new)
> return 0;
> strc = newc = replaced = 0;
> while (strc <= lstr) {
> if (str + strc == subpos[replaced]) {
You have a bug here. replaced can become equal to the size of the
subpos array and, hence, you index outside of it.
> for (repc = 0; repc < lrep; repc++) {
> new[newc] = rep[repc];
> newc++;
> }
> replaced++;
> strc += lsubstr;
> }
> else {
> new[newc] = str[strc];
> strc++;
> newc++;
> }
> }
> free(subpos);
> }
> else {
> new = malloc(lstr + 1);
> if (!new)
> return 0;
> for (strc = 0; strc <= lstr; strc++)
> new[strc] = str[strc];
> }
> return new;
> }
>
> int main(int argc, char **argv) {
> char *newstr;
>
> assert(argc == 4);
I don't think this is a good use of assert. It is almost always
wrong to use it to check user input. I'd just use an "if".
> newstr = replace(argv[1], argv[2], argv[3]);
> if (newstr)
> printf("%s\n", newstr);
> else
> printf("replace() -> null\n");
> free(newstr);
> return 0;
> }
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/16/2010 10:42:03 PM
|
|
"Walter Banks" <walter@bytecraft.com> wrote in message
news:4B79C174.FBB32150@bytecraft.com...
>
>
> "Chris M. Thomasson" wrote:
>
>> Yes, I agree that the solution based on recursion is neat. However, any
>> recursive function tends to make me worry about blowing the stack.
>> Perhaps I
>> worry to much!
>
> As much as I like recursive solutions for many things including most of
> the
> parsers I have written.
>
> There are some application areas where recursion is avoided. Most of the
> automotive bugs 10 or 15 years ago had a stack depth component and most
> code is now written with predictable run time requirements.
I do not "necessarily" want to restrict "potential" user input in order to
get around the limitations of a recursive function in an environment that
has a rather small per-thread stack size. If you can create an iterative
solution, then I say go ahead and do it. This may work out when you realize
that the limits you set on a recursive function are to great to run on a
system that has lower per-task/thread stack size.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/16/2010 10:58:55 PM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:f12dcf90-987d-4b72-ac28-363761b398d2@z10g2000prh.googlegroups.com...
On Feb 16, 7:50 pm, "bartc" <ba...@freeuk.com> wrote:
> > spinoza1111wrote:
> > > On Feb 16, 3:13 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > >> roll your own version of `strlen()' or `memcpy()'. I mean, how can
> > >> you do better than a good implementation of the standard C library?
> > >> An implementation of `memcpy()'
> >
> > > Actually, in terms of efficiency one often can. Library writers are
> > > men of flesh and blood, and women too.
> >
> > But different men for different implementations. When you write your
> > strlen() equivalent, you are only going to write one, not dozens. And if
> > you
> > stick to portable C, it might not be faster.
> >
> > >> will most likely be using processor specific instructions that
> > >> provide a level of efficiency that cannot be reached with 100% pure
> > >> portable C code.
> >
> > > How is that possible? The compiler of the library code will emit
> > > "processor specific" instructions, to be sure, but it will do the same
> > > for me, or any man. And if the library code forces out assembler code,
> > > then it will only work on one processor, or at best small n processor.
> >
> > I think standard library routines can be written in a language other
> > than C,
> > or some mix. For example, hand-written assembly.
> Correct. Wonder how many library routines are written in assembler.
> Don't know.
Well, a standard library implementation in the form of a deferment to native
OS library calls can be written in 100% assembly language. Perhaps a
standard C header can defer to highly efficient native OS provided
primitives.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/16/2010 11:25:16 PM
|
|
"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:tPFen.186968$Fm7.142869@newsfe16.iad...
[...]
> Well, a standard library implementation in the form of a deferment to
> native OS library calls can be written in 100% assembly language.
Ummm... The above should probably read as:
__________________________________________________________
Well, a standard library implementation can defer to native OS provided
library calls that happen to be implemented in 100% assembly language...
__________________________________________________________
Is that any better?
;^o
> Perhaps a standard C header can defer to highly efficient native OS
> provided primitives.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/16/2010 11:30:02 PM
|
|
On Feb 17, 6:35=A0am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> "Chris M. Thomasson" <n...@spam.invalid> writes:
>
> > "Chris M. Thomasson" <n...@spam.invalid> wrote in message
> >news:4Dgen.97765$CM7.48825@newsfe04.iad...
> >> "spinoza1111" <spinoza1...@yahoo.com> wrote in message
> > Now you are confusing me here. First you say it's not a challenge,
> > then you seem to contradict yourself. Can you please clear this up for
> > me? Thanks.
>
> Do not feed the troll.
I am not a "troll", and "troll" is a Nordic racist word, referring as
it does to peoples pushed out of Western Europe by invaders after the
fall of Rome. I have been discussing and submitting code written in C.
A "troll" is one who posts insincerely in order to get a rise out of
people. I do not do so.
Richard Heathfield is by no means a friend of mine, yet he has gone on
record to say that I am not a "troll".
Please keep your comments on-topic if you cannot keep them civil.
>
> Phil
> --
> Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 3:37:15 AM
|
|
On Feb 17, 6:58=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Walter Banks" <wal...@bytecraft.com> wrote in message
>
> news:4B79C174.FBB32150@bytecraft.com...
>
>
>
>
>
>
>
> > "Chris M. Thomasson" wrote:
>
> >> Yes, I agree that the solution based on recursion is neat. However, an=
y
> >> recursive function tends to make me worry about blowing the stack.
> >> Perhaps I
> >> worry to much!
>
> > As much as I like recursive solutions for many things including most of
> > the
> > parsers I have written.
>
> > There are some application areas where recursion is avoided. Most of th=
e
> > automotive bugs 10 or 15 years ago had a stack depth component and most
> > code is now written with predictable run time requirements.
>
> I do not "necessarily" want to restrict "potential" user input in order t=
o
> get around the limitations of a recursive function in an environment that
> has a rather small per-thread stack size. If you can create an iterative
> solution, then I say go ahead and do it. This may work out when you reali=
ze
> that the limits you set on a recursive function are to great to run on a
> system that has lower per-task/thread stack size.
Recursive solutions are in the thread where Seebs misspelled
"efficiency" in the header, in C by Willem and in C Sharp by myself.
Each solution will put something on the stack for each occurence of
the target. But if you don't recurse, then you need to create one of
my segments in my linked list. The stack frame is a couple of
addresses as is the segment.
Therefore, it seems to me that there's a minimum storage complexity to
the problem and not just to these two solutions. In a language/
computer where strings could grow magically this would not be the
case, but this would be the existence as if by magic of hardware that
could hey presto realloc strings "under the covers".
Here, the storage complexity is real if hidden. It is the need to keep
finding and reallocing small strings, or getting a big string and
discarding what you don't need.
And if you implement replace() in assembler, you still have the same
storage complexity.
The most storage efficient solution for C's format for strings would
examine the target and replacement strings. If the target length is
greater than or equal to that of the replacement, do the
transformation *in situ* by shifting bytes left. If the target length
is less, you must reallocate, perhaps more than once.
Which is why C programmers need to understand that C doesn't support
strings out of the box. Instead, it provides a silly set of solutions
based on the absurd idea that it makes sense to terminate a string
with a Nul.
C without strings could be a sensible language for low-level
programming of toys. C with strings needs to use a standardized, open
source modern string.H that represents strings as linked lists.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 3:49:39 AM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:6f22a1d8-6b97-4821-b990-ad69482e24c7@t17g2000prg.googlegroups.com...
[...]
> > > Moral: don't let the library do your thinking for you.
> >
> > How do you feel about a garbage collector doing all the thinking for
> > you? I
> Fine, since garbage collection is simpler than software design. We
> have the right to think of software entities coming into existence and
> dying without having to be midwifes or funeral directors.
What about forgetting to set a reference to NULL? Sometimes, you can
unnecessarily extend the lifetimes of objects in a pure GC system if you
forget to set certain object references to NULL. IMHO, a GC does not mean
you have to check you're brain at the door.
> think a GC is convenient, and I also feel the same way about certain
> library
> functions. However, there are times when you do want to "re-invent"
> something. For instance, I am okay with using various manual memory
> management techniques to help relieve the pressure on a GC.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/17/2010 5:06:33 AM
|
|
On Feb 17, 1:06=A0pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "spinoza1111" <spinoza1...@yahoo.com> wrote in message
>
> news:6f22a1d8-6b97-4821-b990-ad69482e24c7@t17g2000prg.googlegroups.com...
> [...]
>
> > > > Moral: don't let the library do your thinking for you.
>
> > > How do you feel about a garbage collector doing all the thinking for
> > > you? I
> > Fine, since garbage collection is simpler than software design. We
> > have the right to think of software entities coming into existence and
> > dying without having to be midwifes or funeral directors.
>
> What about forgetting to set a reference to NULL? Sometimes, you can
> unnecessarily extend the lifetimes of objects in a pure GC system if you
> forget to set certain object references to NULL. IMHO, a GC does not mean
> you have to check you're brain at the door.
>
>
>
> > think a GC is convenient, and I also feel the same way about certain
> > library
> > functions. However, there are times when you do want to "re-invent"
> > something. For instance, I am okay with using various manual memory
> > management techniques to help relieve the pressure on a GC.
True. In my book "Build Your Own .Net Language and Compiler" I have a
methodology for stateless objects which requires the user to dispose
the object calling a dispose() method. This allows the object to set
all its references to objects from the heap to null.
Precisely because you still need a brain when you have a garbage
collector means that you need the garbage collector since there's no
reason to waste fine brains on manual memory management when those
brains could be solving more important problems.
It is true, however, that you might run out of Fun Stuff to Think
About just as airline pilots in modern high tech cockpits traveling
across the Pacific might get bored. Too bad. I need the pilot to be
bored. I don't want him to have fun or be challenged.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 6:46:56 AM
|
|
spinoza1111 wrote:
>
> C without strings could be a sensible language for low-level
> programming of toys. C with strings needs to use a standardized, open
> source modern string.H that represents strings as linked lists.
I think that you could write c program hat will beat current c++ program
in this benchmark.
All in all it seems that g++ is 6 times faster than gcc and also
java server is slightly faster than gcc according to this benchmark.
I guess they don't test different algorithms, rather same algorihtm
in different language implementations, but I could be wrong.
According to this C is slower than java and both consumes more memory
than c++ and is 6 times slower for same algorithm?
http://shootout.alioth.debian.org/u64/performance.php?test=knucleotide&sort=fullcpu
× Program Source Code CPU secs Elapsed secs Memory KB Code B
≈ CPU Load
1.0 C++ GNU g++ #6 11.22 11.23 142,288 3415 0% 0% 0% 100%
2.0 C++ GNU g++ 22.30 22.30 135,788 2106 0% 0% 0% 100%
3.1 Ada 2005 GNAT #2 34.30 34.36 256,660 4865 0% 0% 0% 100%
4.2 Java 6 -server #2 47.33 47.41 490,660 1602 0% 0% 0% 100%
5.0 Java 6 -server 56.15 56.29 1,295,096 1330 0% 0% 0% 100%
5.0 C GNU gcc #6 56.21 56.25 180,540 2439 0% 0% 0% 100%
For example on my machine, classic c++ program that I would write
lasts:
real 0m58.659s
user 0m58.390s
sys 0m0.270s
I need 10 seconds for getline from std::cin into string than pack
into array of chars, let alone rest of processing.
but from this site benchmarking c++ program takes this time on my
machine:
real 0m4.306s
user 0m7.880s
sys 0m0.120s
Whoa!
Greets
|
|
0
|
|
|
|
Reply
|
bmaxa209 (243)
|
2/17/2010 6:55:53 AM
|
|
spinoza1111 wrote:
> On Feb 17, 6:35 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
> wrote:
<snip>
>> Do not feed the troll.
>
<snip>
> Richard Heathfield is by no means a friend of mine, yet he has gone on
> record to say that I am not a "troll".
That is true. I have also gone on record as saying that you're an idiot.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/17/2010 9:29:50 AM
|
|
On Feb 17, 5:29=A0pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> spinoza1111wrote:
> > On Feb 17, 6:35 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
> > wrote:
> <snip>
> >> Do not feed the troll.
>
> <snip>
>
> > Richard Heathfield is by no means a friend of mine, yet he has gone on
> > record to say that I am not a "troll".
>
> That is true. I have also gone on record as saying that you're an idiot.
That is correct. And I have gone on record as saying you're a fool.
So, we agree that I am not a troll.
>
> --
> Richard Heathfield <http://www.cpax.org.uk>
> Email: -http://www. +rjh@
> "Usenet is a strange place" - dmr 29 July 1999
> Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 9:59:32 AM
|
|
Ben Bacarisse wrote:
> fedora <no_mail@invalid.invalid> writes:
>
>> Have finished my program for spinoza's challenge. rewrote everything and
>> this time i made each statement as simple as posible, so that i can
>> understand the program. The allSubstr procedure can search for over
>> lapping sub-string too like spinoza wanted, but the replace routine
>> doesnt use that since i cant think how to replace over lapping ones!
>>
>> haven't used any function from string.h! it works for strings i could
>> think of but maybe it got bugs since i'm just a beginner...
>
> The result is good, but I am not sure you were right to accept the
> peculiar notion of not using standard string functions. If you felt
> you had to, why not use standard functions but then plug-in you own
> versions? That way you learn about the standard library and get to
> write the character-fiddling functions that can be useful learning
> exercises.
Thanks Ben! I didn't use stdlib functions because i wanted to how
easy/difficult it would be to write my own. also spinoza didn't accept
program that called function in string.h.
About plugging in my own versions, do you mean writing routines with the
same name so that at linking the linker finds my lib first and plug it in?
but i read somewhere that using ansi c's namespace and relying on linker is
all undefined and gcc can replace calls to stdlib functions with inline code
so bypassing my code too... but i'll try it.
> I'll make a few detailed comments (one is a bug), but on the "big
> picture" I don't see why you mix size_t and unsigned all over the
> place. I'd stick to size_t.
ok.
> Finally, I think it is odd to return a null result when the substring
> is too long to match. I'd treat is like any other substring that does
> not match.
and just return copy of original target str? Okay, i'll change code to do
that but right now i'm having bigger problems.
> <snip>
>> #include <stdlib.h>
>> #include <stdio.h>
>> #include <assert.h>
>>
>> size_t strLength(char *cstr) {
>> size_t index = 0;
>>
>> while (cstr[index] != '\0') ++index;
>> return index;
>> }
>>
>> char *strFirstCh(char *str, char ch, size_t lstr) {
>> char *chpos = 0;
>> size_t current;
>>
>> for (current = 0; current < lstr; current++) {
>> if (str[current] == ch) {
>> chpos = str + current;
>> break;
>> }
>> }
>> return chpos;
>> }
>>
>> int strComp(char *s, char *t, size_t len) {
>> int ret = 0;
>> size_t index;
>>
>> for (index = 0; index < len; index++) {
>> if (s[index] != t[index]) {
>> ret = 1;
>> break;
>> }
>> }
>> return ret;
>
> Small point: ret is the same as index != len. Can you see why? Many
> C programmers would just return index != len here. Also, I'd reverse
> the sense of the returned value and call the function strEqual.
okay. that's better than modelling after stdlib strcmp. my reasoning was
since there's only one case where two strings can be identical but many
cases where they can differ i'd use the one bool value for the first case
(false) and return different true values for un-equal cases. but i think
i've thought of it wrongly. boolean flase and true are both unique values
and not like C's definition of true.
>> }
>>
>> char *strSubstr(
>> char *str,
>> char *sub,
>> size_t lstr,
>> size_t lsub) {
>> char *substr = 0;
>> char *anchor = str;
>> size_t remaining_len = (lstr - lsub) + 1;
>>
>> assert(str && sub && lstr && lsub && lstr >= lsub);
>> while (remaining_len > 0 && anchor) {
>> if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
>> if (strComp(anchor, sub, lsub) == 0) {
>> substr = anchor;
>> break;
>> }
>> anchor++;
>> remaining_len--;
>> }
>> }
>> return substr;
>> }
>>
>> unsigned allSubstr(
>> char *str,
>> char *sub,
>> size_t lstr,
>> size_t lsub,
>> char ***ps,
>> int overlap) {
>> unsigned occurs = 0;
>> unsigned ctr;
>> char *orig_str = str;
>> size_t orig_lstr = lstr;
>> size_t step;
>>
>> if (overlap == 1)
>> step = 1;
>> else
>> step = lsub;
>>
>> while (lstr >= lsub) {
>> str = strSubstr(str, sub, lstr, lsub);
>> if (str == 0)
>> break;
>> occurs++;
>> str += step;
>> lstr = (orig_str + orig_lstr) - str;
>> }
>>
>> if (occurs > 0 && ps) {
>> str = orig_str;
>> lstr = orig_lstr;
>> *ps = malloc(occurs * sizeof **ps);
>> if (*ps) {
>> for (ctr = 0; ctr < occurs; ctr++) {
>> ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
>> str += step;
>> lstr = (orig_str + orig_lstr) - str;
>> }
>> }
>> }
>> return occurs;
>> }
>>
>> char *replace(char *str, char *substr, char *rep) {
>> char *new = 0;
>> size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
>> unsigned replacements;
>> char **subpos;
>>
>> assert(str && substr && rep);
>> lstr = strLength(str);
>> lsubstr = strLength(substr);
>> lrep = strLength(rep);
>> if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
>> return 0;
>> replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
>> if (replacements > 0) {
>> lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
>> new = malloc(lnew + 1);
>> if (!new)
>> return 0;
>> strc = newc = replaced = 0;
>> while (strc <= lstr) {
>> if (str + strc == subpos[replaced]) {
>
> You have a bug here. replaced can become equal to the size of the
> subpos array and, hence, you index outside of it.
Thanks for spotting! i'd never have seen that. I replaced that line by
if (replaced < replacements && str + strc == subpos[replaced]) {
i made no other changes and compiled and ran... but strange errors are
occurring. below i'm pasting session from gdb... i really appreciate if
anyone can tell me why it's happening. what's irritating is there is no
patter for the seg faults. sometimes it happens, sometimes not.
compiled with gcc -Wall -Wextra -std=c99 -pedantic -o replace replace.c -
ggdb3
# gdb ./replace
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) run "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
Starting program: /home/fedora/c/replace
"sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400648 in strFirstCh (str=0x7fff1da36feb "rc/c/replace",
ch=115 's', lstr=18446744073709551546) at replace.c:17
17 if (str[current] == ch) {
how can str be "rc/c/replace"?? It should be some part of the "sdak..."
string as i see...
ch is with right value. but lstr is wrong. how did it get that value? i cant
see any thing in my code that is wrong but i'm not intel.
big thanks to any one who can say why str is "rc/c/replace" and lstr is
wrong...
am thinking if string programing in C is naturally so difficult or i'm just
stupid:(
>> for (repc = 0; repc < lrep; repc++) {
>> new[newc] = rep[repc];
>> newc++;
>> }
>> replaced++;
>> strc += lsubstr;
>> }
>> else {
>> new[newc] = str[strc];
>> strc++;
>> newc++;
>> }
>> }
>> free(subpos);
>> }
>> else {
>> new = malloc(lstr + 1);
>> if (!new)
>> return 0;
>> for (strc = 0; strc <= lstr; strc++)
>> new[strc] = str[strc];
>> }
>> return new;
>> }
>>
>> int main(int argc, char **argv) {
>> char *newstr;
>>
>> assert(argc == 4);
>
> I don't think this is a good use of assert. It is almost always
> wrong to use it to check user input. I'd just use an "if".
>
>> newstr = replace(argv[1], argv[2], argv[3]);
>> if (newstr)
>> printf("%s\n", newstr);
>> else
>> printf("replace() -> null\n");
>> free(newstr);
>> return 0;
>> }
>
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/17/2010 10:17:49 AM
|
|
Richard Heathfield <rjh@see.sig.invalid> writes:
> spinoza1111 wrote:
>> On Feb 17, 6:35 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
>> wrote:
> <snip>
>>> Do not feed the troll.
>>
> <snip>
>
>> Richard Heathfield is by no means a friend of mine, yet he has gone on
>> record to say that I am not a "troll".
>
> That is true. I have also gone on record as saying that you're an idiot.
He does it for the strokes. His motives may be different, and the
strokes he seeks may be different (though not vastly different from
some of the sci.math cranks I've encountered), but he's still doing
it to elicit responses. Troll, crank, idiot - you can tick all three
with him.
And Chris' post did nothing but invite him to spew more inane crap
onto c.l.c. Why would anyone want that?
Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1
|
|
0
|
|
|
|
Reply
|
thefatphil_demunged (1562)
|
2/17/2010 10:20:08 AM
|
|
spinoza1111 wrote:
> On Feb 17, 5:29 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
>> spinoza1111wrote:
>>> On Feb 17, 6:35 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
>>> wrote:
>> <snip>
>>>> Do not feed the troll.
>> <snip>
>>
>>> Richard Heathfield is by no means a friend of mine, yet he has gone on
>>> record to say that I am not a "troll".
>> That is true. I have also gone on record as saying that you're an idiot.
>
> That is correct. And I have gone on record as saying you're a fool.
> So, we agree that I am not a troll.
As usual, my point has gone way over your head.
You cite me as someone who says you are not a troll, in support of an
argument that you are not a troll. In so doing, you are appealing to
people's trust in my good judgement, whether or not you realise it.
For those people who do trust my good judgement, the argument is a
powerful one, but the argument that you are an idiot is equally
powerful. And for those people who do not trust my good judgement, the
argument carries no conviction.
As for your opinion of me, I ascribe it no value whatsoever, so it has
no bearing on this discussion.
ObTopic: finding substrings is easy with strstr(). If you need to find a
substring, use strstr() until and unless profiling demonstrates that
it's a significant bottleneck.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/17/2010 10:30:17 AM
|
|
Hi all!
Here's more session output from gdb... i cant figure out why it crashes for
random strings but not for others...
(gdb) run "askdjfhakdfhakdfhasdjkhaksdfhaklsdjfhlakfhlakfhaklfh" "ask" "+"
Starting program: /home/fedora/c/replace
"askdjfhakdfhakdfhasdjkhaksdfhaklsdjfhlakfhlakfhaklfh" "ask" "+"
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400648 in strFirstCh (str=0x7fffe11e6ff5 "ce", ch=97 'a',
lstr=18446744073709551559) at replace.c:17
17 if (str[current] == ch) {
(gdb) run
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaskadhfklajsdhfkajfhklajdfhjkafhkdf"
"a" "+"
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/fedora/c/replace
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaskadhfklajsdhfkajfhklajdfhjkafhkdf"
"a" "+"
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++sk+dhfkl+jsdhfk+jfhkl+jdfhjk+fhkdf
Program exited normally.
(gdb)
i can see that str and lstr are getting corrupt but i cant see where in my
code that is happening... and putting lots of printfs into is really
discouraging.
thanks.
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/17/2010 11:19:54 AM
|
|
Hi all!
now I added two printf calls to strSubstr and allSubstr to trace where str
and lstr are getting wrong values.
I also added one if statement to check if adding step to str would result in
it going beyond str+lstr. that is :-
if (((orig_str + orig_lstr) - str) <= step) break;
below is full code and run in gdb with still seg fault. suddenly lstr is
getting i can see, but which statement is doing that i can't see:(
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
size_t strLength(char *cstr) {
size_t index = 0;
while (cstr[index] != '\0') ++index;
return index;
}
char *strFirstCh(char *str, char ch, size_t lstr) {
char *chpos = 0;
size_t current;
for (current = 0; current < lstr; current++) {
if (str[current] == ch) {
chpos = str + current;
break;
}
}
return chpos;
}
int strComp(char *s, char *t, size_t len) {
int ret = 0;
size_t index;
for (index = 0; index < len; index++) {
if (s[index] != t[index]) {
ret = 1;
break;
}
}
return ret;
}
char *strSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub) {
char *substr = 0;
char *anchor = str;
size_t remaining_len = (lstr - lsub) + 1;
assert(str && sub && lstr && lsub && lstr >= lsub);
printf("in strSubstr: str = %p\tlstr = %zu\n", (void*)str, lstr);
while (remaining_len > 0 && anchor) {
if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
if (strComp(anchor, sub, lsub) == 0) {
substr = anchor;
break;
}
anchor++;
remaining_len--;
}
}
return substr;
}
unsigned allSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub,
char ***ps,
int overlap) {
unsigned occurs = 0;
unsigned ctr;
char *orig_str = str;
size_t orig_lstr = lstr;
size_t step;
if (overlap == 1)
step = 1;
else
step = lsub;
while (lstr >= lsub) {
str = strSubstr(str, sub, lstr, lsub);
if (str == 0)
break;
occurs++;
if (((orig_str + orig_lstr) - str) <= step) break;
str += step;
lstr = (orig_str + orig_lstr) - str;
printf("in allSubstr: str = %p\tlstr = %zu\n", (void*)str, lstr);
}
if (occurs > 0 && ps) {
str = orig_str;
lstr = orig_lstr;
*ps = malloc(occurs * sizeof **ps);
if (*ps) {
for (ctr = 0; ctr < occurs; ctr++) {
ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
str += step;
lstr = (orig_str + orig_lstr) - str;
}
}
}
return occurs;
}
char *replace(char *str, char *substr, char *rep) {
char *new = 0;
size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
unsigned replacements;
char **subpos;
assert(str && substr && rep);
lstr = strLength(str);
lsubstr = strLength(substr);
lrep = strLength(rep);
if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
return 0;
replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
if (replacements > 0) {
lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
new = malloc(lnew + 1);
if (!new) {
free(subpos);
return 0;
}
strc = newc = replaced = 0;
while (strc <= lstr) {
if (replaced < replacements && str + strc == subpos[replaced]) {
for (repc = 0; repc < lrep; repc++) {
new[newc] = rep[repc];
newc++;
}
replaced++;
strc += lsubstr;
}
else {
new[newc] = str[strc];
strc++;
newc++;
}
}
free(subpos);
}
else {
new = malloc(lstr + 1);
if (!new)
return 0;
for (strc = 0; strc <= lstr; strc++)
new[strc] = str[strc];
}
return new;
}
int main(int argc, char **argv) {
char *newstr;
assert(argc == 4);
newstr = replace(argv[1], argv[2], argv[3]);
if (newstr)
printf("%s\n", newstr);
else
printf("replace() -> null\n");
free(newstr);
return 0;
}
# gdb ./replace
(gdb) run "asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "asd" "+"
Starting program: /home/fedora/c/replace
"asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "asd" "+"
in strSubstr: str = 0x7fff646a86e4 lstr = 47
in allSubstr: str = 0x7fff646a86e7 lstr = 44
in strSubstr: str = 0x7fff646a86e7 lstr = 44
in allSubstr: str = 0x7fff646a8717 lstr = 18446744073709551612
in strSubstr: str = 0x7fff646a8717 lstr = 18446744073709551612
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400698 in strFirstCh (str=0x7fff646a8ff5 "ce", ch=97 'a',
lstr=18446744073709551559) at replace.c:17
17 if (str[current] == ch) {
(gdb) run "asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "as" "+"
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/fedora/c/replace
"asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "as" "+"
in strSubstr: str = 0x7fff27e376e5 lstr = 47
in allSubstr: str = 0x7fff27e376e7 lstr = 45
in strSubstr: str = 0x7fff27e376e7 lstr = 45
in allSubstr: str = 0x7fff27e3770f lstr = 5
in strSubstr: str = 0x7fff27e3770f lstr = 5
in strSubstr: str = 0x7fff27e376e5 lstr = 47
in strSubstr: str = 0x7fff27e376e7 lstr = 45
+dkhfaklsdjfhakldfhaklsdjfhakldfhakldfh+kldfh
Program exited normally.
as its shown using "as" for sub-string instead of "asd" makes it work ok.
somewhere some length calculation is getting wrapped around but am not able
to pin point...
thanks all.
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/17/2010 11:58:54 AM
|
|
--------------FF9EF91A8BBF938981CDB6FF
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Branimir Maksimovic wrote:
> All in all it seems that g++ is 6 times faster than gcc and also
> java server is slightly faster than gcc according to this benchmark.
> I guess they don't test different algorithms, rather same algorihtm
> in different language implementations, but I could be wrong.
> According to this C is slower than java and both consumes more memory
> than c++ and is 6 times slower for same algorithm?
>
> http://shootout.alioth.debian.org/u64/performance.php?test=knucleotide&sort=fullcpu
>
> × Program Source Code CPU secs Elapsed secs Memory KB Code B
> ≈ CPU Load
> 1.0 C++ GNU g++ #6 11.22 11.23 142,288 3415 0% 0% 0% 100%
> 2.0 C++ GNU g++ 22.30 22.30 135,788 2106 0% 0% 0% 100%
> 3.1 Ada 2005 GNAT #2 34.30 34.36 256,660 4865 0% 0% 0% 100%
> 4.2 Java 6 -server #2 47.33 47.41 490,660 1602 0% 0% 0% 100%
> 5.0 Java 6 -server 56.15 56.29 1,295,096 1330 0% 0% 0% 100%
> 5.0 C GNU gcc #6 56.21 56.25 180,540 2439 0% 0% 0% 100%
> According to this C is slower than java and both consumes more memory
> than c++ and is 6 times slower for same algorithm?
The stats that are quoted are for GNU. At the risk of causing a flame war
this far from a state of the art compiler.
The shootout benchmarks have done a lot to highlight the language
component to code generation.
Regards
Walter..
--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com
--------------FF9EF91A8BBF938981CDB6FF
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<p>Branimir Maksimovic wrote:
<blockquote TYPE=CITE>All in all it seems that g++ is 6 times faster than
gcc and also
<br>java server is slightly faster than gcc according to this benchmark.
<br>I guess they don't test different algorithms, rather same algorihtm
<br>in different language implementations, but I could be wrong.
<br>According to this C is slower than java and both consumes more memory
<br>than c++ and is 6 times slower for same algorithm?
<p><a href="http://shootout.alioth.debian.org/u64/performance.php?test=knucleotide&sort=fullcpu">http://shootout.alioth.debian.org/u64/performance.php?test=knucleotide&sort=fullcpu</a>
<p> �
Program Source Code CPU secs
Elapsed secs Memory KB
Code B
<br>� CPU Load
<br>1.0 C++ GNU g++ #6 11.22
11.23 142,288 3415 0% 0% 0% 100%
<br>2.0 C++ GNU g++ 22.30
22.30 135,788 2106 0% 0% 0% 100%
<br>3.1 Ada 2005 GNAT #2
34.30 34.36 256,660 4865
0% 0% 0% 100%
<br>4.2 Java 6 -server #2
47.33 47.41 490,660 1602
0% 0% 0% 100%
<br>5.0 Java 6 -server 56.15
56.29 1,295,096 1330
0% 0% 0% 100%
<br>5.0 C GNU gcc #6 56.21
56.25 180,540 2439 0% 0% 0% 100%</blockquote>
> According to this C is slower than java and both consumes more memory
<br>> than c++ and is 6 times slower for same algorithm?
<br>
<p>The stats that are quoted are for GNU. At the risk of causing a flame
war
<br>this far from a state of the art compiler.
<p>The shootout benchmarks have done a lot to highlight the language
<br>component to code generation.
<p>Regards
<p>Walter..
<br>--
<br>Walter Banks
<br>Byte Craft Limited
<br><A HREF="http://www.bytecraft.com">http://www.bytecraft.com</A>
<br>
<br>
<br>
<br>
<br> </html>
--------------FF9EF91A8BBF938981CDB6FF--
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/17/2010 12:25:34 PM
|
|
fedora <no_mail@invalid.invalid> writes:
> Ben Bacarisse wrote:
>
>> fedora <no_mail@invalid.invalid> writes:
>>
>>> Have finished my program for spinoza's challenge. rewrote everything and
>>> this time i made each statement as simple as posible, so that i can
>>> understand the program. The allSubstr procedure can search for over
>>> lapping sub-string too like spinoza wanted, but the replace routine
>>> doesnt use that since i cant think how to replace over lapping ones!
>>>
>>> haven't used any function from string.h! it works for strings i could
>>> think of but maybe it got bugs since i'm just a beginner...
>>
>> The result is good, but I am not sure you were right to accept the
>> peculiar notion of not using standard string functions. If you felt
>> you had to, why not use standard functions but then plug-in you own
>> versions? That way you learn about the standard library and get to
>> write the character-fiddling functions that can be useful learning
>> exercises.
>
> Thanks Ben! I didn't use stdlib functions because i wanted to how
> easy/difficult it would be to write my own. also spinoza didn't accept
> program that called function in string.h.
Why do you want to do what spinoza1111 says? I am curious about how
you decided it was something you wanted to do.
His initial programs all used string.h and some versions erroneously
called strlen without including string.h. I don't know why he decided
to change his mind, but a skilled C programmer would use the standard
library and only do something more complex if there was a compelling
reason.
> About plugging in my own versions, do you mean writing routines with the
> same name so that at linking the linker finds my lib first and plug it in?
> but i read somewhere that using ansi c's namespace and relying on linker is
> all undefined and gcc can replace calls to stdlib functions with inline code
> so bypassing my code too... but i'll try it.
You do it like this:
#define STRLEN my_strelen
and you write STRLEN, STRSTR etc in your code.
<snip>
>>> char *strSubstr(
>>> char *str,
>>> char *sub,
>>> size_t lstr,
>>> size_t lsub) {
>>> char *substr = 0;
>>> char *anchor = str;
>>> size_t remaining_len = (lstr - lsub) + 1;
Your problems below come from this + 1, I think. It looks wrong.
Sorry I did not spot this first time round.
>>> assert(str && sub && lstr && lsub && lstr >= lsub);
>>> while (remaining_len > 0 && anchor) {
>>> if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
>>> if (strComp(anchor, sub, lsub) == 0) {
>>> substr = anchor;
>>> break;
>>> }
>>> anchor++;
>>> remaining_len--;
>>> }
>>> }
>>> return substr;
>>> }
<snip bug fix>
> i made no other changes and compiled and ran... but strange errors are
> occurring. below i'm pasting session from gdb... i really appreciate if
> anyone can tell me why it's happening. what's irritating is there is no
> patter for the seg faults. sometimes it happens, sometimes not.
If you run off an array, all kinds of strange things can happen. It
is often not worthwhile trying to work out exactly why (at least I've
stopped trying -- I just fix the problem).
> compiled with gcc -Wall -Wextra -std=c99 -pedantic -o replace replace.c -
> ggdb3
>
> # gdb ./replace
> GNU gdb 6.8-debian
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu"...
>
> (gdb) run "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
> Starting program: /home/fedora/c/replace
> "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000400648 in strFirstCh (str=0x7fff1da36feb "rc/c/replace",
> ch=115 's', lstr=18446744073709551546) at replace.c:17
> 17 if (str[current] == ch) {
>
> how can str be "rc/c/replace"?? It should be some part of the "sdak..."
> string as i see...
> ch is with right value. but lstr is wrong. how did it get that value? i cant
> see any thing in my code that is wrong but i'm not intel.
>
> big thanks to any one who can say why str is "rc/c/replace" and lstr is
> wrong...
See above.
> am thinking if string programing in C is naturally so difficult or i'm just
> stupid:(
No, it is hard to get the details right but you are not helping
yourself by not breaking your program up into helpful simple
functions. You have some nice functions to help find the strings, but
you stopped there. I'd have some functions to help build the copy
with the replacements.
*Actually*, I'd use (and did use) memcpy and strcpy, but if I had
decided to drink the "no string.h" cool aid, I'd write these myself.
Neither is more than a line or two and they simplify the replace
function a lot.
For my own amusement, I've studied what slows up this replace function
and I have written reasonably fast version that avoids strstr because,
as I've descried elsewhere, it forces the program to re-scan strings
unnecessarily. For very long strings, strstr still wins because of the
sophisticated algorithm that glibc's version uses, but for anything
else it is slightly faster to scan for all the sub-string positions
"by hand". Of course, it still uses the other string functions.
<snip>
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/17/2010 12:31:02 PM
|
|
"fedora" <no_mail@invalid.invalid> wrote in message
news:hlggj2$t3m$1@news.eternal-september.org...
> am thinking if string programing in C is naturally so difficult or i'm
> just
> stupid:(
It's just very fiddly.
--
Bartc
|
|
0
|
|
|
|
Reply
|
bartc (783)
|
2/17/2010 12:46:48 PM
|
|
On Feb 17, 6:20=A0pm, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> Richard Heathfield <r...@see.sig.invalid> writes:
> >spinoza1111wrote:
> >> On Feb 17, 6:35 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
> >> wrote:
> > <snip>
> >>> Do not feed the troll.
>
> > <snip>
>
> >> Richard Heathfield is by no means a friend of mine, yet he has gone on
> >> record to say that I am not a "troll".
>
> > That is true. I have also gone on record as saying that you're an idiot=
..
>
> He does it for the strokes. His motives may be different, and the
> strokes he seeks may be different (though not vastly different from
> some of the sci.math cranks I've encountered), but he's still doing
> it to elicit responses. Troll, crank, idiot - you can tick all three
> with him.
All of which may be true. The problem is that in every discussion I'm
in, I am (like Shakespeare's Falstaff) not only Witty (IMO) in myself
but the cause that is of Witte that is in others. It's always a
refreshing change from the tedious efforts in this and other
newsgroups to recreate the dull spirit of the nastiest type of
business office, which is anhedonia gone insane. I somehow manage to
drive discussions that are always above the usual level.
Oh yes, and this "troll, crank, and idiot" here was the first to post
a solution to the problem, the first to debug it, and is the only one
to have posted anything like a correct solution other than Willem and
(as far as I can tell) io_x. The Regular Guys have been posting
idiotic nonsolutions with far more bugs, all of which use string.h.
It's far too late for the Chomsky Type 3 Guys to post anything, since
they'd probably plagiarize Willem or myself.
But if nonconforming to normalized deviance and Eunuch programming is
to be a troll, a crank and an idiot in your book, hey, so I am.
>
> And Chris' post did nothing but invite him to spew more inane crap
> onto c.l.c. Why would anyone want that?
>
> Phil
> --
> Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 1:04:11 PM
|
|
On Feb 17, 6:30=A0pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> spinoza1111wrote:
> > On Feb 17, 5:29 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> >> spinoza1111wrote:
> >>> On Feb 17, 6:35 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
> >>> wrote:
> >> <snip>
> >>>> Do not feed the troll.
> >> <snip>
>
> >>> Richard Heathfield is by no means a friend of mine, yet he has gone o=
n
> >>> record to say that I am not a "troll".
> >> That is true. I have also gone on record as saying that you're an idio=
t.
>
> > That is correct. And I have gone on record as saying you're a fool.
> > So, we agree that I am not a troll.
>
> As usual, my point has gone way over your head.
>
> You cite me as someone who says you are not a troll, in support of an
> argument that you are not a troll. In so doing, you are appealing to
> people's trust in my good judgement, whether or not you realise it.
>
> For those people who do trust my good judgement, the argument is a
> powerful one, but the argument that you are an idiot is equally
> powerful. And for those people who do not trust my good judgement, the
> argument carries no conviction.
>
> As for your opinion of me, I ascribe it no value whatsoever, so it has
> no bearing on this discussion.
>
> ObTopic: finding substrings is easy with strstr(). If you need to find a
> substring, use strstr() until and unless profiling demonstrates that
> it's a significant bottleneck.
THOU SHALT NOT says Richard NOT USE STRING.H, even for shits and
giggles.
Because thou are virtuous, shall there be no more cakes and ale?
>
> --
> Richard Heathfield <http://www.cpax.org.uk>
> Email: -http://www. +rjh@
> "Usenet is a strange place" - dmr 29 July 1999
> Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 1:05:14 PM
|
|
In article <hlglm1$r14$1@news.eternal-september.org>,
fedora <no_mail@invalid.invalid> wrote:
>
>I also added one if statement to check if adding step to str would result in
>it going beyond str+lstr. that is :-
>
>if (((orig_str + orig_lstr) - str) <= step) break;
Not sure if this is your problem, but you should be aware that
you're doing a comparison between a signed number (the difference
between the pointers (orig_str+orig_lstr) and str), and an unsigned
number (step).
This will give unexpected results if the lefthandside of the comparison
is a small negative value; the value will be converted to unsigned
before it is compared to the righthandside, and becomes a huge positive
value. As a result, the comparision yields false.
You can work around the problem by rewriting the comparison as
(orig_str + orig_lstr <= str + step)
where both sides of the comparision are pointers.
|
|
0
|
|
|
|
Reply
|
ike5 (222)
|
2/17/2010 1:25:51 PM
|
|
Ike Naar wrote:
> In article <hlglm1$r14$1@news.eternal-september.org>,
> fedora <no_mail@invalid.invalid> wrote:
>>
>>I also added one if statement to check if adding step to str would result
>>in it going beyond str+lstr. that is :-
>>
>>if (((orig_str + orig_lstr) - str) <= step) break;
>
> Not sure if this is your problem, but you should be aware that
> you're doing a comparison between a signed number (the difference
> between the pointers (orig_str+orig_lstr) and str), and an unsigned
> number (step).
> This will give unexpected results if the lefthandside of the comparison
> is a small negative value; the value will be converted to unsigned
> before it is compared to the righthandside, and becomes a huge positive
> value. As a result, the comparision yields false.
>
> You can work around the problem by rewriting the comparison as
>
> (orig_str + orig_lstr <= str + step)
>
> where both sides of the comparision are pointers.
Hi Ike!
Gcc warned me of this too, but i ignored it since i couldn't think of a
better way.
Initially i thought of writing it exactly like yours but then when str is <
step bytes before it's end, adding step to it would be undefined behaviour
so i coded like i did. Maybe i can cast step to signed long and then compare
with (orig_str+orig_lstr) - str? is this ok?
thanks
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/17/2010 1:30:24 PM
|
|
Ben Bacarisse wrote:
> fedora <no_mail@invalid.invalid> writes:
>
>> Ben Bacarisse wrote:
>>
>>> fedora <no_mail@invalid.invalid> writes:
>>>
>>>> Have finished my program for spinoza's challenge. rewrote everything
>>>> and this time i made each statement as simple as posible, so that i can
>>>> understand the program. The allSubstr procedure can search for over
>>>> lapping sub-string too like spinoza wanted, but the replace routine
>>>> doesnt use that since i cant think how to replace over lapping ones!
>>>>
>>>> haven't used any function from string.h! it works for strings i could
>>>> think of but maybe it got bugs since i'm just a beginner...
>>>
>>> The result is good, but I am not sure you were right to accept the
>>> peculiar notion of not using standard string functions. If you felt
>>> you had to, why not use standard functions but then plug-in you own
>>> versions? That way you learn about the standard library and get to
>>> write the character-fiddling functions that can be useful learning
>>> exercises.
>>
>> Thanks Ben! I didn't use stdlib functions because i wanted to how
>> easy/difficult it would be to write my own. also spinoza didn't accept
>> program that called function in string.h.
>
> Why do you want to do what spinoza1111 says? I am curious about how
> you decided it was something you wanted to do.
I wanted to write my own functions partly for learning as i'm just beginning
and also because if i used string.h, i couldn't have submitted to spinoza's
challenge...
but motly it was just for learning. as we see, i was correct since i'm
having so much difficulty.
> His initial programs all used string.h and some versions erroneously
> called strlen without including string.h. I don't know why he decided
> to change his mind, but a skilled C programmer would use the standard
> library and only do something more complex if there was a compelling
> reason.
i'm not a skilled programmer:)
>> About plugging in my own versions, do you mean writing routines with the
>> same name so that at linking the linker finds my lib first and plug it
>> in? but i read somewhere that using ansi c's namespace and relying on
>> linker is all undefined and gcc can replace calls to stdlib functions
>> with inline code so bypassing my code too... but i'll try it.
>
> You do it like this:
>
> #define STRLEN my_strelen
>
> and you write STRLEN, STRSTR etc in your code.
Okay i see. Thanks.
> <snip>
>>>> char *strSubstr(
>>>> char *str,
>>>> char *sub,
>>>> size_t lstr,
>>>> size_t lsub) {
>>>> char *substr = 0;
>>>> char *anchor = str;
>>>> size_t remaining_len = (lstr - lsub) + 1;
>
> Your problems below come from this + 1, I think. It looks wrong.
> Sorry I did not spot this first time round.
I added the one because when both string and sub string are equal length the
diff will give zero, so to make the while loop below work properly, i add
one.
i cant compare remaining_len >= 0 since that'll always be true for unsigned.
>>>> assert(str && sub && lstr && lsub && lstr >= lsub);
>>>> while (remaining_len > 0 && anchor) {
>>>> if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
>>>> if (strComp(anchor, sub, lsub) == 0) {
>>>> substr = anchor;
>>>> break;
>>>> }
>>>> anchor++;
>>>> remaining_len--;
>>>> }
>>>> }
>>>> return substr;
>>>> }
>
> <snip bug fix>
>> i made no other changes and compiled and ran... but strange errors are
>> occurring. below i'm pasting session from gdb... i really appreciate if
>> anyone can tell me why it's happening. what's irritating is there is no
>> patter for the seg faults. sometimes it happens, sometimes not.
>
> If you run off an array, all kinds of strange things can happen. It
> is often not worthwhile trying to work out exactly why (at least I've
> stopped trying -- I just fix the problem).
i hate it when i cant get a mental image of how a function will behave for
all combinations of legal ainput values. boundary values complicate
everything and unsigned seems to be more trouble than i thought.
>> compiled with gcc -Wall -Wextra -std=c99 -pedantic -o replace replace.c -
>> ggdb3
>>
>> # gdb ./replace
>> GNU gdb 6.8-debian
>> Copyright (C) 2008 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law. Type "show
>> copying" and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu"...
>>
>> (gdb) run "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda"
>> "+" Starting program: /home/fedora/c/replace
>> "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x0000000000400648 in strFirstCh (str=0x7fff1da36feb "rc/c/replace",
>> ch=115 's', lstr=18446744073709551546) at replace.c:17
>> 17 if (str[current] == ch) {
>>
>> how can str be "rc/c/replace"?? It should be some part of the "sdak..."
>> string as i see...
>> ch is with right value. but lstr is wrong. how did it get that value? i
>> cant see any thing in my code that is wrong but i'm not intel.
>>
>> big thanks to any one who can say why str is "rc/c/replace" and lstr is
>> wrong...
>
> See above.
>
>> am thinking if string programing in C is naturally so difficult or i'm
>> just stupid:(
>
> No, it is hard to get the details right but you are not helping
> yourself by not breaking your program up into helpful simple
> functions. You have some nice functions to help find the strings, but
> you stopped there. I'd have some functions to help build the copy
> with the replacements.
Yes i'm not happy with the replace routine. It's too big. i'll code versions
of strcpy and make it shorter and try again. maybe the bug is in replace...
> *Actually*, I'd use (and did use) memcpy and strcpy, but if I had
> decided to drink the "no string.h" cool aid, I'd write these myself.
> Neither is more than a line or two and they simplify the replace
> function a lot.
>
> For my own amusement, I've studied what slows up this replace function
> and I have written reasonably fast version that avoids strstr because,
> as I've descried elsewhere, it forces the program to re-scan strings
> unnecessarily.
by this replace function do you mean mine above?
> For very long strings, strstr still wins because of the
> sophisticated algorithm that glibc's version uses, but for anything
> else it is slightly faster to scan for all the sub-string positions
> "by hand". Of course, it still uses the other string functions.
>
> <snip>
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/17/2010 1:39:44 PM
|
|
In article <hlgr1g$qp1$1@news.eternal-september.org>,
fedora <no_mail@invalid.invalid> wrote:
>Ike Naar wrote:
>
>> In article <hlglm1$r14$1@news.eternal-september.org>,
>> fedora <no_mail@invalid.invalid> wrote:
>>>
>>>if (((orig_str + orig_lstr) - str) <= step) break;
>>
>> [snip]
>>
>> (orig_str + orig_lstr <= str + step)
>
> [snip]
>
>Initially i thought of writing it exactly like yours but then when str is <
>step bytes before it's end, adding step to it would be undefined behaviour
>so i coded like i did.
Whoops you're right I did not think about that.
> Maybe i can cast step to signed long and then compare
>with (orig_str+orig_lstr) - str? is this ok?
That is a possibility. If your compiler supports prtdiff_t then perhaps
it's better to cast step to that type, so that the types on both sides
of the comparison operator are the same.
What about (orig_str + orig_lstr - step <= str) ?
|
|
0
|
|
|
|
Reply
|
ike5 (222)
|
2/17/2010 1:53:28 PM
|
|
In article <250f529b-a204-4d8a-b670-2b555ea6a1a8@w27g2000pre.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
....
>But if nonconforming to normalized deviance and Eunuch programming is
>to be a troll, a crank and an idiot in your book, hey, so I am.
Now you've gone and done it! From here on in, they will gleefully refer
to you as a "self-confessed troll".
It happened to me about 5 years ago - and they still use it in their
diatribes. The fact, of course, being that they define things so that
anyone with even a glimmer of intelligence becomes defined as "troll".
|
|
0
|
|
|
|
Reply
|
gazelle3 (1609)
|
2/17/2010 2:00:39 PM
|
|
Ike Naar wrote:
> In article <hlgr1g$qp1$1@news.eternal-september.org>,
> fedora <no_mail@invalid.invalid> wrote:
>>Ike Naar wrote:
>>
>>> In article <hlglm1$r14$1@news.eternal-september.org>,
>>> fedora <no_mail@invalid.invalid> wrote:
>>>>
>>>>if (((orig_str + orig_lstr) - str) <= step) break;
>>>
>>> [snip]
>>>
>>> (orig_str + orig_lstr <= str + step)
>>
>> [snip]
>>
>>Initially i thought of writing it exactly like yours but then when str is
>>< step bytes before it's end, adding step to it would be undefined
>>behaviour so i coded like i did.
>
> Whoops you're right I did not think about that.
>
>> Maybe i can cast step to signed long and then compare
>>with (orig_str+orig_lstr) - str? is this ok?
>
> That is a possibility. If your compiler supports prtdiff_t then perhaps
> it's better to cast step to that type, so that the types on both sides
> of the comparison operator are the same.
>
> What about (orig_str + orig_lstr - step <= str) ?
This is perfect thanks!! After changing that line to the comparison above,
i've stopped getting the strange seg faults as far as i can see.
so it seems this failed test allowed lstr to wrap around to very big values
and access foreign memory locations. now it works!
thanks again Ike
|
|
0
|
|
|
|
Reply
|
no_mail647 (13)
|
2/17/2010 2:02:58 PM
|
|
On Wed, 17 Feb 2010 14:00:39 +0000 (UTC),
gazelle@shell.xmission.com (Kenny McCormack) wrote:
>In article <250f529b-a204-4d8a-b670-2b555ea6a1a8@w27g2000pre.googlegroups.com>,
>spinoza1111 <spinoza1111@yahoo.com> wrote:
>...
>>But if nonconforming to normalized deviance and Eunuch programming is
>>to be a troll, a crank and an idiot in your book, hey, so I am.
>
>Now you've gone and done it! From here on in, they will gleefully refer
>to you as a "self-confessed troll".
>
>It happened to me about 5 years ago - and they still use it in their
>diatribes. The fact, of course, being that they define things so that
>anyone with even a glimmer of intelligence becomes defined as "troll".
>
Clearly then, you are not a troll.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.
|
|
0
|
|
|
|
Reply
|
cri (1432)
|
2/17/2010 3:29:50 PM
|
|
On Feb 17, 10:00=A0pm, gaze...@shell.xmission.com (Kenny McCormack)
wrote:
> In article <250f529b-a204-4d8a-b670-2b555ea6a...@w27g2000pre.googlegroups=
..com>,spinoza1111=A0<spinoza1...@yahoo.com> wrote:
>
> ...
>
> >But if nonconforming to normalized deviance and Eunuch programming is
> >to be a troll, a crank and an idiot in your book, hey, so I am.
>
> Now you've gone and done it! =A0From here on in, they will gleefully refe=
r
> to you as a "self-confessed troll".
>
> It happened to me about 5 years ago - and they still use it in their
> diatribes. =A0The fact, of course, being that they define things so that
> anyone with even a glimmer of intelligence becomes defined as "troll".
There's no point, Kenny, in evading their stupid, childish labels. But
I predict that if I continue posting great code and incisive writing,
things will change here.
Hasta la victoria siempre!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 3:54:54 PM
|
|
On Feb 17, 11:29=A0pm, c...@tiac.net (Richard Harter) wrote:
> On Wed, 17 Feb 2010 14:00:39 +0000 (UTC),
>
> gaze...@shell.xmission.com (Kenny McCormack) wrote:
> >In article <250f529b-a204-4d8a-b670-2b555ea6a...@w27g2000pre.googlegroup=
s.com>,
> >spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> >...
> >>But if nonconforming to normalized deviance and Eunuch programming is
> >>to be a troll, a crank and an idiot in your book, hey, so I am.
>
> >Now you've gone and done it! =A0From here on in, they will gleefully ref=
er
> >to you as a "self-confessed troll".
>
> >It happened to me about 5 years ago - and they still use it in their
> >diatribes. =A0The fact, of course, being that they define things so that
> >anyone with even a glimmer of intelligence becomes defined as "troll".
>
> Clearly then, you are not a troll.
They think they can define the world,
Their poison they have hurled,
Because they can't stand freedom:
They recreate their corporate kingdom.
They actually do this for fun
Thinking by lies they have won,
But when you look at their code,
You see quite a load,
Of the bugs they condemn in others.
Seebach can't get string length right
He's off by one in plain sight,
And they dare to call you a "troll"
When you can write above the level of O,
When you code above the level of Joe.
They've taken structured programming
And made it a dog's dinner:
They think it is a ban on thinking.
Dijsktra died at seventy-two
In part from life long depression
At the nonsense and voodoo
That passed for good computing.
Now they invoke his name
To speak it they should feel shame.
>
> Richard Harter, c...@tiac.nethttp://home.tiac.net/~cri,http://www.varinom=
a.com
> Infinity is one of those things that keep philosophers busy when they
> could be more profitably spending their time weeding their garden.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/17/2010 4:03:23 PM
|
|
On 2010-02-17, fedora <no_mail@invalid.invalid> wrote:
> Thanks Ben! I didn't use stdlib functions because i wanted to how
> easy/difficult it would be to write my own. also spinoza didn't accept
> program that called function in string.h.
Unless you're expecting to spend most of your programming career working
for people who have major obsessive problems with using technology in ways
that generally work, because of personal grudges never adequately explained,
I would suggest that perhaps what Nilges accepts or doesn't accept should
not be a component of any technical decision, ever.
> About plugging in my own versions, do you mean writing routines with the
> same name so that at linking the linker finds my lib first and plug it in?
> but i read somewhere that using ansi c's namespace and relying on linker is
> all undefined and gcc can replace calls to stdlib functions with inline code
> so bypassing my code too... but i'll try it.
The obvious way to do it would be:
size_t
callstrlen(const char *s) {
#ifdef ME
/* your code here */
#else
return strlen(s);
#endif
}
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/17/2010 5:16:34 PM
|
|
On 2010-02-17, fedora <no_mail@invalid.invalid> wrote:
> am thinking if string programing in C is naturally so difficult or i'm just
> stupid:(
It takes quite a while to get used to it, at the very least. When you're
doing multiple string operations, you have a lot of things to keep track
of. Consider a simple strcpy() replacement:
char *cpystr(char *dest, char *src) {
char *start = dest;
while ((*(dest++) = *(src++)) != '\0')
;
return start;
}
This isn't actually all that simple; I write a lot of code which is
much harder for me to figure out, but which would be easier to figure
out without years of experience with strings. It can be a bit easier
to read done the other way:
char *cpystr(char *dest, char *src) {
int i;
for (i = 0; src[i]; ++i) {
dest[i] = src[i];
}
dest[i] = '\0';
return dest;
}
This one is often easier for people to read because the pointers stay
pointing to the same things. So for some cases, you may find array
notation simpler. What I have found is usually that array notation is
simpler as long as I only have one index to use. If I have multiple indexes,
it can be easier for me to think about the problem if I write it in
terms of pointers, but not necessarily tersely:
while (*src) {
*dest = *src;
++dest;
++src;
}
*dest = '\0';
That's certainly simpler to understand than the original version. What this
does is trade the simplicity of having the pointers stay fixed -- they always
point to the same things -- for the simplicity of having the semantic content
of the pointers stay fixed -- they always point to the next character you
need to copy.
Which of those is better is a bit arbitrary, and may vary from one person to
another, or one algorithm to another. Once you start getting into stuff like
strstr() implementations, it's often important to pick descriptive names.
I don't recommend names like thePointerToTheThingIWasGoingToCopyInto. I tend
to prefer short nickname length things; something that's easy to recognize,
visually distinct, and tells me what I'm doing.
So...
char *findstr(char *needle, char *haystack) {
while (*haystack) {
if (*haystack == *needle) {
char *h = haystack, *n = needle, *first = NULL;
while (*n && *n == *h) {
if (!first && *h == *needle) {
first = h;
}
++n;
++h;
}
if (!*n) {
return haystack;
} else {
haystack = first - 1;
}
}
++haystack;
}
return NULL;
}
I don't know whether this will work, but it's approximately a standard cheap
strstr(), with the one optimization being an attempt to not rescan a chunk
of the haystack looking for the first character of the needle when it's not
needed. The names aren't very long, and the inner loop uses short names
which clearly refer back to their sources.
You're invited to look for errors in this, as there may well be some, not
the least of which is that I have no idea whether or not it will compile.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/17/2010 5:48:12 PM
|
|
In article <4b7c0b54.165239015@text.giganews.com>,
Richard Harter <cri@tiac.net> wrote:
....
>>It happened to me about 5 years ago - and they still use it in their
>>diatribes. The fact, of course, being that they define things so that
>>anyone with even a glimmer of intelligence becomes defined as "troll".
>
>Clearly then, you are not a troll.
Don't quit your day job.
|
|
0
|
|
|
|
Reply
|
gazelle3 (1609)
|
2/17/2010 7:29:33 PM
|
|
On Feb 17, 7:16=A0pm, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-17, fedora <no_m...@invalid.invalid> wrote:
>
> > Thanks Ben! I didn't use stdlib functions because i wanted to how
> > easy/difficult it would be to write my own. also spinoza didn't accept
> > program that called function in string.h.
>
> Unless you're expecting to spend most of your programming career working
> for people who have major obsessive problems with using technology in way=
s
> that generally work, because of personal grudges never adequately explain=
ed,
> I would suggest that perhaps what Nilges accepts or doesn't accept should
> not be a component of any technical decision, ever.
>
That's the ad hominem fallacy. It's not a pretentious term for
"insult" but a common falacy, which is to suppose that an argument is
wrong because of the person who is making it.
In fact there are good reasons for deprecating string.h. chars
effectively have to be octets, whilst often programs need to accept
non-Latin strings. Then the functions are all very old, with certain
weaknesses (no protection from buffer overun in strcpy, an O(N)
performance for strcat and strlen, an inconvenient interface for
strcat, const inconsistencies with strchr, very poor functionality
with strfind and const inconsiencies here too, very serious buffer
problems with sprintf, an overly difficult interface and buffer
problems with sscanf, thread problems with strtok and a non-intuitive
interface.
|
|
0
|
|
|
|
Reply
|
malcolm.mclean5 (750)
|
2/17/2010 7:42:02 PM
|
|
On 2010-02-17, Malcolm McLean <malcolm.mclean5@btinternet.com> wrote:
> On Feb 17, 7:16�pm, Seebs <usenet-nos...@seebs.net> wrote:
>> Unless you're expecting to spend most of your programming career working
>> for people who have major obsessive problems with using technology in ways
>> that generally work, because of personal grudges never adequately explained,
>> I would suggest that perhaps what Nilges accepts or doesn't accept should
>> not be a component of any technical decision, ever.
> That's the ad hominem fallacy. It's not a pretentious term for
> "insult" but a common falacy, which is to suppose that an argument is
> wrong because of the person who is making it.
No, it's not an ad hominem fallacy. It's the very well supported view
that what Nilges accepts or doesn't accept should not be a component of any
technical decision. I'm not saying that an argument is wrong because of
the person who is making it. I'm saying that a conclusion should be ignored
(neither accepted nor rejected) based on the person who has offered it.
Which is to say, if you know someone is a clown, knowing his position on an
issue tells you nothing for or against the issue. Now, if he had advanced
an argument, it could be worth discussing that argument, but as long as
we're just talking about his conclusion, it's not an ad hominem fallacy to
suggest disregarding the conclusions reached by someone who is demonstrably
very bad at the topic in question.
> In fact there are good reasons for deprecating string.h.
For some purposes, yes.
For manipulation of sequences of non-NUL chars, terminated by a char, not so
much.
> chars
> effectively have to be octets, whilst often programs need to accept
> non-Latin strings.
True. This is addressed in no small part by the multibyte stuff, which you
would presumably use for multibyte strings.
> Then the functions are all very old, with certain
> weaknesses (no protection from buffer overun in strcpy, an O(N)
> performance for strcat and strlen, an inconvenient interface for
> strcat, const inconsistencies with strchr, very poor functionality
> with strfind and const inconsiencies here too, very serious buffer
> problems with sprintf, an overly difficult interface and buffer
> problems with sscanf, thread problems with strtok and a non-intuitive
> interface.
I'm not aware of "strfind".
While the various interfaces are certainly flawed, consider that the Nilges
alternative is to duplicate the flaws of the interface without even the
benefit of already having been debugged. Or, worse, to just not even come
close.
There are certainly cases where the <string.h> functions are not the right
tool. However, that Nilges argues against it is not an argument either way
in that. His arguments might be an argument either way; his conclusion is
not.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/17/2010 7:49:33 PM
|
|
Malcolm McLean <malcolm.mclean5@btinternet.com> writes:
<snip>
> In fact there are good reasons for deprecating string.h. chars
> effectively have to be octets, whilst often programs need to accept
> non-Latin strings.
It's easy to switch to wide character versions if you used the
equivalent str* versions. A few macros and you can build versions for
either character type very simply.
> Then the functions are all very old, with certain
> weaknesses (no protection from buffer overun in strcpy, an O(N)
> performance for strcat and strlen, an inconvenient interface for
> strcat, const inconsistencies with strchr, very poor functionality
> with strfind and const inconsiencies here too, very serious buffer
> problems with sprintf, an overly difficult interface and buffer
> problems with sscanf, thread problems with strtok and a non-intuitive
> interface.
Those are arguments for using something better, not arguments for not
using C's string functions. If the "challenge" had been: "use this
improved string library to write replace" or "design a string library
so that replace is easy to write" I for one would have no objection.
The problem is that rejecting what is already there (rather than using
something better) leads to a /more/ complex and buggy solution.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/17/2010 8:50:10 PM
|
|
On 2010-02-17, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> Malcolm McLean <malcolm.mclean5@btinternet.com> writes:
><snip>
>> In fact there are good reasons for deprecating string.h. chars
>> effectively have to be octets, whilst often programs need to accept
>> non-Latin strings.
>
> It's easy to switch to wide character versions if you used the
> equivalent str* versions.
Just watch out for C99 braindamage!
swprintf has an argument interface similar to the wide character
equivalent of snprintf, but you may be bitten by the gratuitously
different return value convention.
|
|
0
|
|
|
|
Reply
|
kkylheku (2499)
|
2/17/2010 8:58:12 PM
|
|
Malcolm McLean wrote:
> On Feb 17, 7:16 pm, Seebs <usenet-nos...@seebs.net> wrote:
>> On 2010-02-17, fedora <no_m...@invalid.invalid> wrote:
>>
>>> Thanks Ben! I didn't use stdlib functions because i wanted to how
>>> easy/difficult it would be to write my own. also spinoza didn't accept
>>> program that called function in string.h.
>> Unless you're expecting to spend most of your programming career working
>> for people who have major obsessive problems with using technology in ways
>> that generally work, because of personal grudges never adequately explained,
>> I would suggest that perhaps what Nilges accepts or doesn't accept should
>> not be a component of any technical decision, ever.
>>
> That's the ad hominem fallacy. It's not a pretentious term for
> "insult" but a common falacy, which is to suppose that an argument is
> wrong because of the person who is making it.
No, he's not saying that an argument is wrong because Mr Nilges is
making it. He's saying that Mr Nilges's support for an argument is of no
value in determining whether or not the argument is valid. That's very
different.
Here is an example of "ad hominem" argument:
"Mr X says strcpy is dangerous. Mr X doesn't know spit about C.
Therefore strcpy is not dangerous". Poor reasoning.
Here is an example of what Seebs is saying, using the above
(hypothetical) case:
"Mr X says strcpy is dangerous. Mr X doesn't know spit about C.
Therefore X's claim that strcpy is dangerous adds no value to the
argument that strcpy is dangerous. Whether it is or isn't dangerous is
another matter entirely."
That is, he is in effect claiming that it is possible that even someone
who knows little about a subject may know enough about it to make a
correct claim, or may simply make a correct claim by chance. So, moving
back from the abstract to the concrete, he is not dismissing any claim
made by Mr Nilges as being necessarily incorrect. That would be foolish,
not only because it would be an invalid "ad hominem" argument, but also
because Mr Nilges (who is on record as saying that he wishes to cause
maximum damage to this newsgroup) could exploit such a position by
deliberately making claims that are clearly true.
> In fact there are good reasons for deprecating string.h. chars
> effectively have to be octets, whilst often programs need to accept
> non-Latin strings. Then the functions are all very old, with certain
> weaknesses (no protection from buffer overun in strcpy, an O(N)
> performance for strcat and strlen, an inconvenient interface for
> strcat, const inconsistencies with strchr, very poor functionality
> with strfind and const inconsiencies here too, very serious buffer
> problems with sprintf, an overly difficult interface and buffer
> problems with sscanf, thread problems with strtok and a non-intuitive
> interface.
Taking your specific points one at a time:
Whilst it is true that strcpy offers no added protection against buffer
overrun, careful programming overcomes this problem. Thus, strcpy does
not get in the way of the programmer who knows full well that his buffer
is sufficiently large - no performance penalty is imposed.
Yes, strcat and strlen are O(N) - so, where it matters, you remember the
string length, having found it out the first time. These two functions
offer simple solutions to a simple task, and as such are very often a
good solution to the task at hand. Where that is not the case, we have
the option of building more powerful tools. (And yes, I agree that
strcat's interface could be improved; for example, it could return a
pointer to the null terminator rather than to the beginning of the string.)
Again, I must agree that the const inconsistency with strchr is a bit of
a wart. But the input is const purely to constitute a promise that the
function won't write to the input string. The return value is non-const
because strchr would otherwise be a real pain to use. How could it be
done better?
As for strfind, that's not C's problem. Take it up with the vendor.
To my mind, the sprintf function does not have serious buffer problems.
Nevertheless, some people obviously disagree, and C99 provides snprintf
for such people.
The scanf function is basically a mess, and is rarely used correctly. I
am at a loss to understand why it is introduced so early in programming
texts.
The strtok function is of limited use, but there are times when it is
just the ticket. It would be better, however, for it to take a state
pointer. I'm not convinced that its interface is particularly non-intuitive.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/17/2010 9:14:41 PM
|
|
Ben Bacarisse wrote:
> Those are arguments for using something better, not arguments for not
> using C's string functions. If the "challenge" had been: "use this
> improved string library to write replace" or "design a string library
> so that replace is easy to write" I for one would have no objection.
>
> The problem is that rejecting what is already there (rather than using
> something better) leads to a /more/ complex and buggy solution.
This whole project thread has been filled how not to engineer software.
Application code specifically avoiding libraries, Not Invented Here,
random design with moving target specifications or no specifications,
ad hoc testing with a dose of 20+ Year old unresolved office battles,
interpersonal rivalry and off topic rants.
As several have stated not the environment that we are used to.
Regards
w..
--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/17/2010 9:19:53 PM
|
|
On 2010-02-17, Walter Banks <walter@bytecraft.com> wrote:
> This whole project thread has been filled how not to engineer software.
> Application code specifically avoiding libraries, Not Invented Here,
> random design with moving target specifications or no specifications,
> ad hoc testing with a dose of 20+ Year old unresolved office battles,
> interpersonal rivalry and off topic rants.
> As several have stated not the environment that we are used to.
Yes, but it's important to be prepared to program in some of the many
environments which real programmers often end up having to work in.
To be fair, I've never had a coworker in the same class as Nilges. Not
even particularly close. But I have had to work with arbitrary or
bad specifications, specifications which change repeatedly during
implementation, old office battles, and vehement opposition to things which
were Not Invented Here.
At one point, I was asked to develop a linked list implementation. The
proposed design looked like this:
struct list_node {
struct list_node *next;
void *data;
};
struct list {
struct list_node *head;
struct list_node *tail;
};
The specification was much as you'd expect. Except for one TINY detail.
Which was that the formal specification was that
(struct list *) (x->tail->next) == x
whenever tail was not null.
That is to say, if the list contained any members, the "next" pointer for
the last member of the list was a pointer (suitably converted) to the list
object.
So iteration would look roughly like:
for (l = x->head; l->next != x; l = l->next) {
/* ... */
}
It took a day or so of effort for me to round up enough senior developers
to all sit on the guy and tell him that:
1. He was wrong.
2. He was micro-managing, which is presumptively wrong.
before we were allowed to use a more conventional design.
Having had to deal with things like a database in which the formal schema
description begins with "all fields are VARCHAR for simplicity", I found the
Nilges String Replace Challenge to be a surprisingly good approximation of
what programming work is often like in the real world.
(Disclaimer: All the above memories are faded with age. My current
environment is pretty good about this kind of stuff. I have no clue about
the office politics, as our management put a great deal of time and effort
into ensuring that they are Not Our Problem.)
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/17/2010 9:39:45 PM
|
|
"Richard Heathfield" <rjh@see.sig.invalid> wrote in message
news:FrCdnTZJc8ajweHWnZ2dnUVZ8r6dnZ2d@bt.com...
[...]
> Yes, strcat and strlen are O(N) - so, where it matters, you remember the
> string length, having found it out the first time.
Bingo! :^)
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/17/2010 9:51:29 PM
|
|
Seebs wrote:
> On 2010-02-17, Walter Banks <walter@bytecraft.com> wrote:
> > This whole project thread has been filled how not to engineer software.
> > Application code specifically avoiding libraries, Not Invented Here,
> > random design with moving target specifications or no specifications,
> > ad hoc testing with a dose of 20+ Year old unresolved office battles,
> > interpersonal rivalry and off topic rants.
>
> > As several have stated not the environment that we are used to.
>
> Yes, but it's important to be prepared to program in some of the many
> environments which real programmers often end up having to work in.
>
> To be fair, I've never had a coworker in the same class as Nilges. Not
> even particularly close. But I have had to work with arbitrary or
> bad specifications, specifications which change repeatedly during
> implementation, old office battles, and vehement opposition to things which
> were Not Invented Here.
I saw this 20 years ago and it became nothing but a memory as better
more effective approaches prevailed.
> (Disclaimer: All the above memories are faded with age. My current
> environment is pretty good about this kind of stuff. I have no clue about
> the office politics, as our management put a great deal of time and effort
> into ensuring that they are Not Our Problem.)
Software development practices have improved a lot as applications
have become more complex and requirements better defined.
w..
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/17/2010 10:08:42 PM
|
|
fedora <no_mail@invalid.invalid> writes:
> Ben Bacarisse wrote:
>
>> fedora <no_mail@invalid.invalid> writes:
<snip>
>>>> fedora <no_mail@invalid.invalid> writes:
<snip>
>>>>> char *strSubstr(
>>>>> char *str,
>>>>> char *sub,
>>>>> size_t lstr,
>>>>> size_t lsub) {
>>>>> char *substr = 0;
>>>>> char *anchor = str;
>>>>> size_t remaining_len = (lstr - lsub) + 1;
>>
>> Your problems below come from this + 1, I think. It looks wrong.
>> Sorry I did not spot this first time round.
>
> I added the one because when both string and sub string are equal length the
> diff will give zero, so to make the while loop below work properly, i add
> one.
>
> i cant compare remaining_len >= 0 since that'll always be true for
> unsigned.
I think you are right. The bug is a bit further on... I mixed two
related issues up.
>>>>> assert(str && sub && lstr && lsub && lstr >= lsub);
>>>>> while (remaining_len > 0 && anchor) {
>>>>> if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
>>>>> if (strComp(anchor, sub, lsub) == 0) {
>>>>> substr = anchor;
>>>>> break;
>>>>> }
>>>>> anchor++;
>>>>> remaining_len--;
>>>>> }
>>>>> }
>>>>> return substr;
>>>>> }
The problem is not the + 1, but the fact that you subtract one from
remaining_len every time, even when anchor has jumped by more than
one. Put:
assert(!substr || substr + lsub <= str + lstr);
just before the return and run with "aaxbx" and "xa" and you will see
that the assert will fire because the second time round the loop
anchor points to the 'b' and remaining_len is 3 (when it should be 2).
One solution is to replace the decrement of remaining_len with a new
calculation of it:
remaining_len = str + lstr - anchor - lsub + 1;
If you do that, there is really no need to have it in a variable.
Just pass this value to strFirstCh and loop while anchor is not null
but I suspect that you can come up with a simpler way to write the
whole function if you try.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/18/2010 12:02:13 AM
|
|
fedora <no_mail@invalid.invalid> writes:
> Ike Naar wrote:
>
>> In article <hlgr1g$qp1$1@news.eternal-september.org>,
>> fedora <no_mail@invalid.invalid> wrote:
>>>Ike Naar wrote:
>>>
>>>> In article <hlglm1$r14$1@news.eternal-september.org>,
>>>> fedora <no_mail@invalid.invalid> wrote:
>>>>>
>>>>>if (((orig_str + orig_lstr) - str) <= step) break;
>>>>
>>>> [snip]
>>>>
>>>> (orig_str + orig_lstr <= str + step)
>>>
>>> [snip]
>>>
>>>Initially i thought of writing it exactly like yours but then when str is
>>>< step bytes before it's end, adding step to it would be undefined
>>>behaviour so i coded like i did.
>>
>> Whoops you're right I did not think about that.
>>
>>> Maybe i can cast step to signed long and then compare
>>>with (orig_str+orig_lstr) - str? is this ok?
>>
>> That is a possibility. If your compiler supports prtdiff_t then perhaps
>> it's better to cast step to that type, so that the types on both sides
>> of the comparison operator are the same.
>>
>> What about (orig_str + orig_lstr - step <= str) ?
>
> This is perfect thanks!! After changing that line to the comparison above,
> i've stopped getting the strange seg faults as far as i can see.
>
> so it seems this failed test allowed lstr to wrap around to very big values
> and access foreign memory locations. now it works!
No, sorry, your program is still wrong. You are looking at a symptom
not a cause. The change alters thing so it /seems/ to work but the
bug I pointed out (though I miss-described it) is still there. This
new test (which I don't think is needed if you correct strSubstr) is
hiding the undefined effect of going outside the array, but the
behaviour is still undefined even though you may not see a crash. See
my other post about that.
lstr "wrapped round" only because strSubstr is returning a pointer too
close to the end of the string. This line:
lstr = (orig_str + orig_lstr) - str;
sets an unsigned lstr to a possibly signed pointer difference. With
the + error in strSubstr, this difference can be -1. If you fix
strSubstr everything is OK again. Of course, I may have missed
another problem, but i am pretty sure about this one!
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/18/2010 12:06:09 AM
|
|
On Feb 18, 3:49=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-17, Malcolm McLean <malcolm.mcle...@btinternet.com> wrote:
>
> > On Feb 17, 7:16=A0pm, Seebs <usenet-nos...@seebs.net> wrote:
> >> Unless you're expecting to spend most of your programming career worki=
ng
> >> for people who have major obsessive problems with using technology in =
ways
> >> that generally work, because of personal grudges never adequately expl=
ained,
> >> I would suggest that perhaps what Nilges accepts or doesn't accept sho=
uld
> >> not be a component of any technical decision, ever.
> > That's the ad hominem fallacy. It's not a pretentious term for
> > "insult" but a common falacy, which is to suppose that an argument is
> > wrong because of the person who is making it.
>
> No, it's not an ad hominem fallacy. =A0It's the very well supported view
> that what Nilges accepts or doesn't accept should not be a component of a=
ny
> technical decision. =A0I'm not saying that an argument is wrong because o=
f
> the person who is making it. =A0I'm saying that a conclusion should be ig=
nored
> (neither accepted nor rejected) based on the person who has offered it.
....and that's an ad hominem fallacy. If you'd troubled to take a class
in informal logic, you'd have discovered that there is a VALID
argument based on applicable authority, but none based on anti-
authority. Nobody except a Fascist or a child refuses to believe
something because of his hatred of a person.
>
> Which is to say, if you know someone is a clown, knowing his position on =
an
> issue tells you nothing for or against the issue. =A0Now, if he had advan=
ced
> an argument, it could be worth discussing that argument, but as long as
> we're just talking about his conclusion, it's not an ad hominem fallacy t=
o
> suggest disregarding the conclusions reached by someone who is demonstrab=
ly
> very bad at the topic in question.
>
I'd be careful with this, dear boy. You're the one who posted a
"solution" for replacing %s (and bugger all) with something else, that
didn't work, and who posted a strlen with a crude off by one bug. Many
people here (but not me) may well use your demonstrated incompetence
as a reason for ignoring your input on technical matters. I haven't:
most recently, I used the fact that your latest stupidity in the off-
by-one strlen was so glaringly obvious, because you used short
identifiers, to admit that my long and literate identifies might need
to be reduced.
> > In fact there are good reasons for deprecating string.h.
>
> For some purposes, yes.
>
> For manipulation of sequences of non-NUL chars, terminated by a char, not=
so
> much.
You haven't responded to Malcolm's concerns at all.
>
> > chars
> > effectively have to be octets, whilst often programs need to accept
> > non-Latin strings.
>
> True. =A0This is addressed in no small part by the multibyte stuff, which=
you
> would presumably use for multibyte strings.
....as an exception (in other words, a bug in waiting)
>
> > Then the functions are all very old, with certain
> > weaknesses (no protection from buffer overun in strcpy, an O(N)
> > performance for strcat and strlen, an inconvenient interface for
> > strcat, const inconsistencies with strchr, very poor functionality
> > with strfind and const inconsiencies here too, very serious buffer
> > problems with sprintf, an overly difficult interface and buffer
> > problems with sscanf, thread problems with strtok and a non-intuitive
> > interface.
>
> I'm not aware of "strfind".
>
> While the various interfaces are certainly flawed, consider that the Nilg=
es
> alternative is to duplicate the flaws of the interface without even the
> benefit of already having been debugged. =A0Or, worse, to just not even c=
ome
> close.
Incoherent. How do I "duplicate the flaws of the interface?" My
replace merely starts with Nul terminated strings because this is what
I'm given. You're just issuing words out of hatred at this point.
>
> There are certainly cases where the <string.h> functions are not the righ=
t
> tool. =A0However, that Nilges argues against it is not an argument either=
way
> in that. =A0His arguments might be an argument either way; his conclusion=
is
> not.
Empty words.
There is a techie named Seebach
Who sense and skill doth lack
Who uses words
Like little turds
That incompetent techie named Seebach
His code it had an ugly Bug
Off by one, snug as a rug
It was obvious to all
And the sensitive this bug did appall
It made them say, oh Ugh
But we would like to forgive him
Programming is hard for all men
And when he withdraws his Schildt rant
We will do so, like Immanuel Kant.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 3:09:15 AM
|
|
On Feb 18, 5:19=A0am, Walter Banks <wal...@bytecraft.com> wrote:
> Ben Bacarisse wrote:
> > Those are arguments for using something better, not arguments for not
> > using C's string functions. =A0If the "challenge" had been: "use this
> > improved string library to write replace" or "design a string library
> > so that replace is easy to write" I for one would have no objection.
>
> > The problem is that rejecting what is already there (rather than using
> > something better) leads to a /more/ complex and buggy solution.
>
> This whole project thread has been filled how not to engineer software.
> Application code specifically avoiding libraries, Not Invented Here,
> random design with moving target specifications or no specifications,
> ad hoc testing with a dose of 20+ Year old unresolved office battles,
> interpersonal rivalry and off topic rants.
>
> As several have stated not the environment that we are used to.
You're in denial. In fact, Seebach and Heathfield insist on making
this recreational environment, in which we could in the absence of job
pressures engage in a common search for truth, representative of the
usual sort of development environment. The fact is that most high tech
firms produce low tech software consistently, and they do so because
correct software takes "too much time", and requires for its
developers, free human beings unafraid of being laid off into a savage
state of unemployment when they are the targets of office bullies.
>
> Regards
>
> w..
> --
> Walter Banks
> Byte Craft Limitedhttp://www.bytecraft.com
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 3:12:48 AM
|
|
On Feb 18, 5:39=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-17, Walter Banks <wal...@bytecraft.com> wrote:
>
> > This whole project thread has been filled how not to engineer software.
> > Application code specifically avoiding libraries, Not Invented Here,
> > random design with moving target specifications or no specifications,
> > ad hoc testing with a dose of 20+ Year old unresolved office battles,
> > interpersonal rivalry and off topic rants.
> > As several have stated not the environment that we are used to.
>
> Yes, but it's important to be prepared to program in some of the many
> environments which real programmers often end up having to work in.
>
> To be fair, I've never had a coworker in the same class as Nilges. =A0Not
> even particularly close. =A0But I have had to work with arbitrary or
> bad specifications, specifications which change repeatedly during
> implementation, old office battles, and vehement opposition to things whi=
ch
> were Not Invented Here.
>
> At one point, I was asked to develop a linked list implementation. =A0The
> proposed design looked like this:
>
> =A0 =A0 =A0 =A0 struct list_node {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct list_node *next;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 void *data;
> =A0 =A0 =A0 =A0 };
>
> =A0 =A0 =A0 =A0 struct list {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct list_node *head;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct list_node *tail;
> =A0 =A0 =A0 =A0 };
>
> The specification was much as you'd expect. =A0Except for one TINY detail=
..
> Which was that the formal specification was that
> =A0 =A0 =A0 =A0 (struct list *) (x->tail->next) =3D=3D x
> whenever tail was not null.
>
> That is to say, if the list contained any members, the "next" pointer for
> the last member of the list was a pointer (suitably converted) to the lis=
t
> object.
>
> So iteration would look roughly like:
> =A0 =A0 =A0 =A0 for (l =3D x->head; l->next !=3D x; l =3D l->next) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* ... */
> =A0 =A0 =A0 =A0 }
>
> It took a day or so of effort for me to round up enough senior developers
> to all sit on the guy and tell him that:
>
> 1. =A0He was wrong.
> 2. =A0He was micro-managing, which is presumptively wrong.
>
> before we were allowed to use a more conventional design.
>
> Having had to deal with things like a database in which the formal schema
> description begins with "all fields are VARCHAR for simplicity", I found =
the
> Nilges String Replace Challenge to be a surprisingly good approximation o=
f
> what programming work is often like in the real world.
>
> (Disclaimer: =A0All the above memories are faded with age. =A0My current
> environment is pretty good about this kind of stuff. =A0I have no clue ab=
out
> the office politics, as our management put a great deal of time and effor=
t
> into ensuring that they are Not Our Problem.)
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
Fascists like to tell stories about fantasy people and other people in
hopes that their listeners will then confuse the mythical or alternate
person with their target. This of course is easier than going head to
head, where Peter consistently loses.
Peter has not, to my knowledge, posted a solution to my challenge that
works. The only people to have done so are Willem and io_x. Instead,
he started the ball rolling with code with bugs (%s and bugger all to
bugger all) and has since that time posted more code with bugs, that
uses string.h, along with a marvelously compact example of off by one
bugosity.
So of course, it's time to start telling war stories about his
victories over "incompetent" coworkers.
Instead of developing his own competence (for example, by taking
university courses in comp sci), Peter insists on advancing his career
by trashing good people like Schildt and paying his way onto standards
committees.
He is a most amusing creature.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 3:18:42 AM
|
|
spinoza1111 wrote:
> On Feb 18, 5:19 am, Walter Banks <wal...@bytecraft.com> wrote:
>
> > As several have stated not the environment that we are used to.
>
> You're in denial. In fact, Seebach and Heathfield insist on making
> this recreational environment, in which we could in the absence of job
> pressures engage in a common search for truth, representative of the
> usual sort of development environment. The fact is that most high tech
> firms produce low tech software consistently, and they do so because
> correct software takes "too much time", and requires for its
> developers, free human beings unafraid of being laid off into a savage
> state of unemployment when they are the targets of office bullies.
This project and your comments show just how far out of touch
you are with software technology. It goes back a long time
when with pride you were debugging in machine code on a 1401
instead of using tools created to debug software.
NIH and other juvenile behavior at best illustrates the way
not to implement a project and at worst misleading and intellectually
dishonest.
The broad based generalizations that you attribute to software
development companies just doesn't fit the facts.
w..
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/18/2010 3:57:02 AM
|
|
On Feb 18, 5:39=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-17, Walter Banks <wal...@bytecraft.com> wrote:
>
> > This whole project thread has been filled how not to engineer software.
> > Application code specifically avoiding libraries, Not Invented Here,
> > random design with moving target specifications or no specifications,
> > ad hoc testing with a dose of 20+ Year old unresolved office battles,
> > interpersonal rivalry and off topic rants.
> > As several have stated not the environment that we are used to.
>
> Yes, but it's important to be prepared to program in some of the many
> environments which real programmers often end up having to work in.
>
> To be fair, I've never had a coworker in the same class as Nilges. =A0Not
No, I don't think you have.
> even particularly close. =A0But I have had to work with arbitrary or
> bad specifications, specifications which change repeatedly during
> implementation, old office battles, and vehement opposition to things whi=
ch
> were Not Invented Here.
You've memorized the mere names of thought crimes ("not invented
here") without learning your trade; as you have told us, you haven't
taken any university computer science classes at all.
>
> At one point, I was asked to develop a linked list implementation. =A0The
> proposed design looked like this:
>
> =A0 =A0 =A0 =A0 struct list_node {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct list_node *next;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 void *data;
This is as we know a mistake. You're retailing a story, in the
Fascist's way, in hopes that your auditors will confuse the story with
me. William Butler Yeats noticed that both sides in the Irish
"troubles" preferred to read their own biased newspapers and were
uninterested in dialog with the other side, and in the case of the
Republicans further split when Michael Collins tried to establish the
Irish free state:
The bees build in the crevices
Of loosening masonry, and there
The mother birds bring grubs and flies.
My wall is loosening; honey-bees,
Come build in the empty house of the state.
We are closed in, and the key is turned
On our uncertainty; somewhere
A man is killed, or a house burned,
Yet no clear fact to be discerned:
Come build in he empty house of the stare. (Yeats: The Stare's Nest by
My Window, to be continued)
> =A0 =A0 =A0 =A0 };
>
> =A0 =A0 =A0 =A0 struct list {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct list_node *head;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct list_node *tail;
> =A0 =A0 =A0 =A0 };
>
> The specification was much as you'd expect. =A0Except for one TINY detail=
..
> Which was that the formal specification was that
> =A0 =A0 =A0 =A0 (struct list *) (x->tail->next) =3D=3D x
> whenever tail was not null.
>
> That is to say, if the list contained any members, the "next" pointer for
> the last member of the list was a pointer (suitably converted) to the lis=
t
> object.
>
> So iteration would look roughly like:
> =A0 =A0 =A0 =A0 for (l =3D x->head; l->next !=3D x; l =3D l->next) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* ... */
> =A0 =A0 =A0 =A0 }
>
> It took a day or so of effort for me to round up enough senior developers
> to all sit on the guy and tell him that:
>
> 1. =A0He was wrong.
> 2. =A0He was micro-managing, which is presumptively wrong.
>
> before we were allowed to use a more conventional design.
OK, he specified a circular singly-linked list, and (because by your
own admission you have never taken a single computer science class)
you had never seen this simple data structure, and you wasted an
entire day trying to figure this code out in consequence. For the same
reason you preferred to try destroy the career of Herb Schildt and you
hound me here, this enraged you and you maliciously went after the
original designer in order to ruin his position, because you didn't
know how to code a simple loop (we've seen how incompetent you are at
this task in your strlen that was off by one).
if (x->head =3D=3D NULL) return;
list * p =3D x->head;
list * p2 =3D p;
do { ... p =3D p->next;} while(p !=3D p2);
You may in fact know more about C than I do in the infantile rote
register, but you can't program, and instead of learning your trade,
you try to destroy reputations. You're a Fascist and an incompetent.
Oh honey-bees, come and build in the empty house of the stare: oh
honey-bees, come and build in the empty house of the state.
>
> Having had to deal with things like a database in which the formal schema
> description begins with "all fields are VARCHAR for simplicity", I found =
the
> Nilges String Replace Challenge to be a surprisingly good approximation o=
f
> what programming work is often like in the real world.
You can't deal. Like your hero George Bush (whom you supported in
2000) you're incurious and you jump to conclusions about concepts and
people all too readily, because you're not qualified for the job
you're in and you don't want to learn anything except slogans.
A barricade of stone or of wood;
Some fourteen days of civil war;
Last night they trundled down the road
That dead young soldier in his blood:
Come build in the empty house of the stare.
We had fed the heart on fantasies,
The heart's grown brutal from the fare;
More substance in our enmities
Than in our love; O honey-bees,
Come build in the empty house of the stare. (Yeats)
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 4:23:39 AM
|
|
On Feb 18, 6:08=A0am, Walter Banks <wal...@bytecraft.com> wrote:
> Seebs wrote:
> > On 2010-02-17, Walter Banks <wal...@bytecraft.com> wrote:
> > > This whole project thread has been filled how not to engineer softwar=
e.
> > > Application code specifically avoiding libraries, Not Invented Here,
> > > random design with moving target specifications or no specifications,
> > > ad hoc testing with a dose of 20+ Year old unresolved office battles,
> > > interpersonal rivalry and off topic rants.
>
> > > As several have stated not the environment that we are used to.
>
> > Yes, but it's important to be prepared to program in some of the many
> > environments which real programmers often end up having to work in.
>
> > To be fair, I've never had a coworker in the same class as Nilges. =A0N=
ot
> > even particularly close. =A0But I have had to work with arbitrary or
> > bad specifications, specifications which change repeatedly during
> > implementation, old office battles, and vehement opposition to things w=
hich
> > were Not Invented Here.
>
> I saw this 20 years ago and it =A0became nothing but a memory as better
> more effective approaches prevailed.
>
> > (Disclaimer: =A0All the above memories are faded with age. =A0My curren=
t
> > environment is pretty good about this kind of stuff. =A0I have no clue =
about
> > the office politics, as our management put a great deal of time and eff=
ort
> > into ensuring that they are Not Our Problem.)
>
> Software development practices have improved a lot as applications
> have become more complex and requirements better defined.
How can "defining requirements" improve "software development
practices"? "Defining requirements" is being able to express the
requirements in English, other human languages, visuals, and if
necessary mathematical formalisms: if you define them in code, you're
not, by definition, "defining requirements": you are coding.
The fantasy has always been that the people who through intellectual
incuriosity and lack of qualifications (for example, psychology majors
like Seebach) can neither code nor do traditional mathematics will
somehow sit around in meetings and so such great "requirements" that
the code will be easy. The reality is lost baggage, stuck
accellerators, military veterans without health care, children
screaming under stairwells, boys sobbing in armies, and old men
weeping in the parks.
>
> w..
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 4:28:16 AM
|
|
spinoza1111 wrote:
> On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
>
> > Software development practices have improved a lot as applications
> > have become more complex and requirements better defined.
>
> How can "defining requirements" improve "software development
> practices"? "Defining requirements" is being able to express the
> requirements in English, other human languages, visuals, and if
> necessary mathematical formalisms: if you define them in code, you're
> not, by definition, "defining requirements": you are coding.
You just answered your own question.
w..
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/18/2010 4:33:08 AM
|
|
On Feb 18, 11:57=A0am, Walter Banks <wal...@bytecraft.com> wrote:
> spinoza1111wrote:
> > On Feb 18, 5:19 am, Walter Banks <wal...@bytecraft.com> wrote:
>
> > > As several have stated not the environment that we are used to.
>
> > You're in denial. In fact, Seebach and Heathfield insist on making
> > this recreational environment, in which we could in the absence of job
> > pressures engage in a common search for truth, representative of the
> > usual sort of development environment. The fact is that most high tech
> > firms produce low tech software consistently, and they do so because
> > correct software takes "too much time", and requires for its
> > developers, free human beings unafraid of being laid off into a savage
> > state of unemployment when they are the targets of office bullies.
>
> This project and your comments show just how far out of touch
> you are with software technology. It goes back a long time
> when with pride you were debugging in machine code on a 1401
> instead of using tools created to debug software.
Huh? I use .Net's rich development environment all the time, but sure,
I don't use bug creation tools like C strings if I don't have to.
>
> NIH and other juvenile behavior at best illustrates the way
> not to implement a project and at worst misleading and intellectually
> dishonest.
NIH labels a response. Sure, there are companies that have failed
owing to excess internal development and the attitude of
We don't like it 'cause it's not invented here
If you use that solution, you must be a queer
just as any form of development by catchphrase (such as misunderstood
"simplicity") is prone to fail. There are also companies that have
succeeded by developing things internally, including Apple, IBM until
1990 and Northern Telecom until about 1983.
Note that the negation of a catchphrase is still a catchphrase.
Northern Telecom switched catchphrases in 1983 and proceeded to fail
by farming everything out.
On the ground in little data processing shops, NIH is simply a way of
normalizing deviance and forcing smart people to use their skills to
do dumb things, in many cases.
>
> The broad based generalizations that you attribute to software
> development companies just doesn't fit the facts.
Actually, I've been using specific examples taken from personal
experience at Bell-Northern Research and while I hope things have
changed since I changed careers, the evidence here is that things have
gotten worse.
And, your riposte is itself a broad-based generalization.
>
> w..
>
> --- news://freenews.netfront.net/ - complaints: n...@netfront.net ---
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 5:20:37 AM
|
|
On Feb 18, 5:51=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>
> news:FrCdnTZJc8ajweHWnZ2dnUVZ8r6dnZ2d@bt.com...
>
> [...]
>
> > Yes, strcat and strlen are O(N) - so, where it matters, you remember th=
e
> > string length, having found it out the first time.
>
> Bingo! =A0:^)
Yes, strcat and strlen are order N
So be sure to squirrel away their values, like the Hen
Secretes seeds in her cache of seminal treasures
Great programmers, says Dickie, take these measures.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 5:23:11 AM
|
|
On Feb 18, 5:14=A0am, Richard Heathfield <r...@see.sig.invalid> wrote:
> Malcolm McLean wrote:
> > On Feb 17, 7:16 pm, Seebs <usenet-nos...@seebs.net> wrote:
> >> On 2010-02-17, fedora <no_m...@invalid.invalid> wrote:
>
> >>> Thanks Ben! I didn't use stdlib functions because i wanted to how
> >>> easy/difficult it would be to write my own. also spinoza didn't accep=
t
> >>> program that called function in string.h.
> >> Unless you're expecting to spend most of your programming career worki=
ng
> >> for people who have major obsessive problems with using technology in =
ways
> >> that generally work, because of personal grudges never adequately expl=
ained,
> >> I would suggest that perhaps what Nilges accepts or doesn't accept sho=
uld
> >> not be a component of any technical decision, ever.
>
> > That's the ad hominem fallacy. It's not a pretentious term for
> > "insult" but a common falacy, which is to suppose that an argument is
> > wrong because of the person who is making it.
>
> No, he's not saying that an argument is wrong because Mr Nilges is
> making it. He's saying that Mr Nilges's support for an argument is of no
> value in determining whether or not the argument is valid. That's very
> different.
>
> Here is an example of "ad hominem" argument:
>
> "Mr X says strcpy is dangerous. Mr X doesn't know spit about C.
> Therefore strcpy is not dangerous". Poor reasoning.
>
> Here is an example of what Seebs is saying, using the above
> (hypothetical) case:
>
> "Mr X says strcpy is dangerous. Mr X doesn't know spit about C.
> Therefore X's claim that strcpy is dangerous adds no value to the
> argument that strcpy is dangerous. Whether it is or isn't dangerous is
> another matter entirely."
The fallacy here is that while I know a bit about C, I know in fact
less than the regulars: but the regulars are uneducated in computer
science and who are incompetent programmers (who know a lot of rote
facts about C):
* Peter Seebach has by his own admission never taken a computer
science class, and in consequence makes silly off by one bugs, thinks
designing a common data structure (a circular linked list) is
practically a termination offense, and at least at the time of his
Schildt attack, thought that "the 'heap' is a DOS term"
* Richard Heathfield "engineers" a "reusable" linked list by
physically copying node values into nodes instead of using pointers,
creating a "reusable solution" that is O(M*N), prone to unpredictable
failure in production.
Managers who don't know how to manage free men and women prefer that
headhunters find men and women who are overly specifically "trained"
in platforms, and the result is plain.
>
> That is, he is in effect claiming that it is possible that even someone
> who knows little about a subject may know enough about it to make a
> correct claim, or may simply make a correct claim by chance. So, moving
> back from the abstract to the concrete, he is not dismissing any claim
> made by Mr Nilges as being necessarily incorrect. That would be foolish,
> not only because it would be an invalid "ad hominem" argument, but also
> because Mr Nilges (who is on record as saying that he wishes to cause
> maximum damage to this newsgroup) could exploit such a position by
> deliberately making claims that are clearly true.
I don't know where you or Julienne get that claim you said I made:
that I wish to destroy this newsgroup. It sounds like the way Arabs
and Iranians, speaking in writing in their own language are
systematically mistranslated (by Israeli translation firms) as saying
"death to Jews" when for the most part they are calling for an end to
the criminal gang that has put the Jews of Israel at risk with their
criminal policies.
In fact, I always am the source, according to all neutral observers,
of interesting content.
>
> > In fact there are good reasons for deprecating string.h. chars
> > effectively have to be octets, whilst often programs need to accept
> > non-Latin strings. Then the functions are all very old, with certain
> > weaknesses (no protection from buffer overun in strcpy, an O(N)
> > performance for strcat and strlen, an inconvenient interface for
> > strcat, const inconsistencies with strchr, very poor functionality
> > with strfind and const inconsiencies here too, very serious buffer
> > problems with sprintf, an overly difficult interface and buffer
> > problems with sscanf, thread problems with strtok and a non-intuitive
> > interface.
>
> Taking your specific points one at a time:
>
> Whilst it is true that strcpy offers no added protection against buffer
> overrun, careful programming overcomes this problem. Thus, strcpy does
> not get in the way of the programmer who knows full well that his buffer
> is sufficiently large - no performance penalty is imposed.
Knowledge is justified true belief, and programmers who decide that
some power of two is "big enough" don't usually verify this with end
users. Furthermore, knowledge is justified true belief, and most end
users don't know the answer either. Therefore, the best policy is to
use an approach in which you don't have to worry about "buffer
overrun", and this is easy in using a modern language such as C Sharp
or Java. In C, it's easy if one has taken a computer science class in
data structures, and had the chance in a true university to create a
linked list without some moron of a manager asking what you're doing.
With the linked list, you still have to worry about malloc failing.
But if you're competent, you call malloc checking its return. Whereas
allocating a buffer that you "think" is "big enough" and then blindly
copying to it is asking for trouble.
Sure, if you're reasonably competent, you will check the current size
of the string in the buffer against the allocated maximum (and, if
you're reasonably incompetent, you will do so with what even you, dear
Richard, knows is an O(n) strlen call, hopefully not using Peter's
famous off by one version). If you are a little more competent you
might even #define a symbol with the maximum rather than hard coding
it in two places.
But this of course doesn't begin to address the fact that in the so-
called "real world" (referring to it as reality, not in mythical
incantatory invocation), the buffer size will under maintenance
eventually become, for trivial jobs, always the maximum amount it
needed to be for unusual jobs, making for a performance pig.
>
> Yes, strcat and strlen are O(N) - so, where it matters, you remember the
> string length, having found it out the first time. These two functions
> offer simple solutions to a simple task, and as such are very often a
"Simple" to the Simpleton is always a term of Praise:
It means "at last I understand it, to my inner shame'd amaze":
He then imposes this lack of Art upon the sage and Wise
Whenever in Authority, to their shock, and awe'd surprise.
Beggars ride fine horses, whipping them in scorn:
Shouting, we must understand, tho' their comprehension be porn:
Belisarius puts his Foot upon the scholar and the priest,
Commanding them on pain of death to say to him the least:
Yet Vandal, Goth, Visigoth, Ostrogoth and Byzantine:
Need the wisdom of the learned to find the gold they'd glean:
So they can fill themselves with swill and ravish the name of Art
Expelling all at fair Rome's fall in one almighty Fart.
> good solution to the task at hand. Where that is not the case, we have
> the option of building more powerful tools.
Actually, we don't, since the ability is extinguished by Envy and
Rage:
When Barbarians are in control, the cry is death unto the sage.
> (And yes, I agree that
> strcat's interface could be improved; for example, it could return a
> pointer to the null terminator rather than to the beginning of the string=
..)
The mighty mountain Heathfield labors, and great Heaves are heard on
Earth:
The Natives stand a-wonderin' what this presages, whether boon, or
dearth:
The storm clouds gather on the peak, no man dares show his face:
Heathfield announces an idea to the assembled people in this place.
It's "return a pointer to the end and not to the beginning"
The Volcano has but Popp'd, and out comes...almost nothing.
>
> Again, I must agree that the const inconsistency with strchr is a bit of
> a wart. But the input is const purely to constitute a promise that the
> function won't write to the input string. The return value is non-const
> because strchr would otherwise be a real pain to use. How could it be
> done better?
But Heathfield says so, and it must of Needs be true:
This is the best of all possible worlds, there is no point in Rue:
Don't dream, don't hope, you're not the Pope, you are but a sot and
Bum
Heathfield knows this of you, because that's what he's become.
>
> As for strfind, that's not C's problem. Take it up with the vendor.
>
> To my mind, the sprintf function does not have serious buffer problems.
So lo! Heathfield's code has sprung a secret Leak:
But because he's the boss, no man may dare to speak.
It sprintsf bytes on adjacent pages, the debugger he but rages,
Why did this guy use such a feature, and will it to the ages?
His coworkers say, hey hey hey, you'd better hide your rage away:
The Boss wrote dat code back in his high and palmy Day.
If he hears you you're for the chop, better just rewrite the Code
Secretly and in silent hope you won't be the fall guy not his Toad.
> Nevertheless, some people obviously disagree, and C99 provides snprintf
> for such people.
>
> The scanf function is basically a mess, and is rarely used correctly. I
> am at a loss to understand why it is introduced so early in programming
> texts.
>
> The strtok function is of limited use, but there are times when it is
> just the ticket. It would be better, however, for it to take a state
> pointer. I'm not convinced that its interface is particularly non-intuiti=
ve.
And when he doubles a negative, the trouble really starts:
His meaning is conceal'd, it gets out in Fits, and Starts.
>
> --
> Richard Heathfield <http://www.cpax.org.uk>
> Email: -http://www. +rjh@
> "Usenet is a strange place" - dmr 29 July 1999
> Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 6:00:31 AM
|
|
On Feb 18, 12:33=A0pm, Walter Banks <wal...@bytecraft.com> wrote:
> spinoza1111wrote:
> > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
>
> > > Software development practices have improved a lot as applications
> > > have become more complex and requirements better defined.
>
> > How can "defining requirements" improve "software development
> > practices"? "Defining requirements" is being able to express the
> > requirements in English, other human languages, visuals, and if
> > necessary mathematical formalisms: if you define them in code, you're
> > not, by definition, "defining requirements": you are coding.
>
> You just answered your own question.
But dear boy, you said that software improves if we define
requirements. Please explain how this work, and note that industrial
experience is that the success of a software development project is
completely independent of requirements quality. Sometimes, beautiful
requirements produce beautiful code, sometimes, beautiful requirements
beautifully miss the point. Sometimes, Extreme programming projects
start out with no requirements and beauty results.
>
> w..
>
> --- news://freenews.netfront.net/ - complaints: n...@netfront.net ---
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 6:02:59 AM
|
|
On Feb 18, 2:02=A0pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 18, 12:33=A0pm, Walter Banks <wal...@bytecraft.com> wrote:
>
> > spinoza1111wrote:
> > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
>
> > > > Software development practices have improved a lot as applications
> > > > have become more complex and requirements better defined.
>
> > > How can "defining requirements" improve "software development
> > > practices"? "Defining requirements" is being able to express the
> > > requirements in English, other human languages, visuals, and if
> > > necessary mathematical formalisms: if you define them in code, you're
> > > not, by definition, "defining requirements": you are coding.
>
> > You just answered your own question.
>
> But dear boy, you said that software improves if we define
> requirements. Please explain how this work, and note that industrial
> experience is that the success of a software development project is
> completely independent of requirements quality. Sometimes, beautiful
> requirements produce beautiful code, sometimes, beautiful requirements
> beautifully miss the point. Sometimes, Extreme programming projects
> start out with no requirements and beauty results.
>
>
>
>
>
> > w..
>
> > --- news://freenews.netfront.net/ - complaints: n...@netfront.net ---
An interesting factoid about this discussion is that no American or
British posters, save one, have solved the problem I set (write
replace without using string.H).
Willem appears to be Dutch, and io_x appears to be Italian. I am an
American expatriate.
I think this is linked to the "Tea Party" movement in America, which
is ignorance rampant. It's people who absolutely rely on Social
Security and Medicare calling for an end to Social Security and
Medicare. It's also linked to the antics of the "British National
Party".
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 6:08:12 AM
|
|
spinoza1111 wrote:
> On Feb 18, 3:49 am, Seebs <usenet-nos...@seebs.net> wrote:
>> On 2010-02-17, Malcolm McLean <malcolm.mcle...@btinternet.com> wrote:
>>
>>> On Feb 17, 7:16 pm, Seebs <usenet-nos...@seebs.net> wrote:
>>>> Unless you're expecting to spend most of your programming career working
>>>> for people who have major obsessive problems with using technology in ways
>>>> that generally work, because of personal grudges never adequately explained,
>>>> I would suggest that perhaps what Nilges accepts or doesn't accept should
>>>> not be a component of any technical decision, ever.
>>> That's the ad hominem fallacy. It's not a pretentious term for
>>> "insult" but a common falacy, which is to suppose that an argument is
>>> wrong because of the person who is making it.
>> No, it's not an ad hominem fallacy. It's the very well supported view
>> that what Nilges accepts or doesn't accept should not be a component of any
>> technical decision. I'm not saying that an argument is wrong because of
>> the person who is making it. I'm saying that a conclusion should be ignored
>> (neither accepted nor rejected) based on the person who has offered it.
>
> ....and that's an ad hominem fallacy.
No, it isn't. Learn to read.
> If you'd troubled to take a class
> in informal logic, you'd have discovered that there is a VALID
> argument based on applicable authority, but none based on anti-
> authority.
This isn't about authority. It's about cluelessness.
> Nobody except a Fascist or a child refuses to believe
> something because of his hatred of a person.
That's not what he's suggesting. Learn to read. What he's saying is of
the form "X is a bozo. X says Y. Y might be right or it might be wrong.
Either way, X's opinion is irrelevant because X is a bozo." This is very
different from the "ad hominem" fallacy, "X says Y. X is a bozo.
Therefore Y is wrong". Seebs knows you're a bozo, but he's not daft
enough to assume a priori that everything you say is necessarily
incorrect. And this is very wise: despite the apparent difficulty of the
task, I can call several counter-examples to mind.
<nonsense snipped>
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/18/2010 6:47:36 AM
|
|
spinoza1111 wrote:
<snip>
>
> The fallacy here is that while I know a bit about C
s/while//
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/18/2010 6:51:58 AM
|
|
Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform. It does this by using the length of
the string, (e.g., fits in well with counted string), and initializes a hash
table of the stores characters of the comparand string from right-to-left.
It performs a search from left-to-right, however it compares source against
comparand in right-to-left. If a search character does not exist within the
comparand, the search index can be incremented by the length of the
comparand. For instance:
_______________________________________________________________
src = ChrisHelloJohn
cmp = Hello
hash['o'] = 5
hash['l'] = 4
hash['e'] = 2
hash['H'] = 1
_______________________________________________________________
Okay, we want to find `cmp' in `src. So you put the head index on `s' and
you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
zero. There are more elaborate ways to skip characters and I think you can
use the right-to-left hash list to determine other things as well. Here is
some a very crude initial code sketch of the algorithm I am thinking about:
http://clc.pastebin.com/f7df86b63
Here is the output I get for search for "Hello" in the following string:
"ldsefkdsjehuoeewhs;dfjpewfjHello9438jfgohj"
_______________________________________________________________
right-to-left hash hit 'o'!
right-to-left hash hit 'l'!
right-to-left hash miss 'l' chars
right-to-left hash hit 'e'!
right-to-left hash hit 'H'!
compare(1) 'f' to 'o'
skip 5 chars!
compare(2) 'e' to 'o'
compare(3) 'h' to 'o'
skip 5 chars!
compare(4) 'w' to 'o'
skip 5 chars!
compare(5) 'f' to 'o'
skip 5 chars!
compare(6) 'f' to 'o'
skip 5 chars!
compare(7) 'l' to 'o'
compare(8) 'o' to 'o'
compare(9) 'l' to 'l'
compare(10) 'l' to 'l'
compare(11) 'e' to 'e'
compare(12) 'H' to 'H'
((12) compares + (5) hashes) out of (42) characters target at (27)
Hello9438jfgohj
_______________________________________________________________
The algorithm was able to skip some characters and only perform 12
comparisons in order to find the comparand at index 27 in a string 42
characters wide. I think there are many more optimizations that could be
made...
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/18/2010 7:27:18 AM
|
|
"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:jZ5fn.556$Ee1.429@newsfe12.iad...
> Here is a simple technique that can possibly cut down on the number of
> search operations one needs to perform. It does this by using the length
> of the string, (e.g., fits in well with counted string), and initializes a
> hash table of the stores characters of the comparand string from
> right-to-left. It performs a search from left-to-right, however it
> compares source against comparand in right-to-left. If a search character
> does not exist within the comparand, the search index can be incremented
> by the length of the comparand. For instance:
> _______________________________________________________________
> src = ChrisHelloJohn
> cmp = Hello
> hash['o'] = 5
> hash['l'] = 4
> hash['e'] = 2
> hash['H'] = 1
> _______________________________________________________________
>
>
> Okay, we want to find `cmp' in `src. So you put the head index on `s' and
> you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
> zero. There are more elaborate ways to skip characters and I think you can
> use the right-to-left hash list to determine other things as well. Here is
> some a very crude initial code sketch of the algorithm I am thinking
> about:
>
>
> http://clc.pastebin.com/f7df86b63
[...]
I know there should be some bugs in there, so, don't use the code for
anything. I will post fixes as I find them. If you can fix a bug, that would
be very nice as well! Thanks in advance, I really would appreciate it.
:^)
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/18/2010 7:32:59 AM
|
|
I found a stupid bug; here is fix:
http://clc.pastebin.com/f4f063111
And here is the difference from the previous code:
http://clc.pastebin.com/pastebin.php?diff=f4f063111
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/18/2010 7:38:52 AM
|
|
On 18 Feb, 06:02, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 18, 12:33=A0pm, Walter Banks <wal...@bytecraft.com> wrote:
> > spinoza1111wrote:
> > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
> > > > Software development practices have improved a lot as applications
> > > > have become more complex and requirements better defined.
I'm not sure this is true !
> > > How can "defining requirements" improve "software development
> > > practices"?
acquiring good requirements is part of good software development
practice.
> > > "Defining requirements" is being able to express the
> > > requirements in English, other human languages, visuals, and if
> > > necessary mathematical formalisms:
yes. Though various graphical representaions are popular as well.
> > > if you define them in code, you're
> > > not, by definition, "defining requirements": you are coding.
well yes, but he didn't say define the requirements in code
> > You just answered your own question.
>
> But [...] you said that software improves if we define
> requirements.
no he didn't he said "Software development practices [are] improved
[if] requirements [are] better defined." (the actual quote is at the
beginning of the post).
> Please explain how this work, and note that industrial
> experience is that the success of a software development project is
> completely independent of requirements quality.
really, could you expand on that. I was under the impression that many
projects had failed due to badly defined requirements.
> Sometimes, beautiful
> requirements produce beautiful code, sometimes, beautiful requirements
> beautifully miss the point.
then they aren't good requirements! If you ask a guy to build a garage
then complain he missed the swimming pool out that means you gave him
a poor requirement.
> Sometimes, Extreme programming projects
> start out with no requirements and beauty results.
in a sense you're deriving the requirement by iterative refinement.
Have you read "Principles of Software Engineering Management" by Tom
Gilb. He was extreme programming before the (slightly silly) name was
invented.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/18/2010 8:07:49 AM
|
|
On 17 Feb, 21:14, Richard Heathfield <r...@see.sig.invalid> wrote:
<snip>
> The scanf function is basically a mess, and is rarely used correctly. I
> am at a loss to understand why it is introduced so early in programming
> texts.
a former clc regular once posted
***
The fscanf equivalent of fgets is so simple
that it can be used inline whenever needed:-
char s[NN + 1] = "", c;
int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);
***
I think scanf() is seen as a straight forward way to read simple
unvalidated input. I'm not convinced that's a good idea.
> The strtok function is of limited use, but there are times when it is
> just the ticket. It would be better, however, for it to take a state
> pointer. I'm not convinced that its interface is particularly non-intuitive.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/18/2010 8:16:49 AM
|
|
Nick Keighley wrote:
> On 17 Feb, 21:14, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> <snip>
>
>> The scanf function is basically a mess, and is rarely used correctly. I
>> am at a loss to understand why it is introduced so early in programming
>> texts.
>
> a former clc regular once posted
>
> ***
> The fscanf equivalent of fgets is so simple
> that it can be used inline whenever needed:-
> char s[NN + 1] = "", c;
> int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
> if (rc == 1) fscanf("%*[^\n]%*c);
> if (rc == 0) getc(fp);
> ***
:-)
> I think scanf() is seen as a straight forward way to read simple
> unvalidated input. I'm not convinced that's a good idea.
Experience of questions posted to this newsgroup strongly suggests that
it is not an even remotely good idea.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/18/2010 9:10:14 AM
|
|
"Chris M. Thomasson" <no@spam.invalid> wrote in message
news:jZ5fn.556$Ee1.429@newsfe12.iad...
> Here is a simple technique that can possibly cut down on the number of
> search operations one needs to perform.
[...]
> The algorithm was able to skip some characters and only perform 12
> comparisons in order to find the comparand at index 27 in a string 42
> characters wide. I think there are many more optimizations that could be
> made...
Here is a version with some minor optimizations that make it perform better
for certain types of input:
http://clc.pastebin.com/fe41ca06
here is difference:
http://clc.pastebin.com/pastebin.php?diff=fe41ca06
I am learning that this simple algorithm works better with large alphabets.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/18/2010 11:01:39 AM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:f12dcf90-987d-4b72-ac28-363761b398d2@z10g2000prh.googlegroups.com...
On Feb 16, 7:50 pm, "bartc" <ba...@freeuk.com> wrote:
> > spinoza1111wrote:
> > > On Feb 16, 3:13 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
[...]
> > >> will most likely be using processor specific instructions that
> > >> provide a level of efficiency that cannot be reached with 100% pure
> > >> portable C code.
> >
> > > How is that possible? The compiler of the library code will emit
> > > "processor specific" instructions, to be sure, but it will do the same
> > > for me, or any man. And if the library code forces out assembler code,
> > > then it will only work on one processor, or at best small n processor.
> >
> > I think standard library routines can be written in a language other
> > than C,
> > or some mix. For example, hand-written assembly.
> Correct. Wonder how many library routines are written in assembler.
> Don't know.
Here is how `strstr()' is implmented on one of my compilers:
http://clc.pastebin.com/m15e4ab43
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/18/2010 11:13:48 AM
|
|
spinoza1111 wrote:
> On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
> > spinoza1111wrote:
> > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
> >
> > > > Software development practices have improved a lot as applications
> > > > have become more complex and requirements better defined.
> >
> > > How can "defining requirements" improve "software development
> > > practices"? "Defining requirements" is being able to express the
> > > requirements in English, other human languages, visuals, and if
> > > necessary mathematical formalisms: if you define them in code, you're
> > > not, by definition, "defining requirements": you are coding.
> >
> > You just answered your own question.
>
> But dear boy, you said that software improves if we define
> requirements.
Correct, you finally got it.
> Please explain how this work, and note that industrial
> experience is that the success of a software development project is
> completely independent of requirements quality.
Are you asking or telling?
> Sometimes, beautiful
> requirements produce beautiful code, sometimes, beautiful requirements
> beautifully miss the point. Sometimes,
But does the code do what is needed?
> Extreme programming projects
> start out with no requirements and beauty results.
That makes no sense. No matter what comes out of a program with no
requirements must be correct, novel approach to guaranteed success
You really need to work on comprehension.
w..
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/18/2010 11:47:09 AM
|
|
If found another bug; here is fix:
http://clc.pastebin.com/fdc461ce
and difference:
http://clc.pastebin.com/pastebin.php?diff=fdc461ce
I think I will write a little automated test application.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/18/2010 11:48:01 AM
|
|
On Feb 18, 2:47=A0pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> spinoza1111wrote:
> > On Feb 18, 3:49 am, Seebs <usenet-nos...@seebs.net> wrote:
> >> On 2010-02-17, Malcolm McLean <malcolm.mcle...@btinternet.com> wrote:
>
> >>> On Feb 17, 7:16 pm, Seebs <usenet-nos...@seebs.net> wrote:
> >>>> Unless you're expecting to spend most of your programming career wor=
king
> >>>> for people who have major obsessive problems with using technology i=
n ways
> >>>> that generally work, because of personal grudges never adequately ex=
plained,
> >>>> I would suggest that perhaps what Nilges accepts or doesn't accept s=
hould
> >>>> not be a component of any technical decision, ever.
> >>> That's the ad hominem fallacy. It's not a pretentious term for
> >>> "insult" but a common falacy, which is to suppose that an argument is
> >>> wrong because of the person who is making it.
> >> No, it's not an ad hominem fallacy. =A0It's the very well supported vi=
ew
> >> that what Nilges accepts or doesn't accept should not be a component o=
f any
> >> technical decision. =A0I'm not saying that an argument is wrong becaus=
e of
> >> the person who is making it. =A0I'm saying that a conclusion should be=
ignored
> >> (neither accepted nor rejected) based on the person who has offered it=
..
>
> > ....and that's an ad hominem fallacy.
>
> No, it isn't. Learn to read.
>
> =A0> If you'd troubled to take a class
>
> > in informal logic, you'd have discovered that there is a VALID
> > argument based on applicable authority, but none based on anti-
> > authority.
>
> This isn't about authority. It's about cluelessness.
>
> =A0> Nobody except a Fascist or a child refuses to believe
>
> > something because of his hatred of a person.
>
> That's not what he's suggesting. Learn to read. What he's saying is of
> the form "X is a bozo. X says Y. Y might be right or it might be wrong.
> Either way, X's opinion is irrelevant because X is a bozo." This is very
> different from the "ad hominem" fallacy, "X says Y. X is a bozo.
> Therefore Y is wrong". Seebs knows you're a bozo, but he's not daft
> enough to assume a priori that everything you say is necessarily
> incorrect. And this is very wise: despite the apparent difficulty of the
> task, I can call several counter-examples to mind.
Knowledge is justified true belief pace Gettier, but Seebs is not a
qualified programmer (incorrect simple examples posted, unable to code
without string.h, ignorance of the heap, no education in comp sci,
unable to properly moderate clcm). Therefore he merely "believes" I'm
a bozo because he can't take constructive criticism.
And he is daft enough, in fact, to reject everything I say no matter
what, so far. Whereas you are not.
>
> <nonsense snipped>
>
> --
> Richard Heathfield <http://www.cpax.org.uk>
> Email: -http://www. +rjh@
> "Usenet is a strange place" - dmr 29 July 1999
> Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 11:48:11 AM
|
|
On Feb 18, 3:27=A0pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> Here is a simple technique that can possibly cut down on the number of
> search operations one needs to perform. It does this by using the length =
of
> the string, (e.g., fits in well with counted string), and initializes a h=
ash
> table of the stores characters of the comparand string from right-to-left=
..
> It performs a search from left-to-right, however it compares source again=
st
> comparand in right-to-left. If a search character does not exist within t=
he
> comparand, the search index can be incremented by the length of the
> comparand. For instance:
> _______________________________________________________________
> src =A0 =A0 =A0 =3D ChrisHelloJohn
> cmp =A0 =A0 =A0 =3D Hello
> hash['o'] =3D 5
> hash['l'] =3D 4
> hash['e'] =3D 2
> hash['H'] =3D 1
> _______________________________________________________________
>
> Okay, we want to find `cmp' in `src. So you put the head index on `s' and
> you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' i=
s
> zero. There are more elaborate ways to skip characters and I think you ca=
n
> use the right-to-left hash list to determine other things as well. Here i=
s
> some a very crude initial code sketch of the algorithm I am thinking abou=
t:
>
> http://clc.pastebin.com/f7df86b63
>
> Here is the output I get for search for "Hello" in the following string:
>
> "ldsefkdsjehuoeewhs;dfjpewfjHello9438jfgohj"
> _______________________________________________________________
> right-to-left hash hit 'o'!
> right-to-left hash hit 'l'!
> right-to-left hash miss 'l' chars
> right-to-left hash hit 'e'!
> right-to-left hash hit 'H'!
> compare(1) 'f' to 'o'
> skip 5 chars!
> compare(2) 'e' to 'o'
> compare(3) 'h' to 'o'
> skip 5 chars!
> compare(4) 'w' to 'o'
> skip 5 chars!
> compare(5) 'f' to 'o'
> skip 5 chars!
> compare(6) 'f' to 'o'
> skip 5 chars!
> compare(7) 'l' to 'o'
> compare(8) 'o' to 'o'
> compare(9) 'l' to 'l'
> compare(10) 'l' to 'l'
> compare(11) 'e' to 'e'
> compare(12) 'H' to 'H'
> ((12) compares + (5) hashes) out of (42) characters target at (27)
> Hello9438jfgohj
> _______________________________________________________________
>
> The algorithm was able to skip some characters and only perform 12
> comparisons in order to find the comparand at index 27 in a string 42
> characters wide. I think there are many more optimizations that could be
> made...
Excellent work. Thanks, will examine when I have more time.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 11:51:40 AM
|
|
On Feb 18, 4:07=A0pm, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:
> On 18 Feb, 06:02,spinoza1111<spinoza1...@yahoo.com> wrote:
>
> > On Feb 18, 12:33=A0pm, Walter Banks <wal...@bytecraft.com> wrote:
> > > spinoza1111wrote:
> > > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
> > > > > Software development practices have improved a lot as application=
s
> > > > > have become more complex and requirements better defined.
>
> I'm not sure this is true !
>
> > > > How can "defining requirements" improve "software development
> > > > practices"?
>
> acquiring good requirements is part of good software development
> practice.
>
> > > > "Defining requirements" is being able to express the
> > > > requirements in English, other human languages, visuals, and if
> > > > necessary mathematical formalisms:
>
> yes. Though various graphical representaions are popular as well.
>
> > > > if you define them in code, you're
> > > > not, by definition, "defining requirements": you are coding.
>
> well yes, but he didn't say define the requirements in code
>
> > > You just answered your own question.
>
> > But [...] you said that software improves if we define
> > requirements.
>
> no he didn't he said "Software development practices [are] improved
> [if] requirements [are] better defined." (the actual quote is at the
> beginning of the post).
>
> > Please explain how this work, and note that industrial
> > experience is that the success of a software development project is
> > completely independent of requirements quality.
>
> really, could you expand on that. I was under the impression that many
> projects had failed due to badly defined requirements.
Requirements are by definition inadequate for if they were adequate,
we could build a compiler to compile them. The demand for
"requirements definition" presupposes that the "analysts", thought to
be of a higher social class, know by virtue of their social class more
than the "mere" programmers.
>
> > Sometimes, beautiful
> > requirements produce beautiful code, sometimes, beautiful requirements
> > beautifully miss the point.
>
> then they aren't good requirements! If you ask a guy to build a garage
> then complain he missed the swimming pool out that means you gave him
> a poor requirement.
>
> > Sometimes, Extreme programming projects
> > start out with no requirements and beauty results.
>
> in a sense you're deriving the requirement by iterative refinement.
> Have you read "Principles of Software Engineering Management" by Tom
> Gilb. He was extreme programming before the (slightly silly) name was
> invented.
Yes, I've read Gilb...a member of the Scandinavian school of data
processing theory, which took the needs of the workers into account.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 11:54:52 AM
|
|
On Feb 18, 7:47=A0pm, Walter Banks <wal...@bytecraft.com> wrote:
> spinoza1111wrote:
> > On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
> > > spinoza1111wrote:
> > > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
>
> > > > > Software development practices have improved a lot as application=
s
> > > > > have become more complex and requirements better defined.
>
> > > > How can "defining requirements" improve "software development
> > > > practices"? "Defining requirements" is being able to express the
> > > > requirements in English, other human languages, visuals, and if
> > > > necessary mathematical formalisms: if you define them in code, you'=
re
> > > > not, by definition, "defining requirements": you are coding.
>
> > > You just answered your own question.
>
> > But dear boy, you said that software improves if we define
> > requirements.
>
> Correct, you finally got it.
>
> > Please explain how this work, and note that industrial
> > experience is that the success of a software development project is
> > completely independent of requirements quality.
>
> Are you asking or telling?
>
> > Sometimes, beautiful
> > requirements produce beautiful code, sometimes, beautiful requirements
> > beautifully miss the point. Sometimes,
>
> But does the code do what is needed?
>
> > Extreme programming projects
> > start out with no requirements and beauty results.
>
> That makes no sense. No matter what comes out of a program with no
> requirements must be correct, novel approach to guaranteed success
No, the programmers themselves are human beings and able, in some
cases better than users, to express the requirements. The myth of
requirements started as a conspiracy between incompetent and
irresponsible programmers who wanted to be indemnified for their
ignorance and stupid mistakes, and users who wanted credit and
"control".
>
> You really need to work on comprehension.
>
> w..
>
> --- news://freenews.netfront.net/ - complaints: n...@netfront.net ---
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/18/2010 12:21:24 PM
|
|
On 18 Feb, 11:54, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 18, 4:07=A0pm, Nick Keighley <nick_keighley_nos...@hotmail.com>
> > On 18 Feb, 06:02,spinoza1111<spinoza1...@yahoo.com> wrote:
> > > On Feb 18, 12:33=A0pm, Walter Banks <wal...@bytecraft.com> wrote:
> > > > spinoza1111wrote:
> > > > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
> > > > > > Software development practices have improved a lot as applicati=
ons
> > > > > > have become more complex and requirements better defined.
[...]
> > > > > How can "defining requirements" improve "software development
> > > > > practices"?
>
> > acquiring good requirements is part of good software development
> > practice.
<snip>
> > > Please explain how this work, and note that industrial
> > > experience is that the success of a software development project is
> > > completely independent of requirements quality.
>
> > really, could you expand on that. I was under the impression that many
> > projects had failed due to badly defined requirements.
>
> Requirements are by definition inadequate for if they were adequate,
> we could build a compiler to compile them.
comp.lang.lisp are discussing this at the moment. They have a slightly
different view in that they program in the Highest Level Possible
Langage (HLPL) hence if someone is going to write a specification in a
formal language they might as well just write it in Lisp, and then
execute it.
The point was raised that a requirement can be complete but not
specify a program.
- compute 1 million digits of pi
- decrypt an encypted message given only the public key
> The demand for
> "requirements definition" presupposes that the "analysts", thought to
> be of a higher social class, know by virtue of their social class more
> than the "mere" programmers.
no, I just think the guy who is going to use the thing ought to give
us vague idea of what he wants and what he expects to do with it. If
you want to call the people who extract this information "analysts",
so be it.
<snip>
> > > Sometimes, Extreme programming projects
> > > start out with no requirements and beauty results.
>
> > in a sense you're deriving the requirement by iterative refinement.
> > Have you read "Principles of Software Engineering Management" by Tom
> > Gilb. He was extreme programming before the (slightly silly) name was
> > invented.
>
> Yes, I've read Gilb...a member of the Scandinavian school of data
> processing theory, which took the needs of the workers into account.
and despite this, writes a rattling good book!
He's firmly of the belief that the requirements are never complete. In
part because the program changes its environment. The existence of the
program causes working practices to change and hence requirements to
alter. The solution is "evolutionary delivery". A series of deliveries
that asymptopically approach the (hopefully!) settling requirment.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/18/2010 1:58:06 PM
|
|
spinoza1111 wrote:
> I don't know where you . . . get that claim you said I made:
> that I wish to destroy this newsgroup.
I have never said that.
Is that your wish?
w..
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/18/2010 2:08:39 PM
|
|
On 18 Feb, 13:58, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:
> On 18 Feb, 11:54, spinoza1111 <spinoza1...@yahoo.com> wrote:
<snip>
> > Requirements are by definition inadequate for if they were adequate,
> > we could build a compiler to compile them.
>
> comp.lang.lisp are discussing this at the moment. They have a slightly
> different view in that they program in the Highest Level Possible
> Langage (HLPL) hence if someone is going to write a specification in a
> formal language they might as well just write it in Lisp, and then
> execute it.
the thread is called "a defense of ad hoc software development"
***
The difference between specifications and programs is a
difference in degree, not a difference in kind. Once we realize this,
it seems strange to require that one write specifications for a
program before beginning to implement it. If the program has to be
written in a low-level language, then it would be reasonable to
require that it be described in high-level terms first. But as the
programming language becomes more abstract, the need for
specifications begins to evaporate. Or rather, the implementation and
the specifications can become the same thing.
If the high-level language is going to be re-implemented in a
lower-level language, it starts to look even more like
specifications.
[...] in other words, [...] the specifications for C programs could
be
written in Lisp.
***
<snip>
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/18/2010 2:08:44 PM
|
|
spinoza1111 wrote:
> On Feb 18, 4:07 pm, Nick Keighley <nick_keighley_nos...@hotmail.com>
> wrote:
> > On 18 Feb, 06:02,spinoza1111<spinoza1...@yahoo.com> wrote:
> >
> > > On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
> > > > spinoza1111wrote:
> > > > > On Feb 18, 6:08 am, Walter Banks <wal...@bytecraft.com> wrote:
> > > > > > Software development practices have improved a lot as applications
> > > > > > have become more complex and requirements better defined.
> >
> > I'm not sure this is true !
> >
> > > > > How can "defining requirements" improve "software development
> > > > > practices"?
> >
> > acquiring good requirements is part of good software development
> > practice.
> >
> > > > > "Defining requirements" is being able to express the
> > > > > requirements in English, other human languages, visuals, and if
> > > > > necessary mathematical formalisms:
> >
> > yes. Though various graphical representaions are popular as well.
> >
> > > > > if you define them in code, you're
> > > > > not, by definition, "defining requirements": you are coding.
> >
> > well yes, but he didn't say define the requirements in code
> >
> > > > You just answered your own question.
> >
> > > But [...] you said that software improves if we define
> > > requirements.
> >
> > no he didn't he said "Software development practices [are] improved
> > [if] requirements [are] better defined." (the actual quote is at the
> > beginning of the post).
> >
> > > Please explain how this work, and note that industrial
> > > experience is that the success of a software development project is
> > > completely independent of requirements quality.
> >
> > really, could you expand on that. I was under the impression that many
> > projects had failed due to badly defined requirements.
>
> Requirements are by definition inadequate for if they were adequate,
> we could build a compiler to compile them.
Defining how an application is supposed to behave is very different
from an application design and implementation. Requirements provide
clear input into software design and application testing.
w..
|
|
0
|
|
|
|
Reply
|
walter20 (874)
|
2/18/2010 2:27:56 PM
|
|
On Feb 18, 2:32=A0am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in messagenews:jZ5fn.556$E=
e1.429@newsfe12.iad...
>
>
>
> > Here is a simple technique that can possibly cut down on the number of
> > search operations one needs to perform. It does this by using the lengt=
h
> > of the string, (e.g., fits in well with counted string), and initialize=
s a
> > hash table of the stores characters of the comparand string from
> > right-to-left. It performs a search from left-to-right, however it
> > compares source against comparand in right-to-left. If a search charact=
er
> > does not exist within the comparand, the search index can be incremente=
d
> > by the length of the comparand. For instance:
> > _______________________________________________________________
> > src =A0 =A0 =A0 =3D ChrisHelloJohn
> > cmp =A0 =A0 =A0 =3D Hello
> > hash['o'] =3D 5
> > hash['l'] =3D 4
> > hash['e'] =3D 2
> > hash['H'] =3D 1
> > _______________________________________________________________
>
> > Okay, we want to find `cmp' in `src. So you put the head index on `s' a=
nd
> > you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]'=
is
> > zero. There are more elaborate ways to skip characters and I think you =
can
> > use the right-to-left hash list to determine other things as well. Here=
is
> > some a very crude initial code sketch of the algorithm I am thinking
> > about:
>
> >http://clc.pastebin.com/f7df86b63
>
> [...]
>
> I know there should be some bugs in there, so, don't use the code for
> anything. I will post fixes as I find them. If you can fix a bug, that wo=
uld
> be very nice as well! Thanks in advance, I really would appreciate it.
>
> :^)
Would you write me a check for $2.56 for each one ;)
|
|
0
|
|
|
|
Reply
|
jadill33 (201)
|
2/18/2010 2:53:30 PM
|
|
On Wed, 17 Feb 2010 23:27:18 -0800, "Chris M. Thomasson"
<no@spam.invalid> wrote:
>Here is a simple technique that can possibly cut down on the number of
>search operations one needs to perform. It does this by using the length of
>the string, (e.g., fits in well with counted string), and initializes a hash
>table of the stores characters of the comparand string from right-to-left.
>It performs a search from left-to-right, however it compares source against
>comparand in right-to-left. If a search character does not exist within the
>comparand, the search index can be incremented by the length of the
>comparand. For instance:
>_______________________________________________________________
>src = ChrisHelloJohn
>cmp = Hello
>hash['o'] = 5
>hash['l'] = 4
>hash['e'] = 2
>hash['H'] = 1
>_______________________________________________________________
>
>
>Okay, we want to find `cmp' in `src. So you put the head index on `s' and
>you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
>zero. There are more elaborate ways to skip characters and I think you can
>use the right-to-left hash list to determine other things as well. Here is
>some a very crude initial code sketch of the algorithm I am thinking about:
>
>
>http://clc.pastebin.com/f7df86b63
You are well on your way to reinventing Boyer-Moore.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.
|
|
0
|
|
|
|
Reply
|
cri (1432)
|
2/18/2010 3:09:04 PM
|
|
"Chris M. Thomasson" <no@spam.invalid> writes:
> "spinoza1111" <spinoza1111@yahoo.com> wrote in message
> news:f12dcf90-987d-4b72-ac28-363761b398d2@z10g2000prh.googlegroups.com...
> On Feb 16, 7:50 pm, "bartc" <ba...@freeuk.com> wrote:
>> > spinoza1111wrote:
>> > > On Feb 16, 3:13 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
> [...]
>
>> > >> will most likely be using processor specific instructions that
>> > >> provide a level of efficiency that cannot be reached with 100% pure
>> > >> portable C code.
>> >
>> > > How is that possible? The compiler of the library code will emit
>> > > "processor specific" instructions, to be sure, but it will do the same
>> > > for me, or any man. And if the library code forces out assembler code,
>> > > then it will only work on one processor, or at best small n processor.
>> >
>> > I think standard library routines can be written in a language other
>> > than C,
>> > or some mix. For example, hand-written assembly.
>
>> Correct. Wonder how many library routines are written in assembler.
>> Don't know.
>
> Here is how `strstr()' is implmented on one of my compilers:
>
> http://clc.pastebin.com/m15e4ab43
>
>
Would trust any programmer who comments like this?
,----
| push edi ; Preserve edi, ebx and esi
| push ebx
| push esi
`----
--
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c
|
|
0
|
|
|
|
Reply
|
rgrdev_ (1087)
|
2/18/2010 6:14:01 PM
|
|
"Richard" <rgrdev_@gmail.com> wrote in message
news:akc057-5tu.ln1@news.eternal-september.org...
> "Chris M. Thomasson" <no@spam.invalid> writes:
>> Here is how `strstr()' is implmented on one of my compilers:
>>
>> http://clc.pastebin.com/m15e4ab43
>>
>>
>
> Would trust any programmer who comments like this?
>
> ,----
> | push edi ; Preserve edi, ebx and esi
> | push ebx
> | push esi
> `----
I found the following quite scary:
mov ecx,[esp + 8] ; str2 (the string to be searched for)
push edi ; Preserve edi, ebx and esi
push ebx
push esi
....
mov edi,[esp + 10h] ; str1 (the string to be searched)
....
mov ecx,[esp + 14h] ; str2
First, that it uses hardcoded offsets onto the stack. Secondly, that after
those pushes it just matter-of-factly uses a different set of offsets to
compensate! And perhaps thirdly that it uses hex for the offsets...
Which leads me to think it might be machine generated, with the comments
added later?
--
Bartc
|
|
0
|
|
|
|
Reply
|
bartc (783)
|
2/18/2010 10:08:52 PM
|
|
"spinoza1111" <spinoza1111@yahoo.com> wrote in message
news:90c197ae-0525-47e6-b44a-630d7ad025c5@z10g2000prh.googlegroups.com...
On Feb 18, 5:51 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > "Richard Heathfield" <r...@see.sig.invalid> wrote in message
> >
> > news:FrCdnTZJc8ajweHWnZ2dnUVZ8r6dnZ2d@bt.com...
> >
> > [...]
> >
> > > Yes, strcat and strlen are O(N) - so, where it matters, you remember
> > > the
> > > string length, having found it out the first time.
> >
> > Bingo! :^)
>
> Yes, strcat and strlen are order N
> So be sure to squirrel away their values, like the Hen
> Secretes seeds in her cache of seminal treasures
> Great programmers, says Dickie, take these measures.
I personally think it would be wise to stash length values away if you need
to use them again, and again, and again.... Okay, you get the point.
;^)
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/19/2010 1:31:04 AM
|
|
"ImpalerCore" <jadill33@gmail.com> wrote in message
news:493b1e93-a4ea-4f17-9649-4c4b10a4b602@f8g2000yqn.googlegroups.com...
On Feb 18, 2:32 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
[...]
> > I know there should be some bugs in there, so, don't use the code for
> > anything. I will post fixes as I find them. If you can fix a bug, that
> > would
> > be very nice as well! Thanks in advance, I really would appreciate it.
> >
> > :^)
> > Would you write me a check for $2.56 for each one ;)
Na. However, I would praise you for finding a bug Sir! Well, that's not that
much of an incentive in any way, shape or form!!!! Well, thank you anyway!
;^D
If you find one, could you please educate me? IMHO, a dead bug is a good
bug!!!!
:^D
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/19/2010 1:50:48 AM
|
|
"Richard Harter" <cri@tiac.net> wrote in message
news:4b7d57d0.250354671@text.giganews.com...
> On Wed, 17 Feb 2010 23:27:18 -0800, "Chris M. Thomasson"
> <no@spam.invalid> wrote:
>
>>Here is a simple technique that can possibly cut down on the number of
>>search operations one needs to perform.
[...]
>>http://clc.pastebin.com/f7df86b63
> You are well on your way to reinventing Boyer-Moore.
Humm. I don't compute two tables. Anyway, yes you are right! My solution ==
"kind of?" crappy at best! Skipping `cmp_size' chars is good... This just
might be a hard core retarded version of great algorithms?
:^D
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/19/2010 1:53:46 AM
|
|
In article <08119931-a822-47ae-aeab-609cb72aea7a@p13g2000pre.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 18, 2:02 pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
> > On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
[ snip ]
> An interesting factoid about this discussion is that no American or
> British posters, save one, have solved the problem I set (write
> replace without using string.H).
Oh, it's string.H we're to avoid? is it okay to use string.h?
Yeah, yeah, nitpick ....
Well, for the record, I'm an American mostly-lurker, and I've
been tinkering with various solutions over the past few days, not
looking carefully at others' solutions because that would seem
to spoil the fun. I've just posted some of them, in a reply to
Seebs's post with subject line "Efficency [sic] and the standard
library".
For what it's worth, all my solutions use recursion -- that seemed
to me to be a natural way to solve this problem. <shrug>
> Willem appears to be Dutch, and io_x appears to be Italian. I am an
> American expatriate.
>
> I think this is linked to the "Tea Party" movement in America, which
> is ignorance rampant. It's people who absolutely rely on Social
> Security and Medicare calling for an end to Social Security and
> Medicare. It's also linked to the antics of the "British National
> Party".
(Quoted to preserve context. I'll resist the temptation to comment.)
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/19/2010 2:49:34 PM
|
|
On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
<blmblm@myrealbox.com> wrote:
>In article <08119931-a822-47ae-aeab-609cb72aea7a@p13g2000pre.googlegroups.com>,
>spinoza1111 <spinoza1111@yahoo.com> wrote:
>> On Feb 18, 2:02 pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
>> > On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
>
>[ snip ]
>
>> An interesting factoid about this discussion is that no American or
>> British posters, save one, have solved the problem I set (write
>> replace without using string.H).
>
>Oh, it's string.H we're to avoid? is it okay to use string.h?
>Yeah, yeah, nitpick ....
My understanding is that the requirement is there to make life
simpler for people not family with the C standard library,
particularly the functions prototyped in string.h, and for people
who have trouble distinguishing between upper and lower case.
>
>Well, for the record, I'm an American mostly-lurker, and I've
>been tinkering with various solutions over the past few days, not
>looking carefully at others' solutions because that would seem
>to spoil the fun. I've just posted some of them, in a reply to
>Seebs's post with subject line "Efficency [sic] and the standard
>library".
>
>For what it's worth, all my solutions use recursion -- that seemed
>to me to be a natural way to solve this problem. <shrug>
I also happen to be American and I also posted one; however I put
it in a separate thread. See the thread entitled "A substring
replacement implementation". You might want to include it in
your test suite.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.
|
|
0
|
|
|
|
Reply
|
cri (1432)
|
2/19/2010 4:26:56 PM
|
|
On 2010-02-19, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
>> An interesting factoid about this discussion is that no American or
>> British posters, save one, have solved the problem I set (write
>> replace without using string.H).
> Oh, it's string.H we're to avoid? is it okay to use string.h?
> Yeah, yeah, nitpick ....
It's also a silly challenge. If I were going to do that, my first pass
would just be to put "my" in front of the str* functions I use, and
define them.
But why would I do that? How about we set a new challenge, which is to
do matrix multiplication without using the '*' operator. Or, better yet,
how about we iterate through an array without using a for loop, and write
our own routines to interpolate numbers into textual strings, so we don't
have to rely on printf()?
Without a *reason* to not use the string library, this is nothing more
than a challenge to do something stupid.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/19/2010 4:55:47 PM
|
|
Seebs <usenet-nospam@seebs.net> writes:
> But why would I do that? How about we set a new challenge, which is to
> do matrix multiplication without using the '*' operator. Or, better yet,
> how about we iterate through an array without using a for loop, and write
> our own routines to interpolate numbers into textual strings, so we don't
> have to rely on printf()?
Funny, I see posters asking for code to do dumb things like this
regularly.
--
Ben Pfaff
http://benpfaff.org
|
|
0
|
|
|
|
Reply
|
blp (3953)
|
2/19/2010 5:24:10 PM
|
|
Malcolm McLean <malcolm.mclean5@btinternet.com> writes:
> On Feb 17, 7:16 pm, Seebs <usenet-nos...@seebs.net> wrote:
>> On 2010-02-17, fedora <no_m...@invalid.invalid> wrote:
>>
>> > Thanks Ben! I didn't use stdlib functions because i wanted to how
>> > easy/difficult it would be to write my own. also spinoza didn't accept
>> > program that called function in string.h.
>>
>> Unless you're expecting to spend most of your programming career working
>> for people who have major obsessive problems with using technology in ways
>> that generally work, because of personal grudges never adequately explained,
>> I would suggest that perhaps what Nilges accepts or doesn't accept should
>> not be a component of any technical decision, ever.
>>
> That's the ad hominem fallacy. It's not a pretentious term for
> "insult" but a common falacy, which is to suppose that an argument is
> wrong because of the person who is making it.
>
> In fact there are good reasons for deprecating string.h. chars
> effectively have to be octets, whilst often programs need to accept
> non-Latin strings. Then the functions are all very old, with certain
> weaknesses (no protection from buffer overun in strcpy, an O(N)
> performance for strcat and strlen, an inconvenient interface for
> strcat, const inconsistencies with strchr, very poor functionality
> with strfind and const inconsiencies here too, very serious buffer
> problems with sprintf, an overly difficult interface and buffer
> problems with sscanf, thread problems with strtok and a non-intuitive
> interface.
That, though, is an argument for implementing a different string library
- of which there are dozens. /Then/ you use that to implement your find
and replace.
What you don't do is to re-implement the standard library with all the
API drawbacks.
Once you have a basic dynamic string library, it's pretty trivial -
here's a snippet from my source bucket.
dstring *Do_Fn_Replace(const char *substrate,const char *pattern,const char *replacement) {
dstring *res = dstr_create(D_EXPR);
char *p;
// no activity on empty substrate or pattern
if(*substrate == '\0' || *pattern == '\0') {
dstrset(res,substrate);
return res;
}
dstrset(res,"");
while((p=strstr(substrate,pattern))) {
dstrncat(res,substrate,p-substrate);
dstrcat(res,replacement);
substrate = p + strlen(pattern);
}
dstrcat(res,substrate);
return res;
}
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk
|
|
0
|
|
|
|
Reply
|
3-nospam (285)
|
2/19/2010 6:12:16 PM
|
|
In article <4b7eb795.340407953@text.giganews.com>,
Richard Harter <cri@tiac.net> wrote:
> On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
> <blmblm@myrealbox.com> wrote:
>
> >In article <08119931-a822-47ae-aeab-609cb72aea7a@p13g2000pre.googlegroups.com>,
> >spinoza1111 <spinoza1111@yahoo.com> wrote:
> >> On Feb 18, 2:02 pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
> >> > On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
> >
> >[ snip ]
> >
> >> An interesting factoid about this discussion is that no American or
> >> British posters, save one, have solved the problem I set (write
> >> replace without using string.H).
> >
> >Oh, it's string.H we're to avoid? is it okay to use string.h?
> >Yeah, yeah, nitpick ....
>
> My understanding is that the requirement is there to make life
> simpler for people not family with the C standard library,
> particularly the functions prototyped in string.h, and for people
> who have trouble distinguishing between upper and lower case.
>
> >
> >Well, for the record, I'm an American mostly-lurker, and I've
> >been tinkering with various solutions over the past few days, not
> >looking carefully at others' solutions because that would seem
> >to spoil the fun. I've just posted some of them, in a reply to
> >Seebs's post with subject line "Efficency [sic] and the standard
> >library".
> >
> >For what it's worth, all my solutions use recursion -- that seemed
> >to me to be a natural way to solve this problem. <shrug>
>
> I also happen to be American and I also posted one; however I put
> it in a separate thread. See the thread entitled "A substring
> replacement implementation". You might want to include it in
> your test suite.
Yes, I saw your code, though I really only looked at the main
program and the comments -- both of which I rather like better
than mine.
I think I was rather hoping that the others who had posted
solutions might use my code to benchmark, but really, how hard
would it be for *me* to do that .... Maybe I will. "Stay tuned"?
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/19/2010 6:12:55 PM
|
|
On 2010-02-19, Nick <3-nospam@temporary-address.org.uk> wrote:
> Once you have a basic dynamic string library, it's pretty trivial -
> here's a snippet from my source bucket.
And if you have a sufficiently complete string library, it's COMPLETELY
trivial:
char *do_replace(char *needle, char *haystack, char *replacement) {
return mylib_replace_string(needle, haystack, replacement);
}
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/19/2010 6:27:38 PM
|
|
Seebs <usenet-nospam@seebs.net> writes:
> On 2010-02-19, Nick <3-nospam@temporary-address.org.uk> wrote:
>> Once you have a basic dynamic string library, it's pretty trivial -
>> here's a snippet from my source bucket.
>
> And if you have a sufficiently complete string library, it's COMPLETELY
> trivial:
>
> char *do_replace(char *needle, char *haystack, char *replacement) {
> return mylib_replace_string(needle, haystack, replacement);
> }
Well exactly. My first level implements the basic allocate and destroy,
plus "string.h like" functions to assign, extend, get length, etc. You
have to stop somewhere.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk
|
|
0
|
|
|
|
Reply
|
3-nospam (285)
|
2/19/2010 8:25:24 PM
|
|
On 2010-02-19, Nick <3-nospam@temporary-address.org.uk> wrote:
> Well exactly. My first level implements the basic allocate and destroy,
> plus "string.h like" functions to assign, extend, get length, etc. You
> have to stop somewhere.
Yeah. Mine stopped a little past the functionality of <string.h> but not
much past it. In practice, it seems to have been enough.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/19/2010 8:37:11 PM
|
|
On Feb 20, 12:55=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-19, blmblm =A0myrealbox.com <blm...@myrealbox.com> wrote:
>
> >> An interesting factoid about this discussion is that no American or
> >> British posters, save one, have solved the problem I set (write
> >> replace without using string.H).
> > Oh, it's string.H we're to avoid? =A0is it okay to use string.h?
> > Yeah, yeah, nitpick ....
>
> It's also a silly challenge. =A0If I were going to do that, my first pass
> would just be to put "my" in front of the str* functions I use, and
> define them.
That sounds like (1) unethical behavior and (2) to be expected from
you.
Look, why not just admit it. You had no clue how to solve the
challenged problem, which could easily arise in any number of
circumstances. You're really not contributing much to the discussion.
>
> But why would I do that? =A0How about we set a new challenge, which is to
> do matrix multiplication without using the '*' operator. =A0Or, better ye=
t,
> how about we iterate through an array without using a for loop, and write
> our own routines to interpolate numbers into textual strings, so we don't
> have to rely on printf()?
Those of us who can (who like Schildt have written compilers) do this
for fun, for the same reason classical pianists play Czerny finger
exercises, and Bach wrote the Art of Fugue. There are some of us who
don't advance our careers by destroying reputations and buying our way
onto standards boards without any academic qualifications.
Furthermore, if I were doing matrix multiplication times a power of
two, I wouldn't use the * operator. Do you even know what operator I
would use?
>
> Without a *reason* to not use the string library, this is nothing more
> than a challenge to do something stupid.
Fox and grapes. You couldn't do it.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/20/2010 3:03:41 PM
|
|
spinoza1111 wrote:
>
> Furthermore, if I were doing matrix multiplication times a power of
> two, I wouldn't use the * operator. Do you even know what operator I
> would use?
>
Hm.
Greets
|
|
0
|
|
|
|
Reply
|
bmaxa209 (243)
|
2/20/2010 3:08:41 PM
|
|
In article <112e4233-b9bd-4ff5-ab9e-eb6aa73fb743@x1g2000prb.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 18, 5:39 am, Seebs <usenet-nos...@seebs.net> wrote:
> > On 2010-02-17, Walter Banks <wal...@bytecraft.com> wrote:
> >
> > > This whole project thread has been filled how not to engineer software.
> > > Application code specifically avoiding libraries, Not Invented Here,
> > > random design with moving target specifications or no specifications,
> > > ad hoc testing with a dose of 20+ Year old unresolved office battles,
> > > interpersonal rivalry and off topic rants.
> > > As several have stated not the environment that we are used to.
> >
> > Yes, but it's important to be prepared to program in some of the many
> > environments which real programmers often end up having to work in.
> >
> > To be fair, I've never had a coworker in the same class as Nilges. Not
>
> No, I don't think you have.
>
> > even particularly close. But I have had to work with arbitrary or
> > bad specifications, specifications which change repeatedly during
> > implementation, old office battles, and vehement opposition to things which
> > were Not Invented Here.
>
> You've memorized the mere names of thought crimes ("not invented
> here") without learning your trade; as you have told us, you haven't
> taken any university computer science classes at all.
> >
> > At one point, I was asked to develop a linked list implementation. The
> > proposed design looked like this:
> >
> > struct list_node {
> > struct list_node *next;
> > void *data;
>
> This is as we know a mistake.
[ snip several paragraphs of what I believe to be tangential ]
> > };
> >
> > struct list {
> > struct list_node *head;
> > struct list_node *tail;
> > };
> >
> > The specification was much as you'd expect. Except for one TINY detail.
> > Which was that the formal specification was that
> > (struct list *) (x->tail->next) == x
> > whenever tail was not null.
> >
> > That is to say, if the list contained any members, the "next" pointer for
> > the last member of the list was a pointer (suitably converted) to the list
> > object.
> >
> > So iteration would look roughly like:
> > for (l = x->head; l->next != x; l = l->next) {
> > /* ... */
> > }
> >
> > It took a day or so of effort for me to round up enough senior developers
> > to all sit on the guy and tell him that:
> >
> > 1. He was wrong.
> > 2. He was micro-managing, which is presumptively wrong.
> >
> > before we were allowed to use a more conventional design.
>
> OK, he specified a circular singly-linked list,
Oh? The way I'm reading Seebs's example, the last node in the
list pointed not to the first node -- which would as you say be a
circular linked list -- but *to the data structure representing
the list*. That seems decidedly strange to me, and I'm even a
bit skeptical that one could implement it in a way that a compiler
would accept, but whatever.
> and (because by your
> own admission you have never taken a single computer science class)
> you had never seen this simple data structure, and you wasted an
> entire day trying to figure this code out in consequence. For the same
> reason you preferred to try destroy the career of Herb Schildt and you
> hound me here, this enraged you and you maliciously went after the
> original designer in order to ruin his position, because you didn't
> know how to code a simple loop (we've seen how incompetent you are at
> this task in your strlen that was off by one).
Speculation, in my opinion.
[ snip ]
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/20/2010 4:21:16 PM
|
|
In article <7u82l6Fvb9U1@mid.individual.net>,
blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> In article <4b7eb795.340407953@text.giganews.com>,
> Richard Harter <cri@tiac.net> wrote:
> > On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
> > <blmblm@myrealbox.com> wrote:
> >
> > >In article
> <08119931-a822-47ae-aeab-609cb72aea7a@p13g2000pre.googlegroups.com>,
> > >spinoza1111 <spinoza1111@yahoo.com> wrote:
> > >> On Feb 18, 2:02 pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
> > >> > On Feb 18, 12:33 pm, Walter Banks <wal...@bytecraft.com> wrote:
> > >
> > >[ snip ]
> > >
> > >> An interesting factoid about this discussion is that no American or
> > >> British posters, save one, have solved the problem I set (write
> > >> replace without using string.H).
> > >
> > >Oh, it's string.H we're to avoid? is it okay to use string.h?
> > >Yeah, yeah, nitpick ....
> >
> > My understanding is that the requirement is there to make life
> > simpler for people not family with the C standard library,
> > particularly the functions prototyped in string.h, and for people
> > who have trouble distinguishing between upper and lower case.
> >
> > >
> > >Well, for the record, I'm an American mostly-lurker, and I've
> > >been tinkering with various solutions over the past few days, not
> > >looking carefully at others' solutions because that would seem
> > >to spoil the fun. I've just posted some of them, in a reply to
> > >Seebs's post with subject line "Efficency [sic] and the standard
> > >library".
> > >
> > >For what it's worth, all my solutions use recursion -- that seemed
> > >to me to be a natural way to solve this problem. <shrug>
> >
> > I also happen to be American and I also posted one; however I put
> > it in a separate thread. See the thread entitled "A substring
> > replacement implementation". You might want to include it in
> > your test suite.
What test suite is that .... The one I could be accumulating,
I guess, from other people's posts here.
> Yes, I saw your code, though I really only looked at the main
> program and the comments -- both of which I rather like better
> than mine.
>
> I think I was rather hoping that the others who had posted
> solutions might use my code to benchmark, but really, how hard
> would it be for *me* to do that .... Maybe I will. "Stay tuned"?
For what it's worth, I pulled the replace() function and supporting
code out of your solution and connected it with my benchmarking
driver. Output below, followed by output of the best-performing
of my solutions. Yours is faster, though not spectacularly so.
I'll probably do similar experiments with others' solutions as
time and interest permit ....
======== OUTPUT with your replace() ========
Richard Harter's version
performing timing tests (4 repeats)
timing (length 4004, changing 2 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.36 0.35 0.35 0.35
timing (length 4020, changing 10 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.36 0.36 0.36 0.36
timing (length 4100, changing 50 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.44 0.45 0.44 0.44
timing (length 4500, changing 50 occurrences of 10 chars to 10 chars) 20000 times
times (seconds): 0.49 0.49 0.48 0.49
16 tests (counting repeats)
0 errors (counting repeats)
total time = 6.59 seconds
======== OUTPUT with my replace() ========
further improved version of replace():
scans input only once, building linked list of matches
avoids recomputing string lengths
uses user-written string functions
performing timing tests (4 repeats)
timing (length 4004, changing 2 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.49 0.47 0.46 0.50
timing (length 4020, changing 10 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.51 0.53 0.52 0.52
timing (length 4100, changing 50 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.72 0.77 0.75 0.74
timing (length 4500, changing 50 occurrences of 10 chars to 10 chars) 20000 times
times (seconds): 0.75 0.73 0.75 0.74
16 tests (counting repeats)
0 errors (counting repeats)
total time = 9.96 seconds
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/20/2010 4:36:47 PM
|
|
On Feb 20, 5:03=A0pm, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 20, 12:55=A0am, Seebs <usenet-nos...@seebs.net> wrote:
>
> > On 2010-02-19, blmblm =A0myrealbox.com <blm...@myrealbox.com> wrote:
>
> > >> An interesting factoid about this discussion is that no American or
> > >> British posters, save one, have solved the problem I set (write
> > >> replace without using string.H).
> > > Oh, it's string.H we're to avoid? =A0is it okay to use string.h?
> > > Yeah, yeah, nitpick ....
>
> > It's also a silly challenge. =A0If I were going to do that, my first pa=
ss
> > would just be to put "my" in front of the str* functions I use, and
> > define them.
>
> That sounds like (1) unethical behavior and (2) to be expected from
> you.
>
That's how I would solve it.
I have written "replace" functions in the past, using the standard
string library. The challenge is to replicate the function without
that library (which is reasonable, I'v worked in the past on consoles
without standard libraries), so you just write the string fucntions.
With the exception of sprintf, they are all very easy to write if
perfect efficency isn't a major consideration.
To write repace on top of another string library is more difficult. It
may be that that particular libary is a lot better than the standard
one. The problem is that, as it isn't standard, I'm not familiar with
it. I'm often tearing my hair out over Perl scripts because I want to
delete the middle three characters of a string, or something. Perl has
far more powerful strign handling functions than C, and there will be
an easy way to achieve this. However since I'm not very familiar with
them, I'm leafing through the Perl book, looking for clues.
|
|
0
|
|
|
|
Reply
|
malcolm.mclean5 (750)
|
2/20/2010 5:21:34 PM
|
|
In article <65456f81-71e0-4ddb-8da2-04c67e9e2689@y7g2000prc.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 20, 12:55 am, Seebs <usenet-nos...@seebs.net> wrote:
> > On 2010-02-19, blmblm myrealbox.com <blm...@myrealbox.com> wrote:
> >
> > >> An interesting factoid about this discussion is that no American or
> > >> British posters, save one, have solved the problem I set (write
> > >> replace without using string.H).
> > > Oh, it's string.H we're to avoid? is it okay to use string.h?
> > > Yeah, yeah, nitpick ....
> >
> > It's also a silly challenge. If I were going to do that, my first pass
> > would just be to put "my" in front of the str* functions I use, and
> > define them.
>
> That sounds like (1) unethical behavior and (2) to be expected from
> you.
How is this unethical? In essence what he's doing is implementing
a standard interface, but in a way that can't collide with an
existing implementation.
> Look, why not just admit it. You had no clue how to solve the
> challenged problem, which could easily arise in any number of
> circumstances. You're really not contributing much to the discussion.
>
> > But why would I do that? How about we set a new challenge, which is to
> > do matrix multiplication without using the '*' operator. Or, better yet,
> > how about we iterate through an array without using a for loop, and write
> > our own routines to interpolate numbers into textual strings, so we don't
> > have to rely on printf()?
>
> Those of us who can (who like Schildt have written compilers) do this
> for fun, for the same reason classical pianists play Czerny finger
> exercises, and Bach wrote the Art of Fugue. There are some of us who
> don't advance our careers by destroying reputations and buying our way
> onto standards boards without any academic qualifications.
>
> Furthermore, if I were doing matrix multiplication times a power of
> two, I wouldn't use the * operator. Do you even know what operator I
> would use?
"Matrix multiplcation times a power of two?" Could you explain what
you mean by that? In my usage "matrix multiplication" is a binary
operation on matrices, so I don't understand how one of the operands
can be a power of two. ?
[ snip ]
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/20/2010 5:57:22 PM
|
|
blmblm@myrealbox.com wrote:
>>
>> Furthermore, if I were doing matrix multiplication times a power of
>> two, I wouldn't use the * operator. Do you even know what operator I
>> would use?
>
> "Matrix multiplcation times a power of two?" Could you explain what
> you mean by that? In my usage "matrix multiplication" is a binary
> operation on matrices, so I don't understand how one of the operands
> can be a power of two. ?
>
> [ snip ]
>
He probably meant multiplication of scalar and matrix, where
scalar is power of two?
Or he means power of square matrices?
What operator?
First thing that came in my mind is i<<1 instead of i*2 ;)
But then again even in assembler, depending on CPU, shifting
can be slower then multiplication ;)
So if you write i*2 or i<<1 doens;t matter because compiler
will probably use what is most appropriate for current hardware...
but then again Im just guessing what he meant by this ;)
Greets
|
|
0
|
|
|
|
Reply
|
bmaxa209 (243)
|
2/20/2010 6:12:23 PM
|
|
In article <4b80260e@news.x-privat.org>,
Branimir Maksimovic <bmaxa@hotmail.com> wrote:
> blmblm@myrealbox.com wrote:
> >>
> >> Furthermore, if I were doing matrix multiplication times a power of
> >> two, I wouldn't use the * operator. Do you even know what operator I
> >> would use?
> >
> > "Matrix multiplcation times a power of two?" Could you explain what
> > you mean by that? In my usage "matrix multiplication" is a binary
> > operation on matrices, so I don't understand how one of the operands
> > can be a power of two. ?
> >
> > [ snip ]
> >
>
> He probably meant multiplication of scalar and matrix, where
> scalar is power of two?
>
> Or he means power of square matrices?
>
> What operator?
> First thing that came in my mind is i<<1 instead of i*2 ;)
> But then again even in assembler, depending on CPU, shifting
> can be slower then multiplication ;)
> So if you write i*2 or i<<1 doens;t matter because compiler
> will probably use what is most appropriate for current hardware...
> but then again Im just guessing what he meant by this ;)
In a simple test, gcc compiled both expressions (i<<1 and i*2) to
the same x86 assembler code, using shift. (Interestingly enough, it
did this even at the lowest level of optimization. Well, it was
interesting to me anyway.)
Whether the same thing would be true with other compiles and
for other architectures, who knows.
Then again, worrying about this sort of thing .... Well, there's
the widely-repeated remark about premature optimization, no?
the exact source of which seems to be subject to some debate,
but both of the possible originators mentioned here
http://en.wikipedia.org/wiki/Program_optimization
are people I would hesitate to disagree with.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/20/2010 7:14:57 PM
|
|
On 20 Feb 2010 16:36:47 GMT, blmblm@myrealbox.com
<blmblm@myrealbox.com> wrote:
>In article <7u82l6Fvb9U1@mid.individual.net>,
>blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
>> In article <4b7eb795.340407953@text.giganews.com>,
>> Richard Harter <cri@tiac.net> wrote:
[snip]
>
>> Yes, I saw your code, though I really only looked at the main
>> program and the comments -- both of which I rather like better
>> than mine.
>>
>> I think I was rather hoping that the others who had posted
>> solutions might use my code to benchmark, but really, how hard
>> would it be for *me* to do that .... Maybe I will. "Stay tuned"?
>
>For what it's worth, I pulled the replace() function and supporting
>code out of your solution and connected it with my benchmarking
>driver. Output below, followed by output of the best-performing
>of my solutions. Yours is faster, though not spectacularly so.
>I'll probably do similar experiments with others' solutions as
>time and interest permit ....
>
>======== OUTPUT with your replace() ========
>
>Richard Harter's version
[snip]
>16 tests (counting repeats)
>0 errors (counting repeats)
>total time = 6.59 seconds
>
>======== OUTPUT with my replace() ========
>
>further improved version of replace():
>scans input only once, building linked list of matches
>avoids recomputing string lengths
>uses user-written string functions
[snip]
>16 tests (counting repeats)
>0 errors (counting repeats)
>total time = 9.96 seconds
Interesting. I suspect that mine is a bit faster because it
doesn't need to construct linked lists - everything is on the
stack. Also the calling recurse replaces calling malloc. That
said, my version could be time optimized by eliminating
recursion. The plan would be to create a default table holding a
number of entries as an automatic variable, have a pointer to the
table, and switch to a malloc based expansible table when the
original one overflows. Instead of having m recursions the cost
would O(log m) malloc calls. Equally important the actual loop
can be compressed into a tight loop. If you want a hot version
for reference I'll code it.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.
|
|
0
|
|
|
|
Reply
|
cri (1432)
|
2/20/2010 7:29:08 PM
|
|
In article <4b803360.437634421@text.giganews.com>,
Richard Harter <cri@tiac.net> wrote:
> On 20 Feb 2010 16:36:47 GMT, blmblm@myrealbox.com
> <blmblm@myrealbox.com> wrote:
>
> >In article <7u82l6Fvb9U1@mid.individual.net>,
> >blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> >> In article <4b7eb795.340407953@text.giganews.com>,
> >> Richard Harter <cri@tiac.net> wrote:
[ snip ]
> >For what it's worth, I pulled the replace() function and supporting
> >code out of your solution and connected it with my benchmarking
> >driver. Output below, followed by output of the best-performing
> >of my solutions. Yours is faster, though not spectacularly so.
> >I'll probably do similar experiments with others' solutions as
> >time and interest permit ....
[ snip ]
> Interesting. I suspect that mine is a bit faster because it
> doesn't need to construct linked lists - everything is on the
> stack. Also the calling recurse replaces calling malloc. That
> said, my version could be time optimized by eliminating
> recursion. The plan would be to create a default table holding a
> number of entries as an automatic variable, have a pointer to the
> table, and switch to a malloc based expansible table when the
> original one overflows. Instead of having m recursions the cost
> would O(log m) malloc calls. Equally important the actual loop
> can be compressed into a tight loop. If you want a hot version
> for reference I'll code it.
Well, now I'm rather looking forward to actually reading your
code (and that of others) ....
Don't spend more time on this just for the sake of adding to
my collection, but if you do, well, yeah, I can generate times
so you/we can compare .... (Of course you almost surely can do
your own timings to compare the "hot" version to the original,
but it might be interesting to compare it to timings for others'
code too, and I might be the right person to do that. I just have
to track down the latest versions of things others have posted,
in the multiple threads, etc.)
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/20/2010 7:39:17 PM
|
|
"Richard Harter" <cri@tiac.net> wrote in message
news:4b803360.437634421@text.giganews.com...
[...]
> Interesting. I suspect that mine is a bit faster because it
> doesn't need to construct linked lists - everything is on the
> stack. Also the calling recurse replaces calling malloc. That
> said, my version could be time optimized by eliminating
> recursion. The plan would be to create a default table holding a
> number of entries as an automatic variable, have a pointer to the
> table, and switch to a malloc based expansible table when the
> original one overflows.
That works fairly well:
http://groups.google.com/group/comp.lang.c/msg/2ebbf412549f3300
The algorithm I settled on will only resort to dynamic memory when the
number of matches is greater than 256. Then it will avoid `malloc()' again
until the matches are greater than 512 matches. I suppose an optimization
might be to malloc larger and larger space to hold matches.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/20/2010 7:55:39 PM
|
|
cri@tiac.net (Richard Harter) writes:
> On 20 Feb 2010 16:36:47 GMT, blmblm@myrealbox.com
> <blmblm@myrealbox.com> wrote:
>
>>In article <7u82l6Fvb9U1@mid.individual.net>,
>>blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
>>> In article <4b7eb795.340407953@text.giganews.com>,
>>> Richard Harter <cri@tiac.net> wrote:
>
> [snip]
>>
>>> Yes, I saw your code, though I really only looked at the main
>>> program and the comments -- both of which I rather like better
>>> than mine.
>>>
>>> I think I was rather hoping that the others who had posted
>>> solutions might use my code to benchmark, but really, how hard
>>> would it be for *me* to do that .... Maybe I will. "Stay tuned"?
>>
>>For what it's worth, I pulled the replace() function and supporting
>>code out of your solution and connected it with my benchmarking
>>driver. Output below, followed by output of the best-performing
>>of my solutions. Yours is faster, though not spectacularly so.
>>I'll probably do similar experiments with others' solutions as
>>time and interest permit ....
>>
>>======== OUTPUT with your replace() ========
>>
>>Richard Harter's version
> [snip]
>>16 tests (counting repeats)
>>0 errors (counting repeats)
>>total time = 6.59 seconds
>>
>>======== OUTPUT with my replace() ========
>>
>>further improved version of replace():
>>scans input only once, building linked list of matches
>>avoids recomputing string lengths
>>uses user-written string functions
>
> [snip]
>
>>16 tests (counting repeats)
>>0 errors (counting repeats)
>>total time = 9.96 seconds
>
> Interesting. I suspect that mine is a bit faster because it
> doesn't need to construct linked lists - everything is on the
> stack. Also the calling recurse replaces calling malloc. That
> said, my version could be time optimized by eliminating
> recursion. The plan would be to create a default table holding a
> number of entries as an automatic variable, have a pointer to the
> table, and switch to a malloc based expansible table when the
> original one overflows. Instead of having m recursions the cost
> would O(log m) malloc calls. Equally important the actual loop
> can be compressed into a tight loop. If you want a hot version
> for reference I'll code it.
That's one of my versions, too. Up to now I have not posted any
timings because the relative speeds depend so much on the input used.
For example, with the 4K long test strings used by B L Massingill it
is hard to beat using strstr:
$ ./timer d4004 '[]' 'xx' bb_replace rh_replace fast_replace array_replace
bb_replace(`cat d4004`, "[]", "xx"):
2483200 calls in 3.998s is 1.61µs/call (1.61e-06s/call)
rh_replace(`cat d4004`, "[]", "xx"):
424600 calls in 4.001s is 9.422µs/call (9.422e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1123600 calls in 3.999s is 3.559µs/call (3.559e-06s/call)
array_replace(`cat d4004`, "[]", "xx"):
3519700 calls in 4.000s is 1.136µs/call (1.136e-06s/call)
(d4004 is a file containing 4004 bytes with two '[]' sub-strings.)
bb_replace is the very simple code I posted a while ago. rh_replace
is yours. fast_replace uses a local array to store the match
locations and some hand-written sub-string code to avoid the
re-scanning implied by using strstr (I'll post it below) and
array_replace is the same, but simply uses strstr to find the
sub-strings.
If we use short strings, the position is reversed in that the two that
use strstr (bb_replace and array_replace) pay a price:
$ ./timer "abcdefg[]hijklmnop[]qrstuvwxyz" '[]' 'xx' bb_replace rh_replace fast_replace array_replace
bb_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
10141800 calls in 3.999s is 394.3ns/call (3.943e-07s/call)
rh_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
29636400 calls in 4.000s is 135ns/call (1.35e-07s/call)
fast_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
26954200 calls in 4.000s is 148.4ns/call (1.484e-07s/call)
array_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
16888500 calls in 4.000s is 236.8ns/call (2.368e-07s/call)
The order is changed again if there is no matching sub-string:
$ ./timer "abcdefghijklmnopqrstuvwxyz" '[]' 'xx' bb_replace rh_replace fast_replace array_replace
bb_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
38832000 calls in 3.998s is 103ns/call (1.03e-07s/call)
rh_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
33301000 calls in 4.000s is 120.1ns/call (1.201e-07s/call)
fast_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
43300700 calls in 4.000s is 92.38ns/call (9.238e-08s/call)
array_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
46132400 calls in 4.000s is 86.71ns/call (8.671e-08s/call)
Pretty much any implementation has fast and slow cases. I am not sure
if there is any useful information in all of the data I can generate
(which it why I have not posted any results so far).
Anyway, here is my (probably misnamed) "fast_replace":
#include <string.h>
#define STRLEN strlen
#define STRSTR strstr
#define STRCPY strcpy
#define STRNCMP strncmp
#define MEMCPY memcpy
#define FN(x) x
#define AUTO_SIZE 20
char *FN(fast_replace)(char *src, char *match, char *replacement)
{
size_t mlen = STRLEN(match), matches = 0, asize = AUTO_SIZE;
const char *small_list[AUTO_SIZE], **mp = small_list, **op = 0;
const char *anchor = src;
do {
while (*anchor && *anchor != *match)
++anchor;
if (*anchor) {
if (STRNCMP(anchor + 1, match + 1, mlen - 1) == 0) {
if (matches >= asize) {
const char **np = realloc(op, 2 * asize * sizeof *np);
if (!np) {
fprintf(stderr, "Out of memory\n");
free(op);
return 0;
}
if (!op)
MEMCPY(np, mp, sizeof small_list);
mp = op = np;
asize *= 2;
}
mp[matches++] = anchor;
anchor += mlen;
}
else ++anchor;
}
else break;
} while (*anchor);
size_t rlen = matches ? STRLEN(replacement) : 0;
char *result = malloc(anchor - src - matches*mlen + matches*rlen + 1);
if (result) {
char *dst = result;
const char *s = src;
for (int i = 0; i < matches; ++i) {
MEMCPY(dst, s, mp[i] - s);
MEMCPY(dst += mp[i] - s, replacement, rlen);
dst += rlen;
s = mp[i] + mlen;
}
STRCPY(dst, s);
}
free(op);
return result;
}
The macros are to enable me to build an alternate version that
uses naive home-grown versions of strlen, strstr etc to see
how that impacts on the performance.
Maybe you had something faster in mind. I'll happily add it to my
test set.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/21/2010 1:42:06 AM
|
|
On Feb 21, 1:57=A0am, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> In article <65456f81-71e0-4ddb-8da2-04c67e9e2...@y7g2000prc.googlegroups.=
com>,
>
>
>
>
>
> spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> > On Feb 20, 12:55 am, Seebs <usenet-nos...@seebs.net> wrote:
> > > On 2010-02-19, blmblm =A0myrealbox.com <blm...@myrealbox.com> wrote:
>
> > > >> An interesting factoid about this discussion is that no American o=
r
> > > >> British posters, save one, have solved the problem I set (write
> > > >> replace without using string.H).
> > > > Oh, it's string.H we're to avoid? =A0is it okay to use string.h?
> > > > Yeah, yeah, nitpick ....
>
> > > It's also a silly challenge. =A0If I were going to do that, my first =
pass
> > > would just be to put "my" in front of the str* functions I use, and
> > > define them.
>
> > That sounds like (1) unethical behavior and (2) to be expected from
> > you.
>
> How is this unethical? =A0In essence what he's doing is implementing
> a standard interface, but in a way that can't collide with an
> existing implementation.
I said last week that my replace function integrates two different
things.
>
>
>
>
>
> > Look, why not just admit it. You had no clue how to solve the
> > challenged problem, which could easily arise in any number of
> > circumstances. You're really not contributing much to the discussion.
>
> > > But why would I do that? =A0How about we set a new challenge, which i=
s to
> > > do matrix multiplication without using the '*' operator. =A0Or, bette=
r yet,
> > > how about we iterate through an array without using a for loop, and w=
rite
> > > our own routines to interpolate numbers into textual strings, so we d=
on't
> > > have to rely on printf()?
>
> > Those of us who can (who like Schildt have written compilers) do this
> > for fun, for the same reason classical pianists play Czerny finger
> > exercises, and Bach wrote the Art of Fugue. There are some of us who
> > don't advance our careers by destroying reputations and buying our way
> > onto standards boards without any academic qualifications.
>
> > Furthermore, if I were doing matrix multiplication times a power of
> > two, I wouldn't use the * operator. Do you even know what operator I
> > would use?
>
> "Matrix multiplcation times a power of two?" =A0Could you explain what
> you mean by that? =A0In my usage "matrix multiplication" is a binary
> operation on matrices, so I don't understand how one of the operands
> can be a power of two. =A0?
It contains a power of two in each element in the scenario where you
wouldn't use * you would use shift. Look, I was just having a little
fun with Seebie.
>
> [ snip ]
>
> --
> B. L. Massingill
> ObDisclaimer: =A0I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/21/2010 7:57:40 AM
|
|
On Feb 21, 9:42=A0am, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
> c...@tiac.net (Richard Harter) writes:
> > On 20 Feb 2010 16:36:47 GMT, blm...@myrealbox.com
> > <blm...@myrealbox.com> wrote:
>
> >>In article <7u82l6Fvb...@mid.individual.net>,
> >>blm...@myrealbox.com =A0<blm...@myrealbox.com> wrote:
> >>> In article <4b7eb795.340407...@text.giganews.com>,
> >>> Richard Harter <c...@tiac.net> wrote:
>
> > [snip]
>
> >>> Yes, I saw your code, though I really only looked at the main
> >>> program and the comments -- both of which I rather like better
> >>> than mine.
>
> >>> I think I was rather hoping that the others who had posted
> >>> solutions might use my code to benchmark, but really, how hard
> >>> would it be for *me* to do that .... =A0Maybe I will. =A0"Stay tuned"=
?
>
> >>For what it's worth, I pulled the replace() function and supporting
> >>code out of your solution and connected it with my benchmarking
> >>driver. =A0Output below, followed by output of the best-performing
> >>of my solutions. =A0Yours is faster, though not spectacularly so.
> >>I'll probably do similar experiments with others' solutions as
> >>time and interest permit ....
>
> >>=3D=3D=3D=3D=3D=3D=3D=3D OUTPUT with your replace() =3D=3D=3D=3D=3D=3D=
=3D=3D
>
> >>Richard Harter's version
> > [snip]
> >>16 tests (counting repeats)
> >>0 errors (counting repeats)
> >>total time =3D 6.59 seconds
>
> >>=3D=3D=3D=3D=3D=3D=3D=3D OUTPUT with my replace() =3D=3D=3D=3D=3D=3D=3D=
=3D
>
> >>further improved version of replace():
> >>scans input only once, building linked list of matches
> >>avoids recomputing string lengths
> >>uses user-written string functions
>
> > [snip]
>
> >>16 tests (counting repeats)
> >>0 errors (counting repeats)
> >>total time =3D 9.96 seconds
>
> > Interesting. =A0I suspect that mine is a bit faster because it
> > doesn't need to construct linked lists - everything is on the
> > stack. =A0Also the calling recurse replaces calling malloc. =A0That
> > said, my version could be time optimized by eliminating
> > recursion. =A0The plan would be to create a default table holding a
> > number of entries as an automatic variable, have a pointer to the
> > table, and switch to a malloc based expansible table when the
> > original one overflows. =A0Instead of having m recursions the cost
> > would O(log m) malloc calls. =A0Equally important the actual loop
> > can be compressed into a tight loop. =A0If you want a hot version
> > for reference I'll code it.
>
> That's one of my versions, too. =A0Up to now I have not posted any
> timings because the relative speeds depend so much on the input used.
> For example, with the 4K long test strings used by B L Massingill it
> is hard to beat using strstr:
>
> =A0 $ ./timer d4004 '[]' 'xx' bb_replace rh_replace fast_replace array_re=
place
> =A0 bb_replace(`cat d4004`, "[]", "xx"):
> =A0 =A0 =A0 =A0 =A02483200 calls in 3.998s is 1.61=B5s/call (1.61e-06s/ca=
ll)
> =A0 rh_replace(`cat d4004`, "[]", "xx"):
> =A0 =A0 =A0 =A0 =A0 424600 calls in 4.001s is 9.422=B5s/call (9.422e-06s/=
call)
> =A0 fast_replace(`cat d4004`, "[]", "xx"):
> =A0 =A0 =A0 =A0 =A01123600 calls in 3.999s is 3.559=B5s/call (3.559e-06s/=
call)
> =A0 array_replace(`cat d4004`, "[]", "xx"):
> =A0 =A0 =A0 =A0 =A03519700 calls in 4.000s is 1.136=B5s/call (1.136e-06s/=
call)
>
> (d4004 is a file containing 4004 bytes with two '[]' sub-strings.)
>
> bb_replace is the very simple code I posted a while ago. =A0rh_replace
> is yours. =A0fast_replace uses a local array to store the match
> locations and some hand-written sub-string code to avoid the
> re-scanning implied by using strstr (I'll post it below) and
> array_replace is the same, but simply uses strstr to find the
> sub-strings.
>
> If we use short strings, the position is reversed in that the two that
> use strstr (bb_replace and array_replace) pay a price:
>
> =A0 $ ./timer "abcdefg[]hijklmnop[]qrstuvwxyz" '[]' 'xx' bb_replace rh_re=
place fast_replace array_replace
> =A0 bb_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 10141800 calls in 3.999s is 394.3ns/call (3.943e-07s/call=
)
> =A0 rh_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 29636400 calls in 4.000s is 135ns/call (1.35e-07s/call)
> =A0 fast_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 26954200 calls in 4.000s is 148.4ns/call (1.484e-07s/call=
)
> =A0 array_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 16888500 calls in 4.000s is 236.8ns/call (2.368e-07s/call=
)
>
> The order is changed again if there is no matching sub-string:
>
> =A0 $ ./timer "abcdefghijklmnopqrstuvwxyz" '[]' 'xx' bb_replace rh_replac=
e fast_replace array_replace
> =A0 bb_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 38832000 calls in 3.998s is 103ns/call (1.03e-07s/call)
> =A0 rh_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 33301000 calls in 4.000s is 120.1ns/call (1.201e-07s/call=
)
> =A0 fast_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 43300700 calls in 4.000s is 92.38ns/call (9.238e-08s/call=
)
> =A0 array_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
> =A0 =A0 =A0 =A0 46132400 calls in 4.000s is 86.71ns/call (8.671e-08s/call=
)
>
> Pretty much any implementation has fast and slow cases. =A0I am not sure
> if there is any useful information in all of the data I can generate
> (which it why I have not posted any results so far).
>
> Anyway, here is my (probably misnamed) "fast_replace":
>
> #include <string.h>
> #define STRLEN =A0strlen
> #define STRSTR =A0strstr
> #define STRCPY =A0strcpy
> #define STRNCMP strncmp
> #define MEMCPY =A0memcpy
> #define FN(x) =A0 x
>
> #define AUTO_SIZE 20
>
> char *FN(fast_replace)(char *src, char *match, char *replacement)
> {
> =A0 =A0 =A0size_t mlen =3D STRLEN(match), matches =3D 0, asize =3D AUTO_S=
IZE;
> =A0 =A0 =A0const char *small_list[AUTO_SIZE], **mp =3D small_list, **op =
=3D 0;
>
> =A0 =A0 =A0const char *anchor =3D src;
> =A0 =A0 =A0do {
> =A0 =A0 =A0 =A0 =A0 while (*anchor && *anchor !=3D *match)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0++anchor;
> =A0 =A0 =A0 =A0 =A0 if (*anchor) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (STRNCMP(anchor + 1, match + 1, mlen - =
1) =3D=3D 0) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (matches >=3D asize) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const char **np =3D re=
alloc(op, 2 * asize * sizeof *np);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!np) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fprintf(stder=
r, "Out of memory\n");
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 free(op);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!op)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MEMCPY(np, mp=
, sizeof small_list);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mp =3D op =3D np;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0asize *=3D 2;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mp[matches++] =3D anchor;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 anchor +=3D mlen;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else ++anchor;
Here, you forgot to steal my idea, Ben. It was to manually code your
strncmp so as to "go back" not to the character after the first
character of the partial match, but to the first character that
matches the first character of the string we're looking for, or to the
character at which the partial match fails.
By using the library function, you do too much backup, in my opinion.
> =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 else break;
> =A0 =A0 =A0} while (*anchor);
>
> =A0 =A0 =A0size_t rlen =3D matches ? STRLEN(replacement) : 0;
> =A0 =A0 =A0char *result =3D malloc(anchor - src - matches*mlen + matches*=
rlen + 1);
> =A0 =A0 =A0if (result) {
> =A0 =A0 =A0 =A0 =A0 char *dst =3D result;
> =A0 =A0 =A0 =A0 =A0 const char *s =3D src;
> =A0 =A0 =A0 =A0 =A0 for (int i =3D 0; i < matches; ++i) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MEMCPY(dst, s, mp[i] - s);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MEMCPY(dst +=3D mp[i] - s, replacement, rl=
en);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0dst +=3D rlen;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0s =3D mp[i] + mlen;
> =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 STRCPY(dst, s);
> =A0 =A0 =A0}
> =A0 =A0 =A0free(op);
> =A0 =A0 =A0return result;
>
> }
>
> The macros are to enable me to build an alternate version that
> uses naive home-grown versions of strlen, strstr etc to see
> how that impacts on the performance.
But you're still thinking inside the black box. Thinking in terms of
strncmp causes you to restart the search too far the left of the place
where it should optimally restart (the character "handle" of the
target inside the partial match or after the end of the partial
match). For long targets with a lot of overlapping material on the
left such as commonly occur in text data, this could be a serious
performance bottleneck.
The problem is that C library functions for strings are limited Turing
machines with no "magic" ability to see beyond the current "tape
square" or remember useful things. This reflects the reality of modern
computer architectures which in the default case can't be assumed
anymore to have "magic" instructions like those available on the VAX
or IBM 370, which were microprogrammed back in the day when memory was
cheap, CPU power expensive, men were men, women were women, and sheep
were nervous.
You need, I think, to benchmark your code in situations where targets
are long and overlap heavily in the input string. This might be a
library or book search application where you are looking for people's
full names where they have a small set of common first names such as
Bob, or Joe, or Billy Bob, but divergent last names. In this scenario,
your constant returns to one past the first character match will waste
several cycles on names such as Billy Bob Bogus when you are looking
for Billy Bob Badass.
In that specific scenario, you will go through the loop for each
character of the string "Billy Bob Badass" up to a. What you want is
to get past the end of that string ASAP while being sure that the
start of Billy Bob Bogus doesn't occur, given the character by
character constraint of the C machine.
My way, which is not ideal, is to go back to the B of Bob when I hit
the a of Badass and that a fails to match the o in Bogus. I then go to
the second b (B in Bob), find the o, and go past the o to the second B
in Bob, find the space, and then try B in Bogus, only to see
immediately that it is not followed by i. I then return to the zero
state of looking for the lead B. Whereas you seem to go all the way
back to i needlessly.
More sophisticated search algorithms have state in the form of tables,
therefore it's extremely unlikely they are used in actual C libraries.
Instead, those libraries probably use your memoryless procedure but
with both code and an underlying architecture "tuned" to relatively
mindless zipping left to right through strings.
In the scenario I was thinking of, the programmer, or his manager, or
his team decided to NOT use string.H in this let us say critical
operation to see what performance they could get for long targets with
a tendency to start with the same strings. So far, nobody to my
knowledge except myself, Willem and io_x have met the challenge, of
thinking outside the box.
>
> Maybe you had something faster in mind. =A0I'll happily add it to my
> test set.
>
> --
> Ben.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/21/2010 8:27:55 AM
|
|
On 20 Feb 2010 19:39:17 GMT, blmblm@myrealbox.com
<blmblm@myrealbox.com> wrote:
>In article <4b803360.437634421@text.giganews.com>,
>Richard Harter <cri@tiac.net> wrote:
>> On 20 Feb 2010 16:36:47 GMT, blmblm@myrealbox.com
>> <blmblm@myrealbox.com> wrote:
>>
>> >In article <7u82l6Fvb9U1@mid.individual.net>,
>> >blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
>> >> In article <4b7eb795.340407953@text.giganews.com>,
>> >> Richard Harter <cri@tiac.net> wrote:
>
>Well, now I'm rather looking forward to actually reading your
>code (and that of others) ....
>
>Don't spend more time on this just for the sake of adding to
>my collection, but if you do, well, yeah, I can generate times
>so you/we can compare .... (Of course you almost surely can do
>your own timings to compare the "hot" version to the original,
>but it might be interesting to compare it to timings for others'
>code too, and I might be the right person to do that. I just have
>to track down the latest versions of things others have posted,
>in the multiple threads, etc.)
I've posted a second version that replaces recursion by
iteration. It should run slightly faster; like most such
conversions the original is simpler and cleaner.
I hadn't set out to create a performance string replace routine.
To get significant improvement the guts of the "find_next"
function must be replaced with a better algorithm. There are
four paths I can see:
(1) Hack and stumble about
(2) Use a simplified version of a standard algorithm
(3) Bite the bullet and implement a standard algorithm
(4) Invent something new and wonderful
There are tradeoffs. High performance algorithms tend to have
initialization costs, hence the simpler algorithms win when the
input is simple.
In any event if you want to pick up replace2 have at it.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.
|
|
0
|
|
|
|
Reply
|
cri (1432)
|
2/22/2010 7:04:57 AM
|
|
In article <7uahcvFbb8U8@mid.individual.net>,
blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> In article <7u82l6Fvb9U1@mid.individual.net>,
> blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> > In article <4b7eb795.340407953@text.giganews.com>,
> > Richard Harter <cri@tiac.net> wrote:
> > > On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
> > > <blmblm@myrealbox.com> wrote:
[ snip ]
> What test suite is that .... The one I could be accumulating,
> I guess, from other people's posts here.
>
> > Yes, I saw your code, though I really only looked at the main
> > program and the comments -- both of which I rather like better
> > than mine.
> >
> > I think I was rather hoping that the others who had posted
> > solutions might use my code to benchmark, but really, how hard
> > would it be for *me* to do that .... Maybe I will. "Stay tuned"?
>
> For what it's worth, I pulled the replace() function and supporting
> code out of your solution and connected it with my benchmarking
> driver. Output below, followed by output of the best-performing
> of my solutions. Yours is faster, though not spectacularly so.
> I'll probably do similar experiments with others' solutions as
> time and interest permit ....
I've benchmarked all the code I could find that's been posted here.
I believe another poster (Chris M. Thomasson) has posted links to
code in pastebin, but I didn't track that down and include it.
One thing that occurred to me is that previously I had been
compiling with "gcc -O" (optimization), and "gcc -O3" (maximum
optimization) might be better. For reasons I won't summarize
right now, I also tried "-O2". Results below (total time only).
Presented without analysis for now ....
harter-1 is the recursive solution, from
Message-ID: <4b7d5b19.251195734@text.giganews.com>
harter-2 is the non-recursive solution, from
Message-ID: <4b821f00.563490562@text.giganews.com>
willem is from
Message-ID: <slrnhnj8mb.2238.willem@snail.stack.nl>
nilges is from
Message-ID: <06317ad6-cd11-48e7-80db-3341ef3a1597@b9g2000pri.googlegroups.com>
io_x is from
msg-id Message-ID: <4b730781$0$822$4fafbaef@reader5.news.tin.it>
blmblm-$V-$L [1] is from
Message-ID: <7u7ltqF80cU1@mid.individual.net>
[1] V = version, L = string library (C or user-written)
harter-1 (O2) 5.67 seconds
harter-1 (O3) 8.48 seconds
harter-2 (O2) 5.94 seconds
harter-2 (O3) 6.51 seconds
io_x (O2) 18.05 seconds [1]
io_x (O3) **** seconds (segfault) [2]
nilges (O2) 7.72 seconds
nilges (O3) 7.69 seconds
willem (O2) 7.18 seconds
willem (O3) 7.23 seconds
[2] These results may be misleading -- I wasn't sure how to
generate an executable from the mix of C and assembly and more or
less tried things until I got a clean compile/link, with:
nasm -felf -o replacer.o replacer.s
gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
(n=2 or n=3)
blmblm-1-C (O2) 10.78 seconds
blmblm-1-user (O2) 35.97 seconds
blmblm-2-C (O2) 9.60 seconds
blmblm-2-user (O2) 35.10 seconds
blmblm-3-C (O2) 7.86 seconds
blmblm-3-user (O2) 8.58 seconds
blmblm-1-C (O3) 10.36 seconds
blmblm-1-user (O3) 37.58 seconds
blmblm-2-C (O3) 9.45 seconds
blmblm-2-user (O3) 32.96 seconds
blmblm-3-C (O3) 7.49 seconds
blmblm-3-user (O3) 7.72 seconds
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/22/2010 7:43:10 AM
|
|
In article <4b822086.563880750@text.giganews.com>,
Richard Harter <cri@tiac.net> wrote:
> On 20 Feb 2010 19:39:17 GMT, blmblm@myrealbox.com
> <blmblm@myrealbox.com> wrote:
>
> >In article <4b803360.437634421@text.giganews.com>,
> >Richard Harter <cri@tiac.net> wrote:
> >> On 20 Feb 2010 16:36:47 GMT, blmblm@myrealbox.com
> >> <blmblm@myrealbox.com> wrote:
> >>
> >> >In article <7u82l6Fvb9U1@mid.individual.net>,
> >> >blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> >> >> In article <4b7eb795.340407953@text.giganews.com>,
> >> >> Richard Harter <cri@tiac.net> wrote:
>
> >
> >Well, now I'm rather looking forward to actually reading your
> >code (and that of others) ....
> >
> >Don't spend more time on this just for the sake of adding to
> >my collection, but if you do, well, yeah, I can generate times
> >so you/we can compare .... (Of course you almost surely can do
> >your own timings to compare the "hot" version to the original,
> >but it might be interesting to compare it to timings for others'
> >code too, and I might be the right person to do that. I just have
> >to track down the latest versions of things others have posted,
> >in the multiple threads, etc.)
>
> I've posted a second version that replaces recursion by
> iteration. It should run slightly faster; like most such
> conversions the original is simpler and cleaner.
>
> I hadn't set out to create a performance string replace routine.
> To get significant improvement the guts of the "find_next"
> function must be replaced with a better algorithm. There are
> four paths I can see:
>
> (1) Hack and stumble about
> (2) Use a simplified version of a standard algorithm
> (3) Bite the bullet and implement a standard algorithm
> (4) Invent something new and wonderful
>
> There are tradeoffs. High performance algorithms tend to have
> initialization costs, hence the simpler algorithms win when the
> input is simple.
>
> In any event if you want to pick up replace2 have at it.
Already done :-) -- I noticed your second version in the process of
trying to collect other people's code, and I just posted a summary
of my attempt to benchmark the various versions.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/22/2010 8:24:42 AM
|
|
<blmblm@myrealbox.com> wrote in message
news:7ueqseFfmqU1@mid.individual.net...
> In article <7uahcvFbb8U8@mid.individual.net>,
> blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
>> In article <7u82l6Fvb9U1@mid.individual.net>,
>> blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
>> > In article <4b7eb795.340407953@text.giganews.com>,
>> > Richard Harter <cri@tiac.net> wrote:
>> > > On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
>> > > <blmblm@myrealbox.com> wrote:
>
> [ snip ]
>
>> What test suite is that .... The one I could be accumulating,
>> I guess, from other people's posts here.
>>
>> > Yes, I saw your code, though I really only looked at the main
>> > program and the comments -- both of which I rather like better
>> > than mine.
>> >
>> > I think I was rather hoping that the others who had posted
>> > solutions might use my code to benchmark, but really, how hard
>> > would it be for *me* to do that .... Maybe I will. "Stay tuned"?
>>
>> For what it's worth, I pulled the replace() function and supporting
>> code out of your solution and connected it with my benchmarking
>> driver. Output below, followed by output of the best-performing
>> of my solutions. Yours is faster, though not spectacularly so.
>> I'll probably do similar experiments with others' solutions as
>> time and interest permit ....
>
> I've benchmarked all the code I could find that's been posted here.
> I believe another poster (Chris M. Thomasson) has posted links to
> code in pastebin, but I didn't track that down and include it.
It's here:
http://clc.pastebin.com/f62504e4c
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/22/2010 9:23:51 AM
|
|
In article <7ueqseFfmqU1@mid.individual.net>,
blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> In article <7uahcvFbb8U8@mid.individual.net>,
> blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> > In article <7u82l6Fvb9U1@mid.individual.net>,
> > blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> > > In article <4b7eb795.340407953@text.giganews.com>,
> > > Richard Harter <cri@tiac.net> wrote:
> > > > On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
> > > > <blmblm@myrealbox.com> wrote:
[ snip ]
> I've benchmarked all the code I could find that's been posted here.
> I believe another poster (Chris M. Thomasson) has posted links to
> code in pastebin, but I didn't track that down and include it.
> One thing that occurred to me is that previously I had been
> compiling with "gcc -O" (optimization), and "gcc -O3" (maximum
> optimization) might be better. For reasons I won't summarize
> right now, I also tried "-O2". Results below (total time only).
>
> Presented without analysis for now ....
I missed (at least) one. Ben Bacarisse posted one version of his
code, in
Message-ID: <0.ffa3b609be9d5a7e3382.20100221014206GMT.877hq7e55d.fsf@bsb.me.uk>
which gives results as shown below
He says in that message that which implementation is fastest
depends on input. So my benchmarking here probably doesn't
prove much. It does demonstrate, though, that compiler
optimization level may also affect which is fastest (results
for Harter's code) ....
>
> harter-1 is the recursive solution, from
> Message-ID: <4b7d5b19.251195734@text.giganews.com>
>
> harter-2 is the non-recursive solution, from
> Message-ID: <4b821f00.563490562@text.giganews.com>
>
> willem is from
> Message-ID: <slrnhnj8mb.2238.willem@snail.stack.nl>
>
> nilges is from
> Message-ID: <06317ad6-cd11-48e7-80db-3341ef3a1597@b9g2000pri.googlegroups.com>
>
> io_x is from
> msg-id Message-ID: <4b730781$0$822$4fafbaef@reader5.news.tin.it>
>
> blmblm-$V-$L [1] is from
> Message-ID: <7u7ltqF80cU1@mid.individual.net>
>
> [1] V = version, L = string library (C or user-written)
>
bacarisse (O2) 4.47 seconds
bacarisse (O3) 4.48 seconds
> harter-1 (O2) 5.67 seconds
> harter-1 (O3) 8.48 seconds
> harter-2 (O2) 5.94 seconds
> harter-2 (O3) 6.51 seconds
> io_x (O2) 18.05 seconds [1]
> io_x (O3) **** seconds (segfault) [2]
> nilges (O2) 7.72 seconds
> nilges (O3) 7.69 seconds
> willem (O2) 7.18 seconds
> willem (O3) 7.23 seconds
>
> [2] These results may be misleading -- I wasn't sure how to
> generate an executable from the mix of C and assembly and more or
> less tried things until I got a clean compile/link, with:
>
> nasm -felf -o replacer.o replacer.s
> gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
> (n=2 or n=3)
>
> blmblm-1-C (O2) 10.78 seconds
> blmblm-1-user (O2) 35.97 seconds
> blmblm-2-C (O2) 9.60 seconds
> blmblm-2-user (O2) 35.10 seconds
> blmblm-3-C (O2) 7.86 seconds
> blmblm-3-user (O2) 8.58 seconds
>
> blmblm-1-C (O3) 10.36 seconds
> blmblm-1-user (O3) 37.58 seconds
> blmblm-2-C (O3) 9.45 seconds
> blmblm-2-user (O3) 32.96 seconds
> blmblm-3-C (O3) 7.49 seconds
> blmblm-3-user (O3) 7.72 seconds
>
> --
> B. L. Massingill
> ObDisclaimer: I don't speak for my employers; they return the favor.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/22/2010 12:56:54 PM
|
|
In article <o2sgn.42327$0N3.2138@newsfe09.iad>,
Chris M. Thomasson <no@spam.invalid> wrote:
[ snip ]
> >> For what it's worth, I pulled the replace() function and supporting
> >> code out of your solution and connected it with my benchmarking
> >> driver. Output below, followed by output of the best-performing
> >> of my solutions. Yours is faster, though not spectacularly so.
> >> I'll probably do similar experiments with others' solutions as
> >> time and interest permit ....
> >
> > I've benchmarked all the code I could find that's been posted here.
> > I believe another poster (Chris M. Thomasson) has posted links to
> > code in pastebin, but I didn't track that down and include it.
>
> It's here:
>
> http://clc.pastebin.com/f62504e4c
>
Thanks. I'll add it and post another follow-up ....
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/22/2010 1:32:06 PM
|
|
In article <7ufd8mFfl7U1@mid.individual.net>,
blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> In article <7ueqseFfmqU1@mid.individual.net>,
> blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> > In article <7uahcvFbb8U8@mid.individual.net>,
> > blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> > > In article <7u82l6Fvb9U1@mid.individual.net>,
> > > blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> > > > In article <4b7eb795.340407953@text.giganews.com>,
> > > > Richard Harter <cri@tiac.net> wrote:
> > > > > On 19 Feb 2010 14:49:34 GMT, blmblm@myrealbox.com
> > > > > <blmblm@myrealbox.com> wrote:
>
> [ snip ]
>
> > I've benchmarked all the code I could find that's been posted here.
> > I believe another poster (Chris M. Thomasson) has posted links to
> > code in pastebin, but I didn't track that down and include it.
> > One thing that occurred to me is that previously I had been
> > compiling with "gcc -O" (optimization), and "gcc -O3" (maximum
> > optimization) might be better. For reasons I won't summarize
> > right now, I also tried "-O2". Results below (total time only).
> >
> > Presented without analysis for now ....
>
> I missed (at least) one. Ben Bacarisse posted one version of his
> code, in
>
> Message-ID: <0.ffa3b609be9d5a7e3382.20100221014206GMT.877hq7e55d.fsf@bsb.me.uk>
>
> which gives results as shown below
Adding results for Chris Thomasson's code, from pastebin as referenced
in
Message-ID: <o2sgn.42327$0N3.2138@newsfe09.iad>
> He says in that message that which implementation is fastest
> depends on input. So my benchmarking here probably doesn't
> prove much. It does demonstrate, though, that compiler
> optimization level may also affect which is fastest (results
> for Harter's code) ....
>
>
> >
> > harter-1 is the recursive solution, from
> > Message-ID: <4b7d5b19.251195734@text.giganews.com>
> >
> > harter-2 is the non-recursive solution, from
> > Message-ID: <4b821f00.563490562@text.giganews.com>
> >
> > willem is from
> > Message-ID: <slrnhnj8mb.2238.willem@snail.stack.nl>
> >
> > nilges is from
> > Message-ID: <06317ad6-cd11-48e7-80db-3341ef3a1597@b9g2000pri.googlegroups.com>
> >
> > io_x is from
> > msg-id Message-ID: <4b730781$0$822$4fafbaef@reader5.news.tin.it>
> >
> > blmblm-$V-$L [1] is from
> > Message-ID: <7u7ltqF80cU1@mid.individual.net>
> >
> > [1] V = version, L = string library (C or user-written)
> >
thomasson (O2) 4.08 seconds
thomasson (O3) 4.08 seconds
> bacarisse (O2) 4.47 seconds
> bacarisse (O3) 4.48 seconds
>
>
> > harter-1 (O2) 5.67 seconds
> > harter-1 (O3) 8.48 seconds
> > harter-2 (O2) 5.94 seconds
> > harter-2 (O3) 6.51 seconds
> > io_x (O2) 18.05 seconds [1]
> > io_x (O3) **** seconds (segfault) [2]
> > nilges (O2) 7.72 seconds
> > nilges (O3) 7.69 seconds
> > willem (O2) 7.18 seconds
> > willem (O3) 7.23 seconds
> >
> > [2] These results may be misleading -- I wasn't sure how to
> > generate an executable from the mix of C and assembly and more or
> > less tried things until I got a clean compile/link, with:
> >
> > nasm -felf -o replacer.o replacer.s
> > gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
> > (n=2 or n=3)
> >
> > blmblm-1-C (O2) 10.78 seconds
> > blmblm-1-user (O2) 35.97 seconds
> > blmblm-2-C (O2) 9.60 seconds
> > blmblm-2-user (O2) 35.10 seconds
> > blmblm-3-C (O2) 7.86 seconds
> > blmblm-3-user (O2) 8.58 seconds
> >
> > blmblm-1-C (O3) 10.36 seconds
> > blmblm-1-user (O3) 37.58 seconds
> > blmblm-2-C (O3) 9.45 seconds
> > blmblm-2-user (O3) 32.96 seconds
> > blmblm-3-C (O3) 7.49 seconds
> > blmblm-3-user (O3) 7.72 seconds
> >
> > --
> > B. L. Massingill
> > ObDisclaimer: I don't speak for my employers; they return the favor.
>
>
> --
> B. L. Massingill
> ObDisclaimer: I don't speak for my employers; they return the favor.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/22/2010 1:35:12 PM
|
|
blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
<snip>
> I've benchmarked all the code I could find that's been posted here.
For reference, I pulled out your V3 and ran the same sets of tests I
just posted again Richard Harter's second version. Here are the times
I got:
A:
blm_replace(`cat d4004`, "[]", "xx"):
3190500 calls in 3.998s is 1.253µs/call (1.253e-06s/call)
rh_replace2(`cat d4004`, "[]", "xx"):
513700 calls in 4.000s is 7.787µs/call (7.787e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1076300 calls in 4.000s is 3.717µs/call (3.717e-06s/call)
B:
blm_replace(`cat d4004`, "{}", "xx"):
3752900 calls in 3.999s is 1.066µs/call (1.066e-06s/call)
rh_replace2(`cat d4004`, "{}", "xx"):
519000 calls in 4.000s is 7.708µs/call (7.708e-06s/call)
fast_replace(`cat d4004`, "{}", "xx"):
1044200 calls in 4.000s is 3.83µs/call (3.83e-06s/call)
C:
blm_replace(`cat wap.txt`, "and", "xx"):
200 calls in 5.044s is 2.522e+04µs/call (0.02522s/call)
rh_replace2(`cat wap.txt`, "and", "xx"):
400 calls in 4.125s is 1.031e+04µs/call (0.01031s/call)
fast_replace(`cat wap.txt`, "and", "xx"):
500 calls in 4.254s is 8508µs/call (0.008508s/call)
D:
blm_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
6555300 calls in 3.999s is 610.1ns/call (6.101e-07s/call)
rh_replace2("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
25659800 calls in 4.000s is 155.9ns/call (1.559e-07s/call)
fast_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
18203800 calls in 4.000s is 219.7ns/call (2.197e-07s/call)
E:
blm_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
34562700 calls in 3.998s is 115.7ns/call (1.157e-07s/call)
rh_replace2("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
35579200 calls in 4.000s is 112.4ns/call (1.124e-07s/call)
fast_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
44387600 calls in 4.000s is 90.11ns/call (9.011e-08s/call)
A: 4K text with two short replacements.
B: 4K text with no replacements.
C: Large text with lots of replacements.
D: Short text with several short replacements.
E: Short text with no replacements.
If you want to include my current favourite in you tests it is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Bigger is better here: */
#define AUTO_SIZE 20
char *fast_replace(char *src, char *match, char *replacement)
{
size_t mlen = strlen(match), matches = 0, asize = AUTO_SIZE;
const char *small_list[AUTO_SIZE], **mp = small_list, **op = 0;
const char *anchor = src;
do {
while (*anchor && *anchor != *match)
++anchor;
if (*anchor) {
if (strncmp(anchor + 1, match + 1, mlen - 1) == 0) {
if (matches >= asize) {
const char **np = realloc(op, 2 * asize * sizeof *np);
if (!np) {
fprintf(stderr, "Out of memory\n");
free(op);
return 0;
}
if (!op)
memcpy(np, mp, sizeof small_list);
mp = op = np;
asize *= 2;
}
mp[matches++] = anchor;
anchor += mlen;
}
else ++anchor;
}
else break;
} while (*anchor);
size_t rlen = matches ? strlen(replacement) : 0;
char *result = malloc(anchor - src - matches*mlen + matches*rlen + 1);
if (result) {
char *dst = result;
const char *s = src;
for (int i = 0; i < matches; ++i) {
memcpy(dst, s, mp[i] - s);
memcpy(dst += mp[i] - s, replacement, rlen);
dst += rlen;
s = mp[i] + mlen;
}
strcpy(dst, s);
}
free(op);
return result;
}
<snip>
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/22/2010 3:19:41 PM
|
|
blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> In article <7ufd8mFfl7U1@mid.individual.net>,
> blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
<snip>
>> I missed (at least) one. Ben Bacarisse posted one version of his
>> code, in
>>
>> Message-ID: <0.ffa3b609be9d5a7e3382.20100221014206GMT.877hq7e55d.fsf@bsb.me.uk>
>>
>> which gives results as shown below
Thanks. Useful to compare out results.
> Adding results for Chris Thomasson's code, from pastebin as referenced
> in
>
> Message-ID: <o2sgn.42327$0N3.2138@newsfe09.iad>
I've done the same and I get very different results for Chris's code
and for yours. Yours is the fastest on long strings, presumably
because you use strstr. Chris's is next fastest on long strings.
>> He says in that message that which implementation is fastest
>> depends on input.
Yes, though it is reasonably predictable. For example, since Chris's
code is good for long strings and slow for short ones.
>> So my benchmarking here probably doesn't
>> prove much. It does demonstrate, though, that compiler
>> optimization level may also affect which is fastest (results
>> for Harter's code) ....
I used -O3 throughout. I'll post the -O2 with -O2 as well but in my
case I get faster time in all cases using -O3 (gcc 4.4.1).
<snip>
I assume these results are for 4004 byte long strings with 2 short
replacements. If so, here are my comparative timings:
> thomasson (O2) 4.08 seconds
cmt_replace(`cat d4004`, "[]", "xx"):
4162900 calls in 4.000s is 960.8ns/call (9.608e-07s/call)
> thomasson (O3) 4.08 seconds
cmt_replace(`cat d4004`, "[]", "xx"):
4202100 calls in 4.000s is 951.9ns/call (9.519e-07s/call)
>
>> bacarisse (O2) 4.47 seconds
fast_replace(`cat d4004`, "[]", "xx"):
1133100 calls in 4.000s is 3.53µs/call (3.53e-06s/call)
>> bacarisse (O3) 4.48 seconds
fast_replace(`cat d4004`, "[]", "xx"):
1143900 calls in 4.000s is 3.497µs/call (3.497e-06s/call)
You get little difference between Chris's code and mine, but I get a
huge factor. Could just be a difference in hardware or cache/memory
configuration.
>> > harter-1 (O2) 5.67 seconds
>> > harter-1 (O3) 8.48 seconds
>> > harter-2 (O2) 5.94 seconds
rh_replace2(`cat d4004`, "[]", "xx"):
515300 calls in 4.000s is 7.763µs/call (7.763e-06s/call)
>> > harter-2 (O3) 6.51 seconds
rh_replace2(`cat d4004`, "[]", "xx"):
517900 calls in 4.000s is 7.723µs/call (7.723e-06s/call)
>> > io_x (O2) 18.05 seconds [1]
>> > io_x (O3) **** seconds (segfault) [2]
>> > nilges (O2) 7.72 seconds
en_replace(`cat d4004`, "[]", "xx"):
499900 calls in 4.000s is 8.002µs/call (8.002e-06s/call)
>> > nilges (O3) 7.69 seconds
en_replace(`cat d4004`, "[]", "xx"):
486300 calls in 4.000s is 8.226µs/call (8.226e-06s/call)
>> > willem (O2) 7.18 seconds
w_replace(`cat d4004`, "[]", "xx"):
492100 calls in 4.001s is 8.13µs/call (8.13e-06s/call)
>> > willem (O3) 7.23 seconds
w_replace(`cat d4004`, "[]", "xx"):
493100 calls in 4.000s is 8.112µs/call (8.112e-06s/call)
>> > [2] These results may be misleading -- I wasn't sure how to
>> > generate an executable from the mix of C and assembly and more or
>> > less tried things until I got a clean compile/link, with:
>> >
>> > nasm -felf -o replacer.o replacer.s
>> > gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
>> > (n=2 or n=3)
>> >
>> > blmblm-1-C (O2) 10.78 seconds
>> > blmblm-1-user (O2) 35.97 seconds
>> > blmblm-2-C (O2) 9.60 seconds
>> > blmblm-2-user (O2) 35.10 seconds
>> > blmblm-3-C (O2) 7.86 seconds
blm_replace(`cat d4004`, "[]", "xx"):
3162500 calls in 3.998s is 1.264µs/call (1.264e-06s/call)
>> > blmblm-3-user (O2) 8.58 seconds
>> >
>> > blmblm-1-C (O3) 10.36 seconds
>> > blmblm-1-user (O3) 37.58 seconds
>> > blmblm-2-C (O3) 9.45 seconds
>> > blmblm-2-user (O3) 32.96 seconds
>> > blmblm-3-C (O3) 7.49 seconds
blm_replace(`cat d4004`, "[]", "xx"):
3212200 calls in 3.998s is 1.245µs/call (1.245e-06s/call)
Again a difference. You code is super fast, at least on my hardware.
>> > blmblm-3-user (O3) 7.72 seconds
Full results using -O2 this time and the same set of tests to see
variation in times:
blm_replace(`cat d4004`, "[]", "xx"):
3162500 calls in 3.998s is 1.264µs/call (1.264e-06s/call)
rh_replace2(`cat d4004`, "[]", "xx"):
516300 calls in 4.000s is 7.748µs/call (7.748e-06s/call)
cmt_replace(`cat d4004`, "[]", "xx"):
4181100 calls in 4.000s is 956.6ns/call (9.566e-07s/call)
w_replace(`cat d4004`, "[]", "xx"):
492100 calls in 4.001s is 8.13µs/call (8.13e-06s/call)
en_replace(`cat d4004`, "[]", "xx"):
499900 calls in 4.000s is 8.002µs/call (8.002e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1140500 calls in 4.000s is 3.507µs/call (3.507e-06s/call)
blm_replace(`cat d4004`, "{}", "xx"):
3774700 calls in 3.999s is 1.06µs/call (1.06e-06s/call)
rh_replace2(`cat d4004`, "{}", "xx"):
518400 calls in 4.000s is 7.716µs/call (7.716e-06s/call)
cmt_replace(`cat d4004`, "{}", "xx"):
4253000 calls in 4.000s is 940.5ns/call (9.405e-07s/call)
w_replace(`cat d4004`, "{}", "xx"):
495700 calls in 4.000s is 8.07µs/call (8.07e-06s/call)
en_replace(`cat d4004`, "{}", "xx"):
511000 calls in 4.000s is 7.829µs/call (7.829e-06s/call)
fast_replace(`cat d4004`, "{}", "xx"):
1040400 calls in 4.000s is 3.844µs/call (3.844e-06s/call)
blm_replace(`cat wap.txt`, "and", "xx"):
200 calls in 5.107s is 2.553e+04µs/call (0.02553s/call)
rh_replace2(`cat wap.txt`, "and", "xx"):
400 calls in 4.125s is 1.031e+04µs/call (0.01031s/call)
cmt_replace(`cat wap.txt`, "and", "xx"):
200 calls in 4.542s is 2.271e+04µs/call (0.02271s/call)
w_replace(`cat wap.txt`, "and", "xx"):
400 calls in 4.273s is 1.068e+04µs/call (0.01068s/call)
en_replace(`cat wap.txt`, "and", "xx"):
400 calls in 4.915s is 1.229e+04µs/call (0.01229s/call)
fast_replace(`cat wap.txt`, "and", "xx"):
400 calls in 3.420s is 8550µs/call (0.00855s/call)
blm_replace(`cat wap.txt`, "ZZZ", "xx"):
200 calls in 4.479s is 2.24e+04µs/call (0.0224s/call)
rh_replace2(`cat wap.txt`, "ZZZ", "xx"):
600 calls in 3.821s is 6369µs/call (0.006369s/call)
cmt_replace(`cat wap.txt`, "ZZZ", "xx"):
200 calls in 4.383s is 2.192e+04µs/call (0.02192s/call)
w_replace(`cat wap.txt`, "ZZZ", "xx"):
500 calls in 3.379s is 6758µs/call (0.006758s/call)
en_replace(`cat wap.txt`, "ZZZ", "xx"):
600 calls in 3.945s is 6576µs/call (0.006576s/call)
fast_replace(`cat wap.txt`, "ZZZ", "xx"):
1000 calls in 4.105s is 4105µs/call (0.004105s/call)
blm_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
6237100 calls in 3.999s is 641.2ns/call (6.412e-07s/call)
rh_replace2("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
25080600 calls in 4.000s is 159.5ns/call (1.595e-07s/call)
cmt_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
7199800 calls in 4.000s is 555.6ns/call (5.556e-07s/call)
w_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
21167000 calls in 4.000s is 189ns/call (1.89e-07s/call)
en_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
10014700 calls in 4.000s is 399.4ns/call (3.994e-07s/call)
fast_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
18143800 calls in 4.000s is 220.5ns/call (2.205e-07s/call)
blm_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
33012700 calls in 3.999s is 121.1ns/call (1.211e-07s/call)
rh_replace2("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
35563900 calls in 4.000s is 112.5ns/call (1.125e-07s/call)
cmt_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
18117800 calls in 4.000s is 220.8ns/call (2.208e-07s/call)
w_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
40621400 calls in 4.000s is 98.47ns/call (9.847e-08s/call)
en_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
26174300 calls in 4.000s is 152.8ns/call (1.528e-07s/call)
fast_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
41656400 calls in 4.000s is 96.02ns/call (9.602e-08s/call)
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/22/2010 6:08:27 PM
|
|
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
>
> I've done the same and I get very different results for Chris's code
> and for yours. Yours is the fastest on long strings, presumably
> because you use strstr. Chris's is next fastest on long strings.
>
>>> He says in that message that which implementation is fastest
>>> depends on input.
>
> Yes, though it is reasonably predictable. For example, since Chris's
> code is good for long strings and slow for short ones.
>
> You get little difference between Chris's code and mine, but I get a
> huge factor. Could just be a difference in hardware or cache/memory
> configuration.
You're both using gcc, but are you using the same library? Someone
posted earlier about an implementation of strstr that uses Boyer-Moore
for long strings. The BSD implementation I found on the web uses strchr
to find the first character of the match, then strncmp to see if they
are the same. So if one of you has one of these and the other the other
....
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk
|
|
0
|
|
|
|
Reply
|
3-nospam (285)
|
2/22/2010 10:13:06 PM
|
|
Nick <3-nospam@temporary-address.org.uk> writes:
> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>
>> blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
>>
>> I've done the same and I get very different results for Chris's code
>> and for yours. Yours is the fastest on long strings, presumably
>> because you use strstr. Chris's is next fastest on long strings.
>>
>>>> He says in that message that which implementation is fastest
>>>> depends on input.
>>
>> Yes, though it is reasonably predictable. For example, since Chris's
>> code is good for long strings and slow for short ones.
>>
>> You get little difference between Chris's code and mine, but I get a
>> huge factor. Could just be a difference in hardware or cache/memory
>> configuration.
>
> You're both using gcc, but are you using the same library? Someone
> posted earlier about an implementation of strstr that uses Boyer-Moore
> for long strings. The BSD implementation I found on the web uses strchr
> to find the first character of the match, then strncmp to see if they
> are the same. So if one of you has one of these and the other the other
> ...
Good point. It could certainly be a library issue. My library's
strstr does not use Boyer-Moore, but it does use an enhanced search
algorithm that really pays off for long strings.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/23/2010 12:02:07 AM
|
|
"Ben Bacarisse" <ben.usenet@bsb.me.uk> wrote in message
> I assume these results are for 4004 byte long strings with 2 short
> replacements. If so, here are my comparative timings:
>
>> thomasson (O2) 4.08 seconds
> cmt_replace(`cat d4004`, "[]", "xx"):
> 4162900 calls in 4.000s is 960.8ns/call (9.608e-07s/call)
>
>> thomasson (O3) 4.08 seconds
> cmt_replace(`cat d4004`, "[]", "xx"):
> 4202100 calls in 4.000s is 951.9ns/call (9.519e-07s/call)
Etc.
I haven't looked at the code involved (I've lost my way around the thread
and the routines anyway seem to be getting complex). But are you replacing
the 2-char "[]" string with a 2-char "xx" string?
That would suggest a simple optimisation (since the result must be the same
length as the original string?) so wouldn't a replacement string that is
longer and/or shorter than the old one be better?
(So that you don't compare an implementation that does optimise, with one
that doesn't.)
--
Bartc
|
|
0
|
|
|
|
Reply
|
bartc (783)
|
2/23/2010 12:45:55 AM
|
|
"bartc" <bartc@freeuk.com> writes:
> "Ben Bacarisse" <ben.usenet@bsb.me.uk> wrote in message
>> I assume these results are for 4004 byte long strings with 2 short
>> replacements. If so, here are my comparative timings:
>>
>>> thomasson (O2) 4.08 seconds
>> cmt_replace(`cat d4004`, "[]", "xx"):
>> 4162900 calls in 4.000s is 960.8ns/call (9.608e-07s/call)
>>
>>> thomasson (O3) 4.08 seconds
>> cmt_replace(`cat d4004`, "[]", "xx"):
>> 4202100 calls in 4.000s is 951.9ns/call (9.519e-07s/call)
>
> Etc.
>
> I haven't looked at the code involved (I've lost my way around the
> thread and the routines anyway seem to be getting complex). But are
> you replacing the 2-char "[]" string with a 2-char "xx" string?
Yes.
> That would suggest a simple optimisation (since the result must be the
> same length as the original string?) so wouldn't a replacement string
> that is longer and/or shorter than the old one be better?
>
> (So that you don't compare an implementation that does optimise, with
> one that doesn't.)
That particular test is not mine. I was duplicating B L Massingill's
data so I had to use the same pattern, but I did check that this case
is not treated in any special way.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/23/2010 1:53:54 AM
|
|
In article <2919f93f-7d2e-4512-979c-d09e67753f9a@u19g2000prh.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 21, 1:57 am, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > In article <65456f81-71e0-4ddb-8da2-04c67e9e2...@y7g2000prc.googlegroups.com>,
> >
> >
> >
> >
> >
> > spinoza1111 <spinoza1...@yahoo.com> wrote:
> > > On Feb 20, 12:55 am, Seebs <usenet-nos...@seebs.net> wrote:
> > > > On 2010-02-19, blmblm myrealbox.com <blm...@myrealbox.com> wrote:
> >
> > > > >> An interesting factoid about this discussion is that no American or
> > > > >> British posters, save one, have solved the problem I set (write
> > > > >> replace without using string.H).
> > > > > Oh, it's string.H we're to avoid? is it okay to use string.h?
> > > > > Yeah, yeah, nitpick ....
> >
> > > > It's also a silly challenge. If I were going to do that, my first pass
> > > > would just be to put "my" in front of the str* functions I use, and
> > > > define them.
> >
> > > That sounds like (1) unethical behavior and (2) to be expected from
> > > you.
> >
> > How is this unethical? In essence what he's doing is implementing
> > a standard interface, but in a way that can't collide with an
> > existing implementation.
>
> I said last week that my replace function integrates two different
> things.
How does that address my question (about the claim that Seebs's
proposed approach to your challenge was "unethical behavior")?
[ snip ]
> > > > But why would I do that? How about we set a new challenge, which is to
> > > > do matrix multiplication without using the '*' operator. Or, better yet,
> > > > how about we iterate through an array without using a for loop, and write
> > > > our own routines to interpolate numbers into textual strings, so we don't
> > > > have to rely on printf()?
> >
> > > Those of us who can (who like Schildt have written compilers) do this
> > > for fun, for the same reason classical pianists play Czerny finger
> > > exercises, and Bach wrote the Art of Fugue. There are some of us who
> > > don't advance our careers by destroying reputations and buying our way
> > > onto standards boards without any academic qualifications.
> >
> > > Furthermore, if I were doing matrix multiplication times a power of
> > > two, I wouldn't use the * operator. Do you even know what operator I
> > > would use?
> >
> > "Matrix multiplcation times a power of two?" Could you explain what
> > you mean by that? In my usage "matrix multiplication" is a binary
> > operation on matrices, so I don't understand how one of the operands
> > can be a power of two. ?
>
> It contains a power of two in each element in the scenario where you
> wouldn't use * you would use shift. Look, I was just having a little
> fun with Seebie.
And giving the impression that perhaps you weren't quite sure what
matrix multiplication was. Well, whatever.
I'm having a little trouble thinking of situations in which I would
use the shift operator when the desired operation is conceptually a
multiplication (as opposed to a bit shift); to me this seems like
a micro-optimization that would make the code more difficult to
understand without necessarily making it faster. *Maybe* if I had
evidence that the speed of this particular operation was critical,
and that the compiler was not already producing shift instructions
(as, experiment suggests, happens at least sometimes) ....
I'm having even more trouble thinking of how one might express
multiplication by a matrix all of whose elements are powers of two
(would the entries be the exponents? -- i.e., if element a[0][0]
was meant to represent 16, would one encode that as 16, or as 4?),
but -- whatever.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/23/2010 12:25:16 PM
|
|
In article <0.78164a34b1aeb9873866.20100222151941GMT.87bpfhcn76.fsf@bsb.me.uk>,
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> <snip>
> > I've benchmarked all the code I could find that's been posted here.
>
> For reference, I pulled out your V3 and ran the same sets of tests I
> just posted again Richard Harter's second version. Here are the times
> I got:
>
(Very interesting, and I'm trying to keep track of all of these
results so I can look over them when I have the time/energy to try
to understand them. :-)? )
[ snip ]
> If you want to include my current favourite in you tests it is:
>
For the record -- yes, this is the version of your code I've been
using.
[ snip ]
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/23/2010 12:28:49 PM
|
|
In article <87hbp86hsd.fsf@temporary-address.org.uk>,
Nick <3-nospam@temporary-address.org.uk> wrote:
> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>
> > blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> >
> > I've done the same and I get very different results for Chris's code
> > and for yours. Yours is the fastest on long strings, presumably
> > because you use strstr. Chris's is next fastest on long strings.
> >
> >>> He says in that message that which implementation is fastest
> >>> depends on input.
> >
> > Yes, though it is reasonably predictable. For example, since Chris's
> > code is good for long strings and slow for short ones.
> >
> > You get little difference between Chris's code and mine, but I get a
> > huge factor. Could just be a difference in hardware or cache/memory
> > configuration.
Interesting!
> You're both using gcc, but are you using the same library? Someone
> posted earlier about an implementation of strstr that uses Boyer-Moore
> for long strings. The BSD implementation I found on the web uses strchr
> to find the first character of the match, then strncmp to see if they
> are the same. So if one of you has one of these and the other the other
> ...
Times I've posted thus far have been collected on an oldish and
slowish machine, using gcc 4.0.1 and glibc 2.3.5. Details of
hardware configuration on request. I'll try my tests again on
a newer/faster machine, maybe, and report back.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/23/2010 12:32:08 PM
|
|
In article <0.0ddf9db924489b8a3417.20100222180827GMT.87wry5b0tg.fsf@bsb.me.uk>,
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
>
> > In article <7ufd8mFfl7U1@mid.individual.net>,
> > blmblm@myrealbox.com <blmblm@myrealbox.com> wrote:
> <snip>
> >> I missed (at least) one. Ben Bacarisse posted one version of his
> >> code, in
> >>
> >> Message-ID:
> <0.ffa3b609be9d5a7e3382.20100221014206GMT.877hq7e55d.fsf@bsb.me.uk>
> >>
> >> which gives results as shown below
>
> Thanks. Useful to compare out results.
Indeed (though I'm having a little trouble keeping up, and understanding
what they mean .... ) .
[ snip ]
> >> So my benchmarking here probably doesn't
> >> prove much. It does demonstrate, though, that compiler
> >> optimization level may also affect which is fastest (results
> >> for Harter's code) ....
>
> I used -O3 throughout. I'll post the -O2 with -O2 as well but in my
> case I get faster time in all cases using -O3 (gcc 4.4.1).
My results posted thus far were with gcc 4.0.1 (and glibc 2.3.5).
When I repeat the experiment, on a newer/faster system, with
gcc 4.4.1 (and glibc 2.10.1), I also find that -O3 is *almost*
always faster (and the one exception -- perhaps I need to rerun
the test, in case there was something else going on on the machine
that skewed results). Results below.
> <snip>
> I assume these results are for 4004 byte long strings with 2 short
> replacements.
No, this is with my full test suite (such as it is):
4004-byte input, 2 short replacements (2/2)
4020-byte input, 10 short replacements (2/2)
4100-byte input, 50 short replacements (2/2)
4500-byte input, 50 slightly-longer replacements (10/10)
(Differences in input length are because I was lazy in coding
the generate-test-data function -- its inputs are the length of
the unchanged text, the length of the old/new text, and how many
times to repeat .... )
I've been posting total times only to save space and haven't been
looking carefully at whether code that's faster overall is also
faster for each individual test.
Results with gcc 4.4.1, as described above .... :
bacarisse (O2) 1.74 seconds
bacarisse (O3) 1.75 seconds
blmblm (O2) 2.52 seconds
blmblm (O3) 2.48 seconds
harter-1 (O2) 2.58 seconds
harter-1 (O3) 2.49 seconds
harter-2 (O2) 2.27 seconds
harter-2 (O3) 2.24 seconds
io_x (O2) 9.82 seconds
nilges (O2) 2.36 seconds
nilges (O3) 2.35 seconds
thomasson (O2) 1.69 seconds
thomasson (O3) 1.68 seconds
willem (O2) 2.77 seconds
willem (O3) 4.16 seconds [ can this be right?! ]
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/23/2010 1:06:33 PM
|
|
In article <d520a640-1606-407e-9b7f-b9c75f4d5159@s36g2000prf.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
> >
> > news:rand-20100215000605@ram.dialup.fu-berlin.de...
[ snip ]
> And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> STRINGS. If you use it to construct a table of replace points you're
> gonna have an interesting bug-o-rama:
>
> replace("banana", "ana", "ono")
>
> IF you restart one position after the find point, and not at its end.
Why would you do that, though? only if you *wanted* to detect
overlapping strings, and -- if you did detect them, what would
you do with them? I can't think of any sensible definition of
"replace" that does anything with overlapping strings [*], so
when I wrote my first solution to this problem, I of course used
strstr and of course started scanning for the next match after
the end of the previous match.
[*] Chris Thomasson's reply to your post points out the ambiguities.
> Moral: don't let the library do your thinking for you.
Mostly I'm replying to this rather old post, though, because it
seems as good a place as any to attempt a more-or-less formal
specification of the problem, which I'm not sure we have, and
which might be interesting. (Apologies if I missed one somewhere.)
Here's my proposed specification, in which "is not a substring of"
and "concat" have what I hope are obvious meanings, and names
beginning s_ denote strings:
replace(s_input, s_old, s_new) yields
if s_old is not a substring of s_input
s_input
else
concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
where s_input_prefix and s_input_tail are such that
s_input = concat(s_input_prefix, s_old, s_input_tail)
and
s_old is not a substring of s_input_prefix
Somewhat more formal than the "scans left to right and ignores
overlapping strings", though whether it's actually clearer might
be a matter of opinion.
And then perhaps we could replace the "does not use string.h"
constraint with something more meaningful [*], though I'm not
sure what.
[*] My objection to this constraint is that any minimally competent
programmer should be able to write functions that implement the
same API, so just avoiding use of the library functions doesn't
seem to me to make the problem more interesting.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/23/2010 2:38:32 PM
|
|
On Feb 23, 8:25=A0pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> In article <2919f93f-7d2e-4512-979c-d09e67753...@u19g2000prh.googlegroups=
..com>,
>
>
>
>
>
> spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> > On Feb 21, 1:57 am, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > > In article <65456f81-71e0-4ddb-8da2-04c67e9e2...@y7g2000prc.googlegro=
ups.com>,
>
> > >spinoza1111<spinoza1...@yahoo.com> wrote:
> > > > On Feb 20, 12:55 am, Seebs <usenet-nos...@seebs.net> wrote:
> > > > > On 2010-02-19, blmblm =A0myrealbox.com <blm...@myrealbox.com> wro=
te:
>
> > > > > >> An interesting factoid about this discussion is that no Americ=
an or
> > > > > >> British posters, save one, have solved the problem I set (writ=
e
> > > > > >> replace without using string.H).
> > > > > > Oh, it's string.H we're to avoid? =A0is it okay to use string.h=
?
> > > > > > Yeah, yeah, nitpick ....
>
> > > > > It's also a silly challenge. =A0If I were going to do that, my fi=
rst pass
> > > > > would just be to put "my" in front of the str* functions I use, a=
nd
> > > > > define them.
>
> > > > That sounds like (1) unethical behavior and (2) to be expected from
> > > > you.
>
> > > How is this unethical? =A0In essence what he's doing is implementing
> > > a standard interface, but in a way that can't collide with an
> > > existing implementation.
>
> > I said last week that my replace function integrates two different
> > things.
>
> How does that address my question (about the claim that Seebs's
> proposed approach to your challenge was "unethical behavior")?
Because he hasn't once shown us he can code to the problem statement
and not use string.h. Whilst calling people undeserved names.
>
> [ snip ]
>
>
>
>
>
> > > > > But why would I do that? =A0How about we set a new challenge, whi=
ch is to
> > > > > do matrix multiplication without using the '*' operator. =A0Or, b=
etter yet,
> > > > > how about we iterate through an array without using a for loop, a=
nd write
> > > > > our own routines to interpolate numbers into textual strings, so =
we don't
> > > > > have to rely on printf()?
>
> > > > Those of us who can (who like Schildt have written compilers) do th=
is
> > > > for fun, for the same reason classical pianists play Czerny finger
> > > > exercises, and Bach wrote the Art of Fugue. There are some of us wh=
o
> > > > don't advance our careers by destroying reputations and buying our =
way
> > > > onto standards boards without any academic qualifications.
>
> > > > Furthermore, if I were doing matrix multiplication times a power of
> > > > two, I wouldn't use the * operator. Do you even know what operator =
I
> > > > would use?
>
> > > "Matrix multiplcation times a power of two?" =A0Could you explain wha=
t
> > > you mean by that? =A0In my usage "matrix multiplication" is a binary
> > > operation on matrices, so I don't understand how one of the operands
> > > can be a power of two. =A0?
>
> > It contains a power of two in each element in the scenario where you
> > wouldn't use * you would use shift. Look, I was just having a little
> > fun with Seebie.
>
> And giving the impression that perhaps you weren't quite sure what
> matrix multiplication was. =A0Well, whatever.
No, I think I do. A matrix can be multiplied by a scalar, a vector, or
a matrix. Basic finite math. It's been years but I was awake, and I
respected my professors.
>
> I'm having a little trouble thinking of situations in which I would
> use the shift operator when the desired operation is conceptually a
> multiplication (as opposed to a bit shift); to me this seems like
It's a common compiler optimization in fact.
> a micro-optimization that would make the code more difficult to
Not to me. I do it all the time. Shifting n bits to the left and not
rotating is how you get to *2**n: shifting n bits to the right, and
not rotating, is how you get to *2**(-n). It even works when n is
zero.
> understand without necessarily making it faster. =A0*Maybe* if I had
> evidence that the speed of this particular operation was critical,
> and that the compiler was not already producing shift instructions
> (as, experiment suggests, happens at least sometimes) ....
Ah, so you know (of course). See, I was fucking with Seebach to see if
he could think for a change outside of the box. I know that "tools" do
all this stuff, but someone's got to reinvent and maintain the wheel.
Otherwise...flat tires.
>
> I'm having even more trouble thinking of how one might express
> multiplication by a matrix all of whose elements are powers of two
> (would the entries be the exponents? -- i.e., if element a[0][0]
> was meant to represent 16, would one encode that as 16, or as 4?),
> but -- whatever.
No, I just want to multiply every element, no matter what its value,
by a power of two:
for(i =3D 0; i < rows; i++)
for(j =3D 0; i < columns; j++)
matrix[i][j] =3D matrix[i][j] << n;
Think that's right. Sure, the compiler MIGHT use some fairly
sophisticated analysis to know that in
matrix[i][j] *=3D 2**n;
the operation can be a shift, but it must know whether n is greater
than or less than zero. If n is unsigned, cool, but that's one more
thing for the compiler to worry about.
I'm no big fan of these micro-optimizations, however, because I prefer
a nobler goal, which is the complete destruction of the very idea of
terminating a string with a Nul. Basically, I was messing with Seeb's
head.
>
> --
> B. L. Massingill
> ObDisclaimer: =A0I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/23/2010 3:49:45 PM
|
|
On 2010-02-23, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> Somewhat more formal than the "scans left to right and ignores
> overlapping strings", though whether it's actually clearer might
> be a matter of opinion.
I don't think it is.
I would have specced it as:
echo "$STRING" | sed -e "s/$OLD/$NEW/g"
With the caveat that you have to overlook questions like "what if $OLD has
slashes in it".
But yeah, that seems to be the same spec. And really, the overlapping
strings question is just plain stupid. The output of a replacement isn't
rescanned, and the replacement of a given substring removes its content
from further consideration, so there is nothing to do about an alleged
overlapping string. Looking for "aba" in "dababa", you find "d*aba*ba",
you replace it with whatever, and what's left is "ba". Nothing special
or complicated.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/23/2010 5:05:40 PM
|
|
Ben Pfaff <blp@cs.stanford.edu> wrote:
> Seebs <usenet-nospam@seebs.net> writes:
>
> > But why would I do that? How about we set a new challenge, which is to
> > do matrix multiplication without using the '*' operator. Or, better yet,
> > how about we iterate through an array without using a for loop, and write
> > our own routines to interpolate numbers into textual strings, so we don't
> > have to rely on printf()?
>
> Funny, I see posters asking for code to do dumb things like this
> regularly.
Yes, there is an astounding number of incompetent programming teachers,
and an even greater number of even worse programming students, out
there. (Many of them, apparently, in India - that country's programming
culture does not match the joyfulness of its kitchen). But that does not
mean that we should take them, or spinoza1111, seriously.
Richard
|
|
0
|
|
|
|
Reply
|
raltbos (821)
|
2/23/2010 9:20:44 PM
|
|
Nick Keighley <nick_keighley_nospam@hotmail.com> wrote:
> On 17 Feb, 21:14, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> > The scanf function is basically a mess, and is rarely used correctly. I
> > am at a loss to understand why it is introduced so early in programming
> > texts.
>
> a former clc regular once posted
>
> ***
> The fscanf equivalent of fgets is so simple
> that it can be used inline whenever needed:-
> char s[NN + 1] = "", c;
> int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
> if (rc == 1) fscanf("%*[^\n]%*c);
> if (rc == 0) getc(fp);
> ***
That sounds suspiciously like either Dan Pop, who had a bee in his
bonnet regarding scanf() (amongst others), or like someone employing
irony at said Mr. Pop.
> I think scanf() is seen as a straight forward way to read simple
> unvalidated input. I'm not convinced that's a good idea.
It's a very bad idea. sscanf() can be a reasonable way to read simple
_validated_ input. Unvalidated, none of that family is useable.
Richard
|
|
0
|
|
|
|
Reply
|
raltbos (821)
|
2/23/2010 9:20:48 PM
|
|
<blmblm@myrealbox.com> wrote in message
news:7ui26oFieaU1@mid.individual.net...
> In article
> <0.0ddf9db924489b8a3417.20100222180827GMT.87wry5b0tg.fsf@bsb.me.uk>,
[...]
> I've been posting total times only to save space and haven't been
> looking carefully at whether code that's faster overall is also
> faster for each individual test.
>
> Results with gcc 4.4.1, as described above .... :
>
> bacarisse (O2) 1.74 seconds
> bacarisse (O3) 1.75 seconds
> blmblm (O2) 2.52 seconds
> blmblm (O3) 2.48 seconds
> harter-1 (O2) 2.58 seconds
> harter-1 (O3) 2.49 seconds
> harter-2 (O2) 2.27 seconds
> harter-2 (O3) 2.24 seconds
> io_x (O2) 9.82 seconds
> nilges (O2) 2.36 seconds
> nilges (O3) 2.35 seconds
> thomasson (O2) 1.69 seconds
> thomasson (O3) 1.68 seconds
> willem (O2) 2.77 seconds
> willem (O3) 4.16 seconds [ can this be right?! ]
FWIW, I always try to "prime" the program by making several dummy runs the
same data set. After that, I time several iterations of it on the same data
set, and report the average. Humm, when I get some time, I think I will
create another "entry" that uses a non-naive sub-string search algorithm:
http://groups.google.com/group/comp.lang.c/msg/825cf5bd46ad4a9f
I think it will be interesting to see how it effects your timing results.
|
|
0
|
|
|
|
Reply
|
no6 (2791)
|
2/23/2010 10:54:29 PM
|
|
blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> In article <0.0ddf9db924489b8a3417.20100222180827GMT.87wry5b0tg.fsf@bsb.me.uk>,
> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
<snip>
>> I used -O3 throughout. I'll post the -O2 with -O2 as well but in my
>> case I get faster time in all cases using -O3 (gcc 4.4.1).
>
> My results posted thus far were with gcc 4.0.1 (and glibc 2.3.5).
> When I repeat the experiment, on a newer/faster system, with
> gcc 4.4.1 (and glibc 2.10.1), I also find that -O3 is *almost*
> always faster (and the one exception -- perhaps I need to rerun
> the test, in case there was something else going on on the machine
> that skewed results). Results below.
If that odd -O3 result goes away, it means we can largely ignore
-O2/-O3 differences. That halves the data.
>> <snip>
>> I assume these results are for 4004 byte long strings with 2 short
>> replacements.
>
> No, this is with my full test suite (such as it is):
>
> 4004-byte input, 2 short replacements (2/2)
> 4020-byte input, 10 short replacements (2/2)
> 4100-byte input, 50 short replacements (2/2)
> 4500-byte input, 50 slightly-longer replacements (10/10)
Ah, I see. I've made up data files with these properties and re-run
my tests. I've scaled the per call times up by 20,000 (I remember you
used 20,000 calls) and I've aggregated the times for these four tests
for all of the methods I can time.
<snip>
> (Differences in input length are because I was lazy in coding
> Results with gcc 4.4.1, as described above .... :
>
> bacarisse (O2) 1.74 seconds
> bacarisse (O3) 1.75 seconds
> blmblm (O2) 2.52 seconds
> blmblm (O3) 2.48 seconds
> harter-1 (O2) 2.58 seconds
> harter-1 (O3) 2.49 seconds
> harter-2 (O2) 2.27 seconds
> harter-2 (O3) 2.24 seconds
> io_x (O2) 9.82 seconds
> nilges (O2) 2.36 seconds
> nilges (O3) 2.35 seconds
> thomasson (O2) 1.69 seconds
> thomasson (O3) 1.68 seconds
> willem (O2) 2.77 seconds
> willem (O3) 4.16 seconds [ can this be right?! ]
Looks wacky to me! Is it repeatable?
Here are my times (also gcc 4.4.1 and libc 2.10.1). I seem to have a
faster machine. The first number are your times (for reference) and
the second are mine (in seconds). The third column is the ratio of
the two. You can see that there is more going on than just the speed
of the machine.
bacarisse (O2) 1.74 0.426 4.08
bacarisse (O3) 1.75 0.400 4.38
blmblm (O2) 2.52 0.540 4.67
blmblm (O3) 2.48 0.501 4.95
harter-1 (O2) 2.58 0.857 3.01
harter-1 (O3) 2.49 0.803 3.10
harter-2 (O2) 2.27 0.780 2.91
harter-2 (O3) 2.24 0.722 3.10
io_x No data
nilges (O2) 2.36 0.881 2.68
nilges (O3) 2.35 0.861 2.73
thomasson (O2) 1.69 0.380 4.45
thomasson (O3) 1.68 0.364 4.62
willem (O2) 2.77 0.813 3.41
willem (O3) 4.16 0.885 4.70
If we are now measuring the same things, it seems that some code is
favoured by my system (yours for example) and some does not do so
well. I suspect interactions with the various caches but that is a
huge guess.
My other tests show that the rank can be switched round a lot by using
shorter strings. To get a more balanced view, the tests would have to
range over various string lengths, but I doubt that there can be any
idea of the "best" way to do this based on speed. Clear simple code
will always win out for me in these situations, despite my love of
generating (ultimately pointless) data like this.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/23/2010 11:05:38 PM
|
|
On Feb 24, 5:20=A0am, ralt...@xs4all.nl (Richard Bos) wrote:
> Ben Pfaff <b...@cs.stanford.edu> wrote:
> > Seebs <usenet-nos...@seebs.net> writes:
>
> > > But why would I do that? =A0How about we set a new challenge, which i=
s to
> > > do matrix multiplication without using the '*' operator. =A0Or, bette=
r yet,
> > > how about we iterate through an array without using a for loop, and w=
rite
> > > our own routines to interpolate numbers into textual strings, so we d=
on't
> > > have to rely on printf()?
>
> > Funny, I see posters asking for code to do dumb things like this
> > regularly.
>
> Yes, there is an astounding number of incompetent programming teachers,
> and an even greater number of even worse programming students, out
> there. (Many of them, apparently, in India - that country's programming
> culture does not match the joyfulness of its kitchen). But that does not
In fact, I've worked at IBM and elsewhere with Indian developers, and
I've trained them in Fiji. They work harder, more intelligently and
have more intellectual curiosity than white American developers, who
are ready as we see here to defend their crap to the death rather than
learn. It is racist to praise their cooking and condemn their
programming, because they are on average better at programming than
most people here.
> mean that we should take them, orspinoza1111, seriously.
>
> Richard
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/24/2010 2:51:21 AM
|
|
spinoza1111 wrote:
> On Feb 24, 5:20 am, ralt...@xs4all.nl (Richard Bos) wrote:
>> Ben Pfaff <b...@cs.stanford.edu> wrote:
>>> Seebs <usenet-nos...@seebs.net> writes:
>>>> But why would I do that? How about we set a new challenge, which is to
>>>> do matrix multiplication without using the '*' operator. Or, better yet,
>>>> how about we iterate through an array without using a for loop, and write
>>>> our own routines to interpolate numbers into textual strings, so we don't
>>>> have to rely on printf()?
>>> Funny, I see posters asking for code to do dumb things like this
>>> regularly.
>> Yes, there is an astounding number of incompetent programming teachers,
>> and an even greater number of even worse programming students, out
>> there. (Many of them, apparently, in India - that country's programming
>> culture does not match the joyfulness of its kitchen). But that does not
>
> In fact, I've worked at IBM and elsewhere with Indian developers, and
> I've trained them in Fiji.
I, too, have worked with Indian programmers - lots of them, in fact.
Most of them are not very good at programming. Neither are most British,
Australian, New Zealander, and US programmers, of course. And some
Indian programmers are extremely good (just as some British, Australian,
New Zealander, and US programmers are extremely good).
Race appears to have little or nothing to do with programming ability,
but to say that most Indian programmers are dreadful at programming is
perfectly accurate - India is not immune from Sturgeon's Law.
<snip>
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/24/2010 3:07:19 AM
|
|
On 2010-02-24, Richard Heathfield <rjh@see.sig.invalid> wrote:
> Race appears to have little or nothing to do with programming ability,
> but to say that most Indian programmers are dreadful at programming is
> perfectly accurate - India is not immune from Sturgeon's Law.
People who have dealt with outsourcing are very likely to have encountered
a disproportionately bad sample of programmers from some other country.
I have not known anyone who has had particularly good experiences with
"cheap offshore coders", and I have oodles of horror stories.
On the other hand, I have a largeish number of Chinese coworkers. They're
nearly all novice-level or newbie programmers, but they're good people to
work with -- a huge step up from the outsourcing place we used to use.
I think a big part of the problem is that people are carefully set up to
fail -- I'm a pretty decent programmer by many standards, and I could not
produce adequate code from some of the specs I've seen. Giving these
tasks to novice programmers who have to do a full day's work without any
chance to ask for clarification or feedback... Well, I can imagine ways
this could work poorly.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/24/2010 3:25:54 AM
|
|
On Feb 23, 10:38=A0pm, blm...@myrealbox.com <blm...@myrealbox.com>
wrote:
> In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegroups=
..com>,
>
> spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> > On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
>
> > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
>
> [ snip ]
>
> > And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> > STRINGS. If you use it to construct a table of replace points you're
> > gonna have an interesting bug-o-rama:
>
> > replace("banana", "ana", "ono")
>
> > IF you restart one position after the find point, and not at its end.
>
> Why would you do that, though? =A0only if you *wanted* to detect
> overlapping strings, and -- if you did detect them, what would
> you do with them? =A0I can't think of any sensible definition of
> "replace" that does anything with overlapping strings [*], so
> when I wrote my first solution to this problem, I of course used
> strstr and of course started scanning for the next match after
> the end of the previous match.
>
> [*] Chris Thomasson's reply to your post points out the ambiguities.
>
> > Moral: don't let the library do your thinking for you.
>
> Mostly I'm replying to this rather old post, though, because it
> seems as good a place as any to attempt a more-or-less formal
> specification of the problem, which I'm not sure we have, and
> which might be interesting. =A0(Apologies if I missed one somewhere.)
>
> Here's my proposed specification, in which "is not a substring of"
> and "concat" have what I hope are obvious meanings, and names
> beginning s_ denote strings:
>
> replace(s_input, s_old, s_new) yields
>
> if s_old is not a substring of s_input
>
> =A0 s_input
>
> else
>
> =A0 concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
>
> =A0 where s_input_prefix and s_input_tail are such that
>
> =A0 =A0 s_input =3D concat(s_input_prefix, s_old, s_input_tail)
>
> =A0 and
>
> =A0 =A0 =A0s_old is not a substring of s_input_prefix
>
> Somewhat more formal than the "scans left to right and ignores
> overlapping strings", though whether it's actually clearer might
> be a matter of opinion.
>
> And then perhaps we could replace the "does not use string.h"
> constraint with something more meaningful [*], though I'm not
> sure what.
>
> [*] My objection to this constraint is that any minimally competent
> programmer should be able to write functions that implement the
> same API, so just avoiding use of the library functions doesn't
> seem to me to make the problem more interesting.
>
> --
> B. L. Massingill
> ObDisclaimer: =A0I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/24/2010 12:46:40 PM
|
|
On Feb 23, 10:38=A0pm, blm...@myrealbox.com <blm...@myrealbox.com>
wrote:
> In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegroups=
..com>,
>
> spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> > On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
>
> > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
>
> [ snip ]
>
> > And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> > STRINGS. If you use it to construct a table of replace points you're
> > gonna have an interesting bug-o-rama:
>
> > replace("banana", "ana", "ono")
>
> > IF you restart one position after the find point, and not at its end.
>
> Why would you do that, though? =A0only if you *wanted* to detect
Search me. But that's what the code I was discussing actually did.
> overlapping strings, and -- if you did detect them, what would
> you do with them? =A0I can't think of any sensible definition of
> "replace" that does anything with overlapping strings [*], so
replace(banana, ana, ono) could equal
bonona going left to right without overlap
banono going right to left without overlap
bonono going both ways with overlap
The third case could arise in natural language processing.
Suppose that in some language, the ana sound is transformed into the
ono sound to transform present into past tense (weirder things
happen), and suppose speakers do this to ALL occurences of the three
tones a, voiced n, a. When the sounds are adjacent they are
nonetheless distinct in speech but not in writing.
Now, the response of most garden-variety "break room" programmers is
"that's bullshit, and can never happen". But we know that in
programming, many strange things can happen, and that as Hamlet
admonished Horatio, we must "as a stranger give it welcome". Many more
strange things can happen outside programming, and programmers, even
of the Hooters or break room ilk, better realize this when programming
is used to solve problems.
> when I wrote my first solution to this problem, I of course used
> strstr and of course started scanning for the next match after
> the end of the previous match.
>
> [*] Chris Thomasson's reply to your post points out the ambiguities.
>
> > Moral: don't let the library do your thinking for you.
>
> Mostly I'm replying to this rather old post, though, because it
> seems as good a place as any to attempt a more-or-less formal
> specification of the problem, which I'm not sure we have, and
> which might be interesting. =A0(Apologies if I missed one somewhere.)
>
> Here's my proposed specification, in which "is not a substring of"
> and "concat" have what I hope are obvious meanings, and names
> beginning s_ denote strings:
>
> replace(s_input, s_old, s_new) yields
>
> if s_old is not a substring of s_input
>
> =A0 s_input
>
> else
>
> =A0 concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
>
> =A0 where s_input_prefix and s_input_tail are such that
>
> =A0 =A0 s_input =3D concat(s_input_prefix, s_old, s_input_tail)
>
> =A0 and
>
> =A0 =A0 =A0s_old is not a substring of s_input_prefix
This is fine as long as we understand your concat as NOT specifying
left to right or right to left direction. The bonono problem would
have to be handled by preprocessing (translating banana into banaana,
perhaps using a rule that "vowels" (in the language) can always be
duplicated because their sounds don't break.
>
> Somewhat more formal than the "scans left to right and ignores
> overlapping strings", though whether it's actually clearer might
> be a matter of opinion.
>
> And then perhaps we could replace the "does not use string.h"
> constraint with something more meaningful [*], though I'm not
> sure what.
How about "does not use string.H and gets rid of the Obscene
Excrescence: the use of Nul to terminate a string, thereby creating a
new C that is almost fit for use".
>
> [*] My objection to this constraint is that any minimally competent
> programmer should be able to write functions that implement the
> same API, so just avoiding use of the library functions doesn't
> seem to me to make the problem more interesting.
No. The API locks us into bad thoughts.
>
> --
> B. L. Massingill
> ObDisclaimer: =A0I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/24/2010 12:57:59 PM
|
|
On Feb 24, 11:25=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-24, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> > Race appears to have little or nothing to do with programming ability,
> > but to say that most Indian programmers are dreadful at programming is
> > perfectly accurate - India is not immune from Sturgeon's Law.
>
> People who have dealt with outsourcing are very likely to have encountere=
d
> a disproportionately bad sample of programmers from some other country.
> I have not known anyone who has had particularly good experiences with
> "cheap offshore coders", and I have oodles of horror stories.
So do most incompetent programmers. The reason being that they are not
only bad at coding, they also do a bad job of problem specification,
and as technically educated people, they know nothing of the culture
of the offshore programmers, apart from racist
stereotyping...including the patronizing admiration for cuisine, which
relegates the people of the land, and cultures that are normally far
older than Europe, to a feminized, subordinate and servant role.
>
> On the other hand, I have a largeish number of Chinese coworkers. =A0They=
're
> nearly all novice-level or newbie programmers, but they're good people to
> work with -- a huge step up from the outsourcing place we used to use.
>
> I think a big part of the problem is that people are carefully set up to
> fail -- I'm a pretty decent programmer by many standards, and I could not
Not by mine.
> produce adequate code from some of the specs I've seen. =A0Giving these
> tasks to novice programmers who have to do a full day's work without any
> chance to ask for clarification or feedback... =A0Well, I can imagine way=
s
> this could work poorly.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/24/2010 1:01:29 PM
|
|
In article <c7dff3ff-c1a5-4c0c-a9d7-814303838082@m27g2000prl.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 23, 8:25 pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > In article <2919f93f-7d2e-4512-979c-d09e67753...@u19g2000prh.googlegroups.com>,
> >
> >
> >
> >
> >
> > spinoza1111 <spinoza1...@yahoo.com> wrote:
> > > On Feb 21, 1:57 am, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > > > In article <65456f81-71e0-4ddb-8da2-04c67e9e2...@y7g2000prc.googlegroups.com>,
> >
> > > >spinoza1111<spinoza1...@yahoo.com> wrote:
> > > > > On Feb 20, 12:55 am, Seebs <usenet-nos...@seebs.net> wrote:
[ snip ]
> > > > > Furthermore, if I were doing matrix multiplication times a power of
> > > > > two, I wouldn't use the * operator. Do you even know what operator I
> > > > > would use?
> >
> > > > "Matrix multiplcation times a power of two?" Could you explain what
> > > > you mean by that? In my usage "matrix multiplication" is a binary
> > > > operation on matrices, so I don't understand how one of the operands
> > > > can be a power of two. ?
> >
> > > It contains a power of two in each element in the scenario where you
> > > wouldn't use * you would use shift. Look, I was just having a little
> > > fun with Seebie.
> >
> > And giving the impression that perhaps you weren't quite sure what
> > matrix multiplication was. Well, whatever.
>
> No, I think I do. A matrix can be multiplied by a scalar, a vector, or
> a matrix. Basic finite math. It's been years but I was awake, and I
> respected my professors.
>
> >
> > I'm having a little trouble thinking of situations in which I would
> > use the shift operator when the desired operation is conceptually a
> > multiplication (as opposed to a bit shift); to me this seems like
>
> It's a common compiler optimization in fact.
>
> > a micro-optimization that would make the code more difficult to
>
> Not to me. I do it all the time. Shifting n bits to the left and not
> rotating is how you get to *2**n: shifting n bits to the right, and
> not rotating, is how you get to *2**(-n). It even works when n is
> zero.
>
> > understand without necessarily making it faster. *Maybe* if I had
> > evidence that the speed of this particular operation was critical,
> > and that the compiler was not already producing shift instructions
> > (as, experiment suggests, happens at least sometimes) ....
>
> Ah, so you know (of course).
You may give me too much credit. I do have advanced degrees
in CS, but my undergraduate degree is in another field, and --
well, for one reason and another there are courses that probably
would enhance my understanding of my field but that I simply have
never taken, and the list of things I wish I knew more about (and
arguably *SHOULD* know more about) is -- long.
> See, I was fucking with Seebach to see if
> he could think for a change outside of the box.
As best I can tell, he doesn't read anything you're posting here
unless someone else quotes it -- or at least he doesn't reply,
so your taunting .... ah well, I suppose now that I've quoted
you he may choose to reply. Or not.
I would be very surprised to hear that anyone who talks about
code as he does would *not* be aware of this particular trick.
Inventing it for oneself might require "thinking outside the box";
being aware that it exists -- not so much.
> I know that "tools" do
> all this stuff, but someone's got to reinvent and maintain the wheel.
> Otherwise...flat tires.
> >
> > I'm having even more trouble thinking of how one might express
> > multiplication by a matrix all of whose elements are powers of two
> > (would the entries be the exponents? -- i.e., if element a[0][0]
> > was meant to represent 16, would one encode that as 16, or as 4?),
> > but -- whatever.
>
> No, I just want to multiply every element, no matter what its value,
> by a power of two:
*But this is not matrix multiplication* -- or at least, not as I
define the term. Curiously enough, however, Wikipedia seems to
agree with you that "matrix multiplication" can mean multiplying
a matrix by either another matrix or a scalar. So my attempt to
nitpick here may be off-base. Of course, elsethread you don't
seem to find Wikipedia a credible source. "Whatever."
> for(i = 0; i < rows; i++)
> for(j = 0; i < columns; j++)
> matrix[i][j] = matrix[i][j] << n;
>
> Think that's right. Sure, the compiler MIGHT use some fairly
> sophisticated analysis to know that in
>
> matrix[i][j] *= 2**n;
"**" for exponentation is Fortran, not C. (As you may know.)
> the operation can be a shift, but it must know whether n is greater
> than or less than zero. If n is unsigned, cool, but that's one more
> thing for the compiler to worry about.
gcc seems to be reasonably smart about generating shifts rather
than multiplies if n is something that's known at compile time,
even though (as far as I know) one must use the "pow" function
to express exponentiation in C.
> I'm no big fan of these micro-optimizations, however, because I prefer
> a nobler goal, which is the complete destruction of the very idea of
> terminating a string with a Nul. Basically, I was messing with Seeb's
> head.
I don't plan to assist you with reaching either goal -- not on purpose
anyway. (Hm.)
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/25/2010 9:22:11 AM
|
|
In article <slrnho830v.lm8.usenet-nospam@guild.seebs.net>,
Seebs <usenet-nospam@seebs.net> wrote:
> On 2010-02-23, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> > Somewhat more formal than the "scans left to right and ignores
> > overlapping strings", though whether it's actually clearer might
> > be a matter of opinion.
>
> I don't think it is.
Ah, but that's because you never took any CS classes, in which
you would have been exposed to this sort of thing .... :-)
Then again, my experience teaching undergrad CS suggests that
instructors can lead the students to formalism but can't always
get them to like it.
> I would have specced it as:
> echo "$STRING" | sed -e "s/$OLD/$NEW/g"
>
> With the caveat that you have to overlook questions like "what if $OLD has
> slashes in it".
Yeah. That works too -- for people who know what sed is and does.
> But yeah, that seems to be the same spec. And really, the overlapping
> strings question is just plain stupid. The output of a replacement isn't
> rescanned, and the replacement of a given substring removes its content
> >from further consideration, so there is nothing to do about an alleged
> overlapping string. Looking for "aba" in "dababa", you find "d*aba*ba",
> you replace it with whatever, and what's left is "ba". Nothing special
> or complicated.
Sure. I just thought it might be fun to try to come up with a
semi-formal specification that *doesn't* involve narratives about
what the program is doing. I like that sort of thing, but I guess
not everyone does. "Whatever." [*]
[*] Normally I would write <shrug> here, but that usage now has
unfortunate associations.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/25/2010 9:24:51 AM
|
|
In article <D0Zgn.56292$G_2.27271@newsfe15.iad>,
Chris M. Thomasson <no@spam.invalid> wrote:
> <blmblm@myrealbox.com> wrote in message
> news:7ui26oFieaU1@mid.individual.net...
> > In article
> > <0.0ddf9db924489b8a3417.20100222180827GMT.87wry5b0tg.fsf@bsb.me.uk>,
> [...]
>
> > I've been posting total times only to save space and haven't been
> > looking carefully at whether code that's faster overall is also
> > faster for each individual test.
> >
> > Results with gcc 4.4.1, as described above .... :
> >
> > bacarisse (O2) 1.74 seconds
> > bacarisse (O3) 1.75 seconds
> > blmblm (O2) 2.52 seconds
> > blmblm (O3) 2.48 seconds
> > harter-1 (O2) 2.58 seconds
> > harter-1 (O3) 2.49 seconds
> > harter-2 (O2) 2.27 seconds
> > harter-2 (O3) 2.24 seconds
> > io_x (O2) 9.82 seconds
> > nilges (O2) 2.36 seconds
> > nilges (O3) 2.35 seconds
> > thomasson (O2) 1.69 seconds
> > thomasson (O3) 1.68 seconds
> > willem (O2) 2.77 seconds
> > willem (O3) 4.16 seconds [ can this be right?! ]
>
> FWIW, I always try to "prime" the program by making several dummy runs the
> same data set. After that, I time several iterations of it on the same data
> set, and report the average.
Yes .... What my little test program actually does is repeat each
sequence of tests N times (N is a command-line parameter) and report
all N results; my thinking is that if each set of N timings is pretty
consistent (meaning all times are roughly equal -- and they have been),
then the total time should be reasonably meaningful.
> Humm, when I get some time, I think I will
> create another "entry" that uses a non-naive sub-string search algorithm:
>
>
> http://groups.google.com/group/comp.lang.c/msg/825cf5bd46ad4a9f
>
>
> I think it will be interesting to see how it effects your timing results.
>
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/25/2010 9:25:29 AM
|
|
In article <0.d1ae414487ac948ef039.20100223230538GMT.873a0rblj1.fsf@bsb.me.uk>,
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
>
> > In article <0.0ddf9db924489b8a3417.20100222180827GMT.87wry5b0tg.fsf@bsb.me.uk>,
> > Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> <snip>
> >> I used -O3 throughout. I'll post the -O2 with -O2 as well but in my
> >> case I get faster time in all cases using -O3 (gcc 4.4.1).
> >
> > My results posted thus far were with gcc 4.0.1 (and glibc 2.3.5).
> > When I repeat the experiment, on a newer/faster system, with
> > gcc 4.4.1 (and glibc 2.10.1), I also find that -O3 is *almost*
> > always faster (and the one exception -- perhaps I need to rerun
> > the test, in case there was something else going on on the machine
> > that skewed results). Results below.
>
> If that odd -O3 result goes away, it means we can largely ignore
> -O2/-O3 differences. That halves the data.
>
> >> <snip>
> >> I assume these results are for 4004 byte long strings with 2 short
> >> replacements.
> >
> > No, this is with my full test suite (such as it is):
> >
> > 4004-byte input, 2 short replacements (2/2)
> > 4020-byte input, 10 short replacements (2/2)
> > 4100-byte input, 50 short replacements (2/2)
> > 4500-byte input, 50 slightly-longer replacements (10/10)
>
> Ah, I see. I've made up data files with these properties and re-run
> my tests. I've scaled the per call times up by 20,000 (I remember you
> used 20,000 calls) and I've aggregated the times for these four tests
> for all of the methods I can time.
>
> <snip>
> > (Differences in input length are because I was lazy in coding
> > Results with gcc 4.4.1, as described above .... :
> >
> > bacarisse (O2) 1.74 seconds
> > bacarisse (O3) 1.75 seconds
> > blmblm (O2) 2.52 seconds
> > blmblm (O3) 2.48 seconds
> > harter-1 (O2) 2.58 seconds
> > harter-1 (O3) 2.49 seconds
> > harter-2 (O2) 2.27 seconds
> > harter-2 (O3) 2.24 seconds
> > io_x (O2) 9.82 seconds
> > nilges (O2) 2.36 seconds
> > nilges (O3) 2.35 seconds
> > thomasson (O2) 1.69 seconds
> > thomasson (O3) 1.68 seconds
> > willem (O2) 2.77 seconds
> > willem (O3) 4.16 seconds [ can this be right?! ]
>
> Looks wacky to me! Is it repeatable?
*Yes*. Weird, isn't it?! And I don't observe anything like that
on the older/slower system.
> Here are my times (also gcc 4.4.1 and libc 2.10.1). I seem to have a
> faster machine. The first number are your times (for reference) and
> the second are mine (in seconds). The third column is the ratio of
> the two. You can see that there is more going on than just the speed
> of the machine.
Indeed. Should we compare hardware configurations? What would be
relevant to include? information that in Linux can be found by
listing /proc/cpuinfo?
> bacarisse (O2) 1.74 0.426 4.08
> bacarisse (O3) 1.75 0.400 4.38
> blmblm (O2) 2.52 0.540 4.67
> blmblm (O3) 2.48 0.501 4.95
> harter-1 (O2) 2.58 0.857 3.01
> harter-1 (O3) 2.49 0.803 3.10
> harter-2 (O2) 2.27 0.780 2.91
> harter-2 (O3) 2.24 0.722 3.10
> io_x No data
> nilges (O2) 2.36 0.881 2.68
> nilges (O3) 2.35 0.861 2.73
> thomasson (O2) 1.69 0.380 4.45
> thomasson (O3) 1.68 0.364 4.62
> willem (O2) 2.77 0.813 3.41
> willem (O3) 4.16 0.885 4.70
>
> If we are now measuring the same things, it seems that some code is
> favoured by my system (yours for example) and some does not do so
> well. I suspect interactions with the various caches but that is a
> huge guess.
Oh, caching is almost always a good guess ....
> My other tests show that the rank can be switched round a lot by using
> shorter strings. To get a more balanced view, the tests would have to
> range over various string lengths, but I doubt that there can be any
> idea of the "best" way to do this based on speed. Clear simple code
> will always win out for me in these situations, despite my love of
> generating (ultimately pointless) data like this.
I *am* getting the impression that you also might be having too much
fun with this little problem. :-)
Agreed, anyway, that in almost all circumstances clear simple code is
to be preferred. Indeed -- my original naive solution is almost as
fast as the "improved" version, as long as I use the C-library versions
of the string.h functions, and it took less time to write and debug
than the more complicated solution.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/25/2010 9:26:13 AM
|
|
In article <bdac8263-e941-4955-9875-44d4ff741943@k18g2000prf.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 23, 10:38 pm, blm...@myrealbox.com <blm...@myrealbox.com>
> wrote:
> > In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegroups.com>,
> >
> > spinoza1111 <spinoza1...@yahoo.com> wrote:
> > > On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
> >
> > > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
> >
> > [ snip ]
> >
> > > And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> > > STRINGS. If you use it to construct a table of replace points you're
> > > gonna have an interesting bug-o-rama:
> >
> > > replace("banana", "ana", "ono")
> >
> > > IF you restart one position after the find point, and not at its end.
> >
> > Why would you do that, though? only if you *wanted* to detect
>
> Search me. But that's what the code I was discussing actually did.
What code is that? I've traced back through predecessor posts, and
the only one that comes close to including code is the one in which
Chris Thomasson references his code in
http://clc.pastebin.com/f62504e4c
which on a quick skim doesn't seem to me to be looking for
overlapping strings.
> > overlapping strings, and -- if you did detect them, what would
> > you do with them? I can't think of any sensible definition of
> > "replace" that does anything with overlapping strings [*], so
>
> replace(banana, ana, ono) could equal
>
> bonona going left to right without overlap
> banono going right to left without overlap
> bonono going both ways with overlap
There's a semi-sane answer here in the last case, but isn't
that because there are some factors at work that won't be
generally true? What about replace(banana, ana, xno)?
Should that be bxnono or bxnxno?
> The third case could arise in natural language processing.
>
> Suppose that in some language, the ana sound is transformed into the
> ono sound to transform present into past tense (weirder things
> happen), and suppose speakers do this to ALL occurences of the three
> tones a, voiced n, a. When the sounds are adjacent they are
> nonetheless distinct in speech but not in writing.
>
> Now, the response of most garden-variety "break room" programmers is
> "that's bullshit, and can never happen". But we know that in
> programming, many strange things can happen, and that as Hamlet
> admonished Horatio, we must "as a stranger give it welcome". Many more
> strange things can happen outside programming, and programmers, even
> of the Hooters or break room ilk, better realize this when programming
> is used to solve problems.
"Whatever." I'm not convinced yet that it's possible to come
up with a sensible specification for what it means to replace
overlapping occurrences of selected text. Absent such a
specification -- eh, whatever.
> > when I wrote my first solution to this problem, I of course used
> > strstr and of course started scanning for the next match after
> > the end of the previous match.
> >
> > [*] Chris Thomasson's reply to your post points out the ambiguities.
> >
> > > Moral: don't let the library do your thinking for you.
> >
> > Mostly I'm replying to this rather old post, though, because it
> > seems as good a place as any to attempt a more-or-less formal
> > specification of the problem, which I'm not sure we have, and
> > which might be interesting. (Apologies if I missed one somewhere.)
> >
> > Here's my proposed specification, in which "is not a substring of"
> > and "concat" have what I hope are obvious meanings, and names
> > beginning s_ denote strings:
> >
> > replace(s_input, s_old, s_new) yields
> >
> > if s_old is not a substring of s_input
> >
> > s_input
> >
> > else
> >
> > concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
> >
> > where s_input_prefix and s_input_tail are such that
> >
> > s_input = concat(s_input_prefix, s_old, s_input_tail)
> >
> > and
> >
> > s_old is not a substring of s_input_prefix
>
> This is fine as long as we understand your concat as NOT specifying
> left to right or right to left direction.
How could it imply a direction? String concatenation seems to me to
be a pretty straightforward and well-defined operation, which could
even be written as an associative binary operator, no?
> The bonono problem would
> have to be handled by preprocessing (translating banana into banaana,
> perhaps using a rule that "vowels" (in the language) can always be
> duplicated because their sounds don't break.
Well, now you seem to moving in a direction that might eventually
lead to a sensible specification. "Carry on", maybe.
> > Somewhat more formal than the "scans left to right and ignores
> > overlapping strings", though whether it's actually clearer might
> > be a matter of opinion.
> >
> > And then perhaps we could replace the "does not use string.h"
> > constraint with something more meaningful [*], though I'm not
> > sure what.
>
> How about "does not use string.H and gets rid of the Obscene
> Excrescence: the use of Nul to terminate a string, thereby creating a
> new C that is almost fit for use".
Well, even my first naive solution to this problem didn't use
string.H .... But we've already had the discussion about case
sensitivity, so no need to do that again.
In my opinion the functions declared in string.h include some
that are very useful in writing a replace() function as specified
here -- I think of them as useful abstractions for the problem
domain. Could one define similar functions if strings were *not*
represented as null-terminated contiguous sequences of characters ....
Well, certainly one could, but whether they'd be unacceptably
inefficient might depend on how strings were represented.
C's approach to representing strings allows one to have multiple
strings in a single character array and to easily regard a suffix
of a string as a string in its own right, both of which strike me
as useful in context. One could (I think) get a similar effect
by defining strings to be objects consisting of a length and a
pointer into an array of characters. If strings were represented
as a length immediately followed by a sequence of characters --
not so much.
I'm not sure I'm explaining this very well or thinking it through
carefully, but perhaps it will advance the discussion a bit anyway.
> > [*] My objection to this constraint is that any minimally competent
> > programmer should be able to write functions that implement the
> > same API, so just avoiding use of the library functions doesn't
> > seem to me to make the problem more interesting.
>
> No. The API locks us into bad thoughts.
I'd say "how so?" but I'm not optimistic about getting an answer
I'd find useful.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/25/2010 9:29:45 AM
|
|
On 23 Feb, 14:38, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegroups=
..com>,
> spinoza1111 =A0<spinoza1...@yahoo.com> wrote:
> > And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> > STRINGS. If you use it to construct a table of replace points you're
> > gonna have an interesting bug-o-rama:
>
> > replace("banana", "ana", "ono")
>
> > IF you restart one position after the find point, and not at its end.
>
> Why would you do that, though? =A0only if you *wanted* to detect
> overlapping strings, and -- if you did detect them, what would
> you do with them? =A0I can't think of any sensible definition of
> "replace" that does anything with overlapping strings [*], so
> when I wrote my first solution to this problem, I of course used
> strstr and of course started scanning for the next match after
> the end of the previous match.
>
> [*] Chris Thomasson's reply to your post points out the ambiguities.
>
> > Moral: don't let the library do your thinking for you.
sometimes a sensible behaviour drops out quite neatly.
I used to think that was a sign of good code if the naive code also
covered the "corner-cases" naturally.
> Mostly I'm replying to this rather old post, though, because it
> seems as good a place as any to attempt a more-or-less formal
> specification of the problem, which I'm not sure we have, and
> which might be interesting. =A0(Apologies if I missed one somewhere.)
well I missed it to! A programming competition seemed to break out
without it being clear (to me at least) what the program was supposed
to do. By the time it had been sort of clarified I'd seen a fair
amount of code.
I'm just sorry I didn't think of doing it recursivly! That was
impressive.
> Here's my proposed specification, in which "is not a substring of"
> and "concat" have what I hope are obvious meanings, and names
> beginning s_ denote strings:
>
> replace(s_input, s_old, s_new) yields
>
> if s_old is not a substring of s_input
>
> =A0 s_input
>
> else
>
> =A0 concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
>
> =A0 where s_input_prefix and s_input_tail are such that
>
> =A0 =A0 s_input =3D concat(s_input_prefix, s_old, s_input_tail)
>
> =A0 and
>
> =A0 =A0 =A0s_old is not a substring of s_input_prefix
>
> Somewhat more formal than the "scans left to right and ignores
> overlapping strings", though whether it's actually clearer might
> be a matter of opinion.
and of course like most formal specifications it's damn nearly code!
Though you do get a better class of primitive.
> And then perhaps we could replace the "does not use string.h"
> constraint with something more meaningful [*], though I'm not
> sure what.
>
> [*] My objection to this constraint is that any minimally competent
> programmer should be able to write functions that implement the
> same API, so just avoiding use of the library functions doesn't
> seem to me to make the problem more interesting.
there were some proposals for better APIs, ones that avoided
repeatedly rescanning the string. The virtue of using string.h is that
you get some abstraction. Pages of inlined string.h-like functions is
hard to comprehend, hard to debug and hard to test.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/25/2010 10:47:32 AM
|
|
On 23 Feb, 21:20, ralt...@xs4all.nl (Richard Bos) wrote:
> Nick Keighley <nick_keighley_nos...@hotmail.com> wrote:
> > On 17 Feb, 21:14, Richard Heathfield <r...@see.sig.invalid> wrote:
> > > The scanf function is basically a mess, and is rarely used correctly.=
I
> > > am at a loss to understand why it is introduced so early in programmi=
ng
> > > texts.
>
> > a former clc regular once posted
>
> > ***
> > The fscanf equivalent of fgets is so simple
> > that it can be used inline whenever needed:-
> > =A0 =A0 char s[NN + 1] =3D "", c;
> > =A0 =A0 int rc =3D fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
> > =A0 =A0 if (rc =3D=3D 1) fscanf("%*[^\n]%*c);
> > =A0 =A0 if (rc =3D=3D 0) getc(fp);
> > ***
>
> That sounds suspiciously like either Dan Pop, who had a bee in his
> bonnet regarding scanf() (amongst others), or like someone employing
> irony at said Mr. Pop.
it was Mr.Pop. Perhaps being ironical himself?
Oh, and there's at least one bug in the above (most likely inserted by
me).
> > I think scanf() is seen as a straight forward way to read simple
> > unvalidated input. I'm not convinced that's a good idea.
>
> It's a very bad idea. sscanf() can be a reasonable way to read simple
> _validated_ input. Unvalidated, none of that family is useable.
yes but if they're at the "write a program to convert farenheit to
centigrade" scanf() might be just good enough. I think giving them a
read_number() primitive would be a better idea though.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/25/2010 11:04:15 AM
|
|
blmblm@myrealbox.com wrote:
<snip>
> "Whatever." [*]
>
> [*] Normally I would write <shrug> here, but that usage now has
> unfortunate associations.
HUH? I don't think I got that email. Could you explain further?
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
|
|
0
|
|
|
|
Reply
|
rjh (10789)
|
2/25/2010 11:42:41 AM
|
|
On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> As best I can tell, he doesn't read anything you're posting here
> unless someone else quotes it -- or at least he doesn't reply,
> so your taunting .... ah well, I suppose now that I've quoted
> you he may choose to reply. Or not.
Exactly. He's plonked, because that way I see only the funny parts.
> I would be very surprised to hear that anyone who talks about
> code as he does would *not* be aware of this particular trick.
> Inventing it for oneself might require "thinking outside the box";
> being aware that it exists -- not so much.
Yeah. I still remember:
i << 3 + i + i
but that doesn't mean I'd ever use it.
It appears that he meant, not matrix multiplication, but multiplication of
every element in a matrix by a fixed value. For some reason, he seems to
think that multiplying everything in a matrix by a compile-time constant is
a likely operation. For some other reason, probably a less plausible one,
he seems to think that the shift-as-multiply trick actually buys you
something.
>> No, I just want to multiply every element, no matter what its value,
>> by a power of two:
> *But this is not matrix multiplication* -- or at least, not as I
> define the term. Curiously enough, however, Wikipedia seems to
> agree with you that "matrix multiplication" can mean multiplying
> a matrix by either another matrix or a scalar. So my attempt to
> nitpick here may be off-base. Of course, elsethread you don't
> seem to find Wikipedia a credible source. "Whatever."
Interesting. I've never heard the term used that way.
> gcc seems to be reasonably smart about generating shifts rather
> than multiplies if n is something that's known at compile time,
> even though (as far as I know) one must use the "pow" function
> to express exponentiation in C.
Yup.
>> I'm no big fan of these micro-optimizations, however, because I prefer
>> a nobler goal, which is the complete destruction of the very idea of
>> terminating a string with a Nul.
And I have to say, your code certainly did a great job of undermining the
idea that strings were predictably null-terminated. At least in your code.
>> Basically, I was messing with Seeb's
>> head.
....
Wait, does this imply that he thinks the << thing is some kind of secret
that not everyone knows? As an idle curiousity, I've asked my coworkers
to see whether any of them *haven't* seen that.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/25/2010 3:35:01 PM
|
|
On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> Sure. I just thought it might be fun to try to come up with a
> semi-formal specification that *doesn't* involve narratives about
> what the program is doing. I like that sort of thing, but I guess
> not everyone does. "Whatever." [*]
Actually, that's a good point. It's clearer in that it tells you the
answer without requiring you to think it through. Thanks to spinoza1111,
we are now aware that at least some people can implement a solution to
the problem without ever thinking about how it works enough to realize
that there is no question of overlapping strings. Clarifying it explicitly
is probably beneficial.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/25/2010 3:36:28 PM
|
|
On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> Agreed, anyway, that in almost all circumstances clear simple code is
> to be preferred. Indeed -- my original naive solution is almost as
> fast as the "improved" version, as long as I use the C-library versions
> of the string.h functions, and it took less time to write and debug
> than the more complicated solution.
Idle curiousity: How's mine do? I haven't checked to see what the official
interface is, but I'm pretty sure this is adequately obvious. It presumably
suffers from double-scanning, but I don't know how much that matters. It
doesn't do a lot of mallocs.
char *
rep(const char *in, const char *out, const char *target) {
char *result = 0;
const char *s;
char *t, *u;
size_t inlen, outlen, targetlen, resultlen;
int count;
if (!in || !out || !target || !*in)
return 0;
inlen = strlen(in);
outlen = strlen(out);
targetlen = strlen(target);
for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}
resultlen = targetlen + (outlen * count) - (inlen * count);
result = malloc(resultlen + 1);
if (!result)
return result;
u = result;
*u = '\0';
s = target;
for (t = strstr(target, in); t && *t; t = strstr(t, in)) {
memcpy(u, s, t - s);
u += t - s;
memcpy(u, out, outlen + 1);
u += outlen;
t += inlen;
s = t;
}
strcpy(u, s);
#ifdef DEBUG
fprintf(stderr, "replaced %d occurrences, new length %d, actual %d\n",
count, (int) resultlen, (int) strlen(result));
#endif
return result;
}
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/25/2010 3:39:02 PM
|
|
<blmblm@myrealbox.com> ha scritto nel messaggio
news:7uffggFfl7U3@mid.individual.net...
>> > io_x is from
>> > msg-id Message-ID: <4b730781$0$822$4fafbaef@reader5.news.tin.it>
in that message above there is one error
.e: push edi
call _free
.e0: xor eax, eax
has to be written like
..e: push edi
call _free
add esp, 4
..e0: xor eax, eax
i forget to clear the stack;
but why not use this
news:4b7a82ed$0$1109$4fafbaef@reader4.news.tin.it
yes there is a little improve
and the replace routine has different arguments
char* __stdcall Replacer_m(unsigned* len,
char* origin, char* whatSost, char* sost);
but it is possible write
char* replace_iox(char* origin, char* whatSost, char* sost)
{unsigned len;
return Replacer_m(&len, origin, whatSost, sost);
}
it should be always 3 times slowrer, at last, in compare with yours.
But for this time, it is enought for here.
>> > blmblm-$V-$L [1] is from
>> > Message-ID: <7u7ltqF80cU1@mid.individual.net>
>> >
>> > [1] V = version, L = string library (C or user-written)
>> >
>
> thomasson (O2) 4.08 seconds
> thomasson (O3) 4.08 seconds
>
>> bacarisse (O2) 4.47 seconds
>> bacarisse (O3) 4.48 seconds
>>
>>
>> > harter-1 (O2) 5.67 seconds
>> > harter-1 (O3) 8.48 seconds
>> > harter-2 (O2) 5.94 seconds
>> > harter-2 (O3) 6.51 seconds
>> > io_x (O2) 18.05 seconds [1]
>> > io_x (O3) **** seconds (segfault) [2]
>> > nilges (O2) 7.72 seconds
>> > nilges (O3) 7.69 seconds
>> > willem (O2) 7.18 seconds
>> > willem (O3) 7.23 seconds
>> >
>> > [2] These results may be misleading -- I wasn't sure how to
>> > generate an executable from the mix of C and assembly and more or
>> > less tried things until I got a clean compile/link, with:
>> >
>> > nasm -felf -o replacer.o replacer.s
>> > gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
>> > (n=2 or n=3)
>> >
>> > blmblm-1-C (O2) 10.78 seconds
>> > blmblm-1-user (O2) 35.97 seconds
>> > blmblm-2-C (O2) 9.60 seconds
>> > blmblm-2-user (O2) 35.10 seconds
>> > blmblm-3-C (O2) 7.86 seconds
>> > blmblm-3-user (O2) 8.58 seconds
>> >
>> > blmblm-1-C (O3) 10.36 seconds
>> > blmblm-1-user (O3) 37.58 seconds
>> > blmblm-2-C (O3) 9.45 seconds
>> > blmblm-2-user (O3) 32.96 seconds
>> > blmblm-3-C (O3) 7.49 seconds
>> > blmblm-3-user (O3) 7.72 seconds
>> >
>> > --
>> > B. L. Massingill
>> > ObDisclaimer: I don't speak for my employers; they return the favor.
>>
>>
>> --
>> B. L. Massingill
>> ObDisclaimer: I don't speak for my employers; they return the favor.
>
>
> --
> B. L. Massingill
> ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
io_x
|
2/25/2010 3:51:22 PM
|
|
On Feb 25, 11:35=A0pm, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-25, blmblm =A0myrealbox.com <blm...@myrealbox.com> wrote:
>
> > As best I can tell, he doesn't read anything you're posting here
> > unless someone else quotes it -- or at least he doesn't reply,
> > so your taunting .... ah well, I suppose now that I've quoted
> > you he may choose to reply. =A0Or not.
>
> Exactly. =A0He's plonked, because that way I see only the funny parts.
No, you're too cowardly to talk to me. You haven't plonked jackshit.
>
> > I would be very surprised to hear that anyone who talks about
> > code as he does would *not* be aware of this particular trick.
> > Inventing it for oneself might require "thinking outside the box";
> > being aware that it exists -- not so much.
>
> Yeah. =A0I still remember:
> =A0 =A0 =A0 =A0 i << 3 + i + i
> but that doesn't mean I'd ever use it.
>
> It appears that he meant, not matrix multiplication, but multiplication o=
f
> every element in a matrix by a fixed value. =A0For some reason, he seems =
to
> think that multiplying everything in a matrix by a compile-time constant =
is
> a likely operation. =A0For some other reason, probably a less plausible o=
ne,
> he seems to think that the shift-as-multiply trick actually buys you
> something.
Yes, I do, Mr No Compsci.
>
> >> No, I just want to multiply every element, no matter what its value,
> >> by a power of two:
> > *But this is not matrix multiplication* -- or at least, not as I
> > define the term. =A0Curiously enough, however, Wikipedia seems to
> > agree with you that "matrix multiplication" can mean multiplying
> > a matrix by either another matrix or a scalar. =A0So my attempt to
> > nitpick here may be off-base. =A0Of course, elsethread you don't
> > seem to find Wikipedia a credible source. =A0"Whatever."
>
> Interesting. =A0I've never heard the term used that way.
Your lack of intellectual curiosity isn't an argument.
>
> > gcc seems to be reasonably smart about generating shifts rather
> > than multiplies if n is something that's known at compile time,
> > even though (as far as I know) one must use the "pow" function
> > to express exponentiation in C.
>
> Yup.
>
> >> I'm no big fan of these micro-optimizations, however, because I prefer
> >> a nobler goal, which is the complete destruction of the very idea of
> >> terminating a string with a Nul.
>
> And I have to say, your code certainly did a great job of undermining the
> idea that strings were predictably null-terminated. =A0At least in your c=
ode.
>
> >> Basically, I was messing with Seeb's
> >> head.
>
> ...
>
> Wait, does this imply that he thinks the << thing is some kind of secret
> that not everyone knows? =A0As an idle curiousity, I've asked my coworker=
s
> to see whether any of them *haven't* seen that.
Sure, if you ask them. But you weren't able to answer my initial
question, where you had to call a possibility to mind. Which argues
against your debugging skills.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/25/2010 7:02:53 PM
|
|
On Feb 25, 5:29=A0pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> In article <bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegroups=
..com>,
>
>
>
>
>
> spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> > On Feb 23, 10:38 pm, blm...@myrealbox.com <blm...@myrealbox.com>
> > wrote:
> > > In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegr=
oups.com>,
>
> > >spinoza1111<spinoza1...@yahoo.com> wrote:
> > > > On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > > > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
>
> > > > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
>
> > > [ snip ]
>
> > > > And note that "using strstr" has its own dangers. IT FINDS OVERLAPP=
ING
> > > > STRINGS. If you use it to construct a table of replace points you'r=
e
> > > > gonna have an interesting bug-o-rama:
>
> > > > replace("banana", "ana", "ono")
>
> > > > IF you restart one position after the find point, and not at its en=
d.
>
> > > Why would you do that, though? =A0only if you *wanted* to detect
>
> > Search me. But that's what the code I was discussing actually did.
>
> What code is that? =A0I've traced back through predecessor posts, and
> the only one that comes close to including code is the one in which
> Chris Thomasson references his code in
>
> http://clc.pastebin.com/f62504e4c
>
> which on a quick skim doesn't seem to me to be looking for
> overlapping strings.
My code handled string overlap after the bug was pointed out to me,
BEFORE any other code. I'm too sick of the shit that goes on here to
make a collection of all solutions and find what probably are many
failures, but one of my contributions was to pass on the test case.
There's a lot of claims and counterclaims here and at least two
discussants are complete shitheads. However, we KNOW that other
posters used the test suite I created AFTER my code worked with that
test data.
>
> > > overlapping strings, and -- if you did detect them, what would
> > > you do with them? =A0I can't think of any sensible definition of
> > > "replace" that does anything with overlapping strings [*], so
>
> > replace(banana, ana, ono) could equal
>
> > bonona going left to right without overlap
> > banono going right to left without overlap
> > bonono going both ways with overlap
>
> There's a semi-sane answer here in the last case, but isn't
HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
collegial, and it is libel and completely insensitive. It's talking
like those thugs and shitheads here, Seebach and Heathfield.
My late uncle was a personal physician of Lyndon Johnson who was
misdiagnosed as having a mental disorder and died alone in a Veteran's
Administration hospital.
I saw John Nash mistreated at Princeton when he was almost fully
recovered, and I was assigned to him when other programmers refused to
work with him. I saw his points dismissed by an arrogant John Horton
Conway at the John von Neumann center in 1989. When he won the Nobel
prize, of course, people clamored to kiss his ass, but it was a very
small group of people who DON'T reason, as Seebach seems to, from
quirks! Who supported him during his dark years! These were not the
sort of people that call ideas "insane", any more than they call
people "insane".
HOW.DARE.YOU.
> that because there are some factors at work that won't be
> generally true? =A0What about replace(banana, ana, xno)? =A0
> Should that be bxnono or bxnxno?
>
The fact that there is a group of answers does not make the question a
question of a crazy man! In fact, it makes it a good scientific
question, albeit over the heads of the creeps here.
Jesus H. Fuck, I'm glad I no longer have to work with programmers!
> > The third case could arise in natural language processing.
>
> > Suppose that in some language, the ana sound is transformed into the
> > ono sound to transform present into past tense (weirder things
> > happen), and suppose speakers do this to ALL occurences of the three
> > tones a, voiced n, a. When the sounds are adjacent they are
> > nonetheless distinct in speech but not in writing.
>
> > Now, the response of most garden-variety "break room" programmers is
> > "that's bullshit, and can never happen". But we know that in
> > programming, many strange things can happen, and that as Hamlet
> > admonished Horatio, we must "as a stranger give it welcome". Many more
> > strange things can happen outside programming, and programmers, even
> > of the Hooters or break room ilk, better realize this when programming
> > is used to solve problems.
>
> "Whatever." =A0I'm not convinced yet that it's possible to come
> up with a sensible specification for what it means to replace
> overlapping occurrences of selected text. =A0Absent such a
> specification -- eh, whatever. =A0
I gave it to you: a hypothetical but possible natural language in
which adjacent lexemes must be split and modified. And what's this
"whatever"?
>
>
>
>
>
> > > when I wrote my first solution to this problem, I of course used
> > > strstr and of course started scanning for the next match after
> > > the end of the previous match.
>
> > > [*] Chris Thomasson's reply to your post points out the ambiguities.
>
> > > > Moral: don't let the library do your thinking for you.
>
> > > Mostly I'm replying to this rather old post, though, because it
> > > seems as good a place as any to attempt a more-or-less formal
> > > specification of the problem, which I'm not sure we have, and
> > > which might be interesting. =A0(Apologies if I missed one somewhere.)
>
> > > Here's my proposed specification, in which "is not a substring of"
> > > and "concat" have what I hope are obvious meanings, and names
> > > beginning s_ denote strings:
>
> > > replace(s_input, s_old, s_new) yields
>
> > > if s_old is not a substring of s_input
>
> > > =A0 s_input
>
> > > else
>
> > > =A0 concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new)=
)
>
> > > =A0 where s_input_prefix and s_input_tail are such that
>
> > > =A0 =A0 s_input =3D concat(s_input_prefix, s_old, s_input_tail)
>
> > > =A0 and
>
> > > =A0 =A0 =A0s_old is not a substring of s_input_prefix
>
> > This is fine as long as we understand your concat as NOT specifying
> > left to right or right to left direction.
>
> How could it imply a direction? =A0String concatenation seems to me to
> be a pretty straightforward and well-defined operation, which could
> even be written as an associative binary operator, no? =A0
Well, my greater experience with object oriented development in C
Sharp and VB has taught me that given either an adequate OO language,
or sufficient intelligence and patience, concat can work either way
without much drama. In C, the direction has to be a crummy parameter
that is easy to get wrong.
>
> > The bonono problem would
> > have to be handled by preprocessing (translating banana into banaana,
> > perhaps using a rule that "vowels" (in the language) can always be
> > duplicated because their sounds don't break.
>
> Well, now you seem to moving in a direction that might eventually
> lead to a sensible specification. =A0"Carry on", maybe.
Don't patronize me.
>
> > > Somewhat more formal than the "scans left to right and ignores
> > > overlapping strings", though whether it's actually clearer might
> > > be a matter of opinion.
>
> > > And then perhaps we could replace the "does not use string.h"
> > > constraint with something more meaningful [*], though I'm not
> > > sure what.
>
> > How about "does not use string.H and gets rid of the Obscene
> > Excrescence: the use of Nul to terminate a string, thereby creating a
> > new C that is almost fit for use".
>
> Well, even my first naive solution to this problem didn't use
> string.H .... =A0But we've already had the discussion about case
> sensitivity, so no need to do that again.
>
> In my opinion the functions declared in string.h include some
> that are very useful in writing a replace() function as specified
> here -- I think of them as useful abstractions for the problem
> domain. =A0Could one define similar functions if strings were *not*
> represented as null-terminated contiguous sequences of characters ....
>
> Well, certainly one could, but whether they'd be unacceptably
> inefficient might depend on how strings were represented.
> C's approach to representing strings allows one to have multiple
> strings in a single character array and to easily regard a suffix
> of a string as a string in its own right, both of which strike me
> as useful in context. =A0One could (I think) get a similar effect
> by defining strings to be objects consisting of a length and a
> pointer into an array of characters. =A0If strings were represented
> as a length immediately followed by a sequence of characters --
> not so much.
>
> I'm not sure I'm explaining this very well or thinking it through
> carefully, but perhaps it will advance the discussion a bit anyway.
>
> > > [*] My objection to this constraint is that any minimally competent
> > > programmer should be able to write functions that implement the
> > > same API, so just avoiding use of the library functions doesn't
> > > seem to me to make the problem more interesting.
>
> > No. The API locks us into bad thoughts.
>
> I'd say "how so?" but I'm not optimistic about getting an answer
> I'd find useful. =A0
I've already told you. Strings terminated ON THE RIGHT with a Nul is a
bad thought for two reasons:
* It prevents Nul from occuring in a string
* It mandates Eurocentric left to right processing
My respect o meter in your case is diminishing.
As to a length code limiting string length, do the math. 2^63 - 1 or
2^64 - 1 is a big number, and run length codes can be used, especially
in OO programming. Furthermore, even if the string is longer, you can
still process it with an unknown length, which OO programming handles
quite nicely.
>
> --
> B. L. Massingill
> ObDisclaimer: =A0I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/25/2010 7:23:31 PM
|
|
On 25/02/2010 19:23, spinoza1111 wrote:
> On Feb 25, 5:29 pm, blm...@myrealbox.com<blm...@myrealbox.com> wrote:
>> In article<bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegroups.com>,
[snip]
>> There's a semi-sane answer here in the last case, but isn't
>
> HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
> collegial, and it is libel and completely insensitive. It's talking
> like those thugs and shitheads here, Seebach and Heathfield.
Hmmm, I was wondering how long it would take. You oil up to women
posters here until they find you out and then you lose it with them too.
It becomes clearer and clearer how Richard Heathfield's list so aptly
describes you.
I hope you also realise that by screaming that you want Mr Heathfield
"out of this ng", IOW by appointing yourself as clc monitor, you become
the chief "reg", and your brown-nosers "Richard", "Kenny", and Twinky
become your co-conspirators.
--
Tim
"That the freedom of speech and debates or proceedings in Parliament
ought not to be impeached or questioned in any court or place out of
Parliament"
Bill of Rights 1689
|
|
0
|
|
|
|
Reply
|
timstreater (943)
|
2/25/2010 7:39:27 PM
|
|
On 2010-02-25, Tim Streater <timstreater@waitrose.com> wrote:
> Hmmm, I was wondering how long it would take. You oil up to women
> posters here until they find you out and then you lose it with them too.
> It becomes clearer and clearer how Richard Heathfield's list so aptly
> describes you.
More generally, I don't recall ever seeing "how dare you" not be coming from
someone who was consistently and demonstrably more abusive than the people
he/she/it was complaining about.
This gets back to the general question of heuristics in evaluating code or
coders. There are things that tell you that Something Is Wrong. They're
not totally reliable, but they're good bets.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/25/2010 7:39:37 PM
|
|
Seebs <usenet-nospam@seebs.net> writes:
> On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
>> Agreed, anyway, that in almost all circumstances clear simple code is
>> to be preferred. Indeed -- my original naive solution is almost as
>> fast as the "improved" version, as long as I use the C-library versions
>> of the string.h functions, and it took less time to write and debug
>> than the more complicated solution.
>
> Idle curiousity: How's mine do? I haven't checked to see what the official
> interface is, but I'm pretty sure this is adequately obvious. It presumably
> suffers from double-scanning, but I don't know how much that matters. It
> doesn't do a lot of mallocs.
It all depends on the length of the string and probably on the quality
of the library. Yours is ps_replace. I put it next to my first one,
bb_replace, because they are structurally the same (as is Willem's
first which I have not timed since I expected it to be the same as
mine).
replace(`cat d4004`, "[]", "xx"):
blm_replace 4086300 calls in 4.998s is 1.223e-06 s/call 1.223 µs/call
rh_replace2 647200 calls in 5.000s is 7.726e-06 s/call 7.726 µs/call
cmt_replace 5229000 calls in 5.000s is 9.562e-07 s/call 956.2 ns/call
w_replace 617700 calls in 5.001s is 8.096e-06 s/call 8.096 µs/call
en_replace 627100 calls in 4.999s is 7.972e-06 s/call 7.972 µs/call
fast_replace 1424600 calls in 5.000s is 3.51e-06 s/call 3.51 µs/call
bb_replace 3141500 calls in 5.000s is 1.592e-06 s/call 1.592 µs/call
ps_replace 3138400 calls in 5.000s is 1.593e-06 s/call 1.593 µs/call
replace(`cat d4004`, "{}", "xx"):
blm_replace 4630500 calls in 4.999s is 1.08e-06 s/call 1.08 µs/call
rh_replace2 648700 calls in 5.001s is 7.709e-06 s/call 7.709 µs/call
cmt_replace 5258400 calls in 4.999s is 9.507e-07 s/call 950.7 ns/call
w_replace 620100 calls in 5.001s is 8.064e-06 s/call 8.064 µs/call
en_replace 640500 calls in 4.999s is 7.806e-06 s/call 7.806 µs/call
fast_replace 1300000 calls in 5.000s is 3.846e-06 s/call 3.846 µs/call
bb_replace 2985300 calls in 5.000s is 1.675e-06 s/call 1.675 µs/call
ps_replace 2991000 calls in 5.000s is 1.672e-06 s/call 1.672 µs/call
replace(`cat wap.txt`, "and", "xx"):
blm_replace 200 calls in 5.143s is 0.02571 s/call 25.71 ms/call
rh_replace2 500 calls in 5.540s is 0.01108 s/call 11.08 ms/call
cmt_replace 200 calls in 4.537s is 0.02269 s/call 22.69 ms/call
w_replace 500 calls in 5.205s is 0.01041 s/call 10.41 ms/call
en_replace 400 calls in 4.612s is 0.01153 s/call 11.53 ms/call
fast_replace 600 calls in 5.214s is 0.008689 s/call 8.689 ms/call
bb_replace 200 calls in 8.728s is 0.04364 s/call 43.64 ms/call
ps_replace 100 calls in 4.397s is 0.04397 s/call 43.97 ms/call
replace(`cat wap.txt`, "ZZZ", "xx"):
blm_replace 300 calls in 6.698s is 0.02233 s/call 22.33 ms/call
rh_replace2 700 calls in 4.446s is 0.006351 s/call 6.351 ms/call
cmt_replace 300 calls in 6.549s is 0.02183 s/call 21.83 ms/call
w_replace 700 calls in 4.673s is 0.006675 s/call 6.675 ms/call
en_replace 800 calls in 5.167s is 0.006459 s/call 6.459 ms/call
fast_replace 1100 calls in 4.465s is 0.004059 s/call 4.059 ms/call
bb_replace 200 calls in 8.509s is 0.04255 s/call 42.55 ms/call
ps_replace 200 calls in 8.541s is 0.04271 s/call 42.71 ms/call
replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
blm_replace 7714500 calls in 4.999s is 6.48e-07 s/call 648 ns/call
rh_replace2 31172100 calls in 5.000s is 1.604e-07 s/call 160.4 ns/call
cmt_replace 8842600 calls in 5.000s is 5.654e-07 s/call 565.4 ns/call
w_replace 26458800 calls in 5.000s is 1.89e-07 s/call 189 ns/call
en_replace 12170600 calls in 5.000s is 4.108e-07 s/call 410.8 ns/call
fast_replace 22489800 calls in 5.000s is 2.223e-07 s/call 222.3 ns/call
bb_replace 6505100 calls in 5.000s is 7.686e-07 s/call 768.6 ns/call
ps_replace 6331000 calls in 5.000s is 7.898e-07 s/call 789.8 ns/call
replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
blm_replace 43166400 calls in 4.999s is 1.158e-07 s/call 115.8 ns/call
rh_replace2 43100600 calls in 5.000s is 1.16e-07 s/call 116 ns/call
cmt_replace 22713000 calls in 5.000s is 2.201e-07 s/call 220.1 ns/call
w_replace 53059200 calls in 5.000s is 9.423e-08 s/call 94.23 ns/call
en_replace 35975500 calls in 5.000s is 1.39e-07 s/call 139 ns/call
fast_replace 49671300 calls in 5.000s is 1.007e-07 s/call 100.7 ns/call
bb_replace 31664100 calls in 5.000s is 1.579e-07 s/call 157.9 ns/call
ps_replace 31432800 calls in 5.000s is 1.591e-07 s/call 159.1 ns/call
The tests are explained elsewhere, but I will repeat the details if
you like. The first two are modelled on B L Massingill's test data
(for comparison).
<snip code>
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/25/2010 11:39:24 PM
|
|
On Feb 26, 3:39=A0am, Tim Streater <timstrea...@waitrose.com> wrote:
> On 25/02/2010 19:23,spinoza1111wrote:
>
> > On Feb 25, 5:29 pm, blm...@myrealbox.com<blm...@myrealbox.com> =A0wrote=
:
> >> In article<bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegrou=
ps.com>,
>
> [snip]
>
> >> There's a semi-sane answer here in the last case, but isn't
>
> > HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
> > collegial, and it is libel and completely insensitive. It's talking
> > like those thugs and shitheads here, Seebach and Heathfield.
>
> Hmmm, I was wondering how long it would take. You oil up to women
> posters here until they find you out and then you lose it with them too.
> It becomes clearer and clearer how Richard Heathfield's list so aptly
> describes you.
No, I treat all posters, male or female, with decency and respect
until they foolishly or ignorantly or thoughtlessly use their relative
anonymity to start introducing off-topic personal abuse.
You see, it's been impossible for anyone to narrate this type of
relation, since the 1960s, as anything but the weakness they feel in
themselves.
I don't give a flying fuck about blm's gender, and I ain't lookin' for
a date. Are you? Had she been a male, acting with relative restraint
and collegiality as she has, I would have said the same thing. Had
this male then started in, calling good questions "insane" (as John
Conway treated Nash) I would have responded in exactly the same way.
Men have learned self-protectively to narrate lives without honor, but
I missed the class.
>
> I hope you also realise that by screaming that you want Mr Heathfield
> "out of this ng", IOW by appointing yourself as clc monitor, you become
> the chief "reg", and your brown-nosers "Richard", "Kenny", and Twinky
> become your co-conspirators.
Gee, I have fans? I wasn't aware of that. I don't think Richard, Kenny
or da Twink count. I think they are instead independent spirits who
think for themselves.
>
> --
> Tim
>
> "That the freedom of speech and debates or proceedings in Parliament
> ought not to be impeached or questioned in any court or place out of
> Parliament"
>
> Bill of Rights 1689
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/26/2010 3:28:07 AM
|
|
On 25 Feb, 19:23, spinoza1111 <spinoza1...@yahoo.com> wrote:
> On Feb 25, 5:29=A0pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > In article <bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegrou=
ps.com>,
<snip>
> > > replace(banana, ana, ono) could equal
>
> > > bonona going left to right without overlap
> > > banono going right to left without overlap
> > > bonono going both ways with overlap
>
> > There's a semi-sane answer here in the last case, but isn't
>
> HOW.DARE.YOU. How DARE you start talking about sanity?
this is talking about the /solution/ not the person. Just as when I
referred to another posters solution to a problemn as "value based" I
wasn't validating his morals and ethics. A sane solution is one that
isn't crazy.
Are you paranoid?
<snip naming dropping rant>
<snip expletive laden stuff>
> > > No. The API locks us into bad thoughts.
I could sware you told me there were no bad books. And yet there are
bad thoughts? double plus ungood.
> > I'd say "how so?" but I'm not optimistic about getting an answer
> > I'd find useful. =A0
>
> I've already told you. Strings terminated ON THE RIGHT with a Nul is a
> bad thought for two reasons:
>
> * =A0It prevents Nul from occuring in a string
a count preceeded string is bad because it limits the maximum size of
the string. I suppose a chain of blocks doesn't have this limitation.
> * =A0It mandates Eurocentric left to right processing
nope. You are confusing representation and presentation. The nul isn't
at the right hand end it's at the largest address. If the display
device chooses to print r-to-l instead of l-to-r it makes not he
blindest bit of difference.
[...]
> As to a length code limiting string length, do the math. 2^63 - 1 or
> 2^64 - 1 is a big number, and run length codes can be used, especially
> in OO programming. Furthermore, even if the string is longer, you can
> still process it with an unknown length, which OO programming handles
> quite nicely.
oh, 2^64 is a vast number (16x10^18 or 16 quintillion). It also takes
8 bytes to store.
|
|
0
|
|
|
|
Reply
|
nick_keighley_nospam (4575)
|
2/26/2010 9:15:16 AM
|
|
On Feb 26, 5:15=A0pm, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:
> On 25 Feb, 19:23,spinoza1111<spinoza1...@yahoo.com> wrote:
>
> > On Feb 25, 5:29=A0pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote=
:
> > > In article <bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegr=
oups.com>,
>
> <snip>
>
> > > > replace(banana, ana, ono) could equal
>
> > > > bonona going left to right without overlap
> > > > banono going right to left without overlap
> > > > bonono going both ways with overlap
>
> > > There's a semi-sane answer here in the last case, but isn't
>
> > HOW.DARE.YOU. How DARE you start talking about sanity?
>
> this is talking about the /solution/ not the person. Just as when I
> referred to another posters solution to a problemn as "value based" I
> wasn't validating his morals and ethics. A sane solution is one that
> isn't crazy.
>
> Are you paranoid?
You say you're talking about the "solution", but a spectre is haunting
c.l.c: normalized deviance and the strong possibility that because of
normalized deviance, most people here are batshit, and I'm not, which
could mean the opposite as well.
>
> <snip naming dropping rant>
>
> <snip expletive laden stuff>
>
> > > > No. The API locks us into bad thoughts.
>
> I could sware you told me there were no bad books. And yet there are
> bad thoughts? double plus ungood.
>
Sure. Freedom of speech is completely consistent with criticism of
published thought, but NOT by poseurs. Seebach, in my opinion, is a
poseur. Therefore, my freedom of speech enables me to call him a
poseur.
Your own thought seems to proceed in true regular guy mode: Chomsky 3
substitution of strings, but I can with perfect consistence say that
"because there are no bad books, the only proper critic of a book is
someone at least as well-qualified as the author in a free society: we
need not credit, and should not support, the 'free' expression of
thought by clearly unqualified people. In particular, public sites
such as Amazon should be strictly moderated."
"Once a book has made it to publication, we can be reasonably certain
that it's of minimal quality. But this is not possible in a public
access site. In view of the damage done to people at such sites, they
need to be controlled."
"The only way we can do this is by means of formal certification from
accredited universities, or its equivalent in the form of
certification exams administered by a government or nonprofit agency,
since the corporation is corrupted of necessity by its fiduciary duty
to make a profit before all else."
> > > I'd say "how so?" but I'm not optimistic about getting an answer
> > > I'd find useful. =A0
>
> > I've already told you. Strings terminated ON THE RIGHT with a Nul is a
> > bad thought for two reasons:
>
> > * =A0It prevents Nul from occuring in a string
>
> a count preceeded string is bad because it limits the maximum size of
> the string. I suppose a chain of blocks doesn't have this limitation.
I suppose not. However, the length may be inexpressible if it exceeds
what in C is called "long long" precision. OO systems handle this
cleanly: it gives C the willies.
>
> > * =A0It mandates Eurocentric left to right processing
>
> nope. You are confusing representation and presentation. The nul isn't
> at the right hand end it's at the largest address. If the display
> device chooses to print r-to-l instead of l-to-r it makes not he
> blindest bit of difference.
What on EARTH does the word "print" mean? This is an issue I've raised
elsethread: that programmers, as ancillary and supernumerary people
now that data processing is a cost center, are collectively in
developed countries using a language from a dreamtime.
I'm aware that left to right can be reversed by layering software,
something at which C programmers suck because C sucks at it.
Your solution forces the wogs to get wog devices that print backwards.
However, it still forces developers to think eurocentrically with the
result that the non-Latin output is necessarily a second choice.
>
> [...]
>
> > As to a length code limiting string length, do the math. 2^63 - 1 or
> > 2^64 - 1 is a big number, and run length codes can be used, especially
> > in OO programming. Furthermore, even if the string is longer, you can
> > still process it with an unknown length, which OO programming handles
> > quite nicely.
>
> oh, 2^64 is a vast number (16x10^18 or 16 quintillion). It also takes
> 8 bytes to store.
Boo hoo. Like I said, people are thinking in comic book terms:
Alice: But Dr Nilges, you fool! A long long takes eight bytes to
store!
Nilges: (Evil laugh) I care not! Storage is almost free! I vill use 8
bytes per string and rule zee verld! Nyahh ha ha! Nice jugs! Nyah ha
ha!
Timmy: Mommy I'm scared!
Ruff: Woof woof!
After the waste of spirit in an expense of shame that is normalized
deviance, I'm game for Weird Science. Comic book thinking, like Peter
Seebach's invalid reasoning about quirkiness, is an excellent way of
producing The Usual Crap that we see here.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/26/2010 4:48:26 PM
|
|
On Feb 26, 3:39=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-25, Tim Streater <timstrea...@waitrose.com> wrote:
>
> > Hmmm, I was wondering how long it would take. You oil up to women
> > posters here until they find you out and then you lose it with them too=
..
> > It becomes clearer and clearer how Richard Heathfield's list so aptly
> > describes you.
>
> More generally, I don't recall ever seeing "how dare you" not be coming f=
rom
> someone who was consistently and demonstrably more abusive than the peopl=
e
> he/she/it was complaining about.
Somehow, that doesn't surprise me. No sound ever comes from the gates
of Eden, and my guess is that you grew up in some sort of addictive
and enabling system, one in which nobody ever raised their voices or
spoke truth to power.
A lot of people do.
And sure, you might as well use this factoid against me: some of my
biggest online fans, but not all, are homeless men accessing the
Internet at the public library, along with a striking number of women
of color.
Why don't you use that fact against me? I need a laugh. The fact is
that some of my most brilliant coworkers from Silicon Valley of the
1980s are dying in motels without health insurance, and chances R you
don't give a rat's ass.
>
> This gets back to the general question of heuristics in evaluating code o=
r
> coders. =A0There are things that tell you that Something Is Wrong. =A0The=
y're
> not totally reliable, but they're good bets.
Something is Wrong, all right. It's that you shit on people and pay
your way onto standards boards without any qualifications that are
independently verifiable of whatever makes your current employer a
quick buck. You've used the freedom and anonymity of the Internet to
make yourself seem more qualified than you are.
Furthermore, it's unethical to evaluate PEOPLE based on heuristics.
The only ethical procedure is to send them private email, or respond
to their private email and use this medium (or Skype video
conferencing) to verify what you so pompously call "heuristics", a big
word you don't understand, poser. But you're too much of a sissy,
girlie-man, wimp, coward and boor to do this, we have found, since I
sent you email asking for such a conference.
You could have saved clc a lot of bandwidth, you who so seem to value
machines and foolish programming languages and ideas over people and
their reputations, by installing Skype (it's free) and contacting
Edward.G.Nilges at a mutually agreeable time.
But you prefer, being addicted to a false but threatened self-image of
competence, discoursing with shadows.
More generally, Skype could be used to set up a public videoconference
for people to attend, I believe. I think clc needs this to clear the
air, and if possible, to (1) agree on a better behavioral charter and
(2) agree to disagree on undecidable issues.
But given your refusal to work with McGraw Hill, I don't think you
have the guts for this.
>
> -s
> --
> Copyright 2010, all wrongs reversed. =A0Peter Seebach / usenet-nos...@see=
bs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny picturesht=
tp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/26/2010 5:05:03 PM
|
|
On Feb 26, 5:15=A0pm, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:
> On 25 Feb, 19:23,spinoza1111<spinoza1...@yahoo.com> wrote:
>
> > On Feb 25, 5:29=A0pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote=
:
> > > In article <bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegr=
oups.com>,
>
> <snip>
>
> > > > replace(banana, ana, ono) could equal
>
> > > > bonona going left to right without overlap
> > > > banono going right to left without overlap
> > > > bonono going both ways with overlap
>
> > > There's a semi-sane answer here in the last case, but isn't
>
> > HOW.DARE.YOU. How DARE you start talking about sanity?
>
> this is talking about the /solution/ not the person. Just as when I
> referred to another posters solution to a problemn as "value based" I
> wasn't validating his morals and ethics. A sane solution is one that
> isn't crazy.
>
> Are you paranoid?
>
> <snip naming dropping rant>
>
> <snip expletive laden stuff>
>
> > > > No. The API locks us into bad thoughts.
>
> I could sware you told me there were no bad books. And yet there are
> bad thoughts? double plus ungood.
>
> > > I'd say "how so?" but I'm not optimistic about getting an answer
> > > I'd find useful. =A0
>
> > I've already told you. Strings terminated ON THE RIGHT with a Nul is a
> > bad thought for two reasons:
>
> > * =A0It prevents Nul from occuring in a string
>
> a count preceeded string is bad because it limits the maximum size of
> the string. I suppose a chain of blocks doesn't have this limitation.
>
> > * =A0It mandates Eurocentric left to right processing
>
> nope. You are confusing representation and presentation. The nul isn't
> at the right hand end it's at the largest address. If the display
> device chooses to print r-to-l instead of l-to-r it makes not he
> blindest bit of difference.
>
> [...]
>
> > As to a length code limiting string length, do the math. 2^63 - 1 or
> > 2^64 - 1 is a big number, and run length codes can be used, especially
> > in OO programming. Furthermore, even if the string is longer, you can
> > still process it with an unknown length, which OO programming handles
> > quite nicely.
>
> oh, 2^64 is a vast number (16x10^18 or 16 quintillion). It also takes
> 8 bytes to store.
Nit picking, it takes nine. You're thinking of 2**64-1 at least in
twos complement, which Herb Schildt knows is in common use. But I
understand your point.
Of course, a length of a string is never negative (barring something
Richard Heathfield's code or linked list might do to strings: the mind
boggles) so if it's unsigned long long, then yes, that's 8 bytes,
idn't it.
Nit picking and brain farting merrily along, you could use run length
encoding or something more complicated to reduce the length of the
length to the sum of the length of segments.
In OO code, you can in fact store the length in whatever is
appropriate for the string. You can do the same with unions in C, but
many managers get the willies when programmers code the word union,
har har.
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/26/2010 5:10:44 PM
|
|
blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> In article <0.d1ae414487ac948ef039.20100223230538GMT.873a0rblj1.fsf@bsb.me.uk>,
> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
<snip>
>> Here are my times (also gcc 4.4.1 and libc 2.10.1).
<snip>
> Indeed. Should we compare hardware configurations?
I'll do this by email unless anyone else says they are still
interested in this (the differences in times, rather than the times
themselves).
<snip>
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/26/2010 5:25:19 PM
|
|
In article <bdf96965-f107-4ccf-83b6-64e5982cff38@u20g2000yqu.googlegroups.com>,
Nick Keighley <nick_keighley_nospam@hotmail.com> wrote:
> On 23 Feb, 14:38, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegroups.com>,
> > spinoza1111 <spinoza1...@yahoo.com> wrote:
[ snip ]
> > Mostly I'm replying to this rather old post, though, because it
> > seems as good a place as any to attempt a more-or-less formal
> > specification of the problem, which I'm not sure we have, and
> > which might be interesting. (Apologies if I missed one somewhere.)
>
> well I missed it to! A programming competition seemed to break out
> without it being clear (to me at least) what the program was supposed
> to do. By the time it had been sort of clarified I'd seen a fair
> amount of code.
>
> I'm just sorry I didn't think of doing it recursivly! That was
> impressive.
Hm! All of my solutions have involved recursion in some form, though
perhaps not as elegantly as some of the more-cryptic-to-me solutions
posted.
> > Here's my proposed specification, in which "is not a substring of"
> > and "concat" have what I hope are obvious meanings, and names
> > beginning s_ denote strings:
> >
> > replace(s_input, s_old, s_new) yields
> >
> > if s_old is not a substring of s_input
> >
> > s_input
> >
> > else
> >
> > concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
> >
> > where s_input_prefix and s_input_tail are such that
> >
> > s_input = concat(s_input_prefix, s_old, s_input_tail)
> >
> > and
> >
> > s_old is not a substring of s_input_prefix
> >
> > Somewhat more formal than the "scans left to right and ignores
> > overlapping strings", though whether it's actually clearer might
> > be a matter of opinion.
>
> and of course like most formal specifications it's damn nearly code!
> Though you do get a better class of primitive.
"A better class of primitive"? for code, you mean?
> > And then perhaps we could replace the "does not use string.h"
> > constraint with something more meaningful [*], though I'm not
> > sure what.
> >
> > [*] My objection to this constraint is that any minimally competent
> > programmer should be able to write functions that implement the
> > same API, so just avoiding use of the library functions doesn't
> > seem to me to make the problem more interesting.
>
> there were some proposals for better APIs, ones that avoided
> repeatedly rescanning the string.
?
But here's another question .... Should the question of whether the
string is rescanned even be part of the API? I guess it *could* be;
I mean, I've encountered APIs that make guarantees about performance,
at the big-O level of abstraction.
> The virtue of using string.h is that
> you get some abstraction. Pages of inlined string.h-like functions is
> hard to comprehend, hard to debug and hard to test.
Agreed!
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:49:59 PM
|
|
In article <8JidnTyTXp-x_xvWnZ2dnUVZ7rhi4p2d@bt.com>,
Richard Heathfield <rjh@see.sig.invalid> wrote:
> blmblm@myrealbox.com wrote:
>
> <snip>
>
> > "Whatever." [*]
> >
> > [*] Normally I would write <shrug> here, but that usage now has
> > unfortunate associations.
>
> HUH? I don't think I got that email. Could you explain further?
>
Well, I'd rather not give complete details, but I *thought* that
this usage had been employed by someone I would not want to be
perceived as emulating. But in reviewing the record, I think
I'm probably mistaken about that.
<shrug>
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:51:02 PM
|
|
In article <slrnhod6fh.mbj.usenet-nospam@guild.seebs.net>,
Seebs <usenet-nospam@seebs.net> wrote:
> On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> > As best I can tell, he doesn't read anything you're posting here
> > unless someone else quotes it -- or at least he doesn't reply,
> > so your taunting .... ah well, I suppose now that I've quoted
> > you he may choose to reply. Or not.
>
> Exactly. He's plonked, because that way I see only the funny parts.
Ah, but do you? At least one of the people still interacting with
him (Richard Heathfield) snips non-technical content. I won't say
you're missing anything you'd enjoy, but there's plenty that a person
inclined to perceive and react to insult would -- react to.
> > I would be very surprised to hear that anyone who talks about
> > code as he does would *not* be aware of this particular trick.
> > Inventing it for oneself might require "thinking outside the box";
> > being aware that it exists -- not so much.
>
> Yeah. I still remember:
> i << 3 + i + i
> but that doesn't mean I'd ever use it.
Yipes. Where did you encounter this one? Multiplication by ten,
right?
> It appears that he meant, not matrix multiplication, but multiplication of
> every element in a matrix by a fixed value. For some reason, he seems to
> think that multiplying everything in a matrix by a compile-time constant is
> a likely operation. For some other reason, probably a less plausible one,
> he seems to think that the shift-as-multiply trick actually buys you
> something.
Well, to be fair, aren't there circumstances in which it might?
in which the programmer knows that one of the operands of the
multiply operator is a power of 2, but the compiler wouldn't be
able to detect that? It does seem like the kind of microoptimization
that one would hesitate to do without a compelling reason, though.
> >> No, I just want to multiply every element, no matter what its value,
> >> by a power of two:
>
> > *But this is not matrix multiplication* -- or at least, not as I
> > define the term. Curiously enough, however, Wikipedia seems to
> > agree with you that "matrix multiplication" can mean multiplying
> > a matrix by either another matrix or a scalar. So my attempt to
> > nitpick here may be off-base. Of course, elsethread you don't
> > seem to find Wikipedia a credible source. "Whatever."
>
> Interesting. I've never heard the term used that way.
I hadn't either. Is there where I brandish my educational
credentials (undergrad degree in math)? Nah. I think that's
rather gauche. Besides, that undergrad degree is pretty well-aged
by now, and it sometimes surprises me how much of what I presumably
learned is no longer retrievable from memory.
> > gcc seems to be reasonably smart about generating shifts rather
> > than multiplies if n is something that's known at compile time,
> > even though (as far as I know) one must use the "pow" function
> > to express exponentiation in C.
>
> Yup.
>
> >> I'm no big fan of these micro-optimizations, however, because I prefer
> >> a nobler goal, which is the complete destruction of the very idea of
> >> terminating a string with a Nul.
>
> And I have to say, your code certainly did a great job of undermining the
> idea that strings were predictably null-terminated. At least in your code.
>
> >> Basically, I was messing with Seeb's
> >> head.
>
> ...
>
> Wait, does this imply that he thinks the << thing is some kind of secret
> that not everyone knows? As an idle curiousity, I've asked my coworkers
> to see whether any of them *haven't* seen that.
Well, I interpreted the "messing with [his] head" remark to indicate
that he was trying to taunt you or provoke you in some way. But
I could be mistaken about that.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:53:12 PM
|
|
In article <slrnhod6i8.mbj.usenet-nospam@guild.seebs.net>,
Seebs <usenet-nospam@seebs.net> wrote:
> On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> > Sure. I just thought it might be fun to try to come up with a
> > semi-formal specification that *doesn't* involve narratives about
> > what the program is doing. I like that sort of thing, but I guess
> > not everyone does. "Whatever." [*]
>
> Actually, that's a good point. It's clearer in that it tells you the
> answer without requiring you to think it through.
I'm not sure I understand what you mean by this -- I think you do still
have to think through how the function's output relates to its input,
but you can do this without thinking about how you would implement it.
I think of this as a sort of static perspective on specification, as
opposed to a dynamic one that involves thinking in terms of "first
the code does this, then it does that", and viewing things from that
perspective -- it's something I was taught to do in graduate school,
and once I caught on I found it amazingly powerful. But I'm a former
math major, which I think may bias me in favor of formal approaches.
> Thanks to spinoza1111,
> we are now aware that at least some people can implement a solution to
> the problem without ever thinking about how it works enough to realize
> that there is no question of overlapping strings. Clarifying it explicitly
> is probably beneficial.
Agreed -- except I'd say there's no "probably" about it. :-)?
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:53:37 PM
|
|
In article <4b869a82$0$1119$4fafbaef@reader2.news.tin.it>,
io_x <a@b.c.invalid> wrote:
>
> <blmblm@myrealbox.com> ha scritto nel messaggio
> news:7uffggFfl7U3@mid.individual.net...
> >> > io_x is from
> >> > msg-id Message-ID: <4b730781$0$822$4fafbaef@reader5.news.tin.it>
>
> in that message above there is one error
> .e: push edi
> call _free
> .e0: xor eax, eax
>
> has to be written like
>
> .e: push edi
> call _free
> add esp, 4
> .e0: xor eax, eax
Huh -- the copy I have of your code actually has that fix ....
> i forget to clear the stack;
> but why not use this
> news:4b7a82ed$0$1109$4fafbaef@reader4.news.tin.it
Apparently that's the one I actually worked from. Sorry about
putting in the wrong message ID -- I think I only decided *after*
retrieving code that it might be nice to also have matching message
IDs, and obviously in the process of looking up people's code a
second time .... Oops.
> yes there is a little improve
> and the replace routine has different arguments
> char* __stdcall Replacer_m(unsigned* len,
> char* origin, char* whatSost, char* sost);
>
> but it is possible write
> char* replace_iox(char* origin, char* whatSost, char* sost)
> {unsigned len;
> return Replacer_m(&len, origin, whatSost, sost);
> }
>
> it should be always 3 times slowrer, at last, in compare with yours.
Why should it be slower? I don't know x86 assembler and so am not
going to try to figure out what it does, but why should it be slower?
(I'm also curious about why you chose assembler. ? )
> But for this time, it is enought for here.
Yeah. :-)?
[ snip ]
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:54:06 PM
|
|
In article <slrnhod6n2.mbj.usenet-nospam@guild.seebs.net>,
Seebs <usenet-nospam@seebs.net> wrote:
> On 2010-02-25, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> > Agreed, anyway, that in almost all circumstances clear simple code is
> > to be preferred. Indeed -- my original naive solution is almost as
> > fast as the "improved" version, as long as I use the C-library versions
> > of the string.h functions, and it took less time to write and debug
> > than the more complicated solution.
>
> Idle curiousity: How's mine do? I haven't checked to see what the official
> interface is, but I'm pretty sure this is adequately obvious. It presumably
> suffers from double-scanning, but I don't know how much that matters. It
> doesn't do a lot of mallocs.
>
> char *
> rep(const char *in, const char *out, const char *target) {
[ snip ]
> }
It looks like Ben has already run a more-complete set of benchmarks,
but I ran yours through my testing/benchmarking code as well.
For the record, it passes all of the spinoza1111 tests (well,
it did after I realized that you were using a different parameter
order from everyone else and fixed that).
Times .... I'll put them all in again, but this time only for
compiling with -O2.
About my six versions .... v1 is a naive implementation that scans
the input once to count matches, to allow computing the right
length for the output string, and then again to actually do the
replacement. v2 scans once and builds a list of matches, which is
then used to do the replacement, but it makes no attempt to avoid
calling strlen repeatedly on the to_replace/replacement strings.
v3 avoids that by passing around string lengths too, where needed.
"lib" versus "user" is which implementation of the string.h
functions I used.
The two versions that build a list call malloc for each list
element. I thought that would slow things down, but apparently
not -- the versions that are really slow are the ones that make
lots of calls to my implementation of strlen. The library version
of that appears to be *much* faster.
On the old/slow system:
bacarisse 4.47 seconds
blmblm-v1-lib 10.78 seconds
blmblm-v1-user 35.97 seconds
blmblm-v2-lib 9.60 seconds
blmblm-v2-user 35.10 seconds
blmblm-v3-lib 7.86 seconds
blmblm-v3-user 8.58 seconds
harter-1 5.67 seconds
harter-2 5.94 seconds
io_x 18.05 seconds
nilges 7.72 seconds
seebach 8.75 seconds
thomasson 4.08 seconds
willem 7.18 seconds
On the newer/faster system:
bacarisse 1.74 seconds
blmblm-v1-lib 3.33 seconds
blmblm-v1-user 12.73 seconds
blmblm-v2-lib 2.73 seconds
blmblm-v2-user 11.20 seconds
blmblm-v3-lib 2.52 seconds
blmblm-v3-user 3.33 seconds
harter-1 2.58 seconds
harter-2 2.27 seconds
io_x 9.82 seconds
nilges 2.36 seconds
seebach 3.16 seconds
thomasson 1.69 seconds
willem 2.77 seconds
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:54:48 PM
|
|
In article <7c491f59-7c35-47d7-ae49-239b26525db3@t34g2000prm.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 25, 11:35 pm, Seebs <usenet-nos...@seebs.net> wrote:
> > On 2010-02-25, blmblm myrealbox.com <blm...@myrealbox.com> wrote:
> >
> > > As best I can tell, he doesn't read anything you're posting here
> > > unless someone else quotes it -- or at least he doesn't reply,
> > > so your taunting .... ah well, I suppose now that I've quoted
> > > you he may choose to reply. Or not.
> >
> > Exactly. He's plonked, because that way I see only the funny parts.
>
> No, you're too cowardly to talk to me. You haven't plonked jackshit.
> >
What makes you think that? The complete refusal to respond to any
questions or taunts -- to me that's a good sign that your posts are
not even being read. I suppose cowardice is an alternate explanation,
but so is "I refuse to dignify these insults by responding".
I'm tempted to quote the rest of your post as a sample in case I'm
right about the explanation. But -- nah, "let's you and him fight"
is really not very attractive behavior, is it?
[ snip ]
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:55:34 PM
|
|
In article <f6600f51-6210-436f-9f7e-50846632ca5b@k6g2000prg.googlegroups.com>,
spinoza1111 <spinoza1111@yahoo.com> wrote:
> On Feb 25, 5:29 pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > In article <bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegroups.com>,
> >
> >
> >
> >
> >
> > spinoza1111 <spinoza1...@yahoo.com> wrote:
> > > On Feb 23, 10:38 pm, blm...@myrealbox.com <blm...@myrealbox.com>
> > > wrote:
> > > > In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.googlegroups.com>,
> >
> > > >spinoza1111<spinoza1...@yahoo.com> wrote:
> > > > > On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > > > > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
> >
> > > > > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
> >
> > > > [ snip ]
> >
> > > > > And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
> > > > > STRINGS. If you use it to construct a table of replace points you're
> > > > > gonna have an interesting bug-o-rama:
> >
> > > > > replace("banana", "ana", "ono")
> >
> > > > > IF you restart one position after the find point, and not at its end.
> >
> > > > Why would you do that, though? only if you *wanted* to detect
> >
> > > Search me. But that's what the code I was discussing actually did.
> >
> > What code is that? I've traced back through predecessor posts, and
> > the only one that comes close to including code is the one in which
> > Chris Thomasson references his code in
> >
> > http://clc.pastebin.com/f62504e4c
> >
> > which on a quick skim doesn't seem to me to be looking for
> > overlapping strings.
>
> My code handled string overlap after the bug was pointed out to me,
> BEFORE any other code.
So when you said "the code I was discussing", you meant *your*
code? Oh! I understood you to be saying that using strstr() is
dangerous because it finds (or doesn't find?) overlapping strings,
and interpreted "the code I was discussing" to be someone else's
code, someone who was using strstr(). Faulty communication.
> I'm too sick of the shit that goes on here to
> make a collection of all solutions and find what probably are many
> failures, but one of my contributions was to pass on the test case.
>
> There's a lot of claims and counterclaims here and at least two
> discussants are complete shitheads. However, we KNOW that other
> posters used the test suite I created AFTER my code worked with that
> test data.
Is this some kind of race to find out who can post a solution first?
If so, um, haven't you expressed disapproval of boasting about speed
of coding? Or do you suspect others of cribbing from your solution?
I can tell you that I didn't -- reading code is not one of my best
things anyway, and I thought it would be more fun to write my own
code before looking at others'.
> > > > overlapping strings, and -- if you did detect them, what would
> > > > you do with them? I can't think of any sensible definition of
> > > > "replace" that does anything with overlapping strings [*], so
> >
> > > replace(banana, ana, ono) could equal
> >
> > > bonona going left to right without overlap
> > > banono going right to left without overlap
> > > bonono going both ways with overlap
> >
> > There's a semi-sane answer here in the last case, but isn't
>
> HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
> collegial, and it is libel and completely insensitive. It's talking
> like those thugs and shitheads here, Seebach and Heathfield.
The word "sane" was meant to apply to the answer, not to a person.
I don't have enough information to form an opinion I'd want to share
publicly about your sanity.
I could offer to substitute "sensible" for "sane" in what I wrote,
but that might not be any better received.
Whether the apparent lack of communication here is due to poor
writing on my part or something else -- I don't know. At least
one other person appears to have interpreted my words in the
intended way (and replied to that effect).
[ snip ]
> > that because there are some factors at work that won't be
> > generally true? What about replace(banana, ana, xno)?
> > Should that be bxnono or bxnxno?
> >
> The fact that there is a group of answers does not make the question a
> question of a crazy man! In fact, it makes it a good scientific
> question, albeit over the heads of the creeps here.
My point was that I don't think that there's an obvious most-sensible
choice here. How about if you just answer my question -- should
replace(banana, ana, xno) be bxnono or bxnxno? If you aren't sure,
how do you decide what your code should do?
[ snip ]
> > > Suppose that in some language, the ana sound is transformed into the
> > > ono sound to transform present into past tense (weirder things
> > > happen), and suppose speakers do this to ALL occurences of the three
> > > tones a, voiced n, a. When the sounds are adjacent they are
> > > nonetheless distinct in speech but not in writing.
> >
> > > Now, the response of most garden-variety "break room" programmers is
> > > "that's bullshit, and can never happen". But we know that in
> > > programming, many strange things can happen, and that as Hamlet
> > > admonished Horatio, we must "as a stranger give it welcome". Many more
> > > strange things can happen outside programming, and programmers, even
> > > of the Hooters or break room ilk, better realize this when programming
> > > is used to solve problems.
> >
> > "Whatever." I'm not convinced yet that it's possible to come
> > up with a sensible specification for what it means to replace
> > overlapping occurrences of selected text. Absent such a
> > specification -- eh, whatever.
>
> I gave it to you: a hypothetical but possible natural language in
> which adjacent lexemes must be split and modified.
You've posited a scenario in which attempting to replace
overlapping strings would be useful or meaningful. What I'm
not getting is an exact specification of how you think it should
work. What should replace(banana, ana, xno) be? Or are there
restrictions on input that would exclude it from consideration?
> And what's this "whatever"?
It means I couldn't think of a graceful way to express my intended
meaning and decided to just bail out of the sentence. Trying here:
Without a clear specification of what should be done about
overlapping matches, I don't think it makes sense try to come up
code or even an algorithm.
[ snip ]
> > > > Here's my proposed specification, in which "is not a substring of"
> > > > and "concat" have what I hope are obvious meanings, and names
> > > > beginning s_ denote strings:
> >
> > > > replace(s_input, s_old, s_new) yields
> >
> > > > if s_old is not a substring of s_input
> >
> > > > s_input
> >
> > > > else
> >
> > > > concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_new))
> >
> > > > where s_input_prefix and s_input_tail are such that
> >
> > > > s_input = concat(s_input_prefix, s_old, s_input_tail)
> >
> > > > and
> >
> > > > s_old is not a substring of s_input_prefix
> >
> > > This is fine as long as we understand your concat as NOT specifying
> > > left to right or right to left direction.
> >
> > How could it imply a direction? String concatenation seems to me to
> > be a pretty straightforward and well-defined operation, which could
> > even be written as an associative binary operator, no?
>
> Well, my greater experience with object oriented development in C
> Sharp and VB has taught me that given either an adequate OO language,
> or sufficient intelligence and patience, concat can work either way
> without much drama. In C, the direction has to be a crummy parameter
> that is easy to get wrong.
My usage of "concat" here was meant to indicate a mathematical/formal
operation on strings, not a call to a function in some programming
language, real or imagined. How can *that* imply a direction? As
I said, it seems to me that considered as a mathematical operation
string concatenation is associative. Maybe there's something I'm
not getting, though.
> > > The bonono problem would
> > > have to be handled by preprocessing (translating banana into banaana,
> > > perhaps using a rule that "vowels" (in the language) can always be
> > > duplicated because their sounds don't break.
> >
> > Well, now you seem to moving in a direction that might eventually
> > lead to a sensible specification. "Carry on", maybe.
>
> Don't patronize me.
That was not my intent. (And really, I don't think you're in the
strongest position to talk about not patronizing other posters.)
[ snip ]
> > In my opinion the functions declared in string.h include some
> > that are very useful in writing a replace() function as specified
> > here -- I think of them as useful abstractions for the problem
> > domain. Could one define similar functions if strings were *not*
> > represented as null-terminated contiguous sequences of characters ....
> >
> > Well, certainly one could, but whether they'd be unacceptably
> > inefficient might depend on how strings were represented.
> > C's approach to representing strings allows one to have multiple
> > strings in a single character array and to easily regard a suffix
> > of a string as a string in its own right, both of which strike me
> > as useful in context. One could (I think) get a similar effect
> > by defining strings to be objects consisting of a length and a
> > pointer into an array of characters. If strings were represented
> > as a length immediately followed by a sequence of characters --
> > not so much.
> >
> > I'm not sure I'm explaining this very well or thinking it through
> > carefully, but perhaps it will advance the discussion a bit anyway.
> >
> > > > [*] My objection to this constraint is that any minimally competent
> > > > programmer should be able to write functions that implement the
> > > > same API, so just avoiding use of the library functions doesn't
> > > > seem to me to make the problem more interesting.
> >
> > > No. The API locks us into bad thoughts.
> >
> > I'd say "how so?" but I'm not optimistic about getting an answer
> > I'd find useful.
>
> I've already told you. Strings terminated ON THE RIGHT with a Nul is a
> bad thought for two reasons:
>
> * It prevents Nul from occuring in a string
> * It mandates Eurocentric left to right processing
What I'm still not getting is how either of these things .... Let
me try to explain:
To me what makes the functions in string.h useful in dealing
with strings is the operations they perform, not the string
representation they operate on. I can easily imagine rewriting
most if not all of them to operate on strings that are represented
in some other way (as an object containing or pointing to a
sequence of characters and a length, say, where "characters" might
be elements of the ASCII character set or elements of some other
set). I can also imagine adding something that allows specifying
whether processing should be left to right or right to left.
So, what exactly is in string.h .... A partial list of functionality
provided:
* copy characters (memcpy, memmove, etc.)
* copy a string (strcpy)
* compare strings (strcmp)
* concatenate strings (strcat)
* duplicate a string (strdup)
* search for a character in a string (strchr)
* search for a string in a string (strstr)
* get a string's length (strlen)
Sounds like a reasonable assortment to me -- perhaps not including
everything anyone would want, but these all sound useful to me.
> My respect o meter in your case is diminishing.
Probably good; I think it was miscalibrated at some point.
> As to a length code limiting string length, do the math. 2^63 - 1 or
> 2^64 - 1 is a big number, and run length codes can be used, especially
> in OO programming. Furthermore, even if the string is longer, you can
> still process it with an unknown length, which OO programming handles
> quite nicely.
That's not the point I was making -- what I meant was that if the
representation requires that the length and the actual characters
be contiguous, you can't get a substring simply by pointing into
an existing string, as you can in C, and I think that has some
semi-obvious disadvantages. There might be other reasons not
to use such a representation, and it might not be one you were
considering in any case. But I have some vague recollection of
hearing about *some* implementation of strings that works that way.
I could be mistaken.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
|
|
0
|
|
|
|
Reply
|
blmblm (1187)
|
2/26/2010 5:57:40 PM
|
|
blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> In article <slrnhod6fh.mbj.usenet-nospam@guild.seebs.net>,
> Seebs <usenet-nospam@seebs.net> wrote:
<snip>
>> Yeah. I still remember:
>> i << 3 + i + i
>> but that doesn't mean I'd ever use it.
>
> Yipes. Where did you encounter this one? Multiplication by ten,
> right?
No, though it is a detail. I thought Seebs was making a point ("if
you code like this you'll make mistakes like this one I remember") but
I could be wrong about that.
You could add it to your examples for your students to have to debug
with and without -Wall: + binds more tightly than << and >> in C.
Aside: this is a hard one to remember unless you know C++:
cout << 3 + i + i;
<snip>
> Well, I interpreted the "messing with [his] head" remark to indicate
> that he was trying to taunt you or provoke you in some way.
There's no doubt in my mind that he is. Given the abuse and invective
hurled at him, it is to Seebs's great credit that he has been able to
sit on his hands.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/26/2010 6:17:09 PM
|
|
Ben Bacarisse wrote:
) blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
)>< snip >
)> willem (O2) 2.77 seconds
)> willem (O3) 4.16 seconds [ can this be right?! ]
)
) Looks wacky to me! Is it repeatable?
Bwahahaha! I love it!
) Here are my times (also gcc 4.4.1 and libc 2.10.1). I seem to have a
) faster machine. The first number are your times (for reference) and
) the second are mine (in seconds). The third column is the ratio of
) the two. You can see that there is more going on than just the speed
) of the machine.
)
) <snip>
) willem (O2) 2.77 0.813 3.41
) willem (O3) 4.16 0.885 4.70
)
) If we are now measuring the same things, it seems that some code is
) favoured by my system (yours for example) and some does not do so
) well. I suspect interactions with the various caches but that is a
) huge guess.
This is quite interesting! I would really like to see the generated
assembly for -O2 and -O3 for my code. I guess I can retrieve my code
from the usenet archive and compile it, but I don't know which of the
two solutions I posted was tested here. (The iterative or the recursive
one ?)
PS: For testing you would also need different match patterns, including
some that contain repeated strings or stuff like that, especially
if you're comparing 'smart' against 'dumb' algorithms.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
willem7123 (117)
|
2/26/2010 6:23:44 PM
|
|
On 2010-02-26, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> Ah, but do you? At least one of the people still interacting with
> him (Richard Heathfield) snips non-technical content. I won't say
> you're missing anything you'd enjoy, but there's plenty that a person
> inclined to perceive and react to insult would -- react to.
You have a point. Maybe I should unplonk him so I can delight in the
madness. I am an aficionado of Usenet kookery. Honestly, I was a little
sad to come back here and find Scott Nudds gone. :(
>> Yeah. I still remember:
>> i << 3 + i + i
>> but that doesn't mean I'd ever use it.
> Yipes. Where did you encounter this one? Multiplication by ten,
> right?
In the C library for Aztec C for the Amiga. Which leads me to assume either
that they knew that their compiler didn't always make that optimization (or
possibly that it shouldn't always), or that they were too clever for their
own good.
> Well, to be fair, aren't there circumstances in which it might?
> in which the programmer knows that one of the operands of the
> multiply operator is a power of 2, but the compiler wouldn't be
> able to detect that? It does seem like the kind of microoptimization
> that one would hesitate to do without a compelling reason, though.
Hmm.
Here's the thing. If it's a constant, the compiler can obviously tell whether
it's a power of two. If it's not a constant, but I know it's a power of two,
figuring out which power of two it is will almost certainly cost more than
multiplication. Furthermore, it's not at all obvious to me that I should
assume that a given modern CPU will shift that much faster than it multiplies.
>> Wait, does this imply that he thinks the << thing is some kind of secret
>> that not everyone knows? As an idle curiousity, I've asked my coworkers
>> to see whether any of them *haven't* seen that.
> Well, I interpreted the "messing with [his] head" remark to indicate
> that he was trying to taunt you or provoke you in some way. But
> I could be mistaken about that.
I have no idea. He's really challenged my assumption that other things which
use language are generally volitional actors which engage in goal-directed
behavior, certainly.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/26/2010 7:05:23 PM
|
|
On 2010-02-26, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
>> Yipes. Where did you encounter this one? Multiplication by ten,
>> right?
> No, though it is a detail. I thought Seebs was making a point ("if
> you code like this you'll make mistakes like this one I remember") but
> I could be wrong about that.
Whoops. No, I just made the mistake. It was probably right in the original.
> There's no doubt in my mind that he is. Given the abuse and invective
> hurled at him, it is to Seebs's great credit that he has been able to
> sit on his hands.
Not really:
1. He's plonked, so I only see a few of them.
2. I'm autistic. Insults only communicate data to me in most cases. In his
case, they communicate that he's angry but deeply incompetent.
Mostly I just figure he's cheap entertainment. My usenet feed is cheaper
than cable, and spinoza1111's funnier than most comedy shows.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/26/2010 7:06:57 PM
|
|
On 2010-02-26, blmblm myrealbox.com <blmblm@myrealbox.com> wrote:
> The two versions that build a list call malloc for each list
> element. I thought that would slow things down, but apparently
> not -- the versions that are really slow are the ones that make
> lots of calls to my implementation of strlen. The library version
> of that appears to be *much* faster.
Interesting!
> nilges 7.72 seconds
> seebach 8.75 seconds
That, I must admit, totally surprises me. I would have thought that the
cost of strstr() would be trivial compared to the cost of malloc(). I guess
not for some input data!
Which suggests that, if this would be run often enough, by enough people, who
would be waiting on the output, it is conceivable that it could be worth
spending the extra 8-10 hours of programming effort, plus the lifetime
maintenance effort, for the more complicated code.
Or, alternatively, that it would at least make sense to consider one of the
"don't rescan" options.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
|
|
0
|
|
|
|
Reply
|
usenet-nospam (2216)
|
2/26/2010 7:09:22 PM
|
|
Willem <willem@snail.stack.nl> writes:
> Ben Bacarisse wrote:
> ) blmblm@myrealbox.com <blmblm@myrealbox.com> writes:
> )>< snip >
> )> willem (O2) 2.77 seconds
> )> willem (O3) 4.16 seconds [ can this be right?! ]
> )
> ) Looks wacky to me! Is it repeatable?
>
> Bwahahaha! I love it!
>
> ) Here are my times (also gcc 4.4.1 and libc 2.10.1). I seem to have a
> ) faster machine. The first number are your times (for reference) and
> ) the second are mine (in seconds). The third column is the ratio of
> ) the two. You can see that there is more going on than just the speed
> ) of the machine.
> )
> ) <snip>
> ) willem (O2) 2.77 0.813 3.41
> ) willem (O3) 4.16 0.885 4.70
> )
> ) If we are now measuring the same things, it seems that some code is
> ) favoured by my system (yours for example) and some does not do so
> ) well. I suspect interactions with the various caches but that is a
> ) huge guess.
>
> This is quite interesting! I would really like to see the generated
> assembly for -O2 and -O3 for my code. I guess I can retrieve my code
> from the usenet archive and compile it, but I don't know which of the
> two solutions I posted was tested here. (The iterative or the recursive
> one ?)
I can't help because, for whatever reason, I don't see the difference
that B L Massingill sees.
> PS: For testing you would also need different match patterns, including
> some that contain repeated strings or stuff like that, especially
> if you're comparing 'smart' against 'dumb' algorithms.
Sure. I've used a wide variety of test strings but I've seen no point
in posting the results because they are, by and large, rather
predictable but also rather hard to summarise.
--
Ben.
|
|
0
|
|
|
|
Reply
|
ben.usenet (6516)
|
2/26/2010 11:25:38 PM
|
|
On 26 Feb 2010 17:57:40 GMT, blmblm@myrealbox.com
<blmblm@myrealbox.com> wrote:
>My point was that I don't think that there's an obvious most-sensible
>choice here. How about if you just answer my question -- should
>replace(banana, ana, xno) be bxnono or bxnxno? If you aren't sure,
>how do you decide what your code should do?
If I may intrude, there are more possibilities.
(4) Replace every occurence of the substring by the replacement
string. There are two occurrences of ana in banana; replace each
with xno yielding bxnoxno.
(5) There can only be overlap if the original substring has the
pattern PQP. The overlap sequence will have the pattern P(QP)*.
In your example, the pattern is PQPQP. Replace the entire
overlap pattern by the replacement, yielding bxno.
(6) If the replacement string has the pattern R(SR)* and the
repetition count is the same replace the initial P with R and
each PQ with SR. This does not apply to your example; rather it
is a special case that does apply to replacement ono.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
It's not much to ask of the universe that it be fair;
it's not much to ask but it just doesn't happen.
|
|
0
|
|
|
|
Reply
|
cri (1432)
|
2/27/2010 12:15:57 AM
|
|
On Feb 27, 1:57=A0am, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> In article <f6600f51-6210-436f-9f7e-50846632c...@k6g2000prg.googlegroups.=
com>,
>
>
>
>
>
> spinoza1111=A0<spinoza1...@yahoo.com> wrote:
> > On Feb 25, 5:29 pm, blm...@myrealbox.com <blm...@myrealbox.com> wrote:
> > > In article <bdac8263-e941-4955-9875-44d4ff741...@k18g2000prf.googlegr=
oups.com>,
>
> > >spinoza1111<spinoza1...@yahoo.com> wrote:
> > > > On Feb 23, 10:38 pm, blm...@myrealbox.com <blm...@myrealbox.com>
> > > > wrote:
> > > > > In article <d520a640-1606-407e-9b7f-b9c75f4d5...@s36g2000prf.goog=
legroups.com>,
>
> > > > >spinoza1111<spinoza1...@yahoo.com> wrote:
> > > > > > On Feb 15, 8:41 am, "Chris M. Thomasson" <n...@spam.invalid> wr=
ote:
> > > > > > > "Stefan Ram" <r...@zedat.fu-berlin.de> wrote in message
>
> > > > > > >news:rand-20100215000605@ram.dialup.fu-berlin.de...
>
> > > > > [ snip ]
>
> > > > > > And note that "using strstr" has its own dangers. IT FINDS OVER=
LAPPING
> > > > > > STRINGS. If you use it to construct a table of replace points y=
ou're
> > > > > > gonna have an interesting bug-o-rama:
>
> > > > > > replace("banana", "ana", "ono")
>
> > > > > > IF you restart one position after the find point, and not at it=
s end.
>
> > > > > Why would you do that, though? =A0only if you *wanted* to detect
>
> > > > Search me. But that's what the code I was discussing actually did.
>
> > > What code is that? =A0I've traced back through predecessor posts, and
> > > the only one that comes close to including code is the one in which
> > > Chris Thomasson references his code in
>
> > >http://clc.pastebin.com/f62504e4c
>
> > > which on a quick skim doesn't seem to me to be looking for
> > > overlapping strings.
>
> > My code handled string overlap after the bug was pointed out to me,
> > BEFORE any other code.
>
> So when you said "the code I was discussing", you meant *your*
> code? =A0Oh! =A0I understood you to be saying that using strstr() is
> dangerous because it finds (or doesn't find?) overlapping strings,
> and interpreted "the code I was discussing" to be someone else's
> code, someone who was using strstr(). =A0Faulty communication.
>
> > I'm too sick of the shit that goes on here to
> > make a collection of all solutions and find what probably are many
> > failures, but one of my contributions was to pass on the test case.
>
> > There's a lot of claims and counterclaims here and at least two
> > discussants are complete shitheads. However, we KNOW that other
> > posters used the test suite I created AFTER my code worked with that
> > test data.
>
> Is this some kind of race to find out who can post a solution first?
> If so, um, haven't you expressed disapproval of boasting about speed
> of coding? =A0Or do you suspect others of cribbing from your solution?
> I can tell you that I didn't -- reading code is not one of my best
> things anyway, and I thought it would be more fun to write my own
> code before looking at others'.
I believe I express mostly disapproval about speed of coding as a way
of dodging issues, not speed of coding per se. For example, I didn't
like it at all when Brian Kernighan, in the recent O'Reilly collection
Beautiful Code, praised Rob Pike for only taking an hour to write a
"regular expression processor" because:
* Pike's code wasn't a full or true regular expression processor
* The fact that it took an hour doesn't change the above
Seebie bragged about taking "ten minutes before breakfast" in response
to questions about whether he was solving a problem correctly or in
depth, because in fact, in corporations, programmers, in contrast to
real engineers, seem to believe that in all cases speedy coding makes
up for almost any failing.
>
> > > > > overlapping strings, and -- if you did detect them, what would
> > > > > you do with them? =A0I can't think of any sensible definition of
> > > > > "replace" that does anything with overlapping strings [*], so
>
> > > > replace(banana, ana, ono) could equal
>
> > > > bonona going left to right without overlap
> > > > banono going right to left without overlap
> > > > bonono going both ways with overlap
>
> > > There's a semi-sane answer here in the last case, but isn't
>
> > HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
> > collegial, and it is libel and completely insensitive. It's talking
> > like those thugs and shitheads here, Seebach and Heathfield.
>
> The word "sane" was meant to apply to the answer, not to a person.
> I don't have enough information to form an opinion I'd want to share
> publicly about your sanity.
That manages to be rather snide, in my view.
>
> I could offer to substitute "sensible" for "sane" in what I wrote,
> but that might not be any better received.
It would have been an improvement. But there are plenty of words such
as "correct" which have zero personal connotation. =A0
>
> Whether the apparent lack of communication here is due to poor
> writing on my part or something else -- I don't know. =A0At least
> one other person appears to have interpreted my words in the
> intended way (and replied to that effect).
>
> [ snip ]
>
> > > that because there are some factors at work that won't be
> > > generally true? =A0What about replace(banana, ana, xno)? =A0
> > > Should that be bxnono or bxnxno?
>
> > The fact that there is a group of answers does not make the question a
> > question of a crazy man! In fact, it makes it a good scientific
> > question, albeit over the heads of the creeps here.
>
> My point was that I don't think that there's an obvious most-sensible
> choice here. =A0How about if you just answer my question -- should
> replace(banana, ana, xno) be bxnono or bxnxno? =A0If you aren't sure,
> how do you decide what your code should do?
Whoa. I'm not sure. "Science" is about possibility as well as fact.
However, I do think that for the same reason your notion of "concat"
is cool since it is independent of direction, I think that a "flat" or
one-time application of "replace" is one of those phony notions that
only seem useful. The basic notion is not replace once, it is replace
until no change, as in macro replacement. I think we can prove that
there's no instance of a replace that always changes the string when
applied.
Let us call an implementation of replace(master, target, replacement)
"kewl" when and only when it is "independent of left to right or right
to left order". I claim that the only form of replace that is "kewl"
is nondeterministic. To simulate it you'd have to apply the
replacement rule randomly. It would sometimes return bonona, and other
times it would return banono.
(Chorus of you say tomayto I say tomahto).
This is an interesting NEGATIVE point. It means that there are
probably bugs out there.
It's a CORRECT result without being, of course, a reasonable
SPECIFICATION for real code. But that don't mean it's not useful.
Turing's Halting Problem is True, and created software, but it's not a
spec.
The truth is something which can be applied immanently as a critical
tool to some spec, but in corporate life, the central idea is that the
employee is idemnified if she works to rule or spec. Requirements
definition is depressing because it excludes critical thought in favor
of applied Postivism.
I mean, ask a kid.
"Why a four year old could do this! Get me a four year old!"
One thing I find terribly amusing and at the same time rather sad in
this intellectual slum is the contrast between the hackneyed,
conventional, authoritarian and "mature" thinking of grownups here,
and my elementary students in my real job (I teach a range of students
from primary to uni).
I think a child would have a great deal of difficulty learning how to
manually do a replace, and would ask if clever about "banana". I think
a real mathematician thinks like a child and would not be satisfied
with a replace() applied once, deterministically and left to right. I
don't know, however, if there is any "real" work on this.
And despite the arcane flavor of this material, the failure to even
consider cases thought "imprecise" because they are non-deterministic
creates real bugs, as when the user says, "oh no, in THAT case you
need to change banana to banono." "Oh, really? Why?" "Because our
customer in Antigua wants it that way."
Children, in making "mistakes", make discoveries, that it's just the
sort of thing programmers miss. For example, Peter didn't ask himself
what would be the case if %s was in the substitution string and would
probably consider the question so quirky as to make it safe to gravely
infer that the asker of the question is a nutbar, and to Call
Security.
>
> [ snip ]
>
>
>
>
>
> > > > Suppose that in some language, the ana sound is transformed into th=
e
> > > > ono sound to transform present into past tense (weirder things
> > > > happen), and suppose speakers do this to ALL occurences of the thre=
e
> > > > tones a, voiced n, a. When the sounds are adjacent they are
> > > > nonetheless distinct in speech but not in writing.
>
> > > > Now, the response of most garden-variety "break room" programmers i=
s
> > > > "that's bullshit, and can never happen". But we know that in
> > > > programming, many strange things can happen, and that as Hamlet
> > > > admonished Horatio, we must "as a stranger give it welcome". Many m=
ore
> > > > strange things can happen outside programming, and programmers, eve=
n
> > > > of the Hooters or break room ilk, better realize this when programm=
ing
> > > > is used to solve problems.
>
> > > "Whatever." =A0I'm not convinced yet that it's possible to come
> > > up with a sensible specification for what it means to replace
> > > overlapping occurrences of selected text. =A0Absent such a
> > > specification -- eh, whatever. =A0
>
> > I gave it to you: a hypothetical but possible natural language in
> > which adjacent lexemes must be split and modified.
>
> You've posited a scenario in which attempting to replace
> overlapping strings would be useful or meaningful. =A0What I'm
> not getting is an exact specification of how you think it should
> work. =A0What should replace(banana, ana, xno) be? =A0Or are there
> restrictions on input that would exclude it from consideration?
No, I am not trying to come up with an exact specification, only a
general approach.
>
> > And what's this "whatever"?
>
> It means I couldn't think of a graceful way to express my intended
> meaning and decided to just bail out of the sentence. =A0Trying here:
>
> Without a clear specification of what should be done about
> overlapping matches, I don't think it makes sense try to come up
> code or even an algorithm.
Why is it that in the corporation
The so-called clear specification
Is so often very dear
Costing loads of megabucks,
And never, almost never, ever clear?
The Germans had a Schlieffen Plan
A sort of military requirements definition:
But Tuchman the histori-an
Said that the plan was typically Teuton:
Everything was she said perfectly laid on
Only to fail at the critical point:
The plan violated the neutrality of Belgium.
To "focus", to be obedient, administered and precise
Is to be inhuman, and not very nice:
Yet we fear, even in something so rigid
As computer programming, so precise and so frigid,
Our queer humanity.
And why oh why is actual clarity
So seldom treated with charity
Euclid was as clear as day,
But seems to some a fussbudget and gay
Because he was precise in what he had to say.
But as it happens...as it turns out,
In what Adorno called the administered world,
Merely extending common sense,
Is unrewarded, unsung, and without recompense.
It transgresses in the name of truth
And so it's treated without ruth.
Requirements are ersatz and miss the point
Requirements, I say, please just go away:
We don't need no steenking requirements
Let's do a daily build and use extended common sense
That is, in fact, nothing more or less than science.
>
> [ snip ]
>
>
>
>
>
> > > > > Here's my proposed specification, in which "is not a substring of=
"
> > > > > and "concat" have what I hope are obvious meanings, and names
> > > > > beginning s_ denote strings:
>
> > > > > replace(s_input, s_old, s_new) yields
>
> > > > > if s_old is not a substring of s_input
>
> > > > > =A0 s_input
>
> > > > > else
>
> > > > > =A0 concat(s_input_prefix, s_new, replace(s_input_tail, s_old, s_=
new))
>
> > > > > =A0 where s_input_prefix and s_input_tail are such that
>
> > > > > =A0 =A0 s_input =3D concat(s_input_prefix, s_old, s_input_tail)
>
> > > > > =A0 and
>
> > > > > =A0 =A0 =A0s_old is not a substring of s_input_prefix
>
> > > > This is fine as long as we understand your concat as NOT specifying
> > > > left to right or right to left direction.
>
> > > How could it imply a direction? =A0String concatenation seems to me t=
o
> > > be a pretty straightforward and well-defined operation, which could
> > > even be written as an associative binary operator, no? =A0
>
> > Well, my greater experience with object oriented development in C
> > Sharp and VB has taught me that given either an adequate OO language,
> > or sufficient intelligence and patience, concat can work either way
> > without much drama. In C, the direction has to be a crummy parameter
> > that is easy to get wrong.
>
> My usage of "concat" here was meant to indicate a mathematical/formal
> operation on strings, not a call to a function in some programming
> language, real or imagined. =A0How can *that* imply a direction? =A0As
> I said, it seems to me that considered as a mathematical operation
> string concatenation is associative. =A0Maybe there's something I'm
> not getting, though.
I agree. Concat is independent of direction in the abstract. But as
Dijkstra saw, there's a problem when the theory becomes reality. The
fact is that to real developers of the corporate-slave class, the
connotation of concatenation is left to right, they being uneducated
in linguistics.
>
> > > > The bonono problem would
> > > > have to be handled by preprocessing (translating banana into banaan=
a,
> > > > perhaps using a rule that "vowels" (in the language) can always be
> > > > duplicated because their sounds don't break.
>
> > > Well, now you seem to moving in a direction that might eventually
> > > lead to a sensible specification. =A0"Carry on", maybe.
>
> > Don't patronize me.
>
> That was not my intent. =A0(And really, I don't think you're in the
> strongest position to talk about not patronizing other posters.)
It's not patronizing when there is a genuine difference, in abilities
that is belied by the patronizing behavior. So don't speak to me of
what is sensible until you have established enough credibility. I
appreciate your collegiality but as yet I don't see any vast
difference in ability that would make "patronizing" inapplicable.
Real "patronizing" is only crudely inferable from a tone of voice. It
is relative to whether the patronizer is without merit adopting an *ex
cathedra* style. I think in my case it is sometimes appropriate to do
so. I would it were not so, for I would prefer to meet better
programmers here.
>
|
|
0
|
|
|
|
Reply
|
spinoza1111 (3250)
|
2/27/2010 6:27:23 AM
|
|
On Feb 27, 3:06=A0am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2010-02-26, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>
> > blm...@myrealbox.com <blm...@myrealbox.com> writes:
> >> Yipes. =A0Where did you encounter this one? =A0Multiplication by ten,
> >> right?
> > No, though it is a detail. =A0I thought Seebs was making a point ("if
> > you code like this you'll make mistakes like this one I remember") but
> > I could be wrong about that.
>
> Whoops. =A0No, I just made the mistake. =A0It was probably right in the o=
riginal.
>
> > There's no doubt in my mind that he is. =A0Given the abuse and invectiv=
e
> > hurled at him, it is to Seebs's great credit that he has been able to
> > sit on his hands.
>
> Not really:
> 1. =A0He's plonked, so I only see a few of them.
> 2. =A0I'm autistic. =A0Insults only communicate data to me in most cases.=
=A0In his
> case, they communicate that he's angry but deeply incompetent.
>
> Mostly I just figure he's cheap entertainment.
Well, seebie: only in America, my experience as an expat tells me, are
people so proud of watching TV and instead of finding sermons in
stones, or books in running books, find pleasure without instruction
in laughing at supposed inferiors.
If you think what you say above, and if you are accessing these
dis | | | |