Hello
I am working on an rtf renderer and parser. My code is hosted at
http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
that getc() is taking a lot of time. Obviously, character read from files is
slow. So, I decided to read the whole file into memory as a char buffer. Please
feel free to comment and suggest on the following code. Now, I want to scan the
char buffer using sscanf(). However, I remember once I heard in a chat room that
sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
would like to know how I can use sscanf() safely.
if(!fp){
fprintf(stderr, "File pointer uninitialized.\n");
goto close_strbuf;
}
while((count = fread(cp, 1, 4096, fp))){
if(feof(fp))break;
strbuf_append(buf, cp);
}
cp[count] = '\0';
strbuf_append(buf, cp);
fclose(fp);
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/23/2010 5:50:54 PM |
|
On Thu, 23 Dec 2010 23:20:54 +0530, Cross wrote:
> I am working on an rtf renderer and parser. My code is hosted at
> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
> that getc() is taking a lot of time. Obviously, character read from files is
> slow.
FWIW, with GNU libc, you can replace getc() with getc_unlocked() if you
aren't accessing the file from multiple threads. The main reason why
getc() is so slow is that it has to lock the FILE structure for each
operation. The _unlocked version is just:
#define _IO_getc_unlocked(_fp) \
(_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) \
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
getc() itself used to be implemented this way before threading was
available on Linux.
Alternatively, you could just implement your own version of getc based
upon the above.
> So, I decided to read the whole file into memory as a char buffer.
> Please feel free to comment and suggest on the following code. Now, I
> want to scan the char buffer using sscanf(). However, I remember once I
> heard in a chat room that sscanf() has buffer overflow vulnerabilities.
> I would like pointers on this and would like to know how I can use
> sscanf() safely.
The "%s" and "%[...]" specifiers will read as many matching characters as
are available. The buffer needs to be large enough to hold all of them. If
you don't know how many characters might be read, the buffer needs to be
able to hold the entire input.
If you're reading an unknown amount of data into a fixed-size buffer,
specify a maximum field width. E.g. "%15s" will read at most 15
characters, plus a terminating NUL byte (which is written regardless of
whether the field width was exceeded).
If a field exceeds the maximum field width, it's likely that subsequent
parsing will fail, but you won't get a buffer overflow. If you need to be
able to read data containing unlimited-length fields, you'll need to
choose a different approach.
|
|
0
|
|
|
|
Reply
|
Nobody
|
12/23/2010 8:17:49 PM
|
|
Cross <X@x.tv> wrote:
> I am working on an rtf renderer and parser. My code is hosted at
> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
> that getc() is taking a lot of time. Obviously, character read from files is
> slow. So, I decided to read the whole file into memory as a char buffer.
> Please feel free to comment and suggest on the following code. Now, I want
> to scan the char buffer using sscanf().
I guess you mean fscanf() here since you seem to want to read
from a file and not from a string - which is what sscanf() is
for,
> However, I remember once I heard in
> a chat room that sscanf() has buffer overflow vulnerabilities.
Only when used in an unsafe fashion, that is reaing with e.g.
an unadorned '%s' since you can't forsee the number if chars
that can be read in and thus you can't supply a large enough
buffer. But when you specify the maximum number of chars that
fscanf() is allowed to read it's perfectlly safe. So for example
sscanf( fp, "%4096[^\x01-\0xFF]", buf );
will read up to but never more than 4096 chars from the file
and put them into 'buf' (and it will put a trailing '\0' at
the end). It also will stop at a '\0' in the file, but then
this is a binary file and in that case fscanf() isn't suit-
able anyway.
But if you want to read in the whole file anyway and you don't
want to do any conversion of the data then fread() is probably
the better choice anyway.
> I would like
> pointers on this and would like to know how I can use sscanf() safely.
> if(!fp){
> fprintf(stderr, "File pointer uninitialized.\n");
> goto close_strbuf;
> }
> while((count = fread(cp, 1, 4096, fp))){
> if(feof(fp))break;
Checking for feof() is rather useless - if you hit the end of the
file and there are no chars left to be read the return value of
fread() will be 0. That's a good time to stop;-)
> strbuf_append(buf, cp);
No idea what this function is supposed to do (it's not a standard
C function and thus shouldn't have a name starting with 'str'),
but I am rather sure that this use is unsafe - what you got from
fread() has rather likely no '\0' at the end, so the function
has no chance to determine where to stop when copying from 'cp'
to 'buf', you would also have to pass 'count' to it...
> }
> cp[count] = '\0';
> strbuf_append(buf, cp);
> fclose(fp);
The function (or, the snippet from some function) doesn't make
too much sense to me the way its written. How large is 'buf'?
Do you want to read never more than 4096 chars from a file and
thus 'buf' has (at least) 4097 elements? In that case why use
the while() loop? Just
count = fread( buf, 1, 4096, fp );
buf[ count ] = '\0';
would do fine.
Otherwise where do you check that you can still append to 'buf'?
That's assuming that you don't know the length of the file in
advance - but if you did I would expect that you would try to
read in everything at once instead of in chunks of 4096 bytes.
So, please give a bit more of information about what you're
trying to do, otherwise it's impossible to tell what you
may need.
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
|
|
0
|
|
|
|
Reply
|
jt
|
12/23/2010 8:34:14 PM
|
|
On 12/24/2010 02:04 AM, Jens Thoms Toerring wrote:
> Cross<X@x.tv> wrote:
>
>> I am working on an rtf renderer and parser. My code is hosted at
>> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>> that getc() is taking a lot of time. Obviously, character read from files is
>> slow. So, I decided to read the whole file into memory as a char buffer.
>> Please feel free to comment and suggest on the following code. Now, I want
>> to scan the char buffer using sscanf().
>
> I guess you mean fscanf() here since you seem to want to read
> from a file and not from a string - which is what sscanf() is
> for,
No, I mean sscanf() only. As I said, I read the file into a char buffer and use
sscanf() to scan the char buffer.
>
>> However, I remember once I heard in
>> a chat room that sscanf() has buffer overflow vulnerabilities.
>
> Only when used in an unsafe fashion, that is reaing with e.g.
> an unadorned '%s' since you can't forsee the number if chars
> that can be read in and thus you can't supply a large enough
> buffer. But when you specify the maximum number of chars that
> fscanf() is allowed to read it's perfectlly safe. So for example
>
> sscanf( fp, "%4096[^\x01-\0xFF]", buf );
>
> will read up to but never more than 4096 chars from the file
> and put them into 'buf' (and it will put a trailing '\0' at
> the end). It also will stop at a '\0' in the file, but then
> this is a binary file and in that case fscanf() isn't suit-
> able anyway.
>
> But if you want to read in the whole file anyway and you don't
> want to do any conversion of the data then fread() is probably
> the better choice anyway.
>
>> I would like
>> pointers on this and would like to know how I can use sscanf() safely.
>
>> if(!fp){
>> fprintf(stderr, "File pointer uninitialized.\n");
>> goto close_strbuf;
>> }
>
>> while((count = fread(cp, 1, 4096, fp))){
>> if(feof(fp))break;
>
> Checking for feof() is rather useless - if you hit the end of the
> file and there are no chars left to be read the return value of
> fread() will be 0. That's a good time to stop;-)
Without feof(), there was some garbage read each time. I don't know why but with
feof() things appear fine.
>
>> strbuf_append(buf, cp);
>
> No idea what this function is supposed to do (it's not a standard
> C function and thus shouldn't have a name starting with 'str'),
> but I am rather sure that this use is unsafe - what you got from
> fread() has rather likely no '\0' at the end, so the function
> has no chance to determine where to stop when copying from 'cp'
> to 'buf', you would also have to pass 'count' to it...
Thanks for pointing that out.
>
>> }
>> cp[count] = '\0';
>> strbuf_append(buf, cp);
>> fclose(fp);
>
> The function (or, the snippet from some function) doesn't make
> too much sense to me the way its written. How large is 'buf'?
I have defined buf to be of size 4096 bytes.
> Do you want to read never more than 4096 chars from a file and
> thus 'buf' has (at least) 4097 elements? In that case why use
> the while() loop? Just
>
> count = fread( buf, 1, 4096, fp );
> buf[ count ] = '\0';
>
> would do fine.
The files sizes used in testing range from several kilobytes to several megabytes.
>
> Otherwise where do you check that you can still append to 'buf'?
> That's assuming that you don't know the length of the file in
> advance - but if you did I would expect that you would try to
> read in everything at once instead of in chunks of 4096 bytes.
The buffer buf is expandable. strbuf_append() function takes care of it.
>
> So, please give a bit more of information about what you're
> trying to do, otherwise it's impossible to tell what you
> may need.
> Regards, Jens
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/23/2010 9:10:13 PM
|
|
On 12/24/2010 01:47 AM, Nobody wrote:
> On Thu, 23 Dec 2010 23:20:54 +0530, Cross wrote:
>
>> I am working on an rtf renderer and parser. My code is hosted at
>> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>> that getc() is taking a lot of time. Obviously, character read from files is
>> slow.
>
> FWIW, with GNU libc, you can replace getc() with getc_unlocked() if you
> aren't accessing the file from multiple threads. The main reason why
> getc() is so slow is that it has to lock the FILE structure for each
> operation. The _unlocked version is just:
>
> #define _IO_getc_unlocked(_fp) \
> (_IO_BE ((_fp)->_IO_read_ptr>= (_fp)->_IO_read_end, 0) \
> ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
>
> getc() itself used to be implemented this way before threading was
> available on Linux.
I shall try this.
>
> Alternatively, you could just implement your own version of getc based
> upon the above.
>
>> So, I decided to read the whole file into memory as a char buffer.
>> Please feel free to comment and suggest on the following code. Now, I
>> want to scan the char buffer using sscanf(). However, I remember once I
>> heard in a chat room that sscanf() has buffer overflow vulnerabilities.
>> I would like pointers on this and would like to know how I can use
>> sscanf() safely.
>
> The "%s" and "%[...]" specifiers will read as many matching characters as
> are available. The buffer needs to be large enough to hold all of them. If
> you don't know how many characters might be read, the buffer needs to be
> able to hold the entire input.
>
> If you're reading an unknown amount of data into a fixed-size buffer,
> specify a maximum field width. E.g. "%15s" will read at most 15
> characters, plus a terminating NUL byte (which is written regardless of
> whether the field width was exceeded).
>
> If a field exceeds the maximum field width, it's likely that subsequent
> parsing will fail, but you won't get a buffer overflow. If you need to be
> able to read data containing unlimited-length fields, you'll need to
> choose a different approach.
>
Well I shall be scanning for rtf tags which of course are not of unlimited size.
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/23/2010 9:18:53 PM
|
|
Cross <X@x.tv> wrote:
> On 12/24/2010 02:04 AM, Jens Thoms Toerring wrote:
> > Checking for feof() is rather useless - if you hit the end of the
> > file and there are no chars left to be read the return value of
> > fread() will be 0. That's a good time to stop;-)
> Without feof(), there was some garbage read each time. I don't know why but
> with feof() things appear fine.
You shouldn't program by trial and error but try understand what's
going wrong;-) Perhaps using feof() just hides the error but does
not really fix the real problem. Given the scarcity of information
you're giving that could very well be the case.
> > The function (or, the snippet from some function) doesn't make
> > too much sense to me the way its written. How large is 'buf'?
> I have defined buf to be of size 4096 bytes.
> > Do you want to read never more than 4096 chars from a file and
> > thus 'buf' has (at least) 4097 elements? In that case why use
> > the while() loop? Just
> >
> > count = fread( buf, 1, 4096, fp );
> > buf[ count ] = '\0';
> >
> > would do fine.
> The files sizes used in testing range from several kilobytes to several
> megabytes.
That may be the case but you simply don't show enough of your
code to make sense, and this reply doesn't help either, sorry.
So it's impossible to come up with sensible advice. Please show
the whole function (plus functions used from within it like
strbuf_append()) - then things might become clearer. At the
moment it's impossible to even say what you're intentions for
that function are, so it's like giving the car mechanic a photo
of your car and asking him why it won't start;-)
At least if files can be much longer than 4096 bytes then all
tests of how much still fits into 'buf' are missing (does that
happen in strbuf_append()?) as well as reallocation of 'buf' etc.
And that's were buffer overflows you seem to be concerned about
typically happen (or are avoided).
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
|
|
0
|
|
|
|
Reply
|
jt
|
12/23/2010 9:42:36 PM
|
|
On 12/24/2010 03:12 AM, Jens Thoms Toerring wrote:
> Cross<X@x.tv> wrote:
>> On 12/24/2010 02:04 AM, Jens Thoms Toerring wrote:
>>> Checking for feof() is rather useless - if you hit the end of the
>>> file and there are no chars left to be read the return value of
>>> fread() will be 0. That's a good time to stop;-)
>
>> Without feof(), there was some garbage read each time. I don't know why but
>> with feof() things appear fine.
>
> You shouldn't program by trial and error but try understand what's
> going wrong;-) Perhaps using feof() just hides the error but does
> not really fix the real problem. Given the scarcity of information
> you're giving that could very well be the case.
That is correct. I was just postponing it to a later time.
>
>>> The function (or, the snippet from some function) doesn't make
>>> too much sense to me the way its written. How large is 'buf'?
>> I have defined buf to be of size 4096 bytes.
>>> Do you want to read never more than 4096 chars from a file and
>>> thus 'buf' has (at least) 4097 elements? In that case why use
>>> the while() loop? Just
>>>
>>> count = fread( buf, 1, 4096, fp );
>>> buf[ count ] = '\0';
>>>
>>> would do fine.
>
>> The files sizes used in testing range from several kilobytes to several
>> megabytes.
>
> That may be the case but you simply don't show enough of your
> code to make sense, and this reply doesn't help either, sorry.
> So it's impossible to come up with sensible advice. Please show
> the whole function (plus functions used from within it like
> strbuf_append()) - then things might become clearer. At the
> moment it's impossible to even say what you're intentions for
> that function are, so it's like giving the car mechanic a photo
> of your car and asking him why it won't start;-)
>
> At least if files can be much longer than 4096 bytes then all
> tests of how much still fits into 'buf' are missing (does that
> happen in strbuf_append()?) as well as reallocation of 'buf' etc.
> And that's were buffer overflows you seem to be concerned about
> typically happen (or are avoided).
>
> Regards, Jens
I am using eina string buffer utility from EFL. http://enlightenment.org
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/23/2010 10:02:05 PM
|
|
Cross <X@x.tv> wrote:
> On 12/24/2010 03:12 AM, Jens Thoms Toerring wrote:
> > At least if files can be much longer than 4096 bytes then all
> > tests of how much still fits into 'buf' are missing (does that
> > happen in strbuf_append()?) as well as reallocation of 'buf' etc.
> > And that's were buffer overflows you seem to be concerned about
> > typically happen (or are avoided).
> >
> I am using eina string buffer utility from EFL. http://enlightenment.org
That's nice for you. But there was no call to any of them in
the code you posted. And using third-party functions doesn't
make your code inherently safer, they still need to be used
correctly (assuming they're bug-free). But if you're not pre-
pared to tell what you're doing (except some more or less
meaningless snippets) you will need a clairvoyant for help,
my crystal ball doesn't work well enough at the moment. (I
even went as far as checking the web page you gave for your
program but all you get there under "Downloads" is "This pro-
ject currently has no downloads".)
So
<volume setting="max">
POST THE REAL, COMPLETE CODE YOU'RE USING
</volume>
and don't make assumptions about where the problem is - if you
knew you wouldn't have to ask. Best write a short, compilable
program that exhibits the problems you're facing, leaving out
everthing else, and post that.
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
|
|
0
|
|
|
|
Reply
|
jt
|
12/23/2010 10:52:28 PM
|
|
On Thu, 23 Dec 2010 23:20:54 +0530, Cross <X@X.tv> wrote:
>Hello
>
>I am working on an rtf renderer and parser. My code is hosted at
>http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>that getc() is taking a lot of time. Obviously, character read from files is
>slow. So, I decided to read the whole file into memory as a char buffer. Please
>feel free to comment and suggest on the following code. Now, I want to scan the
>char buffer using sscanf(). However, I remember once I heard in a chat room that
>sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
>would like to know how I can use sscanf() safely.
>
>if(!fp){
> fprintf(stderr, "File pointer uninitialized.\n");
> goto close_strbuf;
> }
>
> while((count = fread(cp, 1, 4096, fp))){
> if(feof(fp))break;
There are other reasons besides end of file that will cause fread to
stop prior to reading all 4096 bytes requested.
> strbuf_append(buf, cp);
Since cp needs to be zero terminated (see assignment statement below),
how do you insure that cp[4096] is '\0'?
> }
> cp[count] = '\0';
Are you sure that the file will never contain a '\0'?
> strbuf_append(buf, cp);
> fclose(fp);
>
>--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
--
Remove del for email
|
|
0
|
|
|
|
Reply
|
Barry
|
12/24/2010 12:20:34 AM
|
|
On 12/24/2010 05:50 AM, Barry Schwarz wrote:
> On Thu, 23 Dec 2010 23:20:54 +0530, Cross<X@X.tv> wrote:
>
>> Hello
>>
>> I am working on an rtf renderer and parser. My code is hosted at
>> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>> that getc() is taking a lot of time. Obviously, character read from files is
>> slow. So, I decided to read the whole file into memory as a char buffer. Please
>> feel free to comment and suggest on the following code. Now, I want to scan the
>> char buffer using sscanf(). However, I remember once I heard in a chat room that
>> sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
>> would like to know how I can use sscanf() safely.
>>
>> if(!fp){
>> fprintf(stderr, "File pointer uninitialized.\n");
>> goto close_strbuf;
>> }
>>
>> while((count = fread(cp, 1, 4096, fp))){
>> if(feof(fp))break;
>
> There are other reasons besides end of file that will cause fread to
> stop prior to reading all 4096 bytes requested.
>
>> strbuf_append(buf, cp);
>
> Since cp needs to be zero terminated (see assignment statement below),
> how do you insure that cp[4096] is '\0'?
>
>> }
>> cp[count] = '\0';
>
> Are you sure that the file will never contain a '\0'?
Interesting posibility. I would like to know how to handle that.
>
>> strbuf_append(buf, cp);
>> fclose(fp);
>>
>> --- news://freenews.netfront.net/ - complaints: news@netfront.net ---
>
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/24/2010 5:59:51 PM
|
|
On 12/24/2010 04:22 AM, Jens Thoms Toerring wrote:
> Cross<X@x.tv> wrote:
>> On 12/24/2010 03:12 AM, Jens Thoms Toerring wrote:
>>> At least if files can be much longer than 4096 bytes then all
>>> tests of how much still fits into 'buf' are missing (does that
>>> happen in strbuf_append()?) as well as reallocation of 'buf' etc.
>>> And that's were buffer overflows you seem to be concerned about
>>> typically happen (or are avoided).
>>>
>> I am using eina string buffer utility from EFL. http://enlightenment.org
>
> That's nice for you. But there was no call to any of them in
> the code you posted. And using third-party functions doesn't
> make your code inherently safer, they still need to be used
> correctly (assuming they're bug-free). But if you're not pre-
> pared to tell what you're doing (except some more or less
> meaningless snippets) you will need a clairvoyant for help,
> my crystal ball doesn't work well enough at the moment. (I
> even went as far as checking the web page you gave for your
> program but all you get there under "Downloads" is "This pro-
> ject currently has no downloads".)
>
> So
> <volume setting="max">
> POST THE REAL, COMPLETE CODE YOU'RE USING
> </volume>
>
> and don't make assumptions about where the problem is - if you
> knew you wouldn't have to ask. Best write a short, compilable
> program that exhibits the problems you're facing, leaving out
> everthing else, and post that.
>
> Regards, Jens
http://docs.enlightenment.org/auto/eina/group__Eina__String__Buffer__Group.html
The above link is the documentation and the source code is available at
http://trac.enlightenment.org/e/browser/trunk/eina/src/lib
I also posted the site for my full code http://code.google.com/p/ertf Look in
src/lib/ertf_input.c and src/lib/ertf_rtf_to_markup.c
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/24/2010 6:07:20 PM
|
|
On Fri, 24 Dec 2010 23:29:51 +0530, Cross <X@X.tv> wrote:
>On 12/24/2010 05:50 AM, Barry Schwarz wrote:
>> On Thu, 23 Dec 2010 23:20:54 +0530, Cross<X@X.tv> wrote:
>>
>>> Hello
>>>
>>> I am working on an rtf renderer and parser. My code is hosted at
>>> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>>> that getc() is taking a lot of time. Obviously, character read from files is
>>> slow. So, I decided to read the whole file into memory as a char buffer. Please
>>> feel free to comment and suggest on the following code. Now, I want to scan the
>>> char buffer using sscanf(). However, I remember once I heard in a chat room that
>>> sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
>>> would like to know how I can use sscanf() safely.
>>>
>>> if(!fp){
>>> fprintf(stderr, "File pointer uninitialized.\n");
>>> goto close_strbuf;
>>> }
>>>
>>> while((count = fread(cp, 1, 4096, fp))){
>>> if(feof(fp))break;
>>
>> There are other reasons besides end of file that will cause fread to
>> stop prior to reading all 4096 bytes requested.
>>
>>> strbuf_append(buf, cp);
>>
>> Since cp needs to be zero terminated (see assignment statement below),
>> how do you insure that cp[4096] is '\0'?
>>
>>> }
>>> cp[count] = '\0';
>>
>> Are you sure that the file will never contain a '\0'?
>Interesting posibility. I would like to know how to handle that.
One way would be to eliminate the implicit assumption that the data
read from the file can be treated as a single string. You might need
to pass some additional or slightly modified arguments to
strbuf_append but memcpy could provide the heavy lifting.
>>
>>> strbuf_append(buf, cp);
>>> fclose(fp);
>>>
>>> --- news://freenews.netfront.net/ - complaints: news@netfront.net ---
>>
>
>
>--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
--
Remove del for email
|
|
0
|
|
|
|
Reply
|
Barry
|
12/24/2010 6:59:38 PM
|
|
Cross <X@x.tv> wrote:
> On 12/24/2010 04:22 AM, Jens Thoms Toerring wrote:
> > So
> > <volume setting="max">
> > POST THE REAL, COMPLETE CODE YOU'RE USING
> > </volume>
> >
> > and don't make assumptions about where the problem is - if you
> > knew you wouldn't have to ask. Best write a short, compilable
> > program that exhibits the problems you're facing, leaving out
> > everthing else, and post that.
> http://docs.enlightenment.org/auto/eina/group__Eina__String__Buffer__Group.html
> The above link is the documentation and the source code is available at
> http://trac.enlightenment.org/e/browser/trunk/eina/src/lib
Thanks, I am able to use Google. But in the code you posted
here no 'eina_' function was ever used. So why should I care
about it?
> I also posted the site for my full code http://code.google.com/p/ertf Look in
> src/lib/ertf_input.c and src/lib/ertf_rtf_to_markup.c
As I already told you, if you go to "Downloads" all you get is
"This project currently has no downloads.". So the only way to
get the source you have to do a svn checkout. Perhaps you should
consider that not everyone has svn and is prepared to install it.
But if you do it anyway one finds that the code snippet you pos-
ted here isn't to be found in that source tree at all - e.g. the
fread() function isn't found when you grep over the whole tree.
So it got nothing to do with the code we're supposed to discuss
here and doesn't give a clue what you're after. All one finds in
one of the files you mentioned is a single function with more
than 400 lines, with a single switch case making up more than
350 of that. Good people have been shot for less. And in the
other one one finds e.g.
int
ertf_tag_get(FILE *fp, char *s)
{
int c;
fscanf(fp, "%[^ 0123456789;\\{}\n\r]", s);
if ((c = fgetc(fp)) != ' ')
ungetc(c, fp);
CHECK_EOF(fp, "EOF encountered while obtaining control tag.\n", return 1);
return 0;
}
This function can't be used safely at all. Within the function
you have no idea how much room is left in 's', so it's impos-
sible to use fscanf() without a potential buffer overrun. You
also don't check if fscanf() was successful - but the calling
functions seem, as far as I can see, to blindly assume that it
did succeed. And the check for EOF after the fgetc() is useless
- if you hit EOF you already tried to stuff it back into the
file, which of course is not what you want since EOF is not a
char and what gets "pushed back" is whatever EOF actually is on
the system, converted to an unsigned char...
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
|
|
0
|
|
|
|
Reply
|
jt
|
12/24/2010 11:05:08 PM
|
|
On 23 dec 2010 06:50 em ,Cross <X@X.tv> wrote:
> Hello
>
> I am working on an rtf renderer and parser. My code is hosted at
> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
> that getc() is taking a lot of time. Obviously, character read from files is
> slow. So, I decided to read the whole file into memory as a char buffer. Please
> feel free to comment and suggest on the following code. Now, I want to scan the
> char buffer using sscanf(). However, I remember once I heard in a chat room that
> sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
> would like to know how I can use sscanf() safely.
>
> if(!fp){
> fprintf(stderr, "File pointer uninitialized.\n");
> goto close_strbuf;
> }
>
> while((count = fread(cp, 1, 4096, fp))){
> if(feof(fp))break;
> strbuf_append(buf, cp);
> }
> cp[count] = '\0';
> strbuf_append(buf, cp);
> fclose(fp);
>
> --- news://freenews.netfront.net/ - complaints: news@netfront.net ---
I'm not going into your code but if you want a safer alternative you can look at:
int sscanf_s(const char * restrict string, const char * restrict format, ...);
but in these days when wide characters are getting more and more common I'd suggest
looking into this instead (You'll gain valuable knowledge in the process):
int swscanf_s(const wchar_t * restrict string, const wchar_t * restrict format, ...);
//Michael
--
Posted by Mimo Usenet Browser v0.1.10
http://www.goldenfrog.com/mimo/post
|
|
0
|
|
|
|
Reply
|
Mimo
|
12/27/2010 2:43:02 AM
|
|
On 12/27/10 03:43 PM, Mimo User wrote:
> On 23 dec 2010 06:50 em ,Cross<X@X.tv> wrote:
>> Hello
>>
>> I am working on an rtf renderer and parser. My code is hosted at
>> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>> that getc() is taking a lot of time. Obviously, character read from files is
>> slow. So, I decided to read the whole file into memory as a char buffer. Please
>> feel free to comment and suggest on the following code. Now, I want to scan the
>> char buffer using sscanf(). However, I remember once I heard in a chat room that
>> sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
>> would like to know how I can use sscanf() safely.
>>
>> if(!fp){
>> fprintf(stderr, "File pointer uninitialized.\n");
>> goto close_strbuf;
>> }
>>
>> while((count = fread(cp, 1, 4096, fp))){
>> if(feof(fp))break;
>> strbuf_append(buf, cp);
>> }
>> cp[count] = '\0';
>> strbuf_append(buf, cp);
>> fclose(fp);
>>
>> --- news://freenews.netfront.net/ - complaints: news@netfront.net ---
>
> I'm not going into your code but if you want a safer alternative you can look at:
>
> int sscanf_s(const char * restrict string, const char * restrict format, ...);
>
> but in these days when wide characters are getting more and more common I'd suggest
> looking into this instead (You'll gain valuable knowledge in the process):
>
> int swscanf_s(const wchar_t * restrict string, const wchar_t * restrict format, ...);
If your platform has them...
--
Ian Collins
|
|
0
|
|
|
|
Reply
|
Ian
|
12/27/2010 2:57:51 AM
|
|
On 12/27/2010 08:13 AM, Mimo User wrote:
> On 23 dec 2010 06:50 em ,Cross<X@X.tv> wrote:
>> Hello
>>
>> I am working on an rtf renderer and parser. My code is hosted at
>> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
>> that getc() is taking a lot of time. Obviously, character read from files is
>> slow. So, I decided to read the whole file into memory as a char buffer. Please
>> feel free to comment and suggest on the following code. Now, I want to scan the
>> char buffer using sscanf(). However, I remember once I heard in a chat room that
>> sscanf() has buffer overflow vulnerabilities. I would like pointers on this and
>> would like to know how I can use sscanf() safely.
>>
>> if(!fp){
>> fprintf(stderr, "File pointer uninitialized.\n");
>> goto close_strbuf;
>> }
>>
>> while((count = fread(cp, 1, 4096, fp))){
>> if(feof(fp))break;
>> strbuf_append(buf, cp);
>> }
>> cp[count] = '\0';
>> strbuf_append(buf, cp);
>> fclose(fp);
>>
>> --- news://freenews.netfront.net/ - complaints: news@netfront.net ---
>
> I'm not going into your code but if you want a safer alternative you can look at:
>
> int sscanf_s(const char * restrict string, const char * restrict format, ...);
>
> but in these days when wide characters are getting more and more common I'd suggest
> looking into this instead (You'll gain valuable knowledge in the process):
>
> int swscanf_s(const wchar_t * restrict string, const wchar_t * restrict format, ...);
>
> //Michael
>
>
Well they are not supported on my platform it seems; but it was interesting to
know about them.
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
12/29/2010 6:34:41 PM
|
|
On 12/25/2010 04:35 AM, Jens Thoms Toerring wrote:
> Cross<X@x.tv> wrote:
>> http://docs.enlightenment.org/auto/eina/group__Eina__String__Buffer__Group.html
>
>> The above link is the documentation and the source code is available at
>> http://trac.enlightenment.org/e/browser/trunk/eina/src/lib
>
> Thanks, I am able to use Google. But in the code you posted
> here no 'eina_' function was ever used. So why should I care
> about it?
Well I stripped eina_strbuf_*() functions to strbuf_*() functions so as to only
convey the purpose of the function.
>
>> I also posted the site for my full code http://code.google.com/p/ertf Look in
>> src/lib/ertf_input.c and src/lib/ertf_rtf_to_markup.c
>
> As I already told you, if you go to "Downloads" all you get is
> "This project currently has no downloads.". So the only way to
> get the source you have to do a svn checkout. Perhaps you should
> consider that not everyone has svn and is prepared to install it.
Thank you for pointing it out. I will do it soon. I tried make dist; but it adds
some additional files to the package. Otherwise, I would have added it now. I
will have source tarball downloads hosted soon.
> But if you do it anyway one finds that the code snippet you pos-
> ted here isn't to be found in that source tree at all - e.g. the
> fread() function isn't found when you grep over the whole tree.
Yes, I want to replace fscanf() and fgetc() function calls in my code
appropriately with buffer scanning operations once the file is read into the
buffer as I did in the code snippet I posted.
Actually, I tried kcachegrind on my code and found that as the file size
increases 50 - 80% of the time is spent in i/o (on fgetc() specifically). So, I
came up with the idea of reading the whole file into a char buffer and scanning it.
> All one finds in
> one of the files you mentioned is a single function with more
> than 400 lines, with a single switch case making up more than
> 350 of that. Good people have been shot for less.
Honestly, I would like to know a better way of doing it.
> And in the
> other one one finds e.g.
>
> int
> ertf_tag_get(FILE *fp, char *s)
> {
> int c;
> fscanf(fp, "%[^ 0123456789;\\{}\n\r]", s);
> if ((c = fgetc(fp)) != ' ')
> ungetc(c, fp);
> CHECK_EOF(fp, "EOF encountered while obtaining control tag.\n", return 1);
> return 0;
> }
>
> This function can't be used safely at all. Within the function
> you have no idea how much room is left in 's', so it's impos-
> sible to use fscanf() without a potential buffer overrun. You
> also don't check if fscanf() was successful - but the calling
> functions seem, as far as I can see, to blindly assume that it
> did succeed. And the check for EOF after the fgetc() is useless
> - if you hit EOF you already tried to stuff it back into the
> file, which of course is not what you want since EOF is not a
> char and what gets "pushed back" is whatever EOF actually is on
> the system, converted to an unsigned char...
>
Considering this, s is declared of size 1000 in ertf_font.c. However, now I
think I should reflect this in fscanf() as something similar to "%1000s".
Thank you for the insights
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
1/4/2011 12:16:16 PM
|
|
Cross <X@x.tv> wrote:
> Actually, I tried kcachegrind on my code and found that as the file size
> increases 50 - 80% of the time is spent in i/o (on fgetc() specifically).
> So, I came up with the idea of reading the whole file into a char buffer
> and scanning it.
Something like
Eina_Strbuf *buf = eina_strbuf_new( );
char cp[ 4097 ];
size_t count;
while ( ( count = fread( cp, 1, 4096, fp ) ) != 0 )
{
cp[ count ] = '\0';
eina_strbuf_append( buf, cp );
}
should do the job (assuming I got the part about the eina string
buffer right and it works as I assume). Of course, you should
check the return value of eina_strbuf_append() - it may run out
of memory. And there's no need to check for EOF at all - the
loop will be left once the end of the file has been reached.
Also, you seem to like feof() a lot, but as far as my experience
goes it's hardly needed at all. All functions for reading from a
file return some kind if indication if reading succeeded or not
and if you use that calling feof() in most cases is only needed
if you have to distinguish between the EOF condition from a read
failure. I just grepped through a project of mine with about 250k
LoC and dozends of files being read in and parsed and found not a
single call of feof() (all there is are a few calls of ferror()
which come from automatically generated code).
> > All one finds in
> > one of the files you mentioned is a single function with more
> > than 400 lines, with a single switch case making up more than
> > 350 of that. Good people have been shot for less.
> Honestly, I would like to know a better way of doing it.
Split it up into functions that do simple tasks and call them
instead of putting all the functionality into a single mono-
litic function. Also consider if its possible to use a "dis-
patch table", e.g. if you have code like
if ( ! strncmp( buf, "plain", 5 )
{
/* do something with plain text */
}
else if ( ! strncmp( buf, "par", 3 ) )
{
/* do something with a paragraph */
}
....
else
/* do something if all else failed */
you could instead define a structure like
struct {
const char * keyword;
int ( *handler )( const char * );
} dispatch_table[ ] = { { "plain", handle_plain_text },
{ "f", handle_font },
... };
and use that it as in
for ( int i = 0; i < sizeof dispatch_table / sizeof *dispatch_table, i++ )
if ( ! strncmp( buf, dispatch_table[ i ].keyword,
strlen( dispatch_table[ i ].keyword ) )
{
result = dispatch_table[ i ].handler( buf );
break;
}
if ( i == sizeof dispatch_table / sizeof *dispatch_table )
result = handle_unkown_keyword( buf );
That could get rid of the endless if/else if/else blocks. Of
course you then need a single function for each of the diff-
erent keywords (don't be afraid of short functions, the com-
piler will rather likely inline them), but you should have
something like that anyway to make the code modular and give
the reader at least a fighting chance of understanding what's
going on. With the above definition of the structure they would
have to have a signature of
int handle_plain_text( const char * buf );
and you must adapt the member for the function pointer when
you use a different signature.
When you use such functions you will rather likely have to
assemble the information about the current state of the par-
ser (all those 'boldset', 'italicset' variables etc.) into
a structure that you then pass to the functions (probably as
a pointer to the structure).
Of course, instead of testing for the length of the dispatch
table you could have a sentinel element at the end (where
e.g. the keyword member is a NULL pointer) and bail out from
the loop when this is found - the trick with
sizeof dispatch_table / sizeof *dispatch_table
only works when the array of structures is defined in the
same function where it's used (or as a global array).
> > And in the
> > other one one finds e.g.
> >
> > int
> > ertf_tag_get(FILE *fp, char *s)
> > {
> > int c;
> > fscanf(fp, "%[^ 0123456789;\\{}\n\r]", s);
> > if ((c = fgetc(fp)) != ' ')
> > ungetc(c, fp);
> > CHECK_EOF(fp, "EOF encountered while obtaining control tag.\n", return 1);
> > return 0;
> > }
> >
> > This function can't be used safely at all. Within the function
> > you have no idea how much room is left in 's', so it's impos-
> > sible to use fscanf() without a potential buffer overrun. You
> > also don't check if fscanf() was successful - but the calling
> > functions seem, as far as I can see, to blindly assume that it
> > did succeed. And the check for EOF after the fgetc() is useless
> > - if you hit EOF you already tried to stuff it back into the
> > file, which of course is not what you want since EOF is not a
> > char and what gets "pushed back" is whatever EOF actually is on
> > the system, converted to an unsigned char...
> >
> Considering this, s is declared of size 1000 in ertf_font.c. However, now I
> think I should reflect this in fscanf() as something similar to "%1000s".
Yes. Of course, that assumes that you know for sure that 's' has
1000 elements, which can't be checked from within the function.
And if you do something like that make it "%999s" - you need to
count in the trailing '\0' that is always appended, so there's
only room for 999 characters.
Another thing: what you're doing here seems to be writing a
parser. They are typically ugly to write and if the "syntax"
of the input isn't too convoluted it can be worth the time
to check out tools that automatically generate the "ugly"
part of the code for you. One example would be yacc/flex.
Admittedly, it takes some time to learn how to use them, but
then parsing files becomes a lot easier and much more fun
since you don't have to care anymore about all those pesky
little details;-)
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
|
|
0
|
|
|
|
Reply
|
jt
|
1/9/2011 6:07:44 PM
|
|
On 01/09/2011 11:37 PM, Jens Thoms Toerring wrote:
> Cross<X@x.tv> wrote:
>> Actually, I tried kcachegrind on my code and found that as the file size
>> increases 50 - 80% of the time is spent in i/o (on fgetc() specifically).
>> So, I came up with the idea of reading the whole file into a char buffer
>> and scanning it.
>
> Something like
>
> Eina_Strbuf *buf = eina_strbuf_new( );
> char cp[ 4097 ];
> size_t count;
>
> while ( ( count = fread( cp, 1, 4096, fp ) ) != 0 )
> {
> cp[ count ] = '\0';
> eina_strbuf_append( buf, cp );
> }
>
> should do the job (assuming I got the part about the eina string
> buffer right and it works as I assume). Of course, you should
> check the return value of eina_strbuf_append() - it may run out
> of memory. And there's no need to check for EOF at all - the
> loop will be left once the end of the file has been reached.
>
> Also, you seem to like feof() a lot, but as far as my experience
> goes it's hardly needed at all. All functions for reading from a
> file return some kind if indication if reading succeeded or not
> and if you use that calling feof() in most cases is only needed
> if you have to distinguish between the EOF condition from a read
> failure. I just grepped through a project of mine with about 250k
> LoC and dozends of files being read in and parsed and found not a
> single call of feof() (all there is are a few calls of ferror()
> which come from automatically generated code).
>
I shall reflect these changes in my code soon.
>>> All one finds in
>>> one of the files you mentioned is a single function with more
>>> than 400 lines, with a single switch case making up more than
>>> 350 of that. Good people have been shot for less.
>> Honestly, I would like to know a better way of doing it.
>
> Split it up into functions that do simple tasks and call them
> instead of putting all the functionality into a single mono-
> litic function. Also consider if its possible to use a "dis-
> patch table", e.g. if you have code like
>
> if ( ! strncmp( buf, "plain", 5 )
> {
> /* do something with plain text */
> }
> else if ( ! strncmp( buf, "par", 3 ) )
> {
> /* do something with a paragraph */
> }
> ...
> else
> /* do something if all else failed */
>
> you could instead define a structure like
>
> struct {
> const char * keyword;
> int ( *handler )( const char * );
> } dispatch_table[ ] = { { "plain", handle_plain_text },
> { "f", handle_font },
> ... };
>
> and use that it as in
>
> for ( int i = 0; i< sizeof dispatch_table / sizeof *dispatch_table, i++ )
> if ( ! strncmp( buf, dispatch_table[ i ].keyword,
> strlen( dispatch_table[ i ].keyword ) )
> {
> result = dispatch_table[ i ].handler( buf );
> break;
> }
>
> if ( i == sizeof dispatch_table / sizeof *dispatch_table )
> result = handle_unkown_keyword( buf );
>
> That could get rid of the endless if/else if/else blocks. Of
> course you then need a single function for each of the diff-
> erent keywords (don't be afraid of short functions, the com-
> piler will rather likely inline them), but you should have
> something like that anyway to make the code modular and give
> the reader at least a fighting chance of understanding what's
> going on. With the above definition of the structure they would
> have to have a signature of
>
> int handle_plain_text( const char * buf );
>
> and you must adapt the member for the function pointer when
> you use a different signature.
>
> When you use such functions you will rather likely have to
> assemble the information about the current state of the par-
> ser (all those 'boldset', 'italicset' variables etc.) into
> a structure that you then pass to the functions (probably as
> a pointer to the structure).
>
> Of course, instead of testing for the length of the dispatch
> table you could have a sentinel element at the end (where
> e.g. the keyword member is a NULL pointer) and bail out from
> the loop when this is found - the trick with
>
> sizeof dispatch_table / sizeof *dispatch_table
>
> only works when the array of structures is defined in the
> same function where it's used (or as a global array).
>
>>> And in the
>>> other one one finds e.g.
>>>
>>> int
>>> ertf_tag_get(FILE *fp, char *s)
>>> {
>>> int c;
>>> fscanf(fp, "%[^ 0123456789;\\{}\n\r]", s);
>>> if ((c = fgetc(fp)) != ' ')
>>> ungetc(c, fp);
>>> CHECK_EOF(fp, "EOF encountered while obtaining control tag.\n", return 1);
>>> return 0;
>>> }
>>>
>>> This function can't be used safely at all. Within the function
>>> you have no idea how much room is left in 's', so it's impos-
>>> sible to use fscanf() without a potential buffer overrun. You
>>> also don't check if fscanf() was successful - but the calling
>>> functions seem, as far as I can see, to blindly assume that it
>>> did succeed. And the check for EOF after the fgetc() is useless
>>> - if you hit EOF you already tried to stuff it back into the
>>> file, which of course is not what you want since EOF is not a
>>> char and what gets "pushed back" is whatever EOF actually is on
>>> the system, converted to an unsigned char...
>>>
>
>> Considering this, s is declared of size 1000 in ertf_font.c. However, now I
>> think I should reflect this in fscanf() as something similar to "%1000s".
>
> Yes. Of course, that assumes that you know for sure that 's' has
> 1000 elements, which can't be checked from within the function.
> And if you do something like that make it "%999s" - you need to
> count in the trailing '\0' that is always appended, so there's
> only room for 999 characters.
Well "%999s" shall be able to get strings whose maximum length is 999, isn't it?
>
> Another thing: what you're doing here seems to be writing a
> parser. They are typically ugly to write and if the "syntax"
> of the input isn't too convoluted it can be worth the time
> to check out tools that automatically generate the "ugly"
> part of the code for you. One example would be yacc/flex.
> Admittedly, it takes some time to learn how to use them, but
> then parsing files becomes a lot easier and much more fun
> since you don't have to care anymore about all those pesky
> little details;-)
> Regards, Jens
Yes, I did suggest flex and bison but I was encouraged to design my own parser
to avoid the flex dependency.
Regards
Cross
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
|
|
0
|
|
|
|
Reply
|
Cross
|
1/10/2011 10:12:41 PM
|
|
Cross <X@x.tv> wrote:
> >> Considering this, s is declared of size 1000 in ertf_font.c. However, now I
> >> think I should reflect this in fscanf() as something similar to "%1000s".
> >
> > Yes. Of course, that assumes that you know for sure that 's' has
> > 1000 elements, which can't be checked from within the function.
> > And if you do something like that make it "%999s" - you need to
> > count in the trailing '\0' that is always appended, so there's
> > only room for 999 characters.
>
> Well "%999s" shall be able to get strings whose maximum length is 999,
> isn't it?
The "%999s" bit tells *scanf() to read in up to 999 chars -
the necessary trailing '\0' (that makes it a string) always
will be added automatically. So if you have a char array
with 1000 elements (and thus a sizeof 1000) the maximum
number of chars you can allow *scanf() to read in without
risking a buffer overrun is 999.
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
|
|
0
|
|
|
|
Reply
|
jt
|
1/10/2011 11:11:13 PM
|
|
|
19 Replies
318 Views
(page loaded in 0.248 seconds)
Similiar Articles: C++: Automatically generate method documentation header from ...Automatically generate method documentation header from ... sscanf() safety - comp.lang.c Header: Report as Spam ... The above link is the documentation and the source ... Converting number to std::string ("itoa()" ) - comp.lang.c++ ...|> If atoi can be explained in terms of sscanf, then itoa can be |> explained in terms of sprintf. With the same level of safety, |> similar tradeoffs, etc. Neil F-S on TV - comp.sys.acorn.miscJet skis, and how the inshore areas are divided between uses for safety. ... soft-sys.matlab... year month day hour minute second]; field.value1(count)=sscanf(in1 ... scanf(), fscanf(), fwscanf(), sscanf(), swscanf(), vfscanf ...The sscanf() and vsscanf() functions read from the specified string ... MULTITHREAD SAFETY LEVEL. scanf(), fscanf(), sscanf(): MT-Safe, with exceptions. c - Is sscanf thread-safe on iPhone OS 3.1.2? - Stack OverflowNot the answer you're looking for? Browse other questions tagged iphone c thread-safety sscanf or ask your own question. 7/28/2012 5:41:55 AM
|