Default buffers on pipes

  • Follow


I'm a bit confused about this... hoping somebody can help.

So I have one program writing to stdout and another reading on stdin 
(using fwrite() and fread()); I run them from the command line piping 
the output of one to the input of the other. Experimentation reveals 
that if the first program writes faster than the second reads, then 
there is a fairly small buffer which, after it fills up, the first 
program blocks on writing until the second has a chance to catch up. 
Likewise, if the reader is faster, it blocks while the pipe is empty. 
That's all logical enough. Question: How big is this buffer by default? 
Can I change it?

Also (this is the part I'm really interested in), what happens if the 
first program writes a little bit at a time, but the second one reads a 
large chunk at a time, and it tries to read chunks significantly larger 
than the size of that buffer (say, 2-3 times larger)? Will the first 
program block when it fills the buffer, and then the second program will 
block forever, because the amount of data it wants is never available? 
Conversely, what if the first program writes in large chunks... can it 
write in chunks bigger than the buffer size?

Thanks,
Josh

PS: I hope I don't offend anyone by not posting a real email address... 
I get enough spam as it is. What's the etiquette on that? Most people 
seem to include their email... is it considered impolite not to?
0
Reply jh 2/4/2006 6:53:36 AM

jh <no@thanks.com> writes:

> Question: How big is this buffer by default? 

$ grep PIPE_BUF /usr/include/*/*.h
/usr/include/bits/posix1_lim.h:#define  _POSIX_PIPE_BUF         512
/usr/include/linux/limits.h:#define PIPE_BUF        4096        /* # bytes in atomic write to a pipe */

> Can I change it?

No.

> Also (this is the part I'm really interested in), what happens if the 
> first program writes a little bit at a time, but the second one reads a 
> large chunk at a time, and it tries to read chunks significantly larger 
> than the size of that buffer (say, 2-3 times larger)? 

Reads from pipe can be "short" (return less data than the program
requested).

> PS: I hope I don't offend anyone by not posting a real email address... 
> I get enough spam as it is. What's the etiquette on that? Most people 
> seem to include their email... is it considered impolite not to?

Most people nowadays obfuscate their e-mail (as I have done), and
provide instructions that are easy for humans but hard for e-mail
bots to decode.

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
0
Reply Paul 2/4/2006 8:05:57 AM


"jh" <no@thanks.com> wrote in message
news:no-7CBD72.01533604022006@wonka.hampshire.edu...
> So I have one program writing to stdout and another reading on stdin
> (using fwrite() and fread()); I run them from the command line piping
> the output of one to the input of the other.
[snip]
> Question: How big is this buffer by default? Can I change it?

Which buffer? There are three: the output stdio buffer in the process with
the write end of the pipe, the pipe buffer itself, and the input stdio
buffer in the process with the read end of the pipe.

The stdio buffer sizes are implementation defined, and can be changed with
setvbuf(). The pipe buffer size is PIPE_BUF, which is at least 512 bytes.

> Also (this is the part I'm really interested in), what happens if the
> first program writes a little bit at a time, but the second one reads a
> large chunk at a time, and it tries to read chunks significantly larger
> than the size of that buffer (say, 2-3 times larger)? Will the first
> program block when it fills the buffer, and then the second program will
> block forever, because the amount of data it wants is never available?

fread() and fwrite() do not return until they have, respectively, read or
written the number of bytes implied by their arguments, unless there is an
error or EOF is reached (the latter for fread() only).

fread() and fwrite() internally call read() and write() respectively.
Ignoring signals, write() behaves like fwrite(). But read() has different
semantics: it returns as soon as data is available.

If you write a little bit at a time, fwrite() will probably copy the data in
each call to the stdio buffer, calling write() to drain that buffer when it
is full.

At the other end of the pipe, fread() will loop calling read(), which will
block if no data is available at the time of the call, until enough has been
read (assuming EOF is not reached).

> Conversely, what if the first program writes in large chunks... can it
> write in chunks bigger than the buffer size?

Yes; fwrite() will block in write() if necessary.

Alex


0
Reply Alex 2/4/2006 11:30:31 AM

On 2006-02-04, jh <no@thanks.com> wrote:
> I'm a bit confused about this... hoping somebody can help.
>
> So I have one program writing to stdout and another reading on stdin 
> (using fwrite() and fread()); I run them from the command line piping 
> the output of one to the input of the other. Experimentation reveals 
> that if the first program writes faster than the second reads, then 
> there is a fairly small buffer which, after it fills up, the first 
> program blocks on writing until the second has a chance to catch up. 
> Likewise, if the reader is faster, it blocks while the pipe is empty. 
> That's all logical enough. Question: How big is this buffer by default? 
> Can I change it?
>
> Also (this is the part I'm really interested in), what happens if the 
> first program writes a little bit at a time, but the second one reads a 
> large chunk at a time, and it tries to read chunks significantly larger 
> than the size of that buffer (say, 2-3 times larger)? Will the first 
> program block when it fills the buffer, and then the second program will 
> block forever, because the amount of data it wants is never available? 

No, it will read what it can. It may or may not then block on a second
attempt to read, but by then the buffer is empty and the writer can
write more.

> Conversely, what if the first program writes in large chunks... can it 
> write in chunks bigger than the buffer size?

No. The write will (i believe) succeed in writing what it can. It may
then block on its next attempt to write, until the buffer is drained

> PS: I hope I don't offend anyone by not posting a real email address... 
> I get enough spam as it is. What's the etiquette on that? Most people 
> seem to include their email... is it considered impolite not to?

You should really use *.invalid for that purpose - and while it's
technically against the rules, no-one will really care unless you're
a troll using it to hide your identity.
0
Reply Jordan 2/4/2006 5:50:09 PM

On 04/02/2006, Alex Fraser wrote:

> Which buffer? There are three: the output stdio buffer in the process
> with the write end of the pipe, the pipe buffer itself, and the input
> stdio buffer in the process with the read end of the pipe.

I've seen some apps (eg the ALSA aplay utility) which use read() and
write() instead of fread() and fwrite(), to talk to stdin and stdout,
getting the file descriptor from fileno(stdin) or fileno(stdout).

I can see the rationale for this: it gives more fine-grained control
over the I/O, and presumably the file descriptors can be made non
blocking.

Does anyone think this is a particularly good/bad idea?

-- 
Simon Elliott    http://www.ctsn.co.uk
0
Reply Simon 2/4/2006 6:11:02 PM

"Simon Elliott" <Simon at ctsn.co.uk> wrote in message
news:43e4ee36$0$1170$bed64819@news.gradwell.net...
> I've seen some apps (eg the ALSA aplay utility) which use read() and
> write() instead of fread() and fwrite(), to talk to stdin and stdout,
> getting the file descriptor from fileno(stdin) or fileno(stdout).
>
> I can see the rationale for this: it gives more fine-grained control
> over the I/O, and presumably the file descriptors can be made non
> blocking.
>
> Does anyone think this is a particularly good/bad idea?

IMO: if you use POSIX I/O functions because the standard C I/O ("stdio")
functions can't do the job, then it is obviously a good idea. Otherwise it
is a bad idea.

There are definitely cases where the stdio functions can't do the job. If
you want to multiplex I/O using select() or poll(), including stdin/out/err,
then the underlying descriptors must be non-blocking for robustness, and the
stdio functions require blocking descriptors. If you are handling signals,
the interaction with stdio functions is unspecified, whereas interaction
with POSIX functions is.

You might want to use SIGALRM or select()/poll() to implement I/O with
timeouts. This rules out stdio.

Alex


0
Reply Alex 2/5/2006 2:13:27 PM

"Jordan Abel" <random832@gmail.com> wrote in message
news:slrndu9qgl.lc9.random832@random.yi.org...
> On 2006-02-04, jh <no@thanks.com> wrote:
> > So I have one program writing to stdout and another reading on stdin
> > (using fwrite() and fread()); I run them from the command line piping
> > the output of one to the input of the other.
[snip]
> > Also (this is the part I'm really interested in), what happens if the
> > first program writes a little bit at a time, but the second one reads a
> > large chunk at a time, and it tries to read chunks significantly larger
> > than the size of that buffer (say, 2-3 times larger)? Will the first
> > program block when it fills the buffer, and then the second program
> > will block forever, because the amount of data it wants is never
> > available?
>
> No, it will read what it can.

This is basically true for read(), but not fread().

> It may or may not then block on a second attempt to read, but by then the
> buffer is empty and the writer can write more.

For read() on a blocking descriptor, the "may or may not" is determined by
whether or not there is (more) data available. That is, ignoring EOF and
signals, read() blocks if no bytes are available, else it returns whatever
data it can (normally the lesser of the number of bytes available and the
specified size, but theoretically anything from one byte up to that amount).

> > Conversely, what if the first program writes in large chunks... can it
> > write in chunks bigger than the buffer size?
>
> No. The write will (i believe) succeed in writing what it can.

On a blocking descriptor, write() will write as many bytes as requested
(blocking if necessary), unless there is an error or a signal causes it to
return early.

Alex


0
Reply Alex 2/5/2006 2:13:32 PM

Simon> I've seen some apps (eg the ALSA aplay utility) which use read()
Simon> and write() instead of fread() and fwrite(), to talk to stdin and
Simon> stdout, getting the file descriptor from fileno(stdin) or
Simon> fileno(stdout).

Isn't fileno(stdin) --- resp. fileno(stdout) --- just a fancy way to
write 0, resp. 1?

Simon> I can see the rationale for this: it gives more fine-grained
Simon> control over the I/O, and presumably the file descriptors can be
Simon> made non blocking.  Does anyone think this is a particularly
Simon> good/bad idea?

Not using read or write alone, but _mixing_ read and write with
fread, fwrite and the rest of bufferd I/O.

-- 
A true pessimist won't be discouraged by a little success.
0
Reply Ian 2/5/2006 4:11:22 PM

On 2006-02-05, Alex Fraser <me@privacy.net> wrote:
> "Jordan Abel" <random832@gmail.com> wrote in message
> news:slrndu9qgl.lc9.random832@random.yi.org...
>> On 2006-02-04, jh <no@thanks.com> wrote:
>> > So I have one program writing to stdout and another reading on stdin
>> > (using fwrite() and fread()); I run them from the command line piping
>> > the output of one to the input of the other.
> [snip]
>> > Also (this is the part I'm really interested in), what happens if the
>> > first program writes a little bit at a time, but the second one reads a
>> > large chunk at a time, and it tries to read chunks significantly larger
>> > than the size of that buffer (say, 2-3 times larger)? Will the first
>> > program block when it fills the buffer, and then the second program
>> > will block forever, because the amount of data it wants is never
>> > available?
>>
>> No, it will read what it can.
>
> This is basically true for read(), but not fread().

There is no such thing as fread(). I was talking in terms of the actual
system calls inevitably made by the program, since for these purposes it
doesn't matter what language they're actually in. In the case of
fread(), the "second attempt to read" is made in a loop within fread.

>> It may or may not then block on a second attempt to read, but by then the
>> buffer is empty and the writer can write more.
>
> For read() on a blocking descriptor, the "may or may not" is determined by
> whether or not there is (more) data available.

I was referring to after it drains a full buffer. [therefore, there's no
data left until the writer puts in more]

>> > Conversely, what if the first program writes in large chunks... can it
>> > write in chunks bigger than the buffer size?
>>
>> No. The write will (i believe) succeed in writing what it can.
>
> On a blocking descriptor, write() will write as many bytes as requested
> (blocking if necessary), unless there is an error or a signal causes it to
> return early.

Are you sure? It can't time out?

Now, if write is blocking, at least the ones that fit in the buffer will
then be available to the reader, right?
0
Reply Jordan 2/5/2006 8:21:43 PM

"Jordan Abel" <random832@gmail.com> wrote in message
news:slrnducnp0.qjm.random832@random.yi.org...
> On 2006-02-05, Alex Fraser <me@privacy.net> wrote:
> > "Jordan Abel" <random832@gmail.com> wrote in message
> > news:slrndu9qgl.lc9.random832@random.yi.org...
> >> On 2006-02-04, jh <no@thanks.com> wrote:
> >> > So I have one program writing to stdout and another reading on stdin
> >> > (using fwrite() and fread()); I run them from the command line
> >> > piping the output of one to the input of the other.
> > [snip]
> >> > [if the first program writes a little bit at a time, but the second
> >> > one reads large chunks, then] the second program will block forever,
> >> > because the amount of data it wants is never available?
> >>
> >> No, it will read what it can.
> >
> > This is basically true for read(), but not fread().
>
> There is no such thing as fread().

Are you sure?

> I was talking in terms of the actual system calls inevitably made by the
> program,

But given that the OP only mentioned fread() and fwrite(), you didn't think
that fact was worth mentioning? (This was really my point.)

[snip]
> > On a blocking descriptor, write() will write as many bytes as requested
> > (blocking if necessary), unless there is an error or a signal causes it
> > to return early.
>
> Are you sure? It can't time out?

Not usually, but a timeout would constitute an error.

Alex


0
Reply Alex 2/5/2006 10:54:11 PM

Ian Zimmerman <nobrowser@gmail.com> wrote, on Sun, 05 Feb 2006:

> Isn't fileno(stdin) --- resp. fileno(stdout) --- just a fancy way to
> write 0, resp. 1?

They start out with those values, but there are ways they can get
changed.  E.g.:

	close(0);
	freopen(somefile, "w", stdout);

Now fileno(stdout) is 0.

-- 
Geoff Clare <netnews@gclare.org.uk>

0
Reply Geoff 2/7/2006 6:12:15 PM

10 Replies
150 Views

(page loaded in 0.096 seconds)

Similiar Articles:











7/12/2012 10:30:32 PM


Reply: