parallel vs. serial disk access

Hi NG,

last week I received some very useful answers to my question about the 
parallel flows when sending a file over a network, so think it's a good idea 
to ask you again this time... (Just wanted to say thanks again :)
Background is still that I'm creating a solution for network file transfers. 
Of course, it's been done a thousand times before - but that doesn't mean it 
can't be done any better, does it?

The problem is: When accessing a system with a single client, the whole 
system is "dedicated" to that client. Disk i/o, read-ahead and network 
transfer work perfectly together. But as the number of clients increases, 
the system seems to get to its knees a little too fast, in my opinion.
To give an example: The disk of my server can read a single uncached file at 
about 50MB/s. Reading two uncached files (multithreaded) runs at 8MB/s for 
each, and accessing three files simultaneously runs at 4MB/s per thread - so 
we have an aggregated speed of only 12MB/s for three clients. (I tried 
reading with different chunk-sizes, but that didn't change the result)
I guess, the explanation for this heavy performance loss is the i/o 
scheduler, which tries to achieve a good relation between transfer rates and 
response times and so it ends up in continously jumping over the disk to 
read chunks of every file, adding a lot of seek-time to the i/o operations.

My suggestion now would be, simply not to read multithreaded. I mean, what 
is it good for when it obviously tends to slows things down extremely. I 
think, in situations where files are being read sequentially, it would be 
the best solution to read the files in chunks of one or two MB (tradeoff 
between ram usage, continuous disk access and responsiveness) and switch to 
the next file after each chunk has been read. This should reduce seeks times 
But on the other hand, this would mean, that a client, which tries to get a 
file with a size of only 1KB, has to wait until the other clients each got 
2MB. This would be unfair (at least from a user's point of view ;). So my 
idea here was to use multiples queues for the read requests - assignment 
based on the size of a read request. The server application then would 
switch rotationally between the queues and when there are several big 
requests in the one queue and a small request in the small-request-queue, 
the small request would be handled soon and wouldn't have to wait for the 
big requests to finish. This wouldn't be completely fair, but I think it 
would be a really good tradeoff.

So, let's finally come to my question, which is rather simple after all 
these pre-thoughts: It all looks to easy. Did I forget something? I mean, 
would there really be no advantage in parallel i/o except that the i/o 
scheduler would care about the tradeoff and I would save the little time to 
implement the queued reading?

Thanks in advance

7/12/2005 9:47:05 AM
comp.linux.development.system 5436 articles. 0 followers. zixenus (12) is leader. Post Follow

0 Replies

Similar Articles

[PageSpeed] 22