bundling efficiency, too good to be true?

  • Follow


Everyone has seen that sending one big file works much more
efficiently than many small files.  The effect quite astounding, many
orders of magnitude. It just occurred to me that I don't think I can
account for that huge difference.  Where is all the time going?

It then occurred to me, that any sort of technique to reduce the
difference could have a huge effect on the Internet as a whole.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com

If you tell a computer the same fact in more than one place, unless you have an automated mechanism to ensure           they stay in sync, the versions of the fact will eventually get out of sync.
0
Reply Roedy 3/31/2010 9:23:04 PM

On 31-03-2010 17:23, Roedy Green wrote:
> Everyone has seen that sending one big file works much more
> efficiently than many small files.  The effect quite astounding, many
> orders of magnitude. It just occurred to me that I don't think I can
> account for that huge difference.  Where is all the time going?

Given the lack of context, then one can only guess:
- file open and file creation are rather expensive operations
   so many small files have huge overhead
- per file protocol overhead
- really small files can not be compressed as efficiently
   as larger files

Arne
0
Reply ISO 3/31/2010 10:38:15 PM


On Wed, 31 Mar 2010, Roedy Green wrote:

> Everyone has seen that sending one big file works much more efficiently 
> than many small files.  The effect quite astounding, many orders of 
> magnitude. It just occurred to me that I don't think I can account for 
> that huge difference.  Where is all the time going?

TCP handshake, TCP slow start (look that one up if you don't know it), 
roundtrips for control packets at the start of the connection. Losing a 
bit of time can have a huge impact on throughput - it's all about the 
bandwidth-delay product, which on today's long, fat networks is huge.

> It then occurred to me, that any sort of technique to reduce the 
> difference could have a huge effect on the Internet as a whole.

Yes. It's called pipelining, and it's been in HTTP since 1999.

Although it's not that widely used by browsers, because of worries about 
compatibility with servers, which seems a bit of a waste.

tom

-- 
I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth. -- Umberto Eco
0
Reply Tom 3/31/2010 10:51:27 PM

Tom Anderson wrote:
> On Wed, 31 Mar 2010, Roedy Green wrote:
> 
>> Everyone has seen that sending one big file works much more 
>> efficiently than many small files.  The effect quite astounding, many 
>> orders of magnitude. It just occurred to me that I don't think I can 
>> account for that huge difference.  Where is all the time going?
> 
> TCP handshake, TCP slow start (look that one up if you don't know it), 
> roundtrips for control packets at the start of the connection. Losing a 
> bit of time can have a huge impact on throughput - it's all about the 
> bandwidth-delay product, which on today's long, fat networks is huge.

"back in the day" some serial protocols had asynchronous
ACK and resend on packets.

    BugBear
0
Reply bugbear 4/1/2010 7:46:34 AM

In article <alpine.DEB.1.10.1003312344430.13579@urchin.earth.li>,
 Tom Anderson <twic@urchin.earth.li> wrote:

> On Wed, 31 Mar 2010, Roedy Green wrote:
> 
> > Everyone has seen that sending one big file works much more efficiently 
> > than many small files.  The effect quite astounding, many orders of 
> > magnitude. It just occurred to me that I don't think I can account for 
> > that huge difference.  Where is all the time going?
> 
> TCP handshake, TCP slow start (look that one up if you don't know it), 
> roundtrips for control packets at the start of the connection. Losing a 
> bit of time can have a huge impact on throughput - it's all about the 
> bandwidth-delay product, which on today's long, fat networks is huge.
> 
> > It then occurred to me, that any sort of technique to reduce the 
> > difference could have a huge effect on the Internet as a whole.
> 
> Yes. It's called pipelining, and it's been in HTTP since 1999.
> 
> Although it's not that widely used by browsers, because of worries about 
> compatibility with servers, which seems a bit of a waste.
> 
> tom

Browsers don't support pipelining because the multiplexer/demultiplexer 
is too complicated for the average software engineer.  Out-of-order 
response processing requires forcing preceding responses in the pipeline 
into memory.  That's tricky, but not too bad.  Now do that and rebuild 
the pipeline when the connection closes or drops.  Ugly!

At least in the old Innovation HTTPClient, that results in multiple lock 
grabs on components of linked list that's prone to failure.  The code is 
convoluted and it's looking like a total rewrite might be easier.

Last I heard, Microsoft and Apache clients can't pipeline; WebKit can 
but it's an experimental feature.
-- 
I won't see Google Groups replies because I must filter them as spam
0
Reply Kevin 4/1/2010 7:47:54 AM

Kevin McMurtrie wrote:
> In article <alpine.DEB.1.10.1003312344430.13579@urchin.earth.li>,
>  Tom Anderson <twic@urchin.earth.li> wrote:
> 
>> On Wed, 31 Mar 2010, Roedy Green wrote:
>>
>>> Everyone has seen that sending one big file works much more efficiently 
>>> than many small files.  The effect quite astounding, many orders of 
>>> magnitude. It just occurred to me that I don't think I can account for 
>>> that huge difference.  Where is all the time going?
>> TCP handshake, TCP slow start (look that one up if you don't know it), 
>> roundtrips for control packets at the start of the connection. Losing a 
>> bit of time can have a huge impact on throughput - it's all about the 
>> bandwidth-delay product, which on today's long, fat networks is huge.
>>
>>> It then occurred to me, that any sort of technique to reduce the 
>>> difference could have a huge effect on the Internet as a whole.
>> Yes. It's called pipelining, and it's been in HTTP since 1999.
>>
>> Although it's not that widely used by browsers, because of worries about 
>> compatibility with servers, which seems a bit of a waste.
>>
>> tom
> 
> Browsers don't support pipelining because the multiplexer/demultiplexer 
> is too complicated for the average software engineer.
[ SNIP ]
AHS

Simple solution - don't use average software engineers to work on this 
code. There are how many significant web browsers again? Surely it's not 
so tough to ensure that the 2 or 3 people per important browser, maybe a 
dozen people for the planet overall, are above-average.

Average software engineers (programmers in countries where you can't 
legally call a programmer an engineer, or in the US where it's usually a 
misnomer) can't reliably code chunked transfer encoding either, yet 
significant web browsers have that feature.

If 10% of all programmers are competent to work on problems like this 
(that might be generous, it might be 3 or 5 percent) then you've still 
got hundreds of thousands of people that can do it.

AHS
0
Reply Arved 4/1/2010 9:24:31 AM

On Thu, 1 Apr 2010, Kevin McMurtrie wrote:

> In article <alpine.DEB.1.10.1003312344430.13579@urchin.earth.li>,
> Tom Anderson <twic@urchin.earth.li> wrote:
>
>> On Wed, 31 Mar 2010, Roedy Green wrote:
>>
>>> Everyone has seen that sending one big file works much more efficiently
>>> than many small files.  The effect quite astounding, many orders of
>>> magnitude. It just occurred to me that I don't think I can account for
>>> that huge difference.  Where is all the time going?
>>
>> TCP handshake, TCP slow start (look that one up if you don't know it),
>> roundtrips for control packets at the start of the connection. Losing a
>> bit of time can have a huge impact on throughput - it's all about the
>> bandwidth-delay product, which on today's long, fat networks is huge.
>>
>>> It then occurred to me, that any sort of technique to reduce the
>>> difference could have a huge effect on the Internet as a whole.
>>
>> Yes. It's called pipelining, and it's been in HTTP since 1999.
>>
>> Although it's not that widely used by browsers, because of worries about
>> compatibility with servers, which seems a bit of a waste.
>
> Browsers don't support pipelining because the multiplexer/demultiplexer
> is too complicated for the average software engineer.

Opera - uses pipelining
Firefox - supports pipelining, not turned on by default
Konqueror - supports pipelining, not turned on by default
IE - does not support pipelining
Safari - does not support pipelining

I guess Opera, Mozilla and KDE all have above-average engineers.

> Last I heard, Microsoft and Apache clients can't pipeline;

No surprise about MS. What's the Apache client? I'm not aware of a browser 
made by Apache - do you mean some non-browser client? HttpClient, perhaps?

tom

-- 
The question of whether computers can think is just like the question
of whether submarines can swim. -- Edsger W. Dijkstra
0
Reply Tom 4/1/2010 3:02:05 PM

6 Replies
159 Views

(page loaded in 0.724 seconds)


Reply: