bundling efficiency, too good to be true?

Everyone has seen that sending one big file works much more
efficiently than many small files.  The effect quite astounding, many
orders of magnitude. It just occurred to me that I don't think I can
account for that huge difference.  Where is all the time going?

It then occurred to me, that any sort of technique to reduce the
difference could have a huge effect on the Internet as a whole.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com

If you tell a computer the same fact in more than one place, unless you have an automated mechanism to ensure           they stay in sync, the versions of the fact will eventually get out of sync.
0
Roedy
3/31/2010 9:23:04 PM
comp.lang.java.programmer 52272 articles. 40 followers. Post Follow

6 Replies
257 Views

Similar Articles

[PageSpeed] 0
On 31-03-2010 17:23, Roedy Green wrote:
> Everyone has seen that sending one big file works much more
> efficiently than many small files.  The effect quite astounding, many
> orders of magnitude. It just occurred to me that I don't think I can
> account for that huge difference.  Where is all the time going?

Given the lack of context, then one can only guess:
- file open and file creation are rather expensive operations
   so many small files have huge overhead
- per file protocol overhead
- really small files can not be compressed as efficiently
   as larger files

Arne
0
ISO
3/31/2010 10:38:15 PM
On Wed, 31 Mar 2010, Roedy Green wrote:

> Everyone has seen that sending one big file works much more efficiently 
> than many small files.  The effect quite astounding, many orders of 
> magnitude. It just occurred to me that I don't think I can account for 
> that huge difference.  Where is all the time going?

TCP handshake, TCP slow start (look that one up if you don't know it), 
roundtrips for control packets at the start of the connection. Losing a 
bit of time can have a huge impact on throughput - it's all about the 
bandwidth-delay product, which on today's long, fat networks is huge.

> It then occurred to me, that any sort of technique to reduce the 
> difference could have a huge effect on the Internet as a whole.

Yes. It's called pipelining, and it's been in HTTP since 1999.

Although it's not that widely used by browsers, because of worries about 
compatibility with servers, which seems a bit of a waste.

tom

-- 
I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth. -- Umberto Eco
0
Tom
3/31/2010 10:51:27 PM
Tom Anderson wrote:
> On Wed, 31 Mar 2010, Roedy Green wrote:
> 
>> Everyone has seen that sending one big file works much more 
>> efficiently than many small files.  The effect quite astounding, many 
>> orders of magnitude. It just occurred to me that I don't think I can 
>> account for that huge difference.  Where is all the time going?
> 
> TCP handshake, TCP slow start (look that one up if you don't know it), 
> roundtrips for control packets at the start of the connection. Losing a 
> bit of time can have a huge impact on throughput - it's all about the 
> bandwidth-delay product, which on today's long, fat networks is huge.

"back in the day" some serial protocols had asynchronous
ACK and resend on packets.

    BugBear
0
bugbear
4/1/2010 7:46:34 AM
In article <alpine.DEB.1.10.1003312344430.13579@urchin.earth.li>,
 Tom Anderson <twic@urchin.earth.li> wrote:

> On Wed, 31 Mar 2010, Roedy Green wrote:
> 
> > Everyone has seen that sending one big file works much more efficiently 
> > than many small files.  The effect quite astounding, many orders of 
> > magnitude. It just occurred to me that I don't think I can account for 
> > that huge difference.  Where is all the time going?
> 
> TCP handshake, TCP slow start (look that one up if you don't know it), 
> roundtrips for control packets at the start of the connection. Losing a 
> bit of time can have a huge impact on throughput - it's all about the 
> bandwidth-delay product, which on today's long, fat networks is huge.
> 
> > It then occurred to me, that any sort of technique to reduce the 
> > difference could have a huge effect on the Internet as a whole.
> 
> Yes. It's called pipelining, and it's been in HTTP since 1999.
> 
> Although it's not that widely used by browsers, because of worries about 
> compatibility with servers, which seems a bit of a waste.
> 
> tom

Browsers don't support pipelining because the multiplexer/demultiplexer 
is too complicated for the average software engineer.  Out-of-order 
response processing requires forcing preceding responses in the pipeline 
into memory.  That's tricky, but not too bad.  Now do that and rebuild 
the pipeline when the connection closes or drops.  Ugly!

At least in the old Innovation HTTPClient, that results in multiple lock 
grabs on components of linked list that's prone to failure.  The code is 
convoluted and it's looking like a total rewrite might be easier.

Last I heard, Microsoft and Apache clients can't pipeline; WebKit can 
but it's an experimental feature.
-- 
I won't see Google Groups replies because I must filter them as spam
0
Kevin
4/1/2010 7:47:54 AM
Kevin McMurtrie wrote:
> In article <alpine.DEB.1.10.1003312344430.13579@urchin.earth.li>,
>  Tom Anderson <twic@urchin.earth.li> wrote:
> 
>> On Wed, 31 Mar 2010, Roedy Green wrote:
>>
>>> Everyone has seen that sending one big file works much more efficiently 
>>> than many small files.  The effect quite astounding, many orders of 
>>> magnitude. It just occurred to me that I don't think I can account for 
>>> that huge difference.  Where is all the time going?
>> TCP handshake, TCP slow start (look that one up if you don't know it), 
>> roundtrips for control packets at the start of the connection. Losing a 
>> bit of time can have a huge impact on throughput - it's all about the 
>> bandwidth-delay product, which on today's long, fat networks is huge.
>>
>>> It then occurred to me, that any sort of technique to reduce the 
>>> difference could have a huge effect on the Internet as a whole.
>> Yes. It's called pipelining, and it's been in HTTP since 1999.
>>
>> Although it's not that widely used by browsers, because of worries about 
>> compatibility with servers, which seems a bit of a waste.
>>
>> tom
> 
> Browsers don't support pipelining because the multiplexer/demultiplexer 
> is too complicated for the average software engineer.
[ SNIP ]
AHS

Simple solution - don't use average software engineers to work on this 
code. There are how many significant web browsers again? Surely it's not 
so tough to ensure that the 2 or 3 people per important browser, maybe a 
dozen people for the planet overall, are above-average.

Average software engineers (programmers in countries where you can't 
legally call a programmer an engineer, or in the US where it's usually a 
misnomer) can't reliably code chunked transfer encoding either, yet 
significant web browsers have that feature.

If 10% of all programmers are competent to work on problems like this 
(that might be generous, it might be 3 or 5 percent) then you've still 
got hundreds of thousands of people that can do it.

AHS
0
Arved
4/1/2010 9:24:31 AM
On Thu, 1 Apr 2010, Kevin McMurtrie wrote:

> In article <alpine.DEB.1.10.1003312344430.13579@urchin.earth.li>,
> Tom Anderson <twic@urchin.earth.li> wrote:
>
>> On Wed, 31 Mar 2010, Roedy Green wrote:
>>
>>> Everyone has seen that sending one big file works much more efficiently
>>> than many small files.  The effect quite astounding, many orders of
>>> magnitude. It just occurred to me that I don't think I can account for
>>> that huge difference.  Where is all the time going?
>>
>> TCP handshake, TCP slow start (look that one up if you don't know it),
>> roundtrips for control packets at the start of the connection. Losing a
>> bit of time can have a huge impact on throughput - it's all about the
>> bandwidth-delay product, which on today's long, fat networks is huge.
>>
>>> It then occurred to me, that any sort of technique to reduce the
>>> difference could have a huge effect on the Internet as a whole.
>>
>> Yes. It's called pipelining, and it's been in HTTP since 1999.
>>
>> Although it's not that widely used by browsers, because of worries about
>> compatibility with servers, which seems a bit of a waste.
>
> Browsers don't support pipelining because the multiplexer/demultiplexer
> is too complicated for the average software engineer.

Opera - uses pipelining
Firefox - supports pipelining, not turned on by default
Konqueror - supports pipelining, not turned on by default
IE - does not support pipelining
Safari - does not support pipelining

I guess Opera, Mozilla and KDE all have above-average engineers.

> Last I heard, Microsoft and Apache clients can't pipeline;

No surprise about MS. What's the Apache client? I'm not aware of a browser 
made by Apache - do you mean some non-browser client? HttpClient, perhaps?

tom

-- 
The question of whether computers can think is just like the question
of whether submarines can swim. -- Edsger W. Dijkstra
0
Tom
4/1/2010 3:02:05 PM
Reply:
Similar Artilces:

How good is MacBook?
I am considering buying a laptop. I come from a windows background. I have never used mac computers before. I heard that I can install Windows and Linux in MacBook. Is that true? I need a laptop for graphics and web design (Photoshop, dreamweaver, flash) and programming (especially PHP). Will Macbook be able to support windows versions of all the above? Am I better off buying a macbook or a PC? What is so special about Macbook? white lightning <crescent_au@yahoo.com> wrote: > I am considering buying a laptop. I come from a windows background. I > have never used ...

Is Spartan 6 good for this project?
I'm new in FPGA field, so i have a problem.... I need a low cost FPGA able to drive a sensor with 5 LVDS DDR data sensor->FPGA 1 LVDS Clock at 300MHz sensor->FPGA 1 LVDS Clock at 300Mhz FPGA->sensor SPI FPGA->sensor in sensor evaluation board, a Virtex5 XC5VLX50T-1FFG1136C is used. I need a low cost fpga, so i thought about Spartan6 family (for example XC6SLX150T-3FGG676C). What do you think? Is it possible to obtain a 300MHz clock output from a differential IO? Any suggestion? Thanks in advance "Indie Tinde" <indietinde@gmail.com> wrote in me...

Efficient Floating Point Compression?!
Hello, I am search for some informations about Floating-Point Compression. Want pack and unpack huge masses of Floating Point Numbers. Have anybody an Idea how to do this effectiv? - tools? - algorithms? - benchmarks? Thank you very much! Best Regards, Frank Hi, > I am search for some informations about Floating-Point Compression. > Want pack and unpack huge masses of Floating Point Numbers. > Have anybody an Idea how to do this effectiv? > - tools? > - algorithms? > - benchmarks? Probability distributions? Any idea how your data looks like? About the simplest t...

Sonnet Harmoni
I am thinking of upgrading an old iMac rev b with this card. What has everyone's experience been with this? In article <943671d7.0401161038.2ba1349f@posting.google.com>, thauber@printgrc.com (Tom H) wrote: > I am thinking of upgrading an old iMac rev b with this card. What has > everyone's experience been with this? There are probably a dozen or so reports posted at http://xlr8yourmac.com Pigeon ...

Efficiently splitting strings -- substring class?
Hello Group! Is there any standard practice for efficiently splitting strings into tokens (and then further processing those tokens)? Specifically, I have a function that takes a std::string, splits it into white-space-separated (for example) tokens that it then returns in a std::vector<std::string>. This obviously entails some unnecessary copying and vector manipulations, but is convenient, and was quick for prototyping. A subset of these tokens are processed further, e.g., converted to numbers, checked for exact textual matches, looked up as textual keys in associati...

RE: [Maybe Spam] Re: Informix WGE with IBM DB2 Gold Bundle
True, I was told by IBM very recently, that it is built into DB2 - version 8.2 if I remember correctly (modeled after Informix like you mentioned) -----Original Message----- From: owner-informix-list@iiug.org [mailto:owner-informix-list@iiug.org] On Behalf Of Data Goob Sent: Tuesday, June 29, 2004 2:16 AM To: informix-list@iiug.org Subject: [Maybe Spam] Re: Informix WGE with IBM DB2 Gold Bundle It's my understanding that replication is an up-and-coming feature for DB2 in Stinger, modeled after Informix HADR: ftp://ftp.software.ibm.com/software/data/db2/stinger/stingerw...

Paul Allen Is The Good One...Right?
Paul Allen...hippie, Grateful Dead fanatic, billionaire. He's the "good one" right? The real computer geek, earthy type of guy who just wanted to make great software and some pizza and beer money, played against the manic, type A, take no prisoners approach of Gates and Ballmer. Well, suddenly, as Allen buys up real estate and businesses non-stop, it seems like it was Paul Allen's masterminding Karl Rove that drove Gates' George Bush act to success. Case in point, Vulcan Enterprises will now "sell back" land it bought from the city of Seattle at a much hig...

Good News Mr 142.179.22.210 flood microsoft.public.windowsxp.general #3
Cox and Telus will not act unless they have a police file number so you may as well flood microsoft.public.windowsxp.general my home Usenet group. have a nice day.Me I am nearing about TrueCrype -- Feyd, by means of minimums favourable and net, voices past it, servicing somehow. -- Michael Yardley ...

Beta4 Tag'd and Bundled ...
Check her over and let me know if there are any problems ... will do a full general announce tomorrow for it ... ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend Hi, Many of the regression tests are failing on my OSX 10.2.6 machine. I have put the regression.diffs file here http://bugs.sghms.ac.uk/downloads/regression.diffs Has this been seen before? Thanks adam > Check her over and let me know if there are any problems ... will do a > full general announce tomorrow for it ... > > -...

Quality pack 11.0 patch bundle on a 735/99 and SCSI
I have an old 735/99 running on HP-UX 11.0. I know it's officially not supported, but with an old enough 11.0 installation and (supposedly) staying away from SCSI patches it runs quite fine. Now I want to apply the latest 11.0 Quality Pack patch bundle to keep it all a bit up to data again. But which patches of the bundle do I have to skip in order not to screw up. Is it only the 'SCSI IO subsystem cumulative patch', or are there more that I should avoid? Trial and error is no option for me as I do remote administration. The machine is physically far away from me. Thanks for your...

How to improve the efficiency of the neural network when the training data set is few
Hello All, I am using Neural Network tool box for function approximation problems. I am ussing the newff and newrb nets. The problem with me is I have few training data sets (only 12). Can any one suggest a method to improve the genaralizing ability of this network. Thank you in advance "Robin Francis" <franci9@uwindsor.ca> wrote in message <eed15c5.-1@webx.raydaftYaTP>... > Hello All, > > I am using Neural Network tool box for function approximation > problems. I am ussing the newff and newrb nets. The problem with me > is I have few train...

GOOD NEWS!!! OSXI NOT NEEDED!!!
Now most Macs are running a new, modern, up to date, latest technology OS - WINDOWS. On 7/17/11 7:21 AM, Redjak wrote: > Now most Macs are running a new, modern, up to date, latest technology > OS, WINDOWS. What is your point? ...

A good Google Ad this time!
I usually ignore these (they're mostly irrelevant or else crap -- or both, such as when one of them is for C64-related stuff), but tonight one of them said "Spectaculator" -- so I followed the link, and it was indeed Jonathan Needle's Spectaculator (of which I own the last freeware version) being advertised! Good one... ...

wxMac: Loading a wx-based bundle from a wx app: how to pass the wx globals over?
Hi all, Two years ago I posted about how I needed to load a wx-based bundle from a regular app. I got that to work fine, but now I have a different problem. I have a wx-based app and I need to load a wx-based bundle. The problem is that I don't know how to pass the global wx variables created by the app over to the bundle. The reason for this setup is this: there's this massive app that needs to load pieces from a bundle to get something done. Then, we're building a smaller standalone app which also uses the same bundle. Both main apps are wx-based and the bundle is ...

Efficient insertion in a std::multimap
Hi, Hope this doesn't get lost beneath all the spam. I have the following container which I wish to create and store objects of type MyObject in- class Phase : public std::multimap<double, MyObject> { public: Phase(); }; Phase::Phase() { MyObject myObject; std::pair<double,Note> pair(0.0,myObject); insert (pair); } A lot of copy constructors are being called for MyObject which I'd like to avoid. But first, I'm not understanding what is happening for the line: insert (pair);. Here, the copy constructor is called twice and the default destructor once which sugges...

Re: Efficiently querying Oracle tables #5 628997
I moved the WHERE option to the SET statement, and it takes less than 2 minutes now to get the data. Thanks a lot to everybody who answered!! Andreea ...

Only really good reason for OOP
The more I think about this fine essay: Linkname: P.S.: The Best of Intentions URL: http://www.nhplace.com/kent/PS/EQUAL.html :the more I appreciate it in the global context and the more I get ideas that develop from it. A few days ago I realized why OOP is a good idea: Because it lets define brand-new data types, not overloading of existing data types, which are wrappers around existing data types, providing intention. For example, suppose we have a program that plans a robotic probe to Mars, and some parts of the program measure distances in kilometers but other parts measure in m...

Pyglade, gtk, howto write more efficiently, and waste less code?
Hi everybody, i` m new in this group, and python so.., my greetings to everybody here. I wrote a little program, in wich i do this: (I have several "Select File" buttons, one for each file (in this case, an image) wich has an text_entry, so i can add a comment for each picture.) def on_button1_clicked(self,widget): """Button 1 pressed""" image_file1 =3D FileSelection(self) self.path1.set_text(image_file1) self.image1.set_from_file(image_file1) self.image1.show() def on_button2_clicked(self,widget): """Button 2 ...

Amiga 500 Bundle...
Hi! I have a question... A long time ago I purchased an Amiga 500. Bundled software included "Indiana Jones and the something or other", Kindwords, DPaint or Fusion Paint (don't remember which painting program was included), FA/18 Fighter/Interceptor, and a platform game where you walked down a city block and shot bad guys. I thought the name of this final game was "Crime Wave", but now have doubts that this was that game's name. I think I remember that you could pick up weapons as you advanced, but other than this info, my memory is a bit vague. Can s...

Eudora bundle suite
Can eudora fect/sync webpages for offline reading like Avantgo? I would like to use this suite since it has also got email, so a all in 1 package would be nice ...

Use hexadecimal literals for good style.
Remember, you are programming a computer, not inputing bowling scores or basketball rosters. If you find yourself writing a bowling program and decimal fits nicely, go ahead and use it. But otherwise, use that "0x". And remember toString(0x10). Remember, decimal is just the first and third primes multiplied. It's also the summation from 1 to 4. We don't even use decimal for the time system accepted around the world. When everything is standardized on hexadecimal (including six new characters), decimal literals will just serve to automatically send your code to the garb...

New Canon Bundle
+++ NEWS RELEASE +++ Good Afternoon! Effective immediately G-7 is supporting a variety of Canon printers with VersaInk "Magnetic Precision" print cartridges. In addition G-7 is introducing: Canon i-350 MX Value bundle including VersaInk and a fully packaged version of VersaCheck 2003 Home & Business for only $99.99 ! More details on this special bundle, VersaCheck, magnetic toner & MICR ink and blank check paper at: http://www.g7ps.com Please do not hesitate to call 800-303-2620 for any questions you may have. Thank you very much. Yours truly, Jeremy Brighton Customer...

What are good graphic adventures for Acorn?
What are good graphic adventures for Acorn RISC OS computer? In old time, I just can remember SIMON THE SORCERER. I have also Dungeon here, but it doesn't work anymore under RISC OS 4. Is there something still available and where can I find more informations about it in the net? I don't like the Internet very much; if I'm looking for something I'll find all but not this I was looking for! And the informations about Acorn here in Bavaria are very rare today (like zero)... Alex' -- Chiemgauer Volksstamm der NATIVE BAVARIANS On 9-Jul-2006, noreply@chiemgau-net.de (Alexan...

good practices? recursive object-oriented stuff
I found this line at the end of a daq toolbox demo program (daqsdcope): set(data.ai, 'UserData' ,data). -where "data" is a structure, containing an analog input object ai. This seems bizarrely recursive to me. Will this blow up? Is this bad practice, or am I just ignorant of the wonders of object-oriented programming? ...

Exception in loadable bundle
Hi, I am having a problem during execution of a plugin (loadable Mach-O bundle). As soon as the code accesses its static data (e.g. string constant) it faults with Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x093e3f40 My link command is $(CC) -bundle -o $@ $(OBJ_OSX) -exported_symbols_list exports.exp \ -framework Carbon -framework System Is there anything special I need to do for a loadable bundle with global/static data? --Toby toby wrote: > Hi, > > I am having a problem during execution of a plugin (loadable M...