|
|
rep performance
I want to know how effective rep works. For example, if "movsb" takes 5
cycles, does "rep movsb" take (ecx * 5) cycles? If this is the case, and I
want to move 1G memory(assume no PF), it'll take 5G cycles which is 2
seconds on a 2.5GHz cpu. Is it correct?
Thanks,
|
|
0
|
|
|
|
Reply
|
Gary
|
8/3/2005 6:44:39 PM |
|
Gary,
The days of having a simple formula for timings are over. You schedule
instructions by ordering them so they go through multiple pipelines
without stalls. Later Intel hardware has special case circuitry for REP
used with MOVS and STOS but they are very slow without it.
Moving data is faster by using larger data size instructions as long as
you ALIGN the data correctly. If you are working on gigabytes of data,
look at either MMX or SSE(2) instructions to get additional speed
gains.
Regards,
hutch at movsd dot com
|
|
0
|
|
|
|
Reply
|
hutch
|
8/4/2005 12:12:24 AM
|
|
Gary wrote:
> I want to know how effective rep works. For example, if "movsb" takes 5
> cycles, does "rep movsb" take (ecx * 5) cycles? If this is the case, and I
> want to move 1G memory(assume no PF), it'll take 5G cycles which is 2
> seconds on a 2.5GHz cpu. Is it correct?
>
> Thanks,
MOVSB is somewhat optimized. Though you should still try to use movsd
when moving more than 4 bytes of data around (with a movsb following to
copying any leftover bytes).
And for *really* large data structures (I've heard >512 bytes) you
should consider creating your own block move code using XMM (128 bit)
registers.
Check this out:
http://www.masmforum.com/simple/index.php?topic=1637.0
Cheers,
Randy Hyde
|
|
0
|
|
|
|
Reply
|
randyhyde
|
8/4/2005 12:12:28 AM
|
|
Gary wrote:
>
>I want to know how effective rep works. For example, if "movsb" takes 5
>cycles, does "rep movsb" take (ecx * 5) cycles? If this is the case, and I
>want to move 1G memory(assume no PF), it'll take 5G cycles which is 2
>seconds on a 2.5GHz cpu. Is it correct?
On a Pentium, "rep movs" is (ecx + 7) cycles. After a 7 cycle startup, you
get one move per cycle. That means that "rep movsd" moves 4 times as much
data as "rep movsb".
However, when you are moving very large blocks, memory and cache latency
become the limiting factor. The CPU instructions do not matter.
--
- Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
0
|
|
|
|
Reply
|
Tim
|
8/5/2005 6:54:16 AM
|
|
|
3 Replies
146 Views
(page loaded in 0.045 seconds)
Similiar Articles: Slow string search/fast binary search - comp.lang.asm.x86 ...Is this processor particluarly bad with std - rep scasd - or does it "like" my binsearch loop particularly well - or does the bad scasd performance relate to the std ... Memory problem - comp.lang.asm.x86Even though > using the REP prefix may seem attractive due to its small > code size, a loop may yield better performance due to its > minimal overhead, compared to ... NonStop SOAP server performance tuning - comp.sys.tandem ...But you won't find a single slide on performance benchmarking for the NonStop SOAP ... as setting a parameter in the nssoap.config file etc as implied by the sales rep ... How to generate random number without replacement? - comp.lang ...Certainly a bit vector has better O(n) performance but the requirement is very far from hitting any O(n) bounds. Also as Peter pointed out memory usage is really high. Getting directory sizes on win32 - comp.lang.perl.misc... filename"; } #re-open file so it writes as we process open REP ... More memory will mean some better > performance, etc. > I think the slow portion is the ... Disk benchmark (sol10) - comp.unix.solaris... so my first suggestion is to try and create some reproducible representative tests ... unix.solaris Disk benchmark (sol10) - comp.unix.solaris Script to give performance ... AMD vs Intel timing on this code... - comp.lang.asm.x86If it's used a lot, the code size may play a role in the performance as well. ... PII onwards there is special case circuitry in Intel hardware when you use REP in ... CALIC source code/binaries - comp.compression... grey level image compression by adaptive weighted least squares," tech. rep ... IMPROVING CALIC COMPRESSION PERFORMANCE ON BINARY IMAGES separated grey levels, e.g ... mencpy 128 bytes - comp.lang.asm.x86If the blocks are not aligned, you will probably get best performance from a simple MOVSD . ... rep movs instruction - comp.lang.asm.x86 mencpy 128 bytes - comp.lang.asm.x86 rep ... Conrad DCF receiver - comp.protocols.time.ntp1 16 371 0.000 -1.506 2.031 The criterium by which I define `better performance' is the jitter figure. This data is not representative, as the computer it's ... racer's edge performancemotocross engine rebuilds, modifications and rebuilds, suspension revalving and rebuilds west springfield, ma 413-737-0355 How to Improve Sales Rep Performance: Replace Managing with CoachingConsider using these metrics to improve sales coaching and rep development. 7/16/2012 7:40:25 PM
|
|
|
|
|
|
|
|
|