rep performance

  • Follow


I want to know how effective rep works. For example, if "movsb" takes 5 
cycles, does "rep movsb" take (ecx * 5) cycles? If this is the case, and I 
want to move 1G memory(assume no PF), it'll take 5G cycles which is 2 
seconds on a 2.5GHz cpu. Is it correct?

Thanks, 

0
Reply Gary 8/3/2005 6:44:39 PM

Gary,

The days of having a simple formula for timings are over. You schedule
instructions by ordering them so they go through multiple pipelines
without stalls. Later Intel hardware has special case circuitry for REP
used with MOVS and STOS but they are very slow without it.

Moving data is faster by using larger data size instructions as long as
you ALIGN the data correctly. If you are working on gigabytes of data,
look at either MMX or SSE(2) instructions to get additional speed
gains.

Regards,

hutch at movsd dot com

0
Reply hutch 8/4/2005 12:12:24 AM


Gary wrote:
> I want to know how effective rep works. For example, if "movsb" takes 5
> cycles, does "rep movsb" take (ecx * 5) cycles? If this is the case, and I
> want to move 1G memory(assume no PF), it'll take 5G cycles which is 2
> seconds on a 2.5GHz cpu. Is it correct?
>
> Thanks,

MOVSB is somewhat optimized. Though you should still try to use movsd
when moving more than 4 bytes of data around (with a movsb following to
copying any leftover bytes).

And for *really* large data structures (I've heard >512 bytes) you
should consider creating your own block move code using XMM (128 bit)
registers.

Check this out:
http://www.masmforum.com/simple/index.php?topic=1637.0
Cheers,
Randy Hyde

0
Reply randyhyde 8/4/2005 12:12:28 AM

Gary wrote:
>
>I want to know how effective rep works. For example, if "movsb" takes 5 
>cycles, does "rep movsb" take (ecx * 5) cycles? If this is the case, and I 
>want to move 1G memory(assume no PF), it'll take 5G cycles which is 2 
>seconds on a 2.5GHz cpu. Is it correct?

On a Pentium, "rep movs" is (ecx + 7) cycles.  After a 7 cycle startup, you
get one move per cycle.  That means that "rep movsd" moves 4 times as much
data as "rep movsb".

However, when you are moving very large blocks, memory and cache latency
become the limiting factor.  The CPU instructions do not matter.
-- 
- Tim Roberts, timr@probo.com
  Providenza & Boekelheide, Inc.

0
Reply Tim 8/5/2005 6:54:16 AM

3 Replies
146 Views

(page loaded in 0.045 seconds)

Similiar Articles:













7/16/2012 7:40:25 PM


Reply: