Why is it necessary to use ALIGN 4 for BYTE and WORD? If I have to use
ALIGN 4, I will have to zero high word of AX and high byte of AH. It leaves
low byte of AL for data. It will have to zero high word of AX. It leaves
low word of AX for data.
If I use double word such as DD, I don't need ALIGN. It is ALIGN use
for data? Is it to prevent data from cache miss?
I use MASM 6.13, but I have not patched to 6.14 yet.
Matt Taylor states that if I want to use AX or AL for data, I must
always put XOR EAX, EAX to avoid register stall. If I want to reuse EAX
register, should I still put XOR EAX, EAX again?
For example:
xor eax, eax
xor ecx, ecx
mov ax, 0ffh
mov cx, 01h
add al, cl
adc ah, ch
mov word ptr [variable],ax
xor eax, eax
......
......
......
......
Is it best practice to replace from mov ax, 0ffh to mov eax, 0ffh and
from mov cx, 0ffh to mov ecx, 01h before I don't need to use xor eax, eax?
If I choose to use partial register, it is always practice to use xor eax,
eax. Is it bad idea to use xor ax, ax if I don't use high word of AX, but
use low word of AX for data?
Please let me know what you think if ALIGN is best that will avoid
partial register.
It looks like below.
mov eax, dword ptr [var1]
mov ecx, dword ptr [var2]
add al, cl
adc ah, ch
mov dword ptr [variable, eax
It is shortest assembly list that it does not need to use xor eax, eax
because it is already overcoming partial register by filling zero in high
word of AX that ALIGN 4 did for them.
Can you please explain what does sub eax, eax or sbb eax, eax mean? It
does not make sense if I ask to define mov al, al. It is like test al, al
before setz cl. CL will show 1 because test al, al is supposed to be zero.
The problem is that I must use CMP instead of TEST because it has to deal
with >, <, >=, and <=, but I can only use TEST to test register's 0 bit to 7
bit. Please mention what you think.
Do you think that latest version of MASM will be patched over 6.14 by
adding new instruction support and 64 bit integer register support for
IA-32? I would still want to use 64 bit general register instead of 32 bit
general register so I don't have to pair two registers (such as EAX:EDX)
into one register, but IA-32 has a support of 64 bit data bus already.
--
Bryan Parkoff
|
|
0
|
|
|
|
Reply
|
Bryan
|
11/19/2003 8:13:12 PM |
|
"Bryan Parkoff" <nospam@nospam.com> wrote in message
news:s9Qub.10989$Ek.9579@twister.austin.rr.com...
> Why is it necessary to use ALIGN 4 for BYTE and WORD? If I have to
use
> ALIGN 4, I will have to zero high word of AX and high byte of AH. It
leaves
> low byte of AL for data. It will have to zero high word of AX. It leaves
> low word of AX for data.
It's not necessary to align 4 for byte or word. Where did you get this idea?
Even if you align bytes and words, you will still have to clear the full
register to avoid partial-register stalls.
> If I use double word such as DD, I don't need ALIGN. It is ALIGN use
> for data? Is it to prevent data from cache miss?
> I use MASM 6.13, but I have not patched to 6.14 yet.
The align directive is generally used to align code to cache lines. I have
not heard of it being used to align variables, but if it were, it would be
to prevent misaligned memory access. Usually variables are placed at
addresses aligned for their size by the linker.
> Matt Taylor states that if I want to use AX or AL for data, I must
> always put XOR EAX, EAX to avoid register stall. If I want to reuse EAX
> register, should I still put XOR EAX, EAX again?
It depends on what you do:
xor eax, eax
mov al, [bytevar]
add eax, ecx ; no partial-register stall
inc ax
sub ecx, eax ; partial-register stall!
The xor eax, eax marks the eax register as clear. Since the upper 3-bytes
are clear, the add eax, ecx does not stall. The add eax, ecx marks the eax
register as not clear, so the inc ax causes the eax access in the last
instruction to stall.
> For example:
>
> xor eax, eax
> xor ecx, ecx
> mov ax, 0ffh
> mov cx, 01h
> add al, cl
> adc ah, ch
Why don't you simply add ax, cx?
> mov word ptr [variable],ax
This will stall.
> xor eax, eax
>
> Is it best practice to replace from mov ax, 0ffh to mov eax, 0ffh and
> from mov cx, 0ffh to mov ecx, 01h before I don't need to use xor eax, eax?
> If I choose to use partial register, it is always practice to use xor eax,
> eax. Is it bad idea to use xor ax, ax if I don't use high word of AX, but
> use low word of AX for data?
Mostly correct. If you never use the high half of eax, then xor ax, ax is
fine for avoiding partial stalls. The P6-core processor can stall in certain
cases if you use 16-bit registers in 32-bit mode. It is generally best to
avoid mixing 16-bit and 32-bit code.
> Please let me know what you think if ALIGN is best that will avoid
> partial register.
> It looks like below.
> mov eax, dword ptr [var1]
> mov ecx, dword ptr [var2]
> add al, cl
> adc ah, ch
No stall here.
> mov dword ptr [variable, eax
It stalls here.
> It is shortest assembly list that it does not need to use xor eax, eax
> because it is already overcoming partial register by filling zero in high
> word of AX that ALIGN 4 did for them.
The above code is more efficiently rewritten as:
mov ax, [var1]
mov cx, [var2]
add ax, cx
mov [var3], ax
which is shorter and faster.
> Can you please explain what does sub eax, eax or sbb eax, eax mean?
It
> does not make sense if I ask to define mov al, al. It is like test al, al
> before setz cl. CL will show 1 because test al, al is supposed to be
zero.
> The problem is that I must use CMP instead of TEST because it has to deal
> with >, <, >=, and <=, but I can only use TEST to test register's 0 bit to
7
> bit. Please mention what you think.
sbb eax, eax will set eax to either 0 or -1. If CF=0, then eax = eax - eax -
0 = 0. If CF=1, then eax = eax - eax - 1 = -1.
The test instruction works perfectly fine on 32-bit instructions, and it
does not always set ZF:
test eax, eax
setz cl ; cl = (eax == 0 ? 1 : 0)
For greater-than/less-than comparisons, the same logic applies:
cmp eax, edx
setl cl ; cl = (eax < edx ? 1 : 0)
cmp eax, edx
sete cl ; cl = (eax == edx ? 1 : 0)
setz bl ; bl = cl = (eax == edx ? 1 : 0)
cmp eax, edx
setge cl ; cl = (eax >= edx ? 1 : 0)
> Do you think that latest version of MASM will be patched over 6.14 by
> adding new instruction support and 64 bit integer register support for
> IA-32? I would still want to use 64 bit general register instead of 32
bit
> general register so I don't have to pair two registers (such as EAX:EDX)
> into one register, but IA-32 has a support of 64 bit data bus already.
IA-32 doesn't have 64-bit registers. AMD-64 (AKA x86-64) does, and the ml64
program which is floating around supports it. I believe it claims to be MASM
8, but I could be confusing it with Visual C++ 8 which also supports AMD-64.
Don't expect your IA-32 code to run on AMD-64. It won't. AMD-64 uses a
different calling convention. I don't know all the details, so all I can
tell you is to open http://www.google.com/ and start searching.
-Matt
|
|
0
|
|
|
|
Reply
|
Matt
|
11/20/2003 3:59:59 AM
|
|
Matt,
> It's not necessary to align 4 for byte or word. Where did you get this
idea?
>
I got file*.asm from C/C++ compiler. It is how ALIGN appears each
variables.
> The xor eax, eax marks the eax register as clear. Since the upper 3-bytes
> are clear, the add eax, ecx does not stall. The add eax, ecx marks the eax
> register as not clear, so the inc ax causes the eax access in the last
> instruction to stall.
>
> > For example:
> >
> > xor eax, eax
> > xor ecx, ecx
> > mov ax, 0ffh
> > mov cx, 01h
> > add al, cl
> > adc ah, ch
>
> Why don't you simply add ax, cx?
I choose to use 8 bit instead of 16 bit because I want carry flag to be
modified. It is why I use add al, cl before carry flag is turned on. adc
gets carry flag to be added to high byte. It may be one extra cycle if I
use add and adc together.
>
> > mov word ptr [variable],ax
>
> This will stall.
Should it replace to mov dword ptr [variable],eax? I know that you will say
no. Look at another example deep below.
> Mostly correct. If you never use the high half of eax, then xor ax, ax is
> fine for avoiding partial stalls. The P6-core processor can stall in
certain
> cases if you use 16-bit registers in 32-bit mode. It is generally best to
> avoid mixing 16-bit and 32-bit code.
>
> > Please let me know what you think if ALIGN is best that will avoid
> > partial register.
> > It looks like below.
> > mov eax, dword ptr [var1]
> > mov ecx, dword ptr [var2]
> > add al, cl
> > adc ah, ch
>
> No stall here.
If it is the case that I don't use xor eax, eax, two high bytes are filled
zero. Should 000000ffH be written to EAX first before AL and AH can be
written back to the memory address. It will be no stall. You said that if
EAX is written back to memory address that will be stall because 32 bit and
8 bit are not supposed to be mixed.
>
> > mov dword ptr [variable, eax
>
> It stalls here.
if it replaces from mov dword ptr [variable],eax to mov word ptr
[variable],ax then it will be no stall? Or..should be mov byte ptr
[high_variable],ah and mov byte ptr [low_variable],al?
>
> The above code is more efficiently rewritten as:
>
> mov ax, [var1]
> mov cx, [var2]
> add ax, cx
> mov [var3], ax
>
> which is shorter and faster.
Your point is correct because there is no 8 bit and 16 bit mixed.
I agree, but I still have to add one extra cycle by adding ADC in order
to use 8 bit carry flag. You said that 16 bit and 8 bit cannot be mixed.
Try another example if it is better.
xor ah, ah
xor ch, ch
mov al, byte ptr [var1]
mov cl, byte ptr [var2]
add al, cl
adc ah, ch
mov word ptr [var3], ax
or
mov ax, word ptr [var1]
mov cx, word ptr [var2]
add al, cl
adc ah, ch
mov word ptr [var3], ax -- will stall
remove mov ptr [var3], ax and replace
mov byte ptr [high_var3], ah
mov byte ptr [low_var3], al -- no stall
It will stall because 16 bit and 8 bit are mixed. If I change from mov word
ptr [var3], ax to mov byte ptr [high_var3], ah and mov byte ptr [low_var3],
al. It will avoid stall.
Bryan Parkoff
|
|
0
|
|
|
|
Reply
|
Bryan
|
11/20/2003 5:32:31 PM
|
|
"Bryan Parkoff" <nospam@nospam.com> wrote in message
news:PU6vb.8463$Vs1.7199@twister.austin.rr.com...
<snip>
> > > For example:
> > >
> > > xor eax, eax
> > > xor ecx, ecx
> > > mov ax, 0ffh
> > > mov cx, 01h
> > > add al, cl
> > > adc ah, ch
> >
> > Why don't you simply add ax, cx?
> I choose to use 8 bit instead of 16 bit because I want carry flag to
be
> modified. It is why I use add al, cl before carry flag is turned on. adc
> gets carry flag to be added to high byte. It may be one extra cycle if I
> use add and adc together.
Both the add and adc will change the carry flag. The flags after the adc are
set *exactly* as they would be if you did add ax, cx.
This will not only be an extra cycle, it will probably cause stalls on
P6-core and Pentium 4 processors.
> > > mov word ptr [variable],ax
> >
> > This will stall.
>
> Should it replace to mov dword ptr [variable],eax? I know that you will
say
> no. Look at another example deep below.
You're right that I will say no. The problem is that you modified ah and al
independently. Storing eax has the same problem that storing ax has. Part of
the register was modified without the rest being clear, so the processor
will stall.
> > Mostly correct. If you never use the high half of eax, then xor ax, ax
is
> > fine for avoiding partial stalls. The P6-core processor can stall in
> certain
> > cases if you use 16-bit registers in 32-bit mode. It is generally best
to
> > avoid mixing 16-bit and 32-bit code.
> >
> > > Please let me know what you think if ALIGN is best that will avoid
> > > partial register.
> > > It looks like below.
> > > mov eax, dword ptr [var1]
> > > mov ecx, dword ptr [var2]
> > > add al, cl
> > > adc ah, ch
> >
> > No stall here.
> If it is the case that I don't use xor eax, eax, two high bytes are filled
> zero. Should 000000ffH be written to EAX first before AL and AH can be
> written back to the memory address. It will be no stall. You said that
if
> EAX is written back to memory address that will be stall because 32 bit
and
> 8 bit are not supposed to be mixed.
No, 8-bit and 32-bit can be mixed if the register is clear. The only way to
do this is to xor eax, eax or sub eax, eax. Once you modify the register,
the parts you modify are no longer considered clear.
> > > mov dword ptr [variable, eax
> >
> > It stalls here.
> if it replaces from mov dword ptr [variable],eax to mov word ptr
> [variable],ax then it will be no stall? Or..should be mov byte ptr
> [high_variable],ah and mov byte ptr [low_variable],al?
Yes, if you write the byte registers out separately then there will be no
stall. If you write the ax register and use add ax, cx instead of an add/adc
pair, there will also be no stall.
> > The above code is more efficiently rewritten as:
> >
> > mov ax, [var1]
> > mov cx, [var2]
> > add ax, cx
> > mov [var3], ax
> >
> > which is shorter and faster.
> Your point is correct because there is no 8 bit and 16 bit mixed.
> I agree, but I still have to add one extra cycle by adding ADC in
order
> to use 8 bit carry flag. You said that 16 bit and 8 bit cannot be mixed.
As I said above, the flags, ax, & cx registers will be identical whether you
execute an add/adc pair on partial registers or a single add on the ax and
cx registers.
You've been using word operands, but I think you meant to add bytes, right?
If so, then you should do something like this:
xor eax, eax ; prevents possible stalls after this code executes
mov al, [var1]
add al, [var2]
mov [var3], al
sbb al, al ; al = (CF == 0 ? 0 : -1)
This will not stall. If you want to get 0 or 1, use setc instead of sbb.
> Try another example if it is better.
> xor ah, ah
> xor ch, ch
> mov al, byte ptr [var1]
> mov cl, byte ptr [var2]
> add al, cl
> adc ah, ch
> mov word ptr [var3], ax
No. The adc modifies the ah register which is why you stall when you access
ax. You cleared ah, but the adc makes it no longer clear.
> or
>
> mov ax, word ptr [var1]
> mov cx, word ptr [var2]
> add al, cl
> adc ah, ch
> mov word ptr [var3], ax -- will stall
This has the same problem as above.
> remove mov ptr [var3], ax and replace
> mov byte ptr [high_var3], ah
> mov byte ptr [low_var3], al -- no stall
Yes, this won't stall because you don't use ax or eax after using ah.
> It will stall because 16 bit and 8 bit are mixed. If I change from mov
word
> ptr [var3], ax to mov byte ptr [high_var3], ah and mov byte ptr
[low_var3],
> al. It will avoid stall.
Correct. The problem isn't that 8-bit and 16-bit are being mixed. As I
understand it, the problem is that the processor will treat al and ah as
separate registers internally. When you access ax, the processor has to
recombine al and ah into ax. When the register is marked clear, the
processor realizes it doesn't have to combine ah and al, so there is no
stall.
-Matt
|
|
0
|
|
|
|
Reply
|
Matt
|
11/20/2003 8:49:48 PM
|
|
On Thu, 20 Nov 2003 03:59:59 GMT, "Matt Taylor" <para@tampabay.rr.com>
wrote:
<snip>
>The align directive is generally used to align code to cache lines. I have
>not heard of it being used to align variables, but if it were, it would be
>to prevent misaligned memory access. Usually variables are placed at
>addresses aligned for their size by the linker.
The ONLY variables that the linker can do anything with would be
'comm' variables. And I don't know if that is true for any other
linker that MS's. For ANY other variable, if it is misaligned from
the compiler|assembler|whatever, it stays misaligned.
<snip>
--
Arargh311 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html
To reply by email, remove the garbage from the reply address.
|
|
0
|
|
|
|
Reply
|
arargh311NOSPAM
|
11/21/2003 12:10:33 AM
|
|
|
4 Replies
278 Views
(page loaded in 0.14 seconds)
|