ADD Vs. INC

  • Follow


I have a video for you to watch on YouTube.

Basically,

I have the following code;
   mov edx, Message_1
   inc edx
   inc edx
   inc edx
   inc edx
   mov bl, [edx]
   mov [si], bl
   xor ebx, ebx
   mov bx, 2
   call SecPutC

This code is the only way this code will actually work in VMWare.
Using nasm to assemble the code btw.

If I were to replace the four INC's with an add edx, 4 all goes to
pot. And basically, the SecPutC starts a recursive call. I use lodsb
in SecPutC which is all I can state for now.

But, obviously SecPutC has nothing wrong with it! It's some weird bug
of either nasm or VMWare Player.

What do you think?

Here is the video;

http://www.youtube.com/watch?v=902jHkPXkxI
0
Reply obrzut.alex (64) 12/13/2010 4:47:44 PM

I always use this 'AddInts' function to overcome the limitations of
the built-in ADD instruction.  One has a lot less headaches at the end
of the day.

# AddInts - function to add integers
#
# Parameter One - passed via EAX
# Parameter Two - passed via ECX
#
# Returns: Result of ( EAX + ECX ) is in EAX
#

AddInts:
    cmp eax, 0
    jnz checkother
    mov eax, ecx
    ret
checkother:
    cmp ecx, 0
    jnz docalc
    ret
docalc:
    inc eax
    dec ecx
    jnz docalc
    ret

Nathan.
0
Reply Nathan 12/13/2010 11:55:35 PM


On Dec 13, 11:55=A0pm, Nathan <nathancba...@nospicedham.gmail.com>
wrote:
> I always use this 'AddInts' function to overcome the limitations of
> the built-in ADD instruction. =A0One has a lot less headaches at the end
> of the day.
>
> # AddInts - function to add integers
> #
> # Parameter One - passed via EAX
> # Parameter Two - passed via ECX
> #
> # Returns: Result of ( EAX + ECX ) is in EAX
> #
>
> AddInts:
> =A0 =A0 cmp eax, 0
> =A0 =A0 jnz checkother
> =A0 =A0 mov eax, ecx
> =A0 =A0 ret
> checkother:
> =A0 =A0 cmp ecx, 0
> =A0 =A0 jnz docalc
> =A0 =A0 ret
> docalc:
> =A0 =A0 inc eax
> =A0 =A0 dec ecx
> =A0 =A0 jnz docalc
> =A0 =A0 ret
>
> Nathan.

What are the limitations of the ADD instruction?
0
Reply NadCixelsyd 12/14/2010 12:28:50 AM

"NadCixelsyd" <nadcixelsyd@nospicedham.aol.com> wrote in message 
news:87308215-39c9-4c9f-944c-b3173777b75f@v17g2000vbo.googlegroups.com...

What are the limitations of the ADD instruction?

----

Well, for starters, ADD will occasionally molest the 'carry' flag.  INC, on 
the other hand, maintains a "hands off carry" policy.

Also, ADD is just too far abstracted away from the hardware for my tastes. 
Remember in elementary school when we were taught to count on our fingers 
'one-by-one' to add two numbers together.  *That* seems much more 
comfortable, therefore the INC instruction provides a more intuitive, 
elementary method for the addition of integers.  ADD is like a big "black 
box" -- who knows *what* it is doing behind our backs?

Nathan.


0
Reply Nathan 12/14/2010 1:02:39 AM

"catcalls" <obrzut.alex@nospicedham.gmail.com> wrote in message
news:1ab242f6-ef8f-4c03-b0f8-caa1036dfa66@i32g2000pri.googlegroups.com...
> I have a video for you to watch on YouTube.
>
> Basically,
>
> I have the following code;
>    mov edx, Message_1
>    inc edx
>    inc edx
>    inc edx
>    inc edx
>    mov bl, [edx]
>    mov [si], bl
>

In response to your question at the bottom of the post, what have you
checked?  Have you checked these?

1) Does ds have a proper selector?
2) Is si loaded with a correct value?  I don't see that here.
3) Don't you want esi instead of si?  You're in 32-bit code, not 16-bit.  If
not, is the upper 16-bits of esi correctly set?
4) You've mentioned lodsb below.  It uses esi in 32-bit mode, not si.  So,
you are modifying esi in SecPutC with lodsb.  Are you modifying si too?
5) You've mentioned lodsb below.  Is there something wrong with using al and
stosb here?
6) Is an instruction in SecPutC or the rest of your code dependent on the
byte size of this code sequence, or on the code's location?  add, 4 will
either be shorter (byte form) or longer (dword form) than the four inc's.
7) Is an instruction in SecPutC dependent on the CF?

>    xor ebx, ebx
>    mov bx, 2

mov bl,2  ?

You have 8-bit and 32-bit integer registers by default in 32-bit mode.
Using 16-bit integer registers in 32-bit mode, such as bx here and si above,
will add an override prefix to the instruction, in general.  NASM generates
code for a specific cpu and has a default cpu which causes generation of the
overrides.  You can change which cpu the code is generated for.  The older
cpu generations don't support override prefixes, and therefore no mixing of
code, i.e., no 16-bit registers in 32-bit code.  Is this what you intended?

>    call SecPutC
>
> This code is the only way this code will actually work in VMWare.
> Using nasm to assemble the code btw.
>
> If I were to replace the four INC's with an add edx, 4 all goes to
> pot. And basically, the SecPutC starts a recursive call. I use lodsb
> in SecPutC which is all I can state for now.
>
> But, obviously SecPutC has nothing wrong with it! It's some weird bug
> of either nasm or VMWare Player.
>

Comments to this are above...  Without having all available info, my guess
would be it has something to do with the mixing of 16-bit and 32-bit code.
I.e., modifying si while using esi with lodsb, or VMWare Player not
correctly supporting overrides for 16-bit register use.  Sorry, I didn't
view the video.


Rod Pemberton






0
Reply Rod 12/14/2010 7:05:02 AM

On Dec 14, 7:05=A0am, "Rod Pemberton"
<do_not_h...@nospicedham.notreplytome.cmm> wrote:
> "catcalls" <obrzut.a...@nospicedham.gmail.com> wrote in message
>
> news:1ab242f6-ef8f-4c03-b0f8-caa1036dfa66@i32g2000pri.googlegroups.com...
>
> > I have a video for you to watch on YouTube.
>
> > Basically,
>
> > I have the following code;
> > =A0 =A0mov edx, Message_1
> > =A0 =A0inc edx
> > =A0 =A0inc edx
> > =A0 =A0inc edx
> > =A0 =A0inc edx
> > =A0 =A0mov bl, [edx]
> > =A0 =A0mov [si], bl
>
> In response to your question at the bottom of the post, what have you
> checked? =A0Have you checked these?
>
> 1) Does ds have a proper selector?
> 2) Is si loaded with a correct value? =A0I don't see that here.
> 3) Don't you want esi instead of si? =A0You're in 32-bit code, not 16-bit=
.. =A0If
> not, is the upper 16-bits of esi correctly set?
> 4) You've mentioned lodsb below. =A0It uses esi in 32-bit mode, not si. =
=A0So,
> you are modifying esi in SecPutC with lodsb. =A0Are you modifying si too?
> 5) You've mentioned lodsb below. =A0Is there something wrong with using a=
l and
> stosb here?
> 6) Is an instruction in SecPutC or the rest of your code dependent on the
> byte size of this code sequence, or on the code's location? =A0add, 4 wil=
l
> either be shorter (byte form) or longer (dword form) than the four inc's.
> 7) Is an instruction in SecPutC dependent on the CF?
>
> > =A0 =A0xor ebx, ebx
> > =A0 =A0mov bx, 2
>
> mov bl,2 =A0?
>
> You have 8-bit and 32-bit integer registers by default in 32-bit mode.
> Using 16-bit integer registers in 32-bit mode, such as bx here and si abo=
ve,
> will add an override prefix to the instruction, in general. =A0NASM gener=
ates
> code for a specific cpu and has a default cpu which causes generation of =
the
> overrides. =A0You can change which cpu the code is generated for. =A0The =
older
> cpu generations don't support override prefixes, and therefore no mixing =
of
> code, i.e., no 16-bit registers in 32-bit code. =A0Is this what you inten=
ded?
>
> > =A0 =A0call SecPutC
>
> > This code is the only way this code will actually work in VMWare.
> > Using nasm to assemble the code btw.
>
> > If I were to replace the four INC's with an add edx, 4 all goes to
> > pot. And basically, the SecPutC starts a recursive call. I use lodsb
> > in SecPutC which is all I can state for now.
>
> > But, obviously SecPutC has nothing wrong with it! It's some weird bug
> > of either nasm or VMWare Player.
>
> Comments to this are above... =A0Without having all available info, my gu=
ess
> would be it has something to do with the mixing of 16-bit and 32-bit code=
..
> I.e., modifying si while using esi with lodsb, or VMWare Player not
> correctly supporting overrides for 16-bit register use. =A0Sorry, I didn'=
t
> view the video.
>
> Rod Pemberton

Do not worry about viewing the video. It is just a visual for the
information in this post.

Also, this is very interesting to me. You speak of ESI and SI and 32-
bit and 16-bit code with overrides.

The fact is, I have soundly set up a stack etc...at the beginning of
booting into 16-bit mode in VMWare Player. This is why I am using SI,
because I'm in 16-bit mode. But, that is a red flag for the remaining
code because I've mixed 32-bit registers in 16-bit mode. I see your
point, and I will rewrite the code to reflect the 16-bit code (bx
instead of ebx etc...) and see if I make any progress.

I believe si to be loaded with an address of the beginning of a string
in RAM. SecPutC outputs strings using 16-bit Interrupts to the console
screen. This is pretty much BIOS written code. Nothing to do with the
Operating System I am writing!

Actually, this code started out as a YouTube video shoot. I was
recording live my assembly language programming, in an attempt to
'order' a string in alphabetical order then output it to the screen.
This was really fun to do in ASM! I loved the project, because it
turned into a nested loop, and was quite challenging. I also found
some annoying aspects of ASM along the way. When trying to extract
characters from SI. For instance,

mov ebx, Message
mov al, [bl]

Does not work! This causes an invalid address error. Haha. I've found
lots of errors when writing live the program to order a string. Mostly
to do with attempts to extract characters from the string when I
loaded the string into a 32-bit register. I discovered, that loads 4
chars into EBX for example, because a char is 8-bits, and EBX is a 32-
bit register.

Also, there was an issue with getting each character compared either
higher or lower to all the other characters. In a loop using just
registers and no data holdings at all. That was really fun! I gave up
making the video three times. First time was a hack, second had a
plan, third was just plain silly.

But, I'm persevering with writing the tutorial for youTube. That is
why I broke the code down to a simple char advancement program. Once I
am able to traverse a string char by char, I am able to create a loop
and finish the alphabetical sort program.

Btw, don't post any alphabetical sort programs to this mailing list!
You'll ruin the fun for me! *smiles*

Kind regards,

Alex
0
Reply catcalls 12/14/2010 9:47:34 AM

   mov edx, Message_1
   add edx, 4
   mov bx, [edx]
   mov [si], bl
   xor bx, bx
   mov bx, 2
   call SecPutC

This code works | and does not cause the recursive entropy.
Interesting that one line effects this, and that is the xor bx, bx
line.

If I were to put;

xor ebx, ebx

Then I get a recursion. Where as, how it stands in the above code, it
is recursion free.

Thanks for the pointers in writing 16-bit code in a 16-bit code
segment. That made me re-write the entire program.

One question;

How does one get a char out of a string in 16-bit mode?

I attempted to load bx with the effective address of a string, but I
get an illegal address error when attempting to use that address!
0
Reply catcalls 12/14/2010 11:25:05 AM

On Tue, 14 Dec 2010 01:47:34 -0800 (PST), catcalls
<obrzut.alex@nospicedham.gmail.com> wrote:

<snip>

>mov ebx, Message
>mov al, [bl]
>
>Does not work! This causes an invalid address error. Haha. 

BL is not a valid address, since an address must be 32 bits
long in 32-bit mode, or 16 bits in 16-bit mode.  Check the
Intel opcodes for MOV and you will see there is no such
thing as a byte address.  (It wouldn't be much use anyway,
since it would only work on the first 256 bytes of memory.)

Best regards,


Bob Masta
 
              DAQARTA  v5.10
   Data AcQuisition And Real-Time Analysis
              www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
    Frequency Counter, FREE Signal Generator
           Pitch Track, Pitch-to-MIDI 
         DaqMusic - FREE MUSIC, Forever!
             (Some assembly required)
     Science (and fun!) with your sound card!
0
Reply N0Spam 12/14/2010 1:21:57 PM

"Bob Masta" <N0Spam@nospicedham.daqarta.com> wrote in message 
news:4d076d9d.1236834@news.eternal-september.org...

> BL is not a valid address, since an address must be 32 bits
> long in 32-bit mode, or 16 bits in 16-bit mode.  Check the
> Intel opcodes for MOV and you will see there is no such
> thing as a byte address.  (It wouldn't be much use anyway,
> since it would only work on the first 256 bytes of memory.)

xlat?

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 12/14/2010 3:33:28 PM

Nathan posted:
> I always use this 'AddInts' function to overcome the limitations of
> the built-in ADD instruction.  One has a lot less headaches at the end
> of the day.

> # AddInts - function to add integers
> #
> # Parameter One - passed via EAX
> # Parameter Two - passed via ECX
> #
> # Returns: Result of ( EAX + ECX ) is in EAX
> #

> AddInts:
>    cmp eax, 0
>    jnz checkother
>    mov eax, ecx
>    ret
> checkother:
>    cmp ecx, 0
>    jnz docalc
>    ret
> docalc:
>    inc eax
>    dec ecx
>    jnz docalc
>    ret

Oh well, that's a good one Nate, and after this calculation
we should perform a carry-update:

PROC_Far(op1,op2,void)
Set_carry:
 push ebp
 mov ebp,esp
 add esp,64
 push eax
 push ecx
 mov eax,op1
 mov ecx,op2
 lea eax,[eax+ecx]
 stc
 pop ecx
 pop eax
 mov esp,ebp
 xor eax,eax
 retf 12
END_PROC


Sad if someone see this as a joke,
it actually exists in hw-drivers and may even be found in BIOS.
It must be one of these vital mandatory famous HLL-Lib functions.
__
wolfgang


0
Reply wolfgang 12/14/2010 6:13:57 PM

>>  AddInts:
>>     cmp eax, 0
>>     jnz checkother
>>     mov eax, ecx
>>     ret
>>  checkother:
>>     cmp ecx, 0
>>     jnz docalc
>>     ret
>>  docalc:
>>     inc eax
>>     dec ecx
>>     jnz docalc
>>     ret

> PROC_Far(op1,op2,void)
> Set_carry:
>   push ebp
>   mov ebp,esp
>   add esp,64
>   push eax
>   push ecx
>   mov eax,op1
>   mov ecx,op2
>   lea eax,[eax+ecx]
>   stc
>   pop ecx
>   pop eax
>   mov esp,ebp
>   xor eax,eax
>   retf 12
> END_PROC

Stop it, both of you!


Bjarni
-- 

                        INFORMATION WANTS TO BE FREE
0
Reply Bjarni 12/14/2010 7:28:18 PM

On 12/14/2010 12:25 PM, catcalls wrote:
>     mov edx, Message_1
>     add edx, 4
>     mov bx, [edx]
>     mov [si], bl
>     xor bx, bx
>     mov bx, 2
>     call SecPutC

The code above fetches the fifth and sixth bytes from the string at 
Message_1, stores the fifth one where si points and tosses the sixth 
one, then first puts 0 into bx and then 2, and then calls a mystery 
function that you have not shown us.

I have my suspicions about that function.

> One question;
>
> How does one get a char out of a string in 16-bit mode?

   section .data
   the_string: db 'Tin!', 0

   section .text
   mov si, 0                # to pick character number 0
   mov al, [the_string+si]

Something like that?

> I attempted to load bx with the effective address of a string, but I
> get an illegal address error when attempting to use that address!

You mentioned this code gets booted with no operating system in place. 
Do you in fact set the segment registers up at all?


Bjarni
-- 

                        INFORMATION WANTS TO BE FREE
0
Reply Bjarni 12/14/2010 7:54:51 PM

On Dec 13, 5:02=A0pm, "Nathan Baker"
<nathancba...@nospicedham.gmail.com> wrote:
> "NadCixelsyd" <nadcixel...@nospicedham.aol.com> wrote in message
>
> news:87308215-39c9-4c9f-944c-b3173777b75f@v17g2000vbo.googlegroups.com...
>
> What are the limitations of the ADD instruction?
>
> ----
>
> Well, for starters, ADD will occasionally molest the 'carry' flag. =A0INC=
, on
> the other hand, maintains a "hands off carry" policy.

I totally agree with you, sir. The indecent behavior of ADD is a
disgrace to all CPU instructions. Sadly, the unsuspecting carry flag
is acted lewdly upon in the very first instruction of your ADD-
replacement function, the CMP. Turns out, CMP is ADD in disguise (the
cunning!). If, however, we replace the CMP to zero with one INC and
one DEC, the decency of the carry flag is maintained.
0
Reply Cyril 12/14/2010 9:06:27 PM

On Dec 14, 1:02=A0am, "Nathan Baker"
<nathancba...@nospicedham.gmail.com> wrote:
> "NadCixelsyd" <nadcixel...@nospicedham.aol.com> wrote in message
>
> news:87308215-39c9-4c9f-944c-b3173777b75f@v17g2000vbo.googlegroups.com...
>
> What are the limitations of the ADD instruction?
>
> ----
>
> Well, for starters, ADD will occasionally molest the 'carry' flag. =A0INC=
, on
> the other hand, maintains a "hands off carry" policy.
>
> Also, ADD is just too far abstracted away from the hardware for my tastes=
..
> Remember in elementary school when we were taught to count on our fingers
> 'one-by-one' to add two numbers together. =A0*That* seems much more
> comfortable, therefore the INC instruction provides a more intuitive,
> elementary method for the addition of integers. =A0ADD is like a big "bla=
ck
> box" -- who knows *what* it is doing behind our backs?
>
> Nathan.

The status of carry flag is not relevant in this case because the XOR
instruction (three instructions later) will clear it regardless of its
setting.  Even without the XOR instruction to clear the carry
flag ...  If you don't know the status of the carry flag, then don't
use any instructions that depend on the setting one way or another.

How can  ADD be abstracted?  It adds two numbers.  Period.  Isn't it
far faster to add two numbers rather with one instruction rather than
sit in a loop?  Are you advocating nested loops to do multiplies?
0
Reply NadCixelsyd 12/15/2010 12:22:49 AM

"NadCixelsyd" <nadcixelsyd@nospicedham.aol.com> wrote in message 
news:6ce3e244-c832-400a-985d-e907e9d84680@k11g2000vbf.googlegroups.com...
,---
How can  ADD be abstracted?  It adds two numbers.  Period.  Isn't it
far faster to add two numbers rather with one instruction rather than
sit in a loop?  Are you advocating nested loops to do multiplies?
`---

Just having a bit of Holiday fun.  ;)

Nathan.


0
Reply Nathan 12/15/2010 2:23:07 AM

On Tue, 14 Dec 2010 08:33:28 -0700, "James Van Buskirk"
<not_valid@nospicedham.comcast.net> wrote:

>"Bob Masta" <N0Spam@nospicedham.daqarta.com> wrote in message 
>news:4d076d9d.1236834@news.eternal-september.org...
>
>> BL is not a valid address, since an address must be 32 bits
>> long in 32-bit mode, or 16 bits in 16-bit mode.  Check the
>> Intel opcodes for MOV and you will see there is no such
>> thing as a byte address.  (It wouldn't be much use anyway,
>> since it would only work on the first 256 bytes of memory.)
>
>xlat?
>

XLAT uses AL as an offset into a table pointed to by BX or
EBX, so it still uses 16- or 32-bit addressing.

Best regards,


Bob Masta
 
              DAQARTA  v5.10
   Data AcQuisition And Real-Time Analysis
              www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
    Frequency Counter, FREE Signal Generator
           Pitch Track, Pitch-to-MIDI 
         DaqMusic - FREE MUSIC, Forever!
             (Some assembly required)
     Science (and fun!) with your sound card!
0
Reply N0Spam 12/15/2010 1:00:45 PM

Cyril Novikov figured:

>> What are the limitations of the ADD instruction?

> Well, for starters, ADD will occasionally molest the 'carry' flag. INC, on
> the other hand, maintains a "hands off carry" policy.

|I totally agree with you, sir. The indecent behavior of ADD is a
|disgrace to all CPU instructions. Sadly, the unsuspecting carry flag
|is acted lewdly upon in the very first instruction of your ADD-
|replacement function, the CMP. Turns out, CMP is ADD in disguise (the
|cunning!). If, however, we replace the CMP to zero with one INC and
|one DEC, the decency of the carry flag is maintained.

well observed. To preserve Carry we must use the hidden ADD-ladder.
more serious ment for newbies info yet:

LEA eax,[eax+ecx]   ;will ADD ecx to eax without touching flags.

__
wolfgang







0
Reply wolfgang 12/15/2010 1:06:30 PM

On Dec 14, 5:25=A0am, catcalls <obrzut.a...@nospicedham.gmail.com>
wrote:

[snip]
>
> One question;
>
> How does one get a char out of a string in 16-bit mode?
>

NASM example:

;; File: strpick.nsm
;; string pick, get a char from a string.
;; 16 bit print string example using RomBios.

[SECTION .text]

Start:
	mov  bx, MyStr	;; address of string
	LEA  bx, [MyStr]	;; address of string, alternate method

myloop:
	mov  al, [bx]	;; get character from there
	inc  bx		;; increase ptr by char size.
			;;  points to next char for looping.
	cmp  al, 0	;; test for end of string.
	je   Method2	;; done, do next method.

	push bx		;; save, protect, string pointer
	call RomBiosPrt	;; print character
	pop  bx
	jmp  myloop	;; fetch next character from string

Method2:	;; * use string instructions. *
	cld		;; affirm index increments.
	mov  SI, MyStr	;; address of string
	LEA  SI, [MyStr]	;; address of string, alternate method

myloop2:
	LODSB		;; [SI++] -> AL
	cmp  al, 0	;; test for end of string.
	je   Quits	;; done.

	call RomBiosPrt	;; print character
	jmp  myloop2	;; fetch next character from string

Quits:
	mov  ax, 0	;; pause for keypress
	int  16h
	int  19h		;; reboot

RomBiosPrt:	;; pass chr in AL, BX is overwritten with 0
	mov  ah, 0eh
	mov  bx, 0

	int  10h
	RET

ALIGN 16	;; easier to find in a debugger at a paragraph address.
[SECTION .data]

MyStr: db 'This is my string.',0Dh,0Ah,0

ALIGN 16	;; pad to next paragraph boundry, for readability.

;; -=3D eof =3D-

The makefile contents or the command string to nasm:

-f bin
-l strpick.LST
-o strpick.BIN
strpick.NSM


strpick.bin is the file I load and run, with a debugger for msdos in
XP's cmd.exe.

Do me a favor and see if this will run for you in VMWare.

The output should be:

This is my string.
This is my string.

[awaits for you to press a key, so you can see the above output.]

Steve

> I attempted to load bx with the effective address of a string, but I
> get an illegal address error when attempting to use that address!

0
Reply s_dubrovich 12/15/2010 5:29:01 PM

"wolfgang kern" <nowhere@never.at> wrote in message
news:ieaeh3$a4q$1@newsreader2.utanet.at...
>
> Cyril Novikov figured:
>
> >> What are the limitations of the ADD instruction?
>
> > Well, for starters, ADD will occasionally molest the 'carry' flag. INC,
> > on the other hand, maintains a "hands off carry" policy.
>
> |I totally agree with you, sir. The indecent behavior of ADD is a
> |disgrace to all CPU instructions. Sadly, the unsuspecting carry flag
> |is acted lewdly upon in the very first instruction of your ADD-
> |replacement function, the CMP. Turns out, CMP is ADD in disguise (the
> |cunning!). If, however, we replace the CMP to zero with one INC and
> |one DEC, the decency of the carry flag is maintained.
>
> well observed. To preserve Carry we must use the hidden ADD-ladder.
> more serious ment for newbies info yet:
>
> LEA eax,[eax+ecx]   ;will ADD ecx to eax without touching flags.
>

It's important not to touch flags.  So, we should try to not touch flags.

These arithmetic instructions do not update the CF:

inc, dec, div, idiv, not


The following list of 16/32-bit instructions doesn't touch flags - any of
them, AFAIK.  Although, there are other reasons not to use some of them,
e.g., obsolete or slow.  IMO, most are not of much use within code that is
modifying flags.  If we disregard branches, lea is the useful one, perhaps
movsx/zx, or maybe cmovcc.  xlat/b and bswap are of low usage, as are the
convert instructions and string instructions.

bound bswap call cbw cdq clts cmovcc cpuid cwd cwde enter esc fcmovcc hlt in
ins invd invlpg jcc jcxz jmp lahf lds lea leave les lfs lgdt lgs lidt lldt
lmsw lock lods loop loope loopne loopnz loopz lss ltr monitor mov movs movsx
movzx mwait nop not out outs pop popa push pusha pushf rdmsr rdpmc rdtsc rep
repe repne ret setcc sgdt sidt sldt smsw stos str ud2 wait wbinvd wrmsr xchg
xlat xlatb


Good instructions for minimizing changes of registers or flags or for
setting specific flags are:

lea, cmp, test, or, xor, push, pop, inc, dec, xchg
mov reg, imm
push memory
pop memory
movs, movsb, movsw, movsd
cmps, cmpsb, cmpsw, cmpsd


In addition to those that preserve flags, there are, of course, other
classes of instructions, such as those that modify flags, or
set/clear/modify the CF, or set/clear/modify the ZF flag, or won't generate
partial flags stalls, or will generate flags stalls, or override restricted,
or atomic, or non-atomic, or can use lock prefix, or are serialzing, or are
memory ordering, or obsolete or 64-bit obsolete, or have a specific form
such as 1 byte, or 2 byte, or hardcoded accumulator, or other hardcoded
register, or hardcoded 0 or 1, or load/store segment registers, or can use
string prefixes, or that support mixed-size instructions, or that support
imm8, or that have a +r form, or that support 8-bit relative offset, or that
support 16/32-bit relative offset, or that can perform sign-extension or
zero-extension, or that requires a sign-extended argument, or that supports
certain register sizes but not other register sizes.


Rod Pemberton



0
Reply Rod 12/15/2010 5:58:24 PM

"Bob Masta" <N0Spam@nospicedham.daqarta.com> wrote in message 
news:4d08bb0a.338708@news.eternal-september.org...

> On Tue, 14 Dec 2010 08:33:28 -0700, "James Van Buskirk"
> <not_valid@nospicedham.comcast.net> wrote:

>>"Bob Masta" <N0Spam@nospicedham.daqarta.com> wrote in message
>>news:4d076d9d.1236834@news.eternal-september.org...

>>> BL is not a valid address, since an address must be 32 bits
>>> long in 32-bit mode, or 16 bits in 16-bit mode.  Check the
>>> Intel opcodes for MOV and you will see there is no such
>>> thing as a byte address.  (It wouldn't be much use anyway,
>>> since it would only work on the first 256 bytes of memory.)

>>xlat?

> XLAT uses AL as an offset into a table pointed to by BX or
> EBX, so it still uses 16- or 32-bit addressing.

OK, it seems that we are thinking about different things as defining
a byte address.  I guess that's quite reasonable since I have also
seen people mean different things when they say 32-bit computing vs.
16-bit computing.  I was thinking about the necessity to clean up
the high bits of a register to address off of it, and you really
can make it a requirement or not, as by:

mov al, [rdi]
vs.
mov al, [edi]

or:

mov al, [edi]
vs.
mov al, [di]

or:
mov al, [di]
vs.
xlat

You seem to be focusing on the limited offset you can get by using
an addressing mode of lower bitness.  If you aren't in 64-bit mode
all the virtual addresses get some kind of base address added to them
from either the contents of the segment register involved in forming
the virtual address or the descriptor that it points to.  Thus, you
could address all of memory using a byte address even if [e]bx
weren't always part of the equation.

I think I can see more now what you are talking about, and am hoping
that my point has been clarified somewhat to you.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 12/15/2010 6:46:03 PM

On Dec 15, 5:29=A0pm, s_dubrov...@nospicedham.yahoo.com wrote:
> On Dec 14, 5:25=A0am, catcalls <obrzut.a...@nospicedham.gmail.com>
> wrote:
>
> [snip]
>
>
>
> > One question;
>
> > How does one get a char out of a string in 16-bit mode?
>
> NASM example:
>
> ;; File: strpick.nsm
> ;; string pick, get a char from a string.
> ;; 16 bit print string example using RomBios.
>
> [SECTION .text]
>
> Start:
> =A0 =A0 =A0 =A0 mov =A0bx, MyStr =A0;; address of string
> =A0 =A0 =A0 =A0 LEA =A0bx, [MyStr] =A0 =A0 =A0 =A0;; address of string, a=
lternate method
>
> myloop:
> =A0 =A0 =A0 =A0 mov =A0al, [bx] =A0 ;; get character from there
> =A0 =A0 =A0 =A0 inc =A0bx =A0 =A0 =A0 =A0 ;; increase ptr by char size.
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; =A0points to next char=
 for looping.
> =A0 =A0 =A0 =A0 cmp =A0al, 0 =A0 =A0 =A0;; test for end of string.
> =A0 =A0 =A0 =A0 je =A0 Method2 =A0 =A0;; done, do next method.
>
> =A0 =A0 =A0 =A0 push bx =A0 =A0 =A0 =A0 ;; save, protect, string pointer
> =A0 =A0 =A0 =A0 call RomBiosPrt ;; print character
> =A0 =A0 =A0 =A0 pop =A0bx
> =A0 =A0 =A0 =A0 jmp =A0myloop =A0 =A0 ;; fetch next character from string
>
> Method2: =A0 =A0 =A0 =A0;; * use string instructions. *
> =A0 =A0 =A0 =A0 cld =A0 =A0 =A0 =A0 =A0 =A0 ;; affirm index increments.
> =A0 =A0 =A0 =A0 mov =A0SI, MyStr =A0;; address of string
> =A0 =A0 =A0 =A0 LEA =A0SI, [MyStr] =A0 =A0 =A0 =A0;; address of string, a=
lternate method
>
> myloop2:
> =A0 =A0 =A0 =A0 LODSB =A0 =A0 =A0 =A0 =A0 ;; [SI++] -> AL
> =A0 =A0 =A0 =A0 cmp =A0al, 0 =A0 =A0 =A0;; test for end of string.
> =A0 =A0 =A0 =A0 je =A0 Quits =A0 =A0 =A0;; done.
>
> =A0 =A0 =A0 =A0 call RomBiosPrt ;; print character
> =A0 =A0 =A0 =A0 jmp =A0myloop2 =A0 =A0;; fetch next character from string
>
> Quits:
> =A0 =A0 =A0 =A0 mov =A0ax, 0 =A0 =A0 =A0;; pause for keypress
> =A0 =A0 =A0 =A0 int =A016h
> =A0 =A0 =A0 =A0 int =A019h =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0;; reboot
>
> RomBiosPrt: =A0 =A0 ;; pass chr in AL, BX is overwritten with 0
> =A0 =A0 =A0 =A0 mov =A0ah, 0eh
> =A0 =A0 =A0 =A0 mov =A0bx, 0
>
> =A0 =A0 =A0 =A0 int =A010h
> =A0 =A0 =A0 =A0 RET
>
> ALIGN 16 =A0 =A0 =A0 =A0;; easier to find in a debugger at a paragraph ad=
dress.
> [SECTION .data]
>
> MyStr: db 'This is my string.',0Dh,0Ah,0
>
> ALIGN 16 =A0 =A0 =A0 =A0;; pad to next paragraph boundry, for readability=
..
>
> ;; -=3D eof =3D-
>
> The makefile contents or the command string to nasm:
>
> -f bin
> -l strpick.LST
> -o strpick.BIN
> strpick.NSM
>
> strpick.bin is the file I load and run, with a debugger for msdos in
> XP's cmd.exe.
>
> Do me a favor and see if this will run for you in VMWare.
>
> The output should be:
>
> This is my string.
> This is my string.
>
> [awaits for you to press a key, so you can see the above output.]
>
> Steve
>
> > I attempted to load bx with the effective address of a string, but I
> > get an illegal address error when attempting to use that address!
>
>

Interesting! It works for real. I'm quite pleased with your code
*Smiles*

I have yet to test the LEA instruction at the top, as an alternate
means of addressing the string, but I'm pretty sure it all works.

On a side note, when I first tried to access a string in 16-bit mode,
it did not work. And that discouraged me from doing so. But, now, I
assume it was the code I wrote that was at fault, and not 16-bit mode.

Thanks,

Catcalls
0
Reply catcalls 12/16/2010 1:47:01 AM

On Dec 15, 5:29=A0pm, s_dubrov...@nospicedham.yahoo.com wrote:
> On Dec 14, 5:25=A0am, catcalls <obrzut.a...@nospicedham.gmail.com>
> wrote:
>
> [snip]
>
>
>
> > One question;
>
> > How does one get a char out of a string in 16-bit mode?
>
> NASM example:
>
> ;; File: strpick.nsm
> ;; string pick, get a char from a string.
> ;; 16 bit print string example using RomBios.
>
> [SECTION .text]
>
> Start:
> =A0 =A0 =A0 =A0 mov =A0bx, MyStr =A0;; address of string
> =A0 =A0 =A0 =A0 LEA =A0bx, [MyStr] =A0 =A0 =A0 =A0;; address of string, a=
lternate method
>
> myloop:
> =A0 =A0 =A0 =A0 mov =A0al, [bx] =A0 ;; get character from there
> =A0 =A0 =A0 =A0 inc =A0bx =A0 =A0 =A0 =A0 ;; increase ptr by char size.
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; =A0points to next char=
 for looping.
> =A0 =A0 =A0 =A0 cmp =A0al, 0 =A0 =A0 =A0;; test for end of string.
> =A0 =A0 =A0 =A0 je =A0 Method2 =A0 =A0;; done, do next method.
>
> =A0 =A0 =A0 =A0 push bx =A0 =A0 =A0 =A0 ;; save, protect, string pointer
> =A0 =A0 =A0 =A0 call RomBiosPrt ;; print character
> =A0 =A0 =A0 =A0 pop =A0bx
> =A0 =A0 =A0 =A0 jmp =A0myloop =A0 =A0 ;; fetch next character from string
>
> Method2: =A0 =A0 =A0 =A0;; * use string instructions. *
> =A0 =A0 =A0 =A0 cld =A0 =A0 =A0 =A0 =A0 =A0 ;; affirm index increments.
> =A0 =A0 =A0 =A0 mov =A0SI, MyStr =A0;; address of string
> =A0 =A0 =A0 =A0 LEA =A0SI, [MyStr] =A0 =A0 =A0 =A0;; address of string, a=
lternate method
>
> myloop2:
> =A0 =A0 =A0 =A0 LODSB =A0 =A0 =A0 =A0 =A0 ;; [SI++] -> AL
> =A0 =A0 =A0 =A0 cmp =A0al, 0 =A0 =A0 =A0;; test for end of string.
> =A0 =A0 =A0 =A0 je =A0 Quits =A0 =A0 =A0;; done.
>
> =A0 =A0 =A0 =A0 call RomBiosPrt ;; print character
> =A0 =A0 =A0 =A0 jmp =A0myloop2 =A0 =A0;; fetch next character from string
>
> Quits:
> =A0 =A0 =A0 =A0 mov =A0ax, 0 =A0 =A0 =A0;; pause for keypress
> =A0 =A0 =A0 =A0 int =A016h
> =A0 =A0 =A0 =A0 int =A019h =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0;; reboot
>
> RomBiosPrt: =A0 =A0 ;; pass chr in AL, BX is overwritten with 0
> =A0 =A0 =A0 =A0 mov =A0ah, 0eh
> =A0 =A0 =A0 =A0 mov =A0bx, 0
>
> =A0 =A0 =A0 =A0 int =A010h
> =A0 =A0 =A0 =A0 RET
>
> ALIGN 16 =A0 =A0 =A0 =A0;; easier to find in a debugger at a paragraph ad=
dress.
> [SECTION .data]
>
> MyStr: db 'This is my string.',0Dh,0Ah,0
>
> ALIGN 16 =A0 =A0 =A0 =A0;; pad to next paragraph boundry, for readability=
..
>
> ;; -=3D eof =3D-
>
> The makefile contents or the command string to nasm:
>
> -f bin
> -l strpick.LST
> -o strpick.BIN
> strpick.NSM
>
> strpick.bin is the file I load and run, with a debugger for msdos in
> XP's cmd.exe.
>
> Do me a favor and see if this will run for you in VMWare.
>
> The output should be:
>
> This is my string.
> This is my string.
>
> [awaits for you to press a key, so you can see the above output.]
>
> Steve
>
> > I attempted to load bx with the effective address of a string, but I
> > get an illegal address error when attempting to use that address!
>
>

Hi Steve,

Big headache here. Why is it when I write your code replacing bx with
dx, I get an error message: Invalid Effective Address

?

When I do the following;

mov dx, Values
mov al, [dx]

The nasm assembler really hates this code. Why?
0
Reply catcalls 12/16/2010 2:34:48 AM

catcalls wrote:
> On Dec 15, 5:29 pm, s_dubrov...@nospicedham.yahoo.com wrote:
>> On Dec 14, 5:25 am, catcalls <obrzut.a...@nospicedham.gmail.com>
>> wrote:
>>
>> [snip]
>>
>>
>>
>>> One question;
>>> How does one get a char out of a string in 16-bit mode?
>> NASM example:
>>
>> ;; File: strpick.nsm
>> ;; string pick, get a char from a string.
>> ;; 16 bit print string example using RomBios.
>>
>> [SECTION .text]
>>
>> Start:
>>         mov  bx, MyStr  ;; address of string
>>         LEA  bx, [MyStr]        ;; address of string, alternate method
>>
>> myloop:
>>         mov  al, [bx]   ;; get character from there
>>         inc  bx         ;; increase ptr by char size.
>>                         ;;  points to next char for looping.
>>         cmp  al, 0      ;; test for end of string.
>>         je   Method2    ;; done, do next method.
>>
>>         push bx         ;; save, protect, string pointer
>>         call RomBiosPrt ;; print character
>>         pop  bx
>>         jmp  myloop     ;; fetch next character from string
>>
>> Method2:        ;; * use string instructions. *
>>         cld             ;; affirm index increments.
>>         mov  SI, MyStr  ;; address of string
>>         LEA  SI, [MyStr]        ;; address of string, alternate method
>>
>> myloop2:
>>         LODSB           ;; [SI++] -> AL
>>         cmp  al, 0      ;; test for end of string.
>>         je   Quits      ;; done.
>>
>>         call RomBiosPrt ;; print character
>>         jmp  myloop2    ;; fetch next character from string
>>
>> Quits:
>>         mov  ax, 0      ;; pause for keypress
>>         int  16h
>>         int  19h                ;; reboot
>>
>> RomBiosPrt:     ;; pass chr in AL, BX is overwritten with 0
>>         mov  ah, 0eh
>>         mov  bx, 0
>>
>>         int  10h
>>         RET
>>
>> ALIGN 16        ;; easier to find in a debugger at a paragraph address.
>> [SECTION .data]
>>
>> MyStr: db 'This is my string.',0Dh,0Ah,0
>>
>> ALIGN 16        ;; pad to next paragraph boundry, for readability.
>>
>> ;; -= eof =-
>>
>> The makefile contents or the command string to nasm:
>>
>> -f bin
>> -l strpick.LST
>> -o strpick.BIN
>> strpick.NSM
>>
>> strpick.bin is the file I load and run, with a debugger for msdos in
>> XP's cmd.exe.
>>
>> Do me a favor and see if this will run for you in VMWare.
>>
>> The output should be:
>>
>> This is my string.
>> This is my string.
>>
>> [awaits for you to press a key, so you can see the above output.]
>>
>> Steve
>>
>>> I attempted to load bx with the effective address of a string, but I
>>> get an illegal address error when attempting to use that address!
>>
> 
> Hi Steve,
> 
> Big headache here. Why is it when I write your code replacing bx with
> dx, I get an error message: Invalid Effective Address
> 
> ?
> 
> When I do the following;
> 
> mov dx, Values
> mov al, [dx]
> 
> The nasm assembler really hates this code. Why?

Using 16-bit instructions, your "base" registers are bx and bp, your 
"index" registers are si and di... and that's it! An Effective Address 
consists of an offset, a base register, and an index register... all 
optional. Using 32-bit instructions, any register can be a "base", and 
any but esp can be an "index", and you can have a "scale" of 2, 4, or 8 
applied to the "index". 16-bit addressing is a real PITA.

However... you can use 32-bit instructions in 16-bit code(!). "mov al, 
[edx]" will assemble (Nasm will add an "address size override prefix", 
67h), and will "probably" work. The upper 16 bits of edx have to be 
clear! If the proposed address exceeds the "limit" for the segment - 
usually 0FFFFh in 16-bit modes - it causes a "segment overrun 
exception", crashing your program. To be safe...

xor edx, edx
mov dx, Values
mov al, [edx]

Nasm should like that, and hopefully your CPU will too!

Best,
Frank
0
Reply Frank 12/16/2010 3:38:01 AM

On Dec 15, 9:38=A0pm, Frank Kotler
<fbkot...@nospicedham.myfairpoint.net> wrote:
> catcalls wrote:
> > On Dec 15, 5:29 pm, s_dubrov...@nospicedham.yahoo.com wrote:
> >> On Dec 14, 5:25 am, catcalls <obrzut.a...@nospicedham.gmail.com>
> >> wrote:
>

> >> [snip]

> > Hi Steve,
>
> > Big headache here. Why is it when I write your code replacing bx with
> > dx, I get an error message: Invalid Effective Address
>
> > ?
>
> > When I do the following;
>
> > mov dx, Values
> > mov al, [dx]
>
> > The nasm assembler really hates this code. Why?
>
> Using 16-bit instructions, your "base" registers are bx and bp, your
> "index" registers are si and di... and that's it! An Effective Address
> consists of an offset, a base register, and an index register... all
> optional. Using 32-bit instructions, any register can be a "base", and
> any but esp can be an "index", and you can have a "scale" of 2, 4, or 8
> applied to the "index". 16-bit addressing is a real PITA.
>
> However... you can use 32-bit instructions in 16-bit code(!). "mov al,
> [edx]" will assemble (Nasm will add an "address size override prefix",
> 67h), and will "probably" work. The upper 16 bits of edx have to be
> clear! If the proposed address exceeds the "limit" for the segment -
> usually 0FFFFh in 16-bit modes - it causes a "segment overrun
> exception", crashing your program. To be safe...
>
> xor edx, edx
> mov dx, Values
> mov al, [edx]
>
> Nasm should like that, and hopefully your CPU will too!
>
> Best,
> Frank- Hide quoted text -
>
> - Show quoted text -

Gee Frank, now he know the truth, the general purpose registers aren't
general purpose! -in regards to BX as the Base Register in Effective
Address calculations.

The question made me wonder if the directive CPU P6, or some such,
might make a difference, but it doesn't.  Your 32-bit instruction
variation is a nice finesse.

I guess we should also point out that the default segment for [BX] is
DS whereas the default segment for [BP] is SS.

Steve
0
Reply s_dubrovich 12/16/2010 4:34:12 AM

On Dec 16, 3:38=A0am, Frank Kotler
<fbkot...@nospicedham.myfairpoint.net> wrote:
> catcalls wrote:
> > On Dec 15, 5:29 pm, s_dubrov...@nospicedham.yahoo.com wrote:
> >> On Dec 14, 5:25 am, catcalls <obrzut.a...@nospicedham.gmail.com>
> >> wrote:
>
> >> [snip]
>
> >>> One question;
> >>> How does one get a char out of a string in 16-bit mode?
> >> NASM example:
>
> >> ;; File: strpick.nsm
> >> ;; string pick, get a char from a string.
> >> ;; 16 bit print string example using RomBios.
>
> >> [SECTION .text]
>
> >> Start:
> >> =A0 =A0 =A0 =A0 mov =A0bx, MyStr =A0;; address of string
> >> =A0 =A0 =A0 =A0 LEA =A0bx, [MyStr] =A0 =A0 =A0 =A0;; address of string=
, alternate method
>
> >> myloop:
> >> =A0 =A0 =A0 =A0 mov =A0al, [bx] =A0 ;; get character from there
> >> =A0 =A0 =A0 =A0 inc =A0bx =A0 =A0 =A0 =A0 ;; increase ptr by char size=
..
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; =A0points to next c=
har for looping.
> >> =A0 =A0 =A0 =A0 cmp =A0al, 0 =A0 =A0 =A0;; test for end of string.
> >> =A0 =A0 =A0 =A0 je =A0 Method2 =A0 =A0;; done, do next method.
>
> >> =A0 =A0 =A0 =A0 push bx =A0 =A0 =A0 =A0 ;; save, protect, string point=
er
> >> =A0 =A0 =A0 =A0 call RomBiosPrt ;; print character
> >> =A0 =A0 =A0 =A0 pop =A0bx
> >> =A0 =A0 =A0 =A0 jmp =A0myloop =A0 =A0 ;; fetch next character from str=
ing
>
> >> Method2: =A0 =A0 =A0 =A0;; * use string instructions. *
> >> =A0 =A0 =A0 =A0 cld =A0 =A0 =A0 =A0 =A0 =A0 ;; affirm index increments=
..
> >> =A0 =A0 =A0 =A0 mov =A0SI, MyStr =A0;; address of string
> >> =A0 =A0 =A0 =A0 LEA =A0SI, [MyStr] =A0 =A0 =A0 =A0;; address of string=
, alternate method
>
> >> myloop2:
> >> =A0 =A0 =A0 =A0 LODSB =A0 =A0 =A0 =A0 =A0 ;; [SI++] -> AL
> >> =A0 =A0 =A0 =A0 cmp =A0al, 0 =A0 =A0 =A0;; test for end of string.
> >> =A0 =A0 =A0 =A0 je =A0 Quits =A0 =A0 =A0;; done.
>
> >> =A0 =A0 =A0 =A0 call RomBiosPrt ;; print character
> >> =A0 =A0 =A0 =A0 jmp =A0myloop2 =A0 =A0;; fetch next character from str=
ing
>
> >> Quits:
> >> =A0 =A0 =A0 =A0 mov =A0ax, 0 =A0 =A0 =A0;; pause for keypress
> >> =A0 =A0 =A0 =A0 int =A016h
> >> =A0 =A0 =A0 =A0 int =A019h =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0;; reboot
>
> >> RomBiosPrt: =A0 =A0 ;; pass chr in AL, BX is overwritten with 0
> >> =A0 =A0 =A0 =A0 mov =A0ah, 0eh
> >> =A0 =A0 =A0 =A0 mov =A0bx, 0
>
> >> =A0 =A0 =A0 =A0 int =A010h
> >> =A0 =A0 =A0 =A0 RET
>
> >> ALIGN 16 =A0 =A0 =A0 =A0;; easier to find in a debugger at a paragraph=
 address.
> >> [SECTION .data]
>
> >> MyStr: db 'This is my string.',0Dh,0Ah,0
>
> >> ALIGN 16 =A0 =A0 =A0 =A0;; pad to next paragraph boundry, for readabil=
ity.
>
> >> ;; -=3D eof =3D-
>
> >> The makefile contents or the command string to nasm:
>
> >> -f bin
> >> -l strpick.LST
> >> -o strpick.BIN
> >> strpick.NSM
>
> >> strpick.bin is the file I load and run, with a debugger for msdos in
> >> XP's cmd.exe.
>
> >> Do me a favor and see if this will run for you in VMWare.
>
> >> The output should be:
>
> >> This is my string.
> >> This is my string.
>
> >> [awaits for you to press a key, so you can see the above output.]
>
> >> Steve
>
> >>> I attempted to load bx with the effective address of a string, but I
> >>> get an illegal address error when attempting to use that address!
>
> > Hi Steve,
>
> > Big headache here. Why is it when I write your code replacing bx with
> > dx, I get an error message: Invalid Effective Address
>
> > ?
>
> > When I do the following;
>
> > mov dx, Values
> > mov al, [dx]
>
> > The nasm assembler really hates this code. Why?
>
> Using 16-bit instructions, your "base" registers are bx and bp, your
> "index" registers are si and di... and that's it! An Effective Address
> consists of an offset, a base register, and an index register... all
> optional. Using 32-bit instructions, any register can be a "base", and
> any but esp can be an "index", and you can have a "scale" of 2, 4, or 8
> applied to the "index". 16-bit addressing is a real PITA.
>
> However... you can use 32-bit instructions in 16-bit code(!). "mov al,
> [edx]" will assemble (Nasm will add an "address size override prefix",
> 67h), and will "probably" work. The upper 16 bits of edx have to be
> clear! If the proposed address exceeds the "limit" for the segment -
> usually 0FFFFh in 16-bit modes - it causes a "segment overrun
> exception", crashing your program. To be safe...
>
> xor edx, edx
> mov dx, Values
> mov al, [edx]
>
> Nasm should like that, and hopefully your CPU will too!
>
> Best,
> Frank

Damn, how did people code during the 16-bit days!?

Kind regards,

catcalls
0
Reply catcalls 12/16/2010 4:38:41 AM

On 12/15/2010 08:38 PM, catcalls wrote:
> 
> Damn, how did people code during the 16-bit days!?
> 

Very carefully.  The 8-bit days were even worse; the 8080 for example,
only has a single register (pair) usable for addressing.

	-hpa
0
Reply H 12/16/2010 5:00:03 AM

On Dec 16, 4:34=A0am, s_dubrov...@nospicedham.yahoo.com wrote:
> On Dec 15, 9:38=A0pm, Frank Kotler
>
>
>
> <fbkot...@nospicedham.myfairpoint.net> wrote:
> > catcalls wrote:
> > > On Dec 15, 5:29 pm, s_dubrov...@nospicedham.yahoo.com wrote:
> > >> On Dec 14, 5:25 am, catcalls <obrzut.a...@nospicedham.gmail.com>
> > >> wrote:
>
> > >> [snip]
> > > Hi Steve,
>
> > > Big headache here. Why is it when I write your code replacing bx with
> > > dx, I get an error message: Invalid Effective Address
>
> > > ?
>
> > > When I do the following;
>
> > > mov dx, Values
> > > mov al, [dx]
>
> > > The nasm assembler really hates this code. Why?
>
> > Using 16-bit instructions, your "base" registers are bx and bp, your
> > "index" registers are si and di... and that's it! An Effective Address
> > consists of an offset, a base register, and an index register... all
> > optional. Using 32-bit instructions, any register can be a "base", and
> > any but esp can be an "index", and you can have a "scale" of 2, 4, or 8
> > applied to the "index". 16-bit addressing is a real PITA.
>
> > However... you can use 32-bit instructions in 16-bit code(!). "mov al,
> > [edx]" will assemble (Nasm will add an "address size override prefix",
> > 67h), and will "probably" work. The upper 16 bits of edx have to be
> > clear! If the proposed address exceeds the "limit" for the segment -
> > usually 0FFFFh in 16-bit modes - it causes a "segment overrun
> > exception", crashing your program. To be safe...
>
> > xor edx, edx
> > mov dx, Values
> > mov al, [edx]
>
> > Nasm should like that, and hopefully your CPU will too!
>
> > Best,
> > Frank- Hide quoted text -
>
> > - Show quoted text -
>
> Gee Frank, now he know the truth, the general purpose registers aren't
> general purpose! -in regards to BX as the Base Register in Effective
> Address calculations.
>
> The question made me wonder if the directive CPU P6, or some such,
> might make a difference, but it doesn't. =A0Your 32-bit instruction
> variation is a nice finesse.
>
> I guess we should also point out that the default segment for [BX] is
> DS whereas the default segment for [BP] is SS.
>
> Steve

I'm telling!
0
Reply catcalls 12/16/2010 5:30:55 AM

In article  <63dd36cc-fcdf-4ad3-a04b-9d787345ebb1@w17g2000yqh.googlegroups.com>
           obrzut.alex@nospicedham.gmail.com "catcalls" writes:
[..]
> 
> Damn, how did people code during the 16-bit days!?
> 
> Kind regards,
> 
> catcalls

With a well-thumbed copy of "The 8086 Book" (Rector & Alexy) sitting 
beside the keyboard!  But to be honest, those of us brought up on the 
8080/Z80 were used to the non-orthogonal instruction set and "special" 
registers for specific uses, so the 8086 came as little surprise.

The real PITA for me was the segmentation mechanism when one strayed 
out of "small" model but hey -- we managed somehow :)

Pete
-- 
   "We have not inherited the earth from our ancestors,
    we have borrowed it from our descendants."
0
Reply pete 12/16/2010 6:18:47 AM

On Wed, 15 Dec 2010 12:58:24 -0500, "Rod Pemberton"
<do_not_have@nospicedham.notreplytome.cmm> wrote:


>These arithmetic instructions do not update the CF:
>
>inc, dec, div, idiv, not

According to the original Intel "iAPX 86/88, 186/188
Programmer's Reference", content of CF (as well as AF, OF,
PF, SF, ZF) is undefined after DIV or IDIV.  Has this been
formally changed on later processors, or is it just the case
that they have not been found to change CF?

Best regards,


Bob Masta
 
              DAQARTA  v5.10
   Data AcQuisition And Real-Time Analysis
              www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
    Frequency Counter, FREE Signal Generator
           Pitch Track, Pitch-to-MIDI 
         DaqMusic - FREE MUSIC, Forever!
             (Some assembly required)
     Science (and fun!) with your sound card!
0
Reply N0Spam 12/16/2010 1:21:16 PM

On Dec 15, 2:23=A0am, "Nathan Baker"
<nathancba...@nospicedham.gmail.com> wrote:
> "NadCixelsyd" <nadcixel...@nospicedham.aol.com> wrote in message
>
> news:6ce3e244-c832-400a-985d-e907e9d84680@k11g2000vbf.googlegroups.com...
> ,---
> How can =A0ADD be abstracted? =A0It adds two numbers. =A0Period. =A0Isn't=
 it
> far faster to add two numbers rather with one instruction rather than
> sit in a loop? =A0Are you advocating nested loops to do multiplies?
> `---
>
> Just having a bit of Holiday fun. =A0;)
>
> Nathan.

That's a very immature thing to do, especially for a moderator.
0
Reply NadCixelsyd 12/16/2010 1:58:14 PM

Rod Pemberton wrote:
> It's important not to touch flags.  So, we should try to not touch flags.
>
> These arithmetic instructions do not update the CF:
>
> inc, dec, div, idiv, not

These instructions are interesting, but somewhat evil:

On all modern OoO cpus they cause an extra problem, in that the 
instruction must merge the flag results of the operation with the flags 
from the previous instruction.

This can be a pipeline problem, of the same type as

MOV EAX,0

vs

XOR EAX,EAX


Most x86 cores will recognize the second operation as being equivalent 
to the first, breaking any dependency of previous operations with EAX as 
the target, but there have been cpus where the first form could be faster.

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/16/2010 2:12:22 PM

"catcalls" asked Steve,
....
|Big headache here. Why is it when I write your code replacing bx with
|dx, I get an error message: Invalid Effective Address
|?
|When I do the following;
|mov dx, Values
|mov al, [dx]

|The nasm assembler really hates this code. Why?

It's not NASM's fault, the CPU just can't do it :)
16-bit addressing is limited to (in opcode-bits order):

defaultseg; mod_00  ;mod_01     ;mod_10       ;mod_11
r/m
000  ds:   [bx+si]  ;[bx+si+d8] ;[bx+si+d16]  ;AL|AX
001  ds:   [bx+di]  ;and so on..              ;CL|CX
010  ss:   [bp+si]  ;                         ;DL|DX
011  ss:   [bp+di]  ;                         ;BL|BX
100  ds:   [si]     ;                         ;AH|SP
101  ds:   [di]     ;                         ;CH|BP
110  ds:   [mem16]  ;ss:[bp+d8] ;ss:[bp+d16]  ;DH|SI
111  ds:   [bx]     ;ds:[bx+d8] ;ds:[bx+d16]  ;BH|DI

segment override prefix work for all of them.

But CPUs >i286 can use prefix 67h to use 32-bit addressing
while still in 16-bit code. And NASM is aware of this.

32-bit addressing is quite different to the above, Intel-
manuals show all possible variants on a single page.

__
wolfgang


0
Reply wolfgang 12/16/2010 4:18:41 PM

On 12/16/2010 05:38 AM, catcalls wrote:
> Damn, how did people code during the 16-bit days!?

As others have pointed out, the 8-bit days were even worse. I learned to 
program on the 6502, which gave you one single 8-bit accumulator and two 
8-bit index registers, which provided non-orthogonal addressing modes. 
The stack held 256 bytes and wrapped.

The PDP-8 was also an interesting machine to program, with no stack, one 
accumulator, and rather inconvenient memory addressing.


Bjarni
-- 

                        INFORMATION WANTS TO BE FREE
0
Reply Bjarni 12/16/2010 6:46:58 PM

;Binary Trees > Nodes
; Value
; Greater Than Node
; Less Than Node
START:
	mov eax, [value]
	mov edx, [value+4]
	cmp eax, edx
	jae LOWER
	jmp DONE
LOWER:
	push edx
	pop eax
	jmp DONE
	nop
DONE:
	mov ebx, 10
	call writeInt
hlt

This is some code I wrote. But, the annoying thing, is, that when I
take that nop out, I get the recursive call of writeInt. I've tried
modifying writeInt not to use ECX and LOOP, to DEC ECX and CMP ECX, 0
and finally a JNE etc...

But, my question is; Why is that nop stopping recursive calling of the
WriteInt function!?

Weird if you ask me.

Btw, I did not require the need to xor anything, as I replaced all the
xor's with NOP.
0
Reply catcalls 12/17/2010 1:08:29 PM

On Dec 17, 1:08=A0pm, catcalls <obrzut.a...@nospicedham.gmail.com>
wrote:
> ;Binary Trees > Nodes
> ; Value
> ; Greater Than Node
> ; Less Than Node
> START:
> =A0 =A0 =A0 =A0 mov eax, [value]
> =A0 =A0 =A0 =A0 mov edx, [value+4]
> =A0 =A0 =A0 =A0 cmp eax, edx
> =A0 =A0 =A0 =A0 jae LOWER
> =A0 =A0 =A0 =A0 jmp DONE
> LOWER:
> =A0 =A0 =A0 =A0 push edx
> =A0 =A0 =A0 =A0 pop eax
> =A0 =A0 =A0 =A0 jmp DONE
> =A0 =A0 =A0 =A0 nop
> DONE:
> =A0 =A0 =A0 =A0 mov ebx, 10
> =A0 =A0 =A0 =A0 call writeInt
> hlt
>
> This is some code I wrote. But, the annoying thing, is, that when I
> take that nop out, I get the recursive call of writeInt. I've tried
> modifying writeInt not to use ECX and LOOP, to DEC ECX and CMP ECX, 0
> and finally a JNE etc...
>
> But, my question is; Why is that nop stopping recursive calling of the
> WriteInt function!?
>
> Weird if you ask me.
>
> Btw, I did not require the need to xor anything, as I replaced all the
> xor's with NOP.

That said, it appears when there is an 'odd' number of instructions, I
get the recursive call. If you can see, there are 12 instructions in
this code, including the nop...
0
Reply catcalls 12/17/2010 4:23:44 PM

catcalls wrote:
> On Dec 17, 1:08 pm, catcalls <obrzut.a...@nospicedham.gmail.com>
> wrote:
>> ;Binary Trees > Nodes
>> ; Value
>> ; Greater Than Node
>> ; Less Than Node
>> START:
>>         mov eax, [value]
>>         mov edx, [value+4]
>>         cmp eax, edx
>>         jae LOWER
>>         jmp DONE
>> LOWER:
>>         push edx
>>         pop eax

"mov eax, edx" would be faster(?).

>>         jmp DONE
>>         nop
>> DONE:
>>         mov ebx, 10
>>         call writeInt
>> hlt
>>
>> This is some code I wrote. But, the annoying thing, is, that when I
>> take that nop out, I get the recursive call of writeInt. I've tried
>> modifying writeInt not to use ECX and LOOP, to DEC ECX and CMP ECX, 0
>> and finally a JNE etc...
>>
>> But, my question is; Why is that nop stopping recursive calling of the
>> WriteInt function!?
>>
>> Weird if you ask me.
>>
>> Btw, I did not require the need to xor anything, as I replaced all the
>> xor's with NOP.
> 
> That said, it appears when there is an 'odd' number of instructions, I
> get the recursive call. If you can see, there are 12 instructions in
> this code, including the nop...

"hlt" only stops the CPU until an interrupt occurs.

cli
hlt

might help... although I suspect this isn't the entire code...

This, and the original problem in this thread, lead me to suspect that 
there may be errors in the code you're not showing...

Best,
Frank

0
Reply Frank 12/17/2010 5:11:32 PM

"catcalls" wrote:
> ;Binary Trees > Nodes
> ; Value
> ; Greater Than Node
> ; Less Than Node
> START:
> mov eax, [value]
> mov edx, [value+4]
> cmp eax, edx
> jae LOWER
> jmp DONE
> LOWER:

have you tried to replace such two jumps by one ?  jb DONE

> push edx
> pop eax

Why not 'mov eax,edx' ?  (faster and also only two bytes)

> jmp DONE
> nop

with or without the NOP, what's the reason for this jump here ?

> DONE:
> mov ebx, 10
> call writeInt
> hlt

> This is some code I wrote. But, the annoying thing, is, that when I
> take that nop out, I get the recursive call of writeInt. I've tried
> modifying writeInt not to use ECX and LOOP, to DEC ECX and CMP ECX, 0
> and finally a JNE etc...

> But, my question is; Why is that nop stopping recursive calling of the
> WriteInt function!?

I can't see NOP as guilty here at all.
Your not shown writeInt is probably located right after the HLT
and is executed a second time after any IRQ ie: a timer-tick.
And the second executed return in writeInt will put your code-flow
somewhere ...
Does your writeInt also contain a label named DONE ?
It could also be just a wrong stack treatment (like pushed arguments).

> Weird if you ask me.
> Btw, I did not require the need to xor anything, as I replaced all the
> xor's with NOP.

|That said, it appears when there is an 'odd' number of instructions, I
|get the recursive call. If you can see, there are 12 instructions in
|this code, including the nop...

I think the described behaviour got nothing to do with the
posted snippet except for the missing end.

END:
 ;cli        ;uncomment it if you like a 'dead end'
             ;sti may allow hotkeys ie: alt-ctrl-DEL.
 hlt
 jmp END

__
wolfgang


0
Reply wolfgang 12/17/2010 6:15:21 PM

On Dec 17, 6:15=A0pm, "wolfgang kern" <nowh...@never.at> wrote:
> "catcalls" wrote:
> > ;Binary Trees > Nodes
> > ; Value
> > ; Greater Than Node
> > ; Less Than Node
> > START:
> > mov eax, [value]
> > mov edx, [value+4]
> > cmp eax, edx
> > jae LOWER
> > jmp DONE
> > LOWER:
>
> have you tried to replace such two jumps by one ? =A0jb DONE
>
> > push edx
> > pop eax
>
> Why not 'mov eax,edx' ? =A0(faster and also only two bytes)
>
> > jmp DONE
> > nop
>
> with or without the NOP, what's the reason for this jump here ?
>
> > DONE:
> > mov ebx, 10
> > call writeInt
> > hlt
> > This is some code I wrote. But, the annoying thing, is, that when I
> > take that nop out, I get the recursive call of writeInt. I've tried
> > modifying writeInt not to use ECX and LOOP, to DEC ECX and CMP ECX, 0
> > and finally a JNE etc...
> > But, my question is; Why is that nop stopping recursive calling of the
> > WriteInt function!?
>
> I can't see NOP as guilty here at all.
> Your not shown writeInt is probably located right after the HLT
> and is executed a second time after any IRQ ie: a timer-tick.
> And the second executed return in writeInt will put your code-flow
> somewhere ...
> Does your writeInt also contain a label named DONE ?
> It could also be just a wrong stack treatment (like pushed arguments).
>
> > Weird if you ask me.
> > Btw, I did not require the need to xor anything, as I replaced all the
> > xor's with NOP.
>
> |That said, it appears when there is an 'odd' number of instructions, I
> |get the recursive call. If you can see, there are 12 instructions in
> |this code, including the nop...
>
> I think the described behaviour got nothing to do with the
> posted snippet except for the missing end.
>
> END:
> =A0;cli =A0 =A0 =A0 =A0;uncomment it if you like a 'dead end'
> =A0 =A0 =A0 =A0 =A0 =A0 =A0;sti may allow hotkeys ie: alt-ctrl-DEL.
> =A0hlt
> =A0jmp END
>
> __
> wolfgang

Taken your advice...

     95 ;Binary Trees > Nodes
     96 ; Value
     97 ; Greater Than Node
     98 ; Less Than Node
     99 START:
    100         mov eax, [value]
    101         mov edx, [value+4]
    102         cmp eax, edx
    103         jae DONE
    104 LOWER:
    105         push edx
    106         pop eax
    107 DONE:
    108         mov ebx, 10
    109         call writeInt
    110
    111 END:
    112 cli        ;uncomment it if you like a 'dead end'
    113              ;sti may allow hotkeys ie: alt-ctrl-DEL.
    114 hlt
    115 jmp END


This works! Well done, kind sir. For taking the time to correct my
obvious mistakes.

Sometimes, it takes a professional coder like yourself to help out the
ASM Noobs. I used to help people with gardening tips, and I know the
rewards are good for helping others.

Thank you,

Catcalls
0
Reply catcalls 12/18/2010 4:58:09 AM

catcalls replied:
....
>> push edx
>> pop eax
> Why not 'mov eax,edx' ? (faster and also only two bytes)
....
<quote_____________>
Taken your advice...

     95 ;Binary Trees > Nodes
     96 ; Value
     97 ; Greater Than Node
     98 ; Less Than Node
     99 START:
    100         mov eax, [value]
    101         mov edx, [value+4]
    102         cmp eax, edx
    103         jae DONE
    104 LOWER:
    105         push edx
    106         pop eax
    107 DONE:
    108         mov ebx, 10
    109         call writeInt
    110
    111 END:
    112 cli        ;uncomment it if you like a 'dead end'
    113              ;sti may allow hotkeys ie: alt-ctrl-DEL.
    114 hlt
    115 jmp END


This works! Well done, kind sir. For taking the time to correct my
obvious mistakes.

Sometimes, it takes a professional coder like yourself to help out the
ASM Noobs. I used to help people with gardening tips, and I know the
rewards are good for helping others.

Thank you,
<__________/quote>

glad to be of some help, but please dont call me sir :)
If you like this version to act the same way as your previous
attempt then you should replace JAE (>=; NC or Z) with JB (<; C).

__
wolfgang


0
Reply wolfgang 12/18/2010 3:49:14 PM

catcalls wrote:

<snip>
>       95 ;Binary Trees>  Nodes
>       96 ; Value
>       97 ; Greater Than Node
>       98 ; Less Than Node
>       99 START:
>      100         mov eax, [value]
>      101         mov edx, [value+4]
>      102         cmp eax, edx
>      103         jae DONE
>      104 LOWER:
>      105         push edx
>      106         pop eax
>      107 DONE:
>      108         mov ebx, 10
>      109         call writeInt
>      110
>      111 END:
>      112 cli        ;uncomment it if you like a 'dead end'
>      113              ;sti may allow hotkeys ie: alt-ctrl-DEL.
>      114 hlt
>      115 jmp END


START:
mov eax,[value]
mov edx,[value + 4]
cmp eax,edx
cmovae eax,edx

END:
cli
hlt
jmp END


saves the conditional jump completely.


Greetings from Augsburg

Bernhard Schornak
0
Reply Bernhard 12/18/2010 7:20:28 PM

On Dec 18, 7:20=A0pm, Bernhard Schornak <schor...@nospicedham.web.de>
wrote:
> cmovae eax,edx
>
> saves the conditional jump completely.

But previously the code was 386-compatible; with that change it will
only run on a Pentium or later (I can't remember exactly when the
conditional move instructions first appeared).

Linus Torvalds argues here that conditional moves are rarely a good
idea:

 http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

Richard.
http://www.rtrussell.co.uk/
0
Reply Richard 12/18/2010 10:33:13 PM

Bernhard Schornak wrote:
> catcalls wrote:
> 
> <snip>
>>       95 ;Binary Trees>  Nodes
>>       96 ; Value
>>       97 ; Greater Than Node
>>       98 ; Less Than Node
>>       99 START:
>>      100         mov eax, [value]
>>      101         mov edx, [value+4]
>>      102         cmp eax, edx
>>      103         jae DONE
>>      104 LOWER:
>>      105         push edx
>>      106         pop eax
>>      107 DONE:
>>      108         mov ebx, 10
>>      109         call writeInt
>>      110
>>      111 END:
>>      112 cli        ;uncomment it if you like a 'dead end'
>>      113              ;sti may allow hotkeys ie: alt-ctrl-DEL.
>>      114 hlt
>>      115 jmp END
> 
> 
> START:
> mov eax,[value]
> mov edx,[value + 4]
> cmp eax,edx
> cmovae eax,edx
> 
> END:
> cli
> hlt
> jmp END
> 
> 
> saves the conditional jump completely.

Before we optimize this too much more... In the binary trees I'm 
familiar with, I would expect a node to be:

dd Value
dd Pointer to greater-than node
dd Pointer to less-than node

If we have:

dd Value
dd value from greater-than node
....

Then the above code makes sense. If it is as I "expect", we want to do 
something like:

mov eax, [Value]
mov edx, [Value + 4]  ; we can give "4" a name, if we wish...
cmp eax, [edx] ; "dereference the pointer"?
....

Just a thought...

Greetings from Raymond,
Frank

0
Reply Frank 12/18/2010 11:21:19 PM

On Dec 19, 7:21=A0am, Frank Kotler
<fbkot...@nospicedham.myfairpoint.net> wrote:
> Bernhard Schornak wrote:
> > catcalls wrote:
>
> > <snip>
> >> =A0 =A0 =A0 95 ;Binary Trees> =A0Nodes
> >> =A0 =A0 =A0 96 ; Value
> >> =A0 =A0 =A0 97 ; Greater Than Node
> >> =A0 =A0 =A0 98 ; Less Than Node
> >> =A0 =A0 =A0 99 START:
> >> =A0 =A0 =A0100 =A0 =A0 =A0 =A0 mov eax, [value]
> >> =A0 =A0 =A0101 =A0 =A0 =A0 =A0 mov edx, [value+4]
> >> =A0 =A0 =A0102 =A0 =A0 =A0 =A0 cmp eax, edx
> >> =A0 =A0 =A0103 =A0 =A0 =A0 =A0 jae DONE
> >> =A0 =A0 =A0104 LOWER:
> >> =A0 =A0 =A0105 =A0 =A0 =A0 =A0 push edx
> >> =A0 =A0 =A0106 =A0 =A0 =A0 =A0 pop eax
> >> =A0 =A0 =A0107 DONE:
> >> =A0 =A0 =A0108 =A0 =A0 =A0 =A0 mov ebx, 10
> >> =A0 =A0 =A0109 =A0 =A0 =A0 =A0 call writeInt
> >> =A0 =A0 =A0110
> >> =A0 =A0 =A0111 END:
> >> =A0 =A0 =A0112 cli =A0 =A0 =A0 =A0;uncomment it if you like a 'dead en=
d'
> >> =A0 =A0 =A0113 =A0 =A0 =A0 =A0 =A0 =A0 =A0;sti may allow hotkeys ie: a=
lt-ctrl-DEL.
> >> =A0 =A0 =A0114 hlt
> >> =A0 =A0 =A0115 jmp END
>
> > START:
> > mov eax,[value]
> > mov edx,[value + 4]
> > cmp eax,edx
> > cmovae eax,edx
>
> > END:
> > cli
> > hlt
> > jmp END
>
> > saves the conditional jump completely.
>
> Before we optimize this too much more... In the binary trees I'm
> familiar with, I would expect a node to be:
>
> dd Value
> dd Pointer to greater-than node
> dd Pointer to less-than node
>
> If we have:
>
> dd Value
> dd value from greater-than node
> ...
>
> Then the above code makes sense. If it is as I "expect", we want to do
> something like:
>
> mov eax, [Value]
> mov edx, [Value + 4] =A0; we can give "4" a name, if we wish...
> cmp eax, [edx] ; "dereference the pointer"?
> ...
>
> Just a thought...
>
> Greetings from Raymond,
> Frank


Before going too deep in computation optimization, as modern computer
is too fast while most of the time the CPU is starved for data
I suspect that a memory overhead of 2 pointers per each dd is a bad
idea.

I would have the original value /16 (or 256, or anything optimal for
practical data)
in each btree node we store an array or sorted list of that 16 values
(and in some case you may not need to sort them)

It may take a few more instruction count but it should be more memory/
cache friendly
0
Reply Bluemoon 12/19/2010 7:20:07 AM

Richard Russell wrote:
> On Dec 18, 7:20 pm, Bernhard Schornak<schor...@nospicedham.web.de>
> wrote:
>> cmovae eax,edx
>>
>> saves the conditional jump completely.
>
> But previously the code was 386-compatible; with that change it will
> only run on a Pentium or later (I can't remember exactly when the
> conditional move instructions first appeared).
>
> Linus Torvalds argues here that conditional moves are rarely a good
> idea:
>
>   http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

That thread is from 2007, and mainly concerns itself with the P4, which 
had many ugly stumbling blocks, not just slow mispredicts (due to 
excessively long pipeline) and many microcoded instructions.

Linus does mention the key issue though:

CMOVcc is _only_ a win if/when the branch is really very close to 
unpredictable, or when you have so many alternate code paths that the 
Jcc's would trash the branch predictor table(s).

What this means is that it is close to impossible to measure the Jcc vs 
CMOVcc difference in isolation, i.e. microbenchmarks don't really work 
here. :-(

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/19/2010 10:22:17 AM

Frank Kotler wrote:


> Bernhard Schornak wrote:
>> catcalls wrote:
>>
>> <snip>
>>> 95 ;Binary Trees> Nodes
>>> 96 ; Value
>>> 97 ; Greater Than Node
>>> 98 ; Less Than Node
>>> 99 START:
>>> 100 mov eax, [value]
>>> 101 mov edx, [value+4]
>>> 102 cmp eax, edx
>>> 103 jae DONE
>>> 104 LOWER:
>>> 105 push edx
>>> 106 pop eax
>>> 107 DONE:
>>> 108 mov ebx, 10
>>> 109 call writeInt
>>> 110
>>> 111 END:
>>> 112 cli ;uncomment it if you like a 'dead end'
>>> 113 ;sti may allow hotkeys ie: alt-ctrl-DEL.
>>> 114 hlt
>>> 115 jmp END
>>
>>
>> START:
>> mov eax,[value]
>> mov edx,[value + 4]
>> cmp eax,edx
>> cmovae eax,edx
>>
>> END:
>> cli
>> hlt
>> jmp END
>>
>>
>> saves the conditional jump completely.
>
> Before we optimize this too much more... In the binary trees I'm
> familiar with, I would expect a node to be:
>
> dd Value
> dd Pointer to greater-than node
> dd Pointer to less-than node
>
> If we have:
>
> dd Value
> dd value from greater-than node
> ...
>
> Then the above code makes sense. If it is as I "expect", we want to do
> something like:
>
> mov eax, [Value]
> mov edx, [Value + 4] ; we can give "4" a name, if we wish...
> cmp eax, [edx] ; "dereference the pointer"?
> ...
>
> Just a thought...


Actually, I just stumbled upon that 'branch, PUSH/POP,
go on with something else' sequence - which shouts for
some code reduction. ;)


If I got it right, your array looks like this:

00[value] = data
04[value] = address greater_than_node
08[value] = address less_than_node

Might be a good idea to preload both addresses, before
we start any comparison:

....
mov eax,[value]
mov ebx,[value + 4]
mov ecx,[value + 8]
cmp eax,[ebx + index * 4]
....

The comparison is one clock slower, but saves a tempo-
rary register to store the current value (saving three
clocks to load those data into EDX). I don't know what
has to be put to which array, so I leave the remaining
part open for now...


Greetings from Augsburg

Bernhard
0
Reply Bernhard 12/19/2010 10:33:23 AM

Richard Russell wrote:


> Bernhard Schornak  wrote:
>
>> cmovae eax,edx
>>
>> saves the conditional jump completely.
>
> But previously the code was 386-compatible; with that change it will
> only run on a Pentium or later (I can't remember exactly when the
> conditional move instructions first appeared).
>
> Linus Torvalds argues here that conditional moves are rarely a good
> idea:
>
>   http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus


Backwards compatibility is an issue, of course,
and my suggestion definitely does not work with
machines below the '686' (PPro, K6-2, C3).

Linus Thorwald's arguments were valid for older
machines, but, meanwhile, CMOVcc is a one clock
instruction for register to register moves. AMD
and iNTEL explicitely recommend usage of CMOVcc
as replacement for short branches like shown in
the posted code.

The bottleneck is that any branch mechanism de-
pends on loading some data into EAX and EDX. On
a Phenom, it takes about four clocks until both
are present. Meanwhile, the other pipes of this
core are idling. Nothing can be executed out of
order until both registers are loaded. With the
original code, there's a 50 percent chance that
the branch is mispredicted, and pre-loaded code
has to be flushed (10 clocks++), while a condi-
tional move always is done within one clock (it
does not matter if the move is done or not) - a
CMOVcc wins any race against PUSH/POP sequences
and performs faster if EDX holds random values,
triggering a lot of mispredictions. If you know
what EDX probably contains (and set your branch
to the most likely target), your code might win
the race, though.

Branch prediction mechanism of recent machines:

1st prediction = not taken.

2nd prediction = taken.

3rd and up prediction use the branch prediction
history for their 'guesses'. You always trigger
at least one misprediction if the piece of code
is executed more than once. That's why MOVcc or
jump tables are used.

What I didn't mention, yet: Branch targets must
(or: should) be aligned to the beginning of the
next instruction cache line. Padding the super-
fluous space between those two labels with NOPs
adds _some_ bloat to your code. This is not the
case if you use CMOVcc...


Greetings from Augsburg

Bernhard Schornak
0
Reply Bernhard 12/19/2010 10:35:33 AM

Terje Mathisen mentioned:
> Richard Russell wrote:
>> On Dec 18, 7:20 pm, Bernhard Schornak<schor...@nospicedham.web.de>
>> wrote:
>>> cmovae eax,edx
>>>
>>> saves the conditional jump completely.
>>
>> But previously the code was 386-compatible; with that change it will
>> only run on a Pentium or later (I can't remember exactly when the
>> conditional move instructions first appeared).
>>
>> Linus Torvalds argues here that conditional moves are rarely a good
>> idea:
>>
>>   http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus
>
> That thread is from 2007, and mainly concerns itself with the P4, which 
> had many ugly stumbling blocks, not just slow mispredicts (due to 
> excessively long pipeline) and many microcoded instructions.
>
> Linus does mention the key issue though:
>
> CMOVcc is _only_ a win if/when the branch is really very close to 
> unpredictable, or when you have so many alternate code paths that the 
> Jcc's would trash the branch predictor table(s).

This may only apply to P4 and a few other Intels from this era.
My first 486 (bought 1992) was an AMD and it already had faster
than Intel Shift-instructions. Fast CMOV entered my world with
AMD K7 (got it 1998). Both already Rest In Pieces now.

> What this means is that it is close to impossible to measure the Jcc vs 
> CMOVcc difference in isolation, i.e. microbenchmarks don't really work 
> here. :-(

Yeah, it heavy depends on the surrounding code, but after I replaced
jcc with cmov wherever possible, the overall speed-performance of my
code increased about 10% while single-stepping often dont show any
difference in timing.

wonder if there are still 386 in use ...
__
wolfgang


0
Reply wolfgang 12/19/2010 1:15:50 PM

"Bob Masta" asked:
> "Rod Pemberton" wrote:

>>These arithmetic instructions do not update the CF:
>>inc, dec, div, idiv, not

> According to the original Intel "iAPX 86/88, 186/188
> Programmer's Reference", content of CF (as well as AF, OF,
> PF, SF, ZF) is undefined after DIV or IDIV.  Has this been
> formally changed on later processors, or is it just the case
> that they have not been found to change CF?

No, DIV and IDIV are still reported to leave O,S,Z,A,P and C
undefined after execution. I didn't check on my newest AMDs,
K7 and K8 showed at least some repeatable flag-behaviour for
certain values, but I couldn't see any usefull sense in it.

__
wolfgang


0
Reply wolfgang 12/19/2010 3:35:34 PM

wolfgang kern wrote:
> "Bob Masta" asked:
>> "Rod Pemberton" wrote:
>
>>> These arithmetic instructions do not update the CF:
>>> inc, dec, div, idiv, not
>
>> According to the original Intel "iAPX 86/88, 186/188
>> Programmer's Reference", content of CF (as well as AF, OF,
>> PF, SF, ZF) is undefined after DIV or IDIV.  Has this been
>> formally changed on later processors, or is it just the case
>> that they have not been found to change CF?
>
> No, DIV and IDIV are still reported to leave O,S,Z,A,P and C
> undefined after execution. I didn't check on my newest AMDs,
> K7 and K8 showed at least some repeatable flag-behaviour for
> certain values, but I couldn't see any usefull sense in it.

 From the vendor's viewpoint, having _undefined_ result flags is much to 
be preferred to having them defined to be partially updated.

The INC/DEC update everything except Carry setup introduces a partial 
flags register hazard, just like the PentiumPro got hammered by partial 
register updates:

; BL holds the previous state, BL+BH=BX used as lookup table index
   mov bh,[si]
   add dx,[bx]

This sort of code was fast on a 486 and pretty good on Pentium, then 
lost 10-20 cycles per iteration on a PPro.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/19/2010 4:12:53 PM

"wolfgang kern" <nowhere@never.at> wrote in message
news:iel0ig$bic$1@newsreader2.utanet.at...
>
> wonder if there are still 386 in use ...
>

In use?

The 486 goes back about two decades now.  I'm sure you could find a 386 in
industry somewhere...

IIRC, the oldest cpu I have is a 486 DX2-50.  It's packed away somewhere and
not in use.  Oldest in use cpu would be a 233Mhz, and it's turned on maybe a
couple times a year.  What about you guys?

And, what is the oldest generation of x86 cpu you think should be supported?

I had some older cpu's but gave them to a friend who was "collecting" them
around '00.  He probably sold them on Ebay...


Rod Pemberton


0
Reply Rod 12/19/2010 4:57:12 PM

In article  <ieldck$vn4$1@speranza.aioe.org>
           do_not_have@nospicedham.notreplytome.cmm "Rod Pemberton" writes:

> "wolfgang kern" <nowhere@never.at> wrote in message
> news:iel0ig$bic$1@newsreader2.utanet.at...
> >
> > wonder if there are still 386 in use ...
> >
> 
> In use?
> 
> The 486 goes back about two decades now.  I'm sure you could find a 386 in
> industry somewhere...
> 
> IIRC, the oldest cpu I have is a 486 DX2-50.  It's packed away somewhere and
> not in use.  Oldest in use cpu would be a 233Mhz, and it's turned on maybe a
> couple times a year.  What about you guys?

This machine (the one I'm using to post this) is a 486 DX2/66 running 
DOS 6.22.  It's in daily use for news and email (using a KA9Q 
derivative) and has never been troubled by W32 malware ;-)  Or any 
other malware come to that, and is still going strong.

> And, what is the oldest generation of x86 cpu you think should be supported?

8088/86?  That's what the newsgroup exists for, and I'd hazard a guess 
that most of the regulars here cut their teeth on the 80[2]86, 
learning their optimisation "tricks" to get the maximum performance 
out of slow (by modern standards) processors.

Pete
-- 
   "We have not inherited the earth from our ancestors,
    we have borrowed it from our descendants."
0
Reply pete 12/19/2010 7:30:32 PM

pete answered to Rod Pemberton:

>> I wrote:
>>> wonder if there are still 386 in use ...
>> In use?

>> The 486 goes back about two decades now.  I'm sure you could find a
>> 386 in industry somewhere...

could be, even all my clients already upgraded to newer hardware.

>> IIRC, the oldest cpu I have is a 486 DX2-50.
>> It's packed away somewhere and not in use.
>> Oldest in use cpu would be a 233Mhz, and it's turned on maybe a
>> couple times a year.  What about you guys?

my oldies are gone to the recycler to once become new hardware :)

> This machine (the one I'm using to post this) is a 486 DX2/66 running
> DOS 6.22.  It's in daily use for news and email (using a KA9Q
> derivative) and has never been troubled by W32 malware ;-)  Or any
> other malware come to that, and is still going strong.

:) good olde hardware ... my favorite of OLD were DOS 6.00, it couldn't
 do something against me and my own way of using hardware ... :)

>> And, what is the oldest generation of x86 cpu you think should
>> be supported?

Methink 486 is old enough to be placed into the museum, but may be
worth to support because of its current population-count.
386 and before are just historical relicts in my view and I wont
waste my time to write code for it or even think about it.

> 8088/86?  That's what the newsgroup exists for, and I'd hazard a guess
> that most of the regulars here cut their teeth on the 80[2]86,
> learning their optimisation "tricks" to get the maximum performance
> out of slow (by modern standards) processors.

I'm afraid you missed some time already passed yet, 2011 is soon!
so I cannot imagine that someone still play with 86/88/186/286/386.
Optimisation for 286 is totally obsolete for 386 and so on ...
Even I count myself to the 'aged' programmers, the future is more
near to me and I eagerly wait for the release of AMDs BullDoze :)

__
wolfgang


0
Reply wolfgang 12/19/2010 8:39:17 PM

Terje Mathisen replied:
> wolfgang kern wrote:
>> Bob Masta asked:
>>> Rod Pemberton wrote:

>>>> These arithmetic instructions do not update the CF:
>>>> inc, dec, div, idiv, not

>>> According to the original Intel "iAPX 86/88, 186/188
>>> Programmer's Reference", content of CF (as well as AF, OF,
>>> PF, SF, ZF) is undefined after DIV or IDIV.  Has this been
>>> formally changed on later processors, or is it just the case
>>> that they have not been found to change CF?

>> No, DIV and IDIV are still reported to leave O,S,Z,A,P and C
>> undefined after execution. I didn't check on my newest AMDs,
>> K7 and K8 showed at least some repeatable flag-behaviour for
>> certain values, but I couldn't see any usefull sense in it.

> From the vendor's viewpoint, having _undefined_ result flags is much to be 
> preferred to having them defined to be partially updated.

perhaps just to have 'foolproof/safe' statements in the manuals ? :)

> The INC/DEC update everything except Carry setup introduces a partial 
> flags register hazard, just like the PentiumPro got hammered by partial 
> register updates:

> ; BL holds the previous state, BL+BH=BX used as lookup table index
>   mov bh,[si]
>   add dx,[bx]

> This sort of code was fast on a 486 and pretty good on Pentium, then lost 
> 10-20 cycles per iteration on a PPro.

So I think my decision for using only AMD-CPUs wasn't bad at all :)

Anyway also LOOP and friends became slow because of (vectored)
special flag-treatment by the use of the internal ecxZ-flag apart
from the common efl-Z.
By any fortune we don't encounter too many penalties with
'REP string'-codes which also use this internal ecxZ flag.

__
wolfgang


0
Reply wolfgang 12/19/2010 9:36:35 PM

On Dec 19, 1:15=A0pm, "wolfgang kern" <nowh...@never.at> wrote:
> Yeah, it heavy depends on the surrounding code, but after I replaced
> jcc with cmov wherever possible, the overall speed-performance of my
> code increased about 10%

On an AMD CPU, yes?  As with the issue of aligning jump targets on
multiple-of-32 addresses, this is a divergence between AMD and Intel.
If it's true that on a modern Intel CPU a Jcc will often beat a CMOVcc
(unless the jump is highly unpredictable) that leaves programmers who
care about speed with a dilemma.

My personal approach, and I know you won't agree with this, is to
optimise my code for Intel architectures and tell my users not to use
AMD if they want the best performance!  That avoids having to add
padding NOPs all over the place, which would be a major pain for me.

Richard.
http://www.rtrussell.co.uk/
0
Reply Richard 12/19/2010 11:41:38 PM

On Dec 19, 4:57=A0pm, "Rod Pemberton"
<do_not_h...@nospicedham.notreplytome.cmm> wrote:
> "wolfgang kern" <nowh...@never.at> wrote in message
> news:iel0ig$bic$1@newsreader2.utanet.at...

....

> > wonder if there are still 386 in use ...
>
> In use?
>
> The 486 goes back about two decades now. =A0I'm sure you could find a 386=
 in
> industry somewhere...
>
> IIRC, the oldest cpu I have is a 486 DX2-50. =A0It's packed away somewher=
e and
> not in use. =A0Oldest in use cpu would be a 233Mhz, and it's turned on ma=
ybe a
> couple times a year. =A0What about you guys?
>
> And, what is the oldest generation of x86 cpu you think should be support=
ed?

Maybe it depends on the intended application. In most cases a Pentium
3 or AMD K7, perhaps. (I'm not sure of the exact Intel/AMD
correspondence.)

What were the key feature changes? Something like

* 80386 - 32-bit protected mode
* Pentium 1 - time stamp counter, cmov (IIRC)
* Pentium 1 MMX - er, MMX
* many individual later instruction set additions
* 64-bit

Surprisingly, apart from older machines still in use there are still
some 80386 systems on sale! Try a search for PC/104 and 80386.

James
0
Reply James 12/20/2010 12:01:55 AM

On Dec 19, 11:41=A0pm, Richard Russell
<n...@nospicedham.rtrussell.co.uk> wrote:

....

> My personal approach, and I know you won't agree with this, is to
> optimise my code for Intel architectures and tell my users not to use
> AMD if they want the best performance! =A0That avoids having to add
> padding NOPs all over the place, which would be a major pain for me.

Yuck. You'll have people thinking AMD is intrinsically inferior.

James
0
Reply James 12/20/2010 12:06:46 AM

On Dec 19, 10:57=A0am, "Rod Pemberton"
<do_not_h...@nospicedham.notreplytome.cmm> wrote:
> "wolfgang kern" <nowh...@never.at> wrote in message
>
> news:iel0ig$bic$1@newsreader2.utanet.at...
>
>
>
> > wonder if there are still 386 in use ...
>
> In use?
>
> The 486 goes back about two decades now. =A0I'm sure you could find a 386=
 in
> industry somewhere...
>
> IIRC, the oldest cpu I have is a 486 DX2-50. =A0It's packed away somewher=
e and
> not in use. =A0Oldest in use cpu would be a 233Mhz, and it's turned on ma=
ybe a
> couple times a year. =A0What about you guys?
>

Bootable, useable..

12 mhz xt-clone, actually an st with a nec v-20 (80186 clone).  It has
2 5 1/4 DSDD drives, it gets fired up a couple of times a year.  I had
to swipe out its laptop ide hd this year :/ -we backup our hd data
often don't we? :(  I keep it 'cause it is such an eclectic specimen.
Dual monitor, vga and monachrome, ide hd with a whopping 8k buffer, io
off the 8k buffer is quick, a read or write beyond that is much
slower, but still a big improvement over the original scsi hd.
The old scsi has win3.0 on it, been years since used that hd tho.

A clone ASUS 486 33mhz EISA which holds the bulk of the old software,
DR-DOS 6 and WIN3.3 - about the same use, maybe a couple of times a
year.

The dell pentium mmx laptop sees duty for my floppy bootstrap stuff
yet.

3 desktops pentium I & II mmx's  w/ win95,98, Debian Linux.

2 desktops pentium 4's HT's which I use regularly, XP sp3.

Been saving my pennies for one of those newfangled, 64bit, 3yr
thowaways, laptops - probably one of those is next.

> And, what is the oldest generation of x86 cpu you think should be support=
ed?
>

Well, Pentium class.

> I had some older cpu's but gave them to a friend who was "collecting" the=
m
> around '00. =A0He probably sold them on Ebay...
>

Hmm, collector or packrat, yup that's me, I hate to thow the stuff
out, I might need one to hold the door open someday..

Steve

> Rod Pemberton

0
Reply s_dubrovich 12/20/2010 5:19:57 AM

In article  <ielqi0$fpa$1@newsreader2.utanet.at>
           nowhere@never.at "wolfgang kern" writes:
[..]
> pete answered to Rod Pemberton:
> ...
> >> And, what is the oldest generation of x86 cpu you think should
> >> be supported?
> 
> Methink 486 is old enough to be placed into the museum, but may be
> worth to support because of its current population-count.
> 386 and before are just historical relicts in my view and I wont
> waste my time to write code for it or even think about it.
> 
> > 8088/86?  That's what the newsgroup exists for, and I'd hazard a guess
> > that most of the regulars here cut their teeth on the 80[2]86,
> > learning their optimisation "tricks" to get the maximum performance
> > out of slow (by modern standards) processors.
> 
> I'm afraid you missed some time already passed yet, 2011 is soon!
> so I cannot imagine that someone still play with 86/88/186/286/386.
> ...

Misunderstanding on my part :(  In my post-supper haze I interpreted 
"supported" to mean supported here in clax86 -- there has been the 
occasional 8086 thread here after all.

As for "supported for new production code" I'd doubt one would want to 
go back much further than P3, maybe the original Pentium at a push (if 
there are still enough of these around to justify that).  Having said 
that, there are 386/486 processors around in embedded systems but code 
written for these will likely be bespoke.

Sorry for the confusion
Pete
-- 
   "We have not inherited the earth from our ancestors,
    we have borrowed it from our descendants."
0
Reply pete 12/20/2010 6:03:37 AM

James Harris wrote:
> What were the key feature changes? Something like
>
> * 80386 - 32-bit protected mode

486 - pipelined, capable of one instruction/cycle.

> * Pentium 1 - time stamp counter, cmov (IIRC)

The dual pipelines were _far_ more important!

> * Pentium 1 MMX - er, MMX

Just a small blip, but it did allow Zoran's SoftDVD to handle playback 
with no frame drops.

> * many individual later instruction set additions
> * 64-bit

That's the next big one, and the one that have saved us from Itanic.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/20/2010 7:46:43 AM

On Dec 20, 7:46 am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> James Harris wrote:
> > What were the key feature changes? Something like
>
> > * 80386 - 32-bit protected mode
>
> 486 - pipelined, capable of one instruction/cycle.

Rod asked what is the oldest generation we thought should be
supported. I guess I didn't make it very clear but I was listing some
key feature upgrades in the x86 line that a programmer might write
code for. Such a CPU would be a minimum target for code that would
simply fail to run on an earlier processor.

> > * Pentium 1 - time stamp counter, cmov (IIRC)
>
> The dual pipelines were _far_ more important!

To performance, yes, and that can be essential in some applications,
but not to the arguably more common goal of executability.

> > * Pentium 1 MMX - er, MMX
>
> Just a small blip, but it did allow Zoran's SoftDVD to handle playback
> with no frame drops.

Again, someone who coded MMX instructions would know they wouldn't
execute on an earlier CPU.

> > * many individual later instruction set additions
> > * 64-bit
>
> That's the next big one, and the one that have saved us from Itanic.

No bad thing, to be saved from the Itanic!

James
0
Reply James 12/20/2010 9:25:29 AM

On Dec 20, 5:19=A0am, s_dubrov...@nospicedham.yahoo.com wrote:

....

> Hmm, collector or packrat, yup that's me, I hate to thow the stuff
> out,

You sound like me! Keeping the old stuff may not be a bad idea. I've
been looking for some specific old 386 machines on ebay to test a
certain piece of code and have been amazed at the prices the sellers
expect to get for them.

James
0
Reply James 12/20/2010 9:31:01 AM

Bluemoon wrote:
> On Dec 19, 7:21 am, Frank Kotler
> <fbkot...@nospicedham.myfairpoint.net> wrote:
>> Bernhard Schornak wrote:
>>> catcalls wrote:
>>> <snip>
>>>>       95 ;Binary Trees>  Nodes
>>>>       96 ; Value
>>>>       97 ; Greater Than Node
>>>>       98 ; Less Than Node
>>>>       99 START:
>>>>      100         mov eax, [value]
>>>>      101         mov edx, [value+4]
>>>>      102         cmp eax, edx
>>>>      103         jae DONE
>>>>      104 LOWER:
>>>>      105         push edx
>>>>      106         pop eax
>>>>      107 DONE:
>>>>      108         mov ebx, 10
>>>>      109         call writeInt
>>>>      110
>>>>      111 END:
>>>>      112 cli        ;uncomment it if you like a 'dead end'
>>>>      113              ;sti may allow hotkeys ie: alt-ctrl-DEL.
>>>>      114 hlt
>>>>      115 jmp END
>>> START:
>>> mov eax,[value]
>>> mov edx,[value + 4]
>>> cmp eax,edx
>>> cmovae eax,edx
>>> END:
>>> cli
>>> hlt
>>> jmp END
>>> saves the conditional jump completely.
>> Before we optimize this too much more... In the binary trees I'm
>> familiar with, I would expect a node to be:
>>
>> dd Value
>> dd Pointer to greater-than node
>> dd Pointer to less-than node
>>
>> If we have:
>>
>> dd Value
>> dd value from greater-than node
>> ...
>>
>> Then the above code makes sense. If it is as I "expect", we want to do
>> something like:
>>
>> mov eax, [Value]
>> mov edx, [Value + 4]  ; we can give "4" a name, if we wish...
>> cmp eax, [edx] ; "dereference the pointer"?
>> ...
>>
>> Just a thought...
>>
>> Greetings from Raymond,
>> Frank
> 
> 
> Before going too deep in computation optimization, as modern computer
> is too fast while most of the time the CPU is starved for data
> I suspect that a memory overhead of 2 pointers per each dd is a bad
> idea.

Ummm... okay. I don't see how you're going to do a "binary tree" with 
less. In fact, I'd modify my above suggestion to make a node:

pointer to "the data"
pointer to greater-than node
pointer to less-than node

This would allow the same code to manage data other than dwords - bytes, 
strings, double-precision floats, structures, whatever you like. 
Further, I think I'd pass, as a parameter to "addnode" etc. functions, a 
"comparison routine" rather than use just "cmp". This would result in 
greater "memory overhead" (is that a problem?), but much more "flexible" 
code.

> I would have the original value /16 (or 256, or anything optimal for
> practical data)
> in each btree node we store an array or sorted list of that 16 values
> (and in some case you may not need to sort them)

I'm not clear what you've got in mind here, Bluemoon. Can you explain 
further... perhaps an example?

> It may take a few more instruction count but it should be more memory/
> cache friendly

"Cache friendly" is good. As I see it, first we need an algorithm that 
does what we intend, then we optimize the algorithm, then we worry about 
instruction count/speed. Keeping "cache-friendly" in mind, right from 
the top, is probably good...

I should make clear that I don't know too much about "binary trees". 
There's an example in K&R (in C, of course) that I'm mostly "going by". 
You guys may have something entirely different in mind...

Best,
Frank
0
Reply Frank 12/20/2010 12:23:01 PM

On Sun, 19 Dec 2010 14:15:50 +0100, "wolfgang kern"
<nowhere@never.at> wrote:

<snip>

>wonder if there are still 386 in use ...

Funny you should ask that.  Recently I was approached by
someone looking for DAS-20 data acquisition boards.
These are ISA-bus boards that my old Daqarta for DOS
supported, so they were mentioned on my site.  This person
wondered if I had any to sell, to be used as replacement
units for his client.  I told him that I only had a couple
of used boards, but I would be glad to test them to make
sure they were still up to spec.

My old 486 system wouldn't boot, since the CMOS battery had
died and the hard drive info was thus no longer valid.
(Ahh, the "good old days" before Plug N Play!)  Rather than
pull the drive to read the heads, cylinders, etc, I decided
to try the even-older 386 machine.  It fired right up just
fine, and I ran my tests and verified that the boards were
still OK.

Sometimes it pays to be a junk collector!  

....But not this time, since the guy's customer wouldn't buy
unless I agreed to offer a money-back guarantee that the
boards would work in his setup.  I figured that if he is
looking for hardware this old, he had probably damaged his
original boards.  I concluded he probably didn't know what
he was doing, and politely declined.

But next time I'll be ready!

Best regards,

 
Bob Masta
 
              DAQARTA  v5.10
   Data AcQuisition And Real-Time Analysis
              www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
    Frequency Counter, FREE Signal Generator
           Pitch Track, Pitch-to-MIDI 
         DaqMusic - FREE MUSIC, Forever!
             (Some assembly required)
     Science (and fun!) with your sound card!
0
Reply N0Spam 12/20/2010 2:27:10 PM

On Dec 20, 12:06=A0am, James Harris
<james.harri...@nospicedham.googlemail.com> wrote:
> Yuck. You'll have people thinking AMD is intrinsically inferior.

IMHO needing to align jump targets on multiple-of-32 addresses (for
best performance) is a highly undesirable and retrograde feature.  I
probably would call AMD "intrinsically inferior" for that reason
alone.

Richard.
http://www.rtrussell.co.uk/
0
Reply Richard 12/20/2010 3:48:08 PM

"Frank Kotler" <fbkotler@nospicedham.myfairpoint.net> wrote in message
news:ienhud$5fi$1@speranza.aioe.org...
> Bluemoon wrote:
> > On Dec 19, 7:21 am, Frank Kotler
> > <fbkot...@nospicedham.myfairpoint.net> wrote:
> >> Bernhard Schornak wrote:
> >>> catcalls wrote:
....
<snip>
> >>
> >> dd Value
> >> dd Pointer to greater-than node
> >> dd Pointer to less-than node
> >>
<snip>
>
> Ummm... okay. I don't see how you're going to do a "binary tree" with
> less. In fact, I'd modify my above suggestion to make a node:
>
> pointer to "the data"
> pointer to greater-than node
> pointer to less-than node
>
> This would allow the same code to manage data other than dwords - bytes,
> strings, double-precision floats, structures, whatever you like.

If the tree supports multiple data types, you'd also need to know what data
type is that the location where the "pointer to 'the data'" points to.  It's
likely you are traversing the tree searching it for just one type of data:
dword, float, string, etc.  So, maybe,

pointer to greater-than node
pointer to less-than node
pointer to "the data"
integer for data type, e.g., dd Type

dd Type could be reduced in size to save space, but that'll introduce
mis-alignment of data.  The pointer size is dependent on the cpu mode on
x86, unless you choose to artificially limit pointer size, e.g., 16-bit
pointers while in 32-bit/64-bit code.  That'll add a very slight addition
overhead, but will reduce size.  Is a 64KB array large enough?

The earlier format with "dd Value" has a single data type (or structure) and
has a fixed size.  This allows many nodes to be stored as an array in any
available block of memory (declared array, malloc, etc.).  Fixed size nodes
are convenient if you have sufficient memory, or are using an appendable
read-write file.  You can allocate new nodes at the end of the list simply
by incrementing a pointer to the end-of-array by the fixed size of the node.
You can leave deleted nodes in place until you need more space, i.e., a
deleted node remains allocated but is unused since no other nodes point to
that node anymore.  If needed, a simple routine can be executed to compress
the entire array, er... binary tree, to eliminate the deleted nodes, i.e.,
garbage collection.  This is just a loop that copies the in-use nodes down
onto the unused and earlier nodes.  I've used this method in C a few times.
It's very fast.  It's low overhead.  It's low space.  It's simple.


Rod Pemberton


0
Reply Rod 12/20/2010 4:17:05 PM

"Richard Russell" <news@nospicedham.rtrussell.co.uk> wrote in message
news:5e4affcc-27e6-461e-9c84-49843fd35a4e@k30g2000vbn.googlegroups.com...
> My personal approach, and I know you won't agree with this, is to
> optimise my code for Intel architectures and tell my users not to use
> AMD if they want the best performance!  That avoids having to add
> padding NOPs all over the place, which would be a major pain for me.
>

Why are you manually padding with NOPs?  Don't most assemblers have an
instruction alignment directive?

E.g.,

ALIGN or ALIGNB for NASM
..align or .balign[wl] or .p2align[wl] for GAS (GNU AS)
ALIGN for MASM


RP


0
Reply Rod 12/20/2010 4:34:16 PM

Rod Pemberton wrote:
> pointer to greater-than node
> pointer to less-than node
> pointer to "the data"
> integer for data type, e.g., dd Type

Assuming this is speed-critical you'd want more or less all decisions to 
be made without having to follow pointers to a separate data storage, 
the easiest method is probably to allocate a fixed size prefix area, 
plus a pointer to the possibly much longer data (text strings?).

With a 32-bit prefix you could have code like this (assume EBX -> 
current node, EAX has target prefix key.

Each node contains:
   prefix
   data_pointer
   right_subtree

The left subtree follows directly after this node!

next:
   cmp eax,[ebx.prefix]
   mov edx,[ebx.right_subtree]
   lea ebx,[ebx+12]	; Assume left
    ja next
   xchg ebx,edx		; Go right
    jb next

check_keys:
; We get here if the prefix is identical, if so check the full keys!
; [EDX.data_pointer-12] points to the actual data

; Go back up if not a match.

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/20/2010 5:32:50 PM

On 2010-12-19 16:57, Rod Pemberton wrote:
> "wolfgang kern"<nowhere@never.at>  wrote in message
> news:iel0ig$bic$1@newsreader2.utanet.at...
>>
>> wonder if there are still 386 in use ...
>>
>
> In use?
>
> The 486 goes back about two decades now.  I'm sure you could find a 386 in
> industry somewhere...

My father still *uses* an original 4.77 MHz IBM PC with 640K and a 8088, 
and does so on an almost daily basis...

Robert

0
Reply Robert 12/20/2010 8:57:23 PM

Robert AH Prins <spamtrap@prino.org> writes:
>On 2010-12-19 16:57, Rod Pemberton wrote:
>> "wolfgang kern"<nowhere@never.at>  wrote in message
>> news:iel0ig$bic$1@newsreader2.utanet.at...
>>>
>>> wonder if there are still 386 in use ...
>>>
>>
>> In use?
>>
>> The 486 goes back about two decades now.  I'm sure you could find a 386 in
>> industry somewhere...
>
>My father still *uses* an original 4.77 MHz IBM PC with 640K and a 8088, 
>and does so on an almost daily basis...

Hi,

   Beats me!  I have parts for a Z100 8086/8085, but no working
systems.  I do use a Pentium laptop.  Was using a Pentium
desktop until this year when something scary happened.  It
still fires up, but...  And an 80186 palmtop gets placed in a pocket
about half time.

Cheers,

Steve N.
0
Reply Bogus 12/20/2010 9:58:06 PM

On Dec 20, 4:34=A0pm, "Rod Pemberton"
<do_not_h...@nospicedham.notreplytome.cmm> wrote:
> Why are you manually padding with NOPs? =A0Don't most assemblers have an
> instruction alignment directive?

Who said I would have to *manually* pad with NOPs?  Of course, the
assembler I use (NASM) has align directives I could use, but there are
a number of reasons why doing that would be a pain:

1.  Aligning every jump target would significantly increase the
overall size of the code, and that alone could impair performance by
reducing the efficiency of the instruction cache (let alone the sheer
inelegance of it).

2.  The padding would make many 'short' jumps go out of range,
including some 'loop' and 'jecxz' instructions which don't have a
'near' equivalent.

3.  The natural *module* alignment is 16 bytes, so if I'm not careful
earlier modules in the build could make all my nice 'align 32'
directives actually place the jump targets exactly half way between
the desired addresses!  (Maybe there is some linker flag I could use
to change the module alignment, but I haven't investigated that).

Why should I put myself through all that hassle, when on an Intel CPU
it gains me absolutely nothing (and indeed will slightly worsen
performance)?

Richard.
http://www.rtrussell.co.uk/

0
Reply Richard 12/20/2010 10:44:03 PM

On Dec 20, 3:48 pm, Richard Russell
<n...@nospicedham.rtrussell.co.uk>
> On Dec 20, 12:06 am, James Harris
> <james.harri...@nospicedham.googlemail.com> wrote:

> > Yuck. You'll have people thinking AMD is intrinsically inferior.
>
> IMHO needing to align jump targets on multiple-of-32 addresses (for
> best performance) is a highly undesirable and retrograde feature.  I
> probably would call AMD "intrinsically inferior" for that reason
> alone.

I think you are missing the point. Some AMDs fetch 32 bytes of
instruction memory in a single cycle! That's a good thing. It means
that if you jump to a 16-byte aligned target the AMD CPU will
immediately fetch 16 or 32 bytes ahead, depending on whether the
address is just 16- or is also 32-byte aligned. Presumably an Intel
CPU only fetches in blocks of 16 bytes...? If so AMD will match it or
beat it at this metric. :-)

James
0
Reply James 12/21/2010 12:15:01 AM

On Dec 20, 10:44=A0pm, Richard Russell
<n...@nospicedham.rtrussell.co.uk> wrote:

....

> 3. =A0The natural *module* alignment is 16 bytes, so if I'm not careful
> earlier modules in the build could make all my nice 'align 32'
> directives actually place the jump targets exactly half way between
> the desired addresses! =A0(Maybe there is some linker flag I could use
> to change the module alignment, but I haven't investigated that).

No need. There are subtleties but in general "half-way between" is not
as bad as you seem to think. You should avoid jumping to just before
the ends of I-fetch lines.

> Why should I put myself through all that hassle, when on an Intel CPU
> it gains me absolutely nothing (and indeed will slightly worsen
> performance)?

You don't have to. Stick to 16-byte for most cases. Relax it to less
alignment in some instances. And save 32-byte alignment for innermost
loops and other hotspots. On average half your 16-byte alignments will
be 32-byte aligned anyway.

James
0
Reply James 12/21/2010 12:26:21 AM

On Dec 21, 12:15=A0am, James Harris
<james.harri...@nospicedham.googlemail.com> wrote:
> No need. There are subtleties but in general "half-way between" is not
> as bad as you seem to think. You should avoid jumping to just before
> the ends of I-fetch lines.

I don't want to add *any* padding at all, whether to align on 16-byte
or 32-byte boundaries.

> Presumably an Intel CPU only fetches in blocks of 16 bytes...?
> If so AMD will match it or beat it at this metric. :-)

I don't believe that how code is fetched is the issue.  Intel CPUs
don't seem to have any alignment requirements in terms of optimising
performance: a jump takes the same time whether it crosses a 'fetch
boundary' or not (assuming the code is cached).

Even the AMD Optimization Guide admits that the issue arises from
"architectural restrictions".  It also says "Perform this alignment by
rearranging code; it is not beneficial to align branches using padding
sequences".  If NASM could do that automatically I'd be impressed!

Richard.
http://www.rtrussell.co.uk/
0
Reply Richard 12/21/2010 9:51:51 AM

On Dec 21, 9:51 am, Richard Russell <n...@nospicedham.rtrussell.co.uk>
wrote:
> On Dec 21, 12:15 am, James Harris
> <james.harri...@nospicedham.googlemail.com> wrote:

> > No need. There are subtleties but in general "half-way between" is not
> > as bad as you seem to think. You should avoid jumping to just before
> > the ends of I-fetch lines.
>
> I don't want to add *any* padding at all, whether to align on 16-byte
> or 32-byte boundaries.

Well, this is assembler!

> > Presumably an Intel CPU only fetches in blocks of 16 bytes...?
> > If so AMD will match it or beat it at this metric. :-)
>
> I don't believe that how code is fetched is the issue.

There are a few adverse effects of unaligned code fetch on CPUs from
both manufacturers. Some are obvious such as keeping the instruction
decoders fed: a single cycle can pass to decode only as many bytes as
are in the cache line from the jump target to the end of the line; but
the next cycle can get the full 16 or 32 bytes so it's not a big delay
and becomes a non-issue if the instructions are not cached since then
everything becomes a lot slower. Branch target alignment would,
however, be a good optimisation for an inner loop.

Other jump-target effects are more subtle or indirect such as possible
unnecessary consumption of I-cache space, and slightly increased
likelihood of cache collisions elsewhere in the code.

The Intel Pentium 4 Netburst, AIUI, isn't bothered too much with
branch target alignment because it caches decoded instructions (one of
the very few good effects of Netburst!) but the issue returned in the
Pentium M. Not sure about later CPUs. I can't recall there being
another trace cache. Anyone correct me on that?

> Intel CPUs
> don't seem to have any alignment requirements in terms of optimising
> performance: a jump takes the same time whether it crosses a 'fetch
> boundary' or not (assuming the code is cached).

It applies to Intel too. According to their IA-32 Intel Architecture
Optimization Reference Manual order number 248966-009 which discusses
machines to Netburst and Pentium M: "Assembly/Compiler Coding Rule 30.
(M impact, H generality) All branch targets should be 16-byte
aligned."

> Even the AMD Optimization Guide admits that the issue arises from
> "architectural restrictions".  It also says "Perform this alignment by
> rearranging code; it is not beneficial to align branches using padding
> sequences".

Their "architectural restrictions" comment is probably something the
PR boys would jump on and call it architectural characteristics! - and
I think it's just a choice of words. Architectural restrictions apply
to Intel too.

I guess the AMD manual says to avoid padding sequences because the
effect of a misaligned branch is not high, i.e. not as high as
consuming multiple nops.

BTW, as a bit of an aside, padding sequences longer than one byte
should not be a series of NOPs such as I think one would get from a
Nasm align directive. IIRC, when I tested it a 3-issue CPU could
consume 3 nops per cycle, i.e. they were treated like separate
instructions. Understandable really. But the same CPU could consume a
*3-byte* nop in 1/3 cycle, if you see what I mean. IOW the instruction
decode can consume multiple padding bytes in one go if they form a
single instruction.

> If NASM could do that automatically I'd be impressed!

This raises an intriguing idea. Because different manufacturers *and*
their different CPUs have different optimisation requirements maybe
there is a need for a post-assembler optimizer. IOW once 'asm-like'
output has been generated it could be tailored to different target
CPUs. Then the various idiosyncrasies of each CPU model could be
accommodated. Hmm....

James
0
Reply James 12/21/2010 12:21:41 PM

On Dec 21, 12:21=A0pm, James Harris
<james.harri...@nospicedham.googlemail.com> wrote:
> It applies to Intel too. According to their IA-32 Intel Architecture
> Optimization Reference Manual order number 248966-009 which discusses
> machines to Netburst and Pentium M: "Assembly/Compiler Coding Rule 30.
> (M impact, H generality) All branch targets should be 16-byte
> aligned."

That's very interesting.  I'd not previously spotted it, but it is in
248966-020 as Rule 12.  However in reading the detail in section
2.1.2.2 (Instruction Fetch Unit) my interpretation is that it's a less
significant issue than on AMD CPUs.

AMD CPUs seem to have alignment issues that are more complex than
simple code starvation ("Due to architectural restrictions, a branch
that is split across a 16-byte boundary cannot dispatch with any other
instructions when it is predicted taken") which could explain why I
have found the impact to be more noticeable than with Intel CPUs.

However it's obviously quite a subtle and complicated issue.

> I guess the AMD manual says to avoid padding sequences because the
> effect of a misaligned branch is not high, i.e. not as high as
> consuming multiple nops.

All I know, from my own practical experience, is that on occasion when
I've made a trivial code change (perhaps shifting the remainder of the
module by a couple of bytes) it has had a very significant adverse
impact on overall execution speed on an AMD Athlon.  Probably a few
carefully placed ALIGN directives would have solved that, but finding
out where to put them is not trivial!

Richard.
http://www.rtrussell.co.uk/
0
Reply Richard 12/21/2010 2:50:15 PM

"Richard Russell" <news@nospicedham.rtrussell.co.uk> wrote in message
news:eda35670-3c49-4d62-9752-213e6708dcb8@e20g2000vbn.googlegroups.com...
>
> AMD CPUs <snip>
> "Due to architectural restrictions, a branch
> that is split across a 16-byte boundary cannot dispatch
> with any other instructions when it is predicted taken"
>

Uh...  Where are these 16-byte boundaries _exactly_?

Just how do I determine where the 16-byte boundaries are?  Does it start
from the start of my assembly code segment?  If so, what guarantees the OS
will load it to a properly aligned memory area?  Does it start from the base
address of the segment? e.g., segment_base+16*n .  Or, is it when the low
nybble of an address is zero?  For either, which address has the alignment
requirement: the physical memory address or the address relative to the
segment base address?  If paging is involved, which address do I use:
physical address or paging address?  From what starting location do I start
counting to determine where the next 16-byte boundary is?  Is it affected by
paging and the segment base address?  If I intend to load a block of
pre-compiled aligned code, how do I make sure it's loaded to an address with
correct 16-byte boundary alignment when physical addresses, base addresses,
and paging are all involved and they change the address values available to
my code?


Rod Pemberton



0
Reply Rod 12/21/2010 6:53:17 PM

On Dec 21, 12:53=A0pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> "Richard Russell" <n...@nospicedham.rtrussell.co.uk> wrote in message
>
> news:eda35670-3c49-4d62-9752-213e6708dcb8@e20g2000vbn.googlegroups.com...
>
>
>
> > AMD CPUs <snip>
> > "Due to architectural restrictions, a branch
> > that is split across a 16-byte boundary cannot dispatch
> > with any other instructions when it is predicted taken"
>
> Uh... =A0Where are these 16-byte boundaries _exactly_?
>
> Just how do I determine where the 16-byte boundaries are? =A0Does it star=
t
> from the start of my assembly code segment? =A0If so, what guarantees the=
 OS
> will load it to a properly aligned memory area? =A0Does it start from the=
 base
> address of the segment? e.g., segment_base+16*n . =A0Or, is it when the l=
ow
> nybble of an address is zero? =A0For either, which address has the alignm=
ent
> requirement: the physical memory address or the address relative to the
> segment base address? =A0If paging is involved, which address do I use:
> physical address or paging address? =A0From what starting location do I s=
tart
> counting to determine where the next 16-byte boundary is? =A0Is it affect=
ed by
> paging and the segment base address? =A0If I intend to load a block of
> pre-compiled aligned code, how do I make sure it's loaded to an address w=
ith
> correct 16-byte boundary alignment when physical addresses, base addresse=
s,
> and paging are all involved and they change the address values available =
to
> my code?


The physical address.  All sane OSs align all segments to (at least)
16B boundaries if they're using segments, and programs are loaded in
page aligned chunks in flat model (it would also be OK if they loaded
code in flat model on 16/32B boundaries, but I'm not aware of any OS's
that do so).  In neither case do you need to worry about paging, since
virtual pages are always aligned on physical page (4KB) boundaries, so
if the linear address is aligned, so is the physical address.

Linkers also generally align (object) segments of program text on 16
byte boundaries in executables, so if you objects are linked together,
both objects will stay on 16B boundaries in the resulting executable.

The segment alignment thing is recommended explicitly by Intel: (from
the description of segment descriptor contents in the System
Programming Guide):

"Base address fields:
Defines the location of byte 0 of the segment within the 4-GByte
linear address space. The processor puts together the three base
address fields to form a single 32-bit value. Segment base addresses
should be aligned to 16-byte boundaries. Although 16-byte alignment
is not required, this alignment allows programs to maximize
performance
by aligning code and data on 16-byte boundaries."
0
Reply robertwessel2 12/21/2010 8:08:55 PM

In article <ajg4u7-tkl.ln1@ntp6.tmsw.no>,
	Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> writes:
> Rod Pemberton wrote:
> > pointer to greater-than node
> > pointer to less-than node
> > pointer to "the data"
> > integer for data type, e.g., dd Type
>
> Assuming this is speed-critical you'd want more or less all decisions to
> be made without having to follow pointers to a separate data storage,
> the easiest method is probably to allocate a fixed size prefix area,
> plus a pointer to the possibly much longer data (text strings?).
>
> With a 32-bit prefix you could have code like this (assume EBX ->
> current node, EAX has target prefix key.
>
> Each node contains:
>    prefix
>    data_pointer
>    right_subtree
>
> The left subtree follows directly after this node!

This makes inserting a new node between a node and its left child
expensive.
Also, the tree is now lay out as a sorted array and the pointer to
the right subtree can therefore be eliminated as well.
0
Reply free 12/21/2010 9:17:08 PM

Dick Wesseling wrote:
> In article<ajg4u7-tkl.ln1@ntp6.tmsw.no>,
> 	Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com>  writes:
>> Each node contains:
>>     prefix
>>     data_pointer
>>     right_subtree
>>
>> The left subtree follows directly after this node!
>
> This makes inserting a new node between a node and its left child
> expensive.
> Also, the tree is now lay out as a sorted array and the pointer to
> the right subtree can therefore be eliminated as well.

You are absolutely right, I had simply forgotten my theory. :-(

The structure above, with no explicit pointers, was called a binary 
ladder (in Norwegian) maybe 25-30 years ago, and it is only really 
useful as a read-only structure.

Thanks!

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/22/2010 9:53:57 AM

On Dec 22, 3:53=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> Dick Wesseling wrote:
> > In article<ajg4u7-tkl....@ntp6.tmsw.no>,
> > =A0 =A0Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> =A0writ=
es:
> >> Each node contains:
> >> =A0 =A0 prefix
> >> =A0 =A0 data_pointer
> >> =A0 =A0 right_subtree
>
> >> The left subtree follows directly after this node!
>
> > This makes inserting a new node between a node and its left child
> > expensive.
> > Also, the tree is now lay out as a sorted array and the pointer to
> > the right subtree can therefore be eliminated as well.
>
> You are absolutely right, I had simply forgotten my theory. :-(
>
> The structure above, with no explicit pointers, was called a binary
> ladder (in Norwegian) maybe 25-30 years ago, and it is only really
> useful as a read-only structure.
>
> Thanks!
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Seems like RO would be fine for match_keyword_string -> <action>.
But the tokenizer for 'prefix' has work to do.  What to do about short
strings?  Pad with spaces?  Also, strings are little endian so
grouping into dwords for 'prefix' isn't straight forward..

     1                                  ;; string search among C
keywords
     2                                  ;; auto,break,case,char,...,
do<spsp>,...,if<spsp>,...
     3
     4 00000000 6175746F`62726561`63-     db
'auto','brea','case','char','do  ','if  ',0
     5 00000009 617365`63686172`646F-
     6 00000012 2020`69662020`00
     7
     8 00000019 20<rept>                align 80h, db 20h
     9
    10                                  ;; -=3Deof=3D-


So dword integer cmp of the above will fail in this current form; 'if
' < 'brea' < 'auto'.

So the tokenizer has to 'accumulate' characters in a stream into a
temp four character buffer, filo, for a value suitable for 'prefix' -
there is probably a better way?

Steve
0
Reply s_dubrovich 12/22/2010 2:18:32 PM

s_dubrovich@nospicedham.yahoo.com wrote:
> On Dec 22, 3:53 am, Terje Mathisen<"terje.mathisen at
> tmsw.no"@giganews.com>  wrote:
>> Dick Wesseling wrote:
>>> In article<ajg4u7-tkl....@ntp6.tmsw.no>,
>>>     Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com>    writes:
>>>> Each node contains:
>>>>      prefix
>>>>      data_pointer
>>>>      right_subtree
>>
>>>> The left subtree follows directly after this node!
>>
>>> This makes inserting a new node between a node and its left child
>>> expensive.
>>> Also, the tree is now lay out as a sorted array and the pointer to
>>> the right subtree can therefore be eliminated as well.
>>
>> You are absolutely right, I had simply forgotten my theory. :-(
>>
>> The structure above, with no explicit pointers, was called a binary
>> ladder (in Norwegian) maybe 25-30 years ago, and it is only really
>> useful as a read-only structure.
>>
>> Thanks!
>>
>> Terje
>>
>> --
>> -<Terje.Mathisen at tmsw.no>
>> "almost all programming can be viewed as an exercise in caching"
>
> Seems like RO would be fine for match_keyword_string ->  <action>.
> But the tokenizer for 'prefix' has work to do.  What to do about short
> strings?  Pad with spaces?  Also, strings are little endian so
> grouping into dwords for 'prefix' isn't straight forward..

I was waiting for that one!
:-)

You convert strings into a prefix by taking the first 4 (or 8) bytes, 
padding with zeros if the length was less than 3, and then do a BSWAP.

At this point you can compare the DWORDs without having to care about 
the actual data type.

I have used somewhat similar code for the Huffmann part of the world's 
fastest ogg vorbis decoder:

Lookup the next 8 input bits in a table, each entry contains the output 
token and # of bits used, or a pointer to a second-level table. (The 
pointers are actually table offset values, biased by 0x80000000, so any 
negative table entry means that you need another table lookup.

The second-level tables are all quite small, using just two input bits, 
so the decoder starts by taking the first 8 bits and do the lookup.

If the result is a final token (very probable!) I decrement the nr of 
available bits by the length field (bits 29-31: 0-7 corresponding to 
1-8) and shift the bit accumulator by the same count.

Otherwise I know that I've just used 8 bits, so I decrement by 8, shift 
by 8 bits, adjust the table base pointer by the offset value (minus that 
0x8... bias) and drop into the two-bit lookup code.

This code is otherwise identical to the initial 8-bit lookup, except 
that since I only need one bit to indicate 1 or 2 used bits, I have room 
for tokens of up to 30 bits in length. (Afair the encoder definition 
guarantees that tokens cannot be longer than 26 bits...)

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/22/2010 5:24:26 PM

On Dec 22, 11:24=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@nospicedham.giganews.com> wrote:
> s_dubrov...@nospicedham.yahoo.com wrote:
> > On Dec 22, 3:53 am, Terje Mathisen<"terje.mathisen at
> > tmsw.no"@giganews.com> =A0wrote:
> >> Dick Wesseling wrote:
> >>> In article<ajg4u7-tkl....@ntp6.tmsw.no>,
> >>> =A0 =A0 Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> =A0 =
=A0writes:
> >>>> Each node contains:
> >>>> =A0 =A0 =A0prefix
> >>>> =A0 =A0 =A0data_pointer
> >>>> =A0 =A0 =A0right_subtree
>
> >>>> The left subtree follows directly after this node!
>
> >>> This makes inserting a new node between a node and its left child
> >>> expensive.
> >>> Also, the tree is now lay out as a sorted array and the pointer to
> >>> the right subtree can therefore be eliminated as well.
>
> >> You are absolutely right, I had simply forgotten my theory. :-(
>
> >> The structure above, with no explicit pointers, was called a binary
> >> ladder (in Norwegian) maybe 25-30 years ago, and it is only really
> >> useful as a read-only structure.
>
> >> Thanks!
>
> >> Terje
>
> >> --
> >> -<Terje.Mathisen at tmsw.no>
> >> "almost all programming can be viewed as an exercise in caching"
>
> > Seems like RO would be fine for match_keyword_string -> =A0<action>.
> > But the tokenizer for 'prefix' has work to do. =A0What to do about shor=
t
> > strings? =A0Pad with spaces? =A0Also, strings are little endian so
> > grouping into dwords for 'prefix' isn't straight forward..
>
> I was waiting for that one!
> :-)
>

Good!  Half of the effort is to ask the right questions, the other
half is to get/develop good help/solutions to arrive at good
conclusions. :)

> You convert strings into a prefix by taking the first 4 (or 8) bytes,
> padding with zeros if the length was less than 3, and then do a BSWAP.
>
> At this point you can compare the DWORDs without having to care about
> the actual data type.
>

Right, I'll have to play with this some.  If LODSD is used to load 4
byte characters from a byte text input buffer-

a. I suppose the buffer should be sized at least three bytes longer
that the allowed input string chunk, and padded with 0?

b. Is LODSD subject to alignment checks on unaligned addresses?

c. What is an efficient way to check for whitespace termination of a
keyword token?  i.e. looking for 'if' in 'if (' .. with the notion
that the "prefix" key for it would be db "if",0,0 ..
--1. should EAX's bytes be tested, or should the test for whitspace
termination occur before LODSD?

e. Perhaps in the end it is more efficient to LODSB into AL, test for
whitespace, shr EAX,4 and loop to LODSB? -to arrive at a token for the
"prefix" key.

Steve

[snipped]
>
> Terje
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"- Hide qu=
oted text -
>
> - Show quoted text -

0
Reply s_dubrovich 12/23/2010 3:04:40 AM

Il 23.12.2010 04:04, s_dubrovich@nospicedham.yahoo.com ha scritto:
> On Dec 22, 11:24 am, Terje Mathisen<"terje.mathisen at
> tmsw.no"@nospicedham.giganews.com>  wrote:
>> s_dubrov...@nospicedham.yahoo.com wrote:
>>> On Dec 22, 3:53 am, Terje Mathisen<"terje.mathisen at
>>> tmsw.no"@giganews.com>    wrote:
>>>> Dick Wesseling wrote:
>>>>> In article<ajg4u7-tkl....@ntp6.tmsw.no>,
>>>>>      Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com>      writes:
>>>>>> Each node contains:
>>>>>>       prefix
>>>>>>       data_pointer
>>>>>>       right_subtree
>>
>>>>>> The left subtree follows directly after this node!
>>
>>
>>>> Terje
>>
>>>> --
>>>> -<Terje.Mathisen at tmsw.no>
>>>> "almost all programming can be viewed as an exercise in caching"
>>
>>> Seems like RO would be fine for match_keyword_string ->    <action>.
>>> But the tokenizer for 'prefix' has work to do.  What to do about short
>>> strings?  Pad with spaces?  Also, strings are little endian so
>>> grouping into dwords for 'prefix' isn't straight forward..
>>
>> I was waiting for that one!
>> :-)
>>
>
>> You convert strings into a prefix by taking the first 4 (or 8) bytes,
>> padding with zeros if the length was less than 3, and then do a BSWAP.
>>
>> At this point you can compare the DWORDs without having to care about
>> the actual data type.
>>
>
> Right, I'll have to play with this some.  If LODSD is used to load 4
> byte characters from a byte text input buffer-
>
> a. I suppose the buffer should be sized at least three bytes longer
> that the allowed input string chunk, and padded with 0?
>
> b. Is LODSD subject to alignment checks on unaligned addresses?
>
> c. What is an efficient way to check for whitespace termination of a
> keyword token?  i.e. looking for 'if' in 'if (' .. with the notion
> that the "prefix" key for it would be db "if",0,0 ..
> --1. should EAX's bytes be tested, or should the test for whitspace
> termination occur before LODSD?
>
> e. Perhaps in the end it is more efficient to LODSB into AL, test for
> whitespace, shr EAX,4 and loop to LODSB? -to arrive at a token for the
> "prefix" key.
>
> Steve
>

Hi,Steve,
LODSB or LODSB is out of scope here, because at this stage
(presumably tokenizing) you should not worry about
performances at the moment. Consider for example an unicode text;
parsing should be done codepoint-by-codepoint: if you consider
utf-8, well, it could be basically till up 4 bytes. by utf-16
it is basically 2 bytes. But unicode will break the theory in all
cases, whether you use LODSB or LODSD.

BTW, for the record, i am experiencing it because i am studying/writing 
a 64bit native unicode macro-assembler.

Now for example, the "if" token followed by " " space; the space
is not necessairly a discarded part after the "if" token. Like
other tokens, as for example "," or ";" or ":", it could be
considered like a special attribute belonging to it, modifying
it.
IMHO this is very important for your designing goal more than, for 
example, how you allocate mem for the tokens, or what method/hash 
function you use.

Cheers,

-- 

..::mrk::.
  x64 Assembly Lab
  http://sites.google.com/site/x64lab
0
Reply hopcode 12/23/2010 9:04:10 AM

Il 22.12.2010 18:24, Terje Mathisen ha scritto:
> s_dubrovich@nospicedham.yahoo.com wrote:
>> On Dec 22, 3:53 am, Terje Mathisen<"terje.mathisen at
>> tmsw.no"@giganews.com> wrote:
>>> Dick Wesseling wrote:
>>>> In article<ajg4u7-tkl....@ntp6.tmsw.no>,
>>>> Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> writes:
>>>>> Each node contains:
>>>>> prefix
>>>>> data_pointer
>>>>> right_subtree
>>>
>>>>> The left subtree follows directly after this node!
>>>
>>>> This makes inserting a new node between a node and its left child
>>>> expensive.
>>>> Also, the tree is now lay out as a sorted array and the pointer to
>>>> the right subtree can therefore be eliminated as well.
>>>
>>> You are absolutely right, I had simply forgotten my theory. :-(
>>>
>>> The structure above, with no explicit pointers, was called a binary
>>> ladder (in Norwegian) maybe 25-30 years ago, and it is only really
>>> useful as a read-only structure.
>>>
>>> Thanks!
>>>
>>> Terje
>>>
>>> --
>>> -<Terje.Mathisen at tmsw.no>
>>> "almost all programming can be viewed as an exercise in caching"
>>
>> Seems like RO would be fine for match_keyword_string -> <action>.
>> But the tokenizer for 'prefix' has work to do. What to do about short
>> strings? Pad with spaces? Also, strings are little endian so
>> grouping into dwords for 'prefix' isn't straight forward..
>
> I was waiting for that one!
> :-)
>
> You convert strings into a prefix by taking the first 4 (or 8) bytes,
> padding with zeros if the length was less than 3, and then do a BSWAP.
>
> At this point you can compare the DWORDs without having to care about
> the actual data type.
>
> I have used somewhat similar code for the Huffmann part of the world's
> fastest ogg vorbis decoder:
>
> Lookup the next 8 input bits in a table, each entry contains the output
> token and # of bits used, or a pointer to a second-level table. (The
> pointers are actually table offset values, biased by 0x80000000, so any
> negative table entry means that you need another table lookup.
>
 >
 > Terje

Cool! ;-) Biasing is in many cases the solution. I apply it intensively,
for example in a heap allocator i wrote and currently in use.
Referring to your "binary ladder" above, (curiosity: the norwegian 
name?) and having a predefined alignment for the "steps" of the ladder 
in memory, say 16 byte align, well, you have the low 4 bits available as 
flags. I can store them as they are in each field of the "step",
having in total 4+4+4 bits available.

Cheers,

-- 

..::mrk::.
  x64 Assembly Lab
  http://sites.google.com/site/x64lab
0
Reply hopcode 12/23/2010 9:21:21 AM

s_dubrovich@nospicedham.yahoo.com wrote:
> On Dec 22, 11:24 am, Terje Mathisen<"terje.mathisen at
>> I was waiting for that one!
>> :-)
>>
>
> Good!  Half of the effort is to ask the right questions, the other
> half is to get/develop good help/solutions to arrive at good
> conclusions. :)
>
>> You convert strings into a prefix by taking the first 4 (or 8) bytes,
>> padding with zeros if the length was less than 3, and then do a BSWAP.
>>
>> At this point you can compare the DWORDs without having to care about
>> the actual data type.
>
> Right, I'll have to play with this some.  If LODSD is used to load 4
> byte characters from a byte text input buffer-
>
> a. I suppose the buffer should be sized at least three bytes longer
> that the allowed input string chunk, and padded with 0?

That's a good idea.
>
> b. Is LODSD subject to alignment checks on unaligned addresses?

Normally not, unless you've managed to set the AC bit (which no x86 OS 
is currently doing).
>
> c. What is an efficient way to check for whitespace termination of a
> keyword token?  i.e. looking for 'if' in 'if (' .. with the notion
> that the "prefix" key for it would be db "if",0,0 ..
> --1. should EAX's bytes be tested, or should the test for whitspace
> termination occur before LODSD?

When scanning for tokens in an input stream you really want your code to 
be fast, right?

In a programming language things are relatively easy, i.e. you have a 
limited character set which is valid as the first char in a token, then 
there's often a somewhat larger set which can follow.

I'd use code which simply loads the next char, does a table lookup and 
uses the result to determine if this could be the beginning of a token.

When you've found the start you use a (potentially) different table to 
verify the next chars.

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/23/2010 1:40:10 PM

Il 23.12.2010 14:40, Terje Mathisen ha scritto:
>
> In a programming language things are relatively easy, i.e. you have a
> limited character set which is valid as the first char in a token, then
> there's often a somewhat larger set which can follow.
>
Yes,relatively easy, after considering these factors:

1) Backus-Naur or somewhat formalism
2) ASCII english-instructions,english-variable names,strings,etc
3) if your goal is reinventing the well known wheel

No, quite obviously, because

1) for example, i have not yet read about a complete formalism for
    unicode larger charsets.

2) using unicode, tokenizer should recognize cpt even when
    in other stages you use an external library to manipulate
    strings. But *the next* cpt (codepoint) could lead to invoke
    a string manipulation function *before* ending the current
    tokenizing process. Also, speaking about a macro-featured
    programming language it is not so simple,imho.

3) idea (in my case) is to create an "unreal" but 64bit
    native assembler.


> I'd use code which simply loads the next char, does a table lookup and
> uses the result to determine if this could be the beginning of a token.
>
> When you've found the start you use a (potentially) different table to
> verify the next chars.
>
> Terje

Both could be as good as the preferred methods, in some cases.
Wwhen i see people wanting to start a new programmin language,
i would suggest them to consider unicode,under the motto
  "no unicode, no party" ;-)

Cheers,

-- 

..::mrk::.
  x64 Assembly Lab
  http://sites.google.com/site/x64lab
0
Reply hopcode 12/23/2010 2:36:48 PM

On Dec 23, 7:40=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> When scanning for tokens in an input stream you really want your code to
> be fast, right?
>
> In a programming language things are relatively easy, i.e. you have a
> limited character set which is valid as the first char in a token, then
> there's often a somewhat larger set which can follow.
>
> I'd use code which simply loads the next char, does a table lookup and
> uses the result to determine if this could be the beginning of a token.
>
> When you've found the start you use a (potentially) different table to
> verify the next chars.


That's true for many languages, but not for all.

More than a few languages (particularly older ones) don't cleanly
separate the parsing layers like that.  Cobol is a good example, where
(amongst other things) the "picture clause", which defines the format
of a data item, can contain (certain) numbers and letters, and even
many punctuation characters including parenthesis, commas and periods
(the latter are, more-or-less, statement delimiters in Cobol.  In fact
you can say "05 DATA-ITEM PICTURE 999.99.", and the 999.99 is not a
number, it's a definition of a display formatted decimal field (named
DATA-ITEM, and part of a structure - the "05") with three digits
before the decimal point, two after, and an explicit decimal point
between.  And the period at the end is the end of statement.  And you
only know that you have to apply "PICTURE" tokenization rules when
you've determined that you're in a a PICTURE subclause of a data
definition statement.

A special joy if you=92re writing a parser for that=85
0
Reply robertwessel2 12/23/2010 7:29:18 PM

On Dec 23, 3:04=A0am, hopcode <hopc...@nospicedham.nullnichtsnada.com>
wrote:

>
> Hi,Steve,
> LODSB or LODSB is out of scope here, because at this stage
> (presumably tokenizing) you should not worry about
> performances at the moment. Consider for example an unicode text;
> parsing should be done codepoint-by-codepoint: if you consider
> utf-8, well, it could be basically till up 4 bytes. by utf-16
> it is basically 2 bytes. But unicode will break the theory in all
> cases, whether you use LODSB or LODSD.
>
> BTW, for the record, i am experiencing it because i am studying/writing
> a 64bit native unicode macro-assembler.
>
> Now for example, the "if" token followed by " " space; the space
> is not necessairly a discarded part after the "if" token. Like
> other tokens, as for example "," or ";" or ":", it could be
> considered like a special attribute belonging to it, modifying
> it.
> IMHO this is very important for your designing goal more than, for
> example, how you allocate mem for the tokens, or what method/hash
> function you use.
>
> Cheers,
>
> --
>
> .::mrk::.
> =A0 x64 Assembly Lab
> =A0http://sites.google.com/site/x64lab- Hide quoted text -

Yes, the data drives the algorithm.  No, I handn't considered unicode
for this discussion.  Mostly I was just considering Terje's binary
tree, it's 'prefix' or perhaps 'shortkey' is a good word, and a parse
for 'shortkey' on a buffered input stream. -actual application aside.

Steve
0
Reply s_dubrovich 12/23/2010 9:23:47 PM

"Terje Mathisen" <"terje.mathisen at tmsw.no"@giganews.com> wrote in message
news:t20cu7-cmh1.ln1@ntp6.tmsw.no...
> s_dubrovich@nospicedham.yahoo.com wrote:
> >
> > Right, I'll have to play with this some.  If LODSD is used to load 4
> > byte characters from a byte text input buffer-
> >
> > a. I suppose the buffer should be sized at least three bytes longer
> > that the allowed input string chunk, and padded with 0?
>
> That's a good idea.
>

Well, I think it's a good idea to byte pad the string out to the next DWORD.
That may or may not be 3 bytes in length depending on the string length.  If
the string ends on a DWORD boundary, you'll need a DWORD zero pad.  It might
be useful to always have a full DWORD as zero, even if the string doesn't
end on a DWORD boundary.  Then, a search for a zero dword will find the end
of the string or 1 to 3 zero bytes past...  This may be useful with a REPNE
SCASD for DWORD zero, then backup by bytes to the first zero byte, i.e., end
of string.  It wastes space, but it's possible, that a REPNE SCASD is faster
than a REPNE SCASB.  Yes?  No?


Rod Pemberton


0
Reply Rod 12/23/2010 9:52:59 PM

"Terje Mathisen" <"terje.mathisen at tmsw.no"@giganews.com> wrote in message
news:t20cu7-cmh1.ln1@ntp6.tmsw.no...
>
> Normally not, unless you've managed to set the AC bit (which no x86 OS
> is currently doing).
>

Could the AC bit be used as an OS security feature?


Rod Pemberton
Follow-ups set to alt.os.development from comp.lang.asm.x86.


0
Reply Rod 12/23/2010 9:56:52 PM

On Dec 23, 7:40=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> s_dubrov...@nospicedham.yahoo.com wrote:
> > On Dec 22, 11:24 am, Terje Mathisen<"terje.mathisen at
> >> I was waiting for that one!
> >> :-)
>
> > Good! =A0Half of the effort is to ask the right questions, the other
> > half is to get/develop good help/solutions to arrive at good
> > conclusions. :)
>
> >> You convert strings into a prefix by taking the first 4 (or 8) bytes,
> >> padding with zeros if the length was less than 3, and then do a BSWAP.
>
> >> At this point you can compare the DWORDs without having to care about
> >> the actual data type.
>
> > Right, I'll have to play with this some. =A0If LODSD is used to load 4
> > byte characters from a byte text input buffer-
>
> > a. I suppose the buffer should be sized at least three bytes longer
> > that the allowed input string chunk, and padded with 0?
>
> That's a good idea.
>
>
>
> > b. Is LODSD subject to alignment checks on unaligned addresses?
>
> Normally not, unless you've managed to set the AC bit (which no x86 OS
> is currently doing).
>
Thanks, I'm afraid I'm alittle fuzzy on this point as I thought #AC
was automatic on ring3, PL3, code.  I think it's about time to retire
this `486 PRM. :)

>
>
> > c. What is an efficient way to check for whitespace termination of a
> > keyword token? =A0i.e. looking for 'if' in 'if (' .. with the notion
> > that the "prefix" key for it would be db "if",0,0 ..
> > --1. should EAX's bytes be tested, or should the test for whitspace
> > termination occur before LODSD?
>
> When scanning for tokens in an input stream you really want your code to
> be fast, right?

<sheepish shrug> [I think I'm supposed to say yes.] ( The teacher asks
the errant student a leading question.  But what the student really
wants fast, is a pitcher of beer and a deep dish pizza! )

As a practical matter, I'm thinking buffered file I/O is likely to be
slower than my spagetti code. (I hope.)  So, slow may be fast enough.

Hmm, deep dish, spagetti, - I think Italian food should be on for
tonight :)

>
> In a programming language things are relatively easy, i.e. you have a
> limited character set which is valid as the first char in a token, then
> there's often a somewhat larger set which can follow.
>
> I'd use code which simply loads the next char, does a table lookup and
> uses the result to determine if this could be the beginning of a token.
>
> When you've found the start you use a (potentially) different table to
> verify the next chars.
>

That sounds good to me, but once I have a token, I may want to use
it's 'shortkey' to pattern match it to a grammar rule (as evidenced as
a part of a syntax tree.)

Thxs,

Steve

> Terje
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"- Hide qu=
oted text -
>
> - Show quoted text -

0
Reply s_dubrovich 12/23/2010 10:18:52 PM

On Dec 23, 4:18=A0pm, s_dubrov...@nospicedham.yahoo.com wrote:
> On Dec 23, 7:40=A0am, Terje Mathisen <"terje.mathisen at
> > Normally not, unless you've managed to set the AC bit (which no x86 OS
> > is currently doing).
>
> Thanks, I'm afraid I'm alittle fuzzy on this point as I thought #AC
> was automatic on ring3, PL3, code. =A0I think it's about time to retire
> this `486 PRM. :)


No.  It's only *available* in ring 3.  If the OS sets AC (and it's off
by default), it only traps on unaligned accesses in ring 3, unaligned
accesses at higher privilege levels are unaffected.
0
Reply robertwessel2 12/23/2010 11:14:05 PM

On Dec 19, 11:57=A0am, "Rod Pemberton"
<do_not_h...@nospicedham.notreplytome.cmm> wrote:
> "wolfgang kern" <nowh...@never.at> wrote in message
>
> news:iel0ig$bic$1@newsreader2.utanet.at...
>
>
>
> > wonder if there are still 386 in use ...
>
> In use?
>
> The 486 goes back about two decades now. =A0I'm sure you could find a 386=
 in
> industry somewhere...
>
> IIRC, the oldest cpu I have is a 486 DX2-50. =A0It's packed away somewher=
e and
> not in use. =A0Oldest in use cpu would be a 233Mhz, and it's turned on ma=
ybe a
> couple times a year. =A0What about you guys?
>

A Pentium M is turned-on maybe once a year.

A Celeron machine gets turned-on maybe once or twice a week.  I might
make it into a server and leave it on.

> And, what is the oldest generation of x86 cpu you think should be support=
ed?
>

Depends on the expected user-base.  Terence Wright (from a.l.a) once
told me where (and on what antiquated machines) his Windows-based TUI
[all DOS-only software] was expected to run.  Under *what* conditions
does a software-vendor have the right to dictate to the client what
hardware they should use??

Nathan.
0
Reply Nathan 12/24/2010 8:26:12 AM

On Dec 24, 8:26=A0am, Nathan <nathancba...@gmail.com> wrote:
> On Dec 19, 11:57=A0am, "Rod Pemberton"
>
>
>
> <do_not_h...@nospicedham.notreplytome.cmm> wrote:
> > "wolfgang kern" <nowh...@never.at> wrote in message
>
> >news:iel0ig$bic$1@newsreader2.utanet.at...
>
> > > wonder if there are still 386 in use ...
>
> > In use?
>
> > The 486 goes back about two decades now. =A0I'm sure you could find a 3=
86 in
> > industry somewhere...
>
> > IIRC, the oldest cpu I have is a 486 DX2-50. =A0It's packed away somewh=
ere and
> > not in use. =A0Oldest in use cpu would be a 233Mhz, and it's turned on =
maybe a
> > couple times a year. =A0What about you guys?
>
> A Pentium M is turned-on maybe once a year.
>
> A Celeron machine gets turned-on maybe once or twice a week. =A0I might
> make it into a server and leave it on.
>
> > And, what is the oldest generation of x86 cpu you think should be suppo=
rted?
>
> Depends on the expected user-base. =A0Terence Wright (from a.l.a) once
> told me where (and on what antiquated machines) his Windows-based TUI
> [all DOS-only software] was expected to run. =A0Under *what* conditions
> does a software-vendor have the right to dictate to the client what
> hardware they should use??
>
> Nathan.

Interesting. As a programmer, I find that question posed on every
project I undertake.

For example, for a client I used a specific .Net Framework, that was
not supported on Windows XP without a patch. He obviously used XP
throughout his office, and had to patch every machine in order for the
software I wrote to run.

It is not just Hardware nowadays that is the issue as one can see from
the above example.

But, in writing Lita OS, I am aiming the OS at the Intel market. And I
want to be able to use MMX and later command sets (SIMD, no?)

A lot of the older machines will not run Lita OS after being optimised
with these commands and registers.

There will be a base Intel Atom spec I believe, and perhaps as low as
a P4, but even that may not support all the new features. I'm a little
rusty when MMX came out (I'd have to check my 32-bit Assembly Language
Manuals!)
0
Reply catcalls 12/24/2010 10:35:02 AM

s_dubrovich@nospicedham.yahoo.com wrote:
> On Dec 23, 7:40 am, Terje Mathisen<"terje.mathisen at
>> I'd use code which simply loads the next char, does a table lookup and
>> uses the result to determine if this could be the beginning of a token.
>>
>> When you've found the start you use a (potentially) different table to
>> verify the next chars.
>>
>
> That sounds good to me, but once I have a token, I may want to use
> it's 'shortkey' to pattern match it to a grammar rule (as evidenced as
> a part of a syntax tree.)

Oh, absolutely!

Particularly since the type of token determines how to parse the 
following text, i.e. you have to recognize each token as it occurs.

Terje

PS. Try to search for 'perfect hash'!

With a given, relatively small, set of keywords you can define a hash 
function that results in zero collisions, so that you can simply 
generate the hash as you read in characters, then go directly to the 
corresponding table entry.

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 12/24/2010 11:18:55 AM

On Dec 24, 10:35=A0am, catcalls <obrzut.a...@nospicedham.gmail.com>
wrote:
> On Dec 24, 8:26=A0am, Nathan <nathancba...@gmail.com> wrote:
> > On Dec 19, 11:57=A0am, "Rod Pemberton"
> > <do_not_h...@nospicedham.notreplytome.cmm> wrote:

> > > And, what is the oldest generation of x86 cpu you think should be sup=
ported?
>
> > Depends on the expected user-base. =A0Terence Wright (from a.l.a) once
> > told me where (and on what antiquated machines) his Windows-based TUI
> > [all DOS-only software] was expected to run. =A0Under *what* conditions
> > does a software-vendor have the right to dictate to the client what
> > hardware they should use??
>
> > Nathan.
>
> Interesting. As a programmer, I find that question posed on every
> project I undertake.
>
> For example, for a client I used a specific .Net Framework, that was
> not supported on Windows XP without a patch. He obviously used XP
> throughout his office, and had to patch every machine in order for the
> software I wrote to run.
>
> It is not just Hardware nowadays that is the issue as one can see from
> the above example.
>
> But, in writing Lita OS, I am aiming the OS at the Intel market. And I
> want to be able to use MMX and later command sets (SIMD, no?)

You plan to use floating point in your OS? What for? Please don't say,
because it's there and groovy!

>
> A lot of the older machines will not run Lita OS after being optimised
> with these commands and registers.
>
> There will be a base Intel Atom spec I believe, and perhaps as low as
> a P4, but even that may not support all the new features. I'm a little
> rusty when MMX came out (I'd have to check my 32-bit Assembly Language
> Manuals!)

IMHO that's fine as long as you both state the minimum spec on the
documentation and check for it when installing and/or when being
invoked. If you don't then some of your users will be unimpressed with
an OS that crashes rather than an OS which reports "CPU too old. I
need a minimum of...."

BTW, Intel recommend checking for SIMD features and the like by
checking the feature bits from CPUID which is a slightly different
test from a CPU type.

James
0
Reply James 12/24/2010 12:28:10 PM

On Dec 21, 2:50=A0pm, Richard Russell <n...@nospicedham.rtrussell.co.uk>
wrote:
> On Dec 21, 12:21=A0pm, James Harris
> <james.harri...@nospicedham.googlemail.com> wrote:

> > It applies to Intel too. According to their IA-32 Intel Architecture
> > Optimization Reference Manual order number 248966-009 which discusses
> > machines to Netburst and Pentium M: "Assembly/Compiler Coding Rule 30.
> > (M impact, H generality) All branch targets should be 16-byte
> > aligned."
>
> That's very interesting. =A0I'd not previously spotted it, but it is in
> 248966-020 as Rule 12. =A0However in reading the detail in section
> 2.1.2.2 (Instruction Fetch Unit) my interpretation is that it's a less
> significant issue than on AMD CPUs.
>
> AMD CPUs seem to have alignment issues that are more complex than
> simple code starvation ("Due to architectural restrictions, a branch
> that is split across a 16-byte boundary cannot dispatch with any other
> instructions when it is predicted taken") which could explain why I
> have found the impact to be more noticeable than with Intel CPUs.
>
> However it's obviously quite a subtle and complicated issue.

Agreed. A CPU's innards can be fiendish. With fast CPUs long gone are
the simple cycle counting days.

I'm not sure about the impact of the above. It sounds like a single-
cycle penalty (though I accept that the wording is vague and the
penalty may be greater). There is an issue other than alignment which
might be the cause of the problem. Here is an except from Agner Fog's
microarchitecture document on AMD.

"A drawback of this design is that there can be no more than three
control transfer instructions for every aligned 16-bytes block of
code, except for branches that are never taken. If there are more than
three taken branches in the same 6-bytes block of code then they will
keep stealing branch selectors and BTB entries from each other and
cause two mispredictions for every execution. It is therefore
important to avoid having more than three jumps, branches, calls and
returns in a single aligned 16-bytes block of code. Branches that are
never taken do not count. The three branch limit can easily be
exceeded if for example a switch/case statement is implemented as a
sequence of dec / jz instructions."

That sounds like a more serious issue. If shifting code up or down by
a few bytes brings four jumps within a 16-byte block (or is it a 32-
byte block?) and those jumps are part of an important loop, two
mispredictions for every execution could make a massive difference.

> > I guess the AMD manual says to avoid padding sequences because the
> > effect of a misaligned branch is not high, i.e. not as high as
> > consuming multiple nops.
>
> All I know, from my own practical experience, is that on occasion when
> I've made a trivial code change (perhaps shifting the remainder of the
> module by a couple of bytes) it has had a very significant adverse
> impact on overall execution speed on an AMD Athlon.

This is very interesting. Are you in a position to say which AMD model
or models you found the problem on? I only have one AMD but if I get
time I may try some similar tests.

> Probably a few
> carefully placed ALIGN directives would have solved that, but finding
> out where to put them is not trivial!

Unless it's the 4-jumps in a i-cache line issue my guess is that
aligns by 16 (probably enough; 32 not needed) around the innermost
loops would do. Perhaps as asm programmer we all should align our
inner loops anyway! It's not something one thinks about in general
programming but there is a slew of recommendations from both Intel and
AMD that, as assembler programmers, we may want to be aware of.
Compilers do this all for us, don't they...? (Rhetorical and ironic.
From recent posts it seems some - popular and expensive - compilers
have far less of a clue than we do!)

James
0
Reply James 12/24/2010 12:48:07 PM

On Dec 22, 9:53=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> Dick Wesseling wrote:
> > In article<ajg4u7-tkl....@ntp6.tmsw.no>,
> > =A0 =A0Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> =A0writ=
es:
> >> Each node contains:
> >> =A0 =A0 prefix
> >> =A0 =A0 data_pointer
> >> =A0 =A0 right_subtree
>
> >> The left subtree follows directly after this node!
>
> > This makes inserting a new node between a node and its left child
> > expensive.
> > Also, the tree is now lay out as a sorted array and the pointer to
> > the right subtree can therefore be eliminated as well.
>
> You are absolutely right, I had simply forgotten my theory. :-(
>
> The structure above, with no explicit pointers, was called a binary
> ladder (in Norwegian) maybe 25-30 years ago, and it is only really
> useful as a read-only structure.

If that's what I think it is it's OK for read-write as long as the
tree is balanced (strictly, as long as the memory reserved allows for
the maximum depth) isn't it? In either case, unless the tree is fully
populated (a power of 2 minus 1 nodes) there needs to be a way to
detect whether a node is present or absent.

James
0
Reply James 12/24/2010 12:58:15 PM

On Dec 24, 5:18=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> s_dubrov...@nospicedham.yahoo.com wrote:
> > On Dec 23, 7:40 am, Terje Mathisen<"terje.mathisen at
> >> I'd use code which simply loads the next char, does a table lookup and
> >> uses the result to determine if this could be the beginning of a token=
..
>
> >> When you've found the start you use a (potentially) different table to
> >> verify the next chars.
>
> > That sounds good to me, but once I have a token, I may want to use
> > it's 'shortkey' to pattern match it to a grammar rule (as evidenced as
> > a part of a syntax tree.)
>
> Oh, absolutely!
>
> Particularly since the type of token determines how to parse the
> following text, i.e. you have to recognize each token as it occurs.
>
> Terje
>
> PS. Try to search for 'perfect hash'!
>

I did, thanks, wiki has..
http://en.wikipedia.org/wiki/Hash_function#Perfect_hashing

The books I have on programming/computer science that discuss the
subject are vague on algorithm development and testing..

> With a given, relatively small, set of keywords you can define a hash
> function that results in zero collisions, so that you can simply
> generate the hash as you read in characters, then go directly to the
> corresponding table entry.
>

I reasoned an approach to a hash function for strings, but how do I
proof that it is a perfect hash algorithm? -also at first glance, the
best goal seems to be a 'minimal perfect hash' where the hash value is
a member of a set of dense indexes, allowing direct table lookup.

Previously discussed, a 'prefix' of the first 4 characters of string
was considered as a 'hash'.

I thought briefly about summing each character in an input stream, but
collisions are obvious: IF & FI (endif in intel's pseudo code) came to
mind, as they would sum to the same value.  In thinking of 123, one
hundred twenty-three, each digit has a rank according to it position
of: (3 x 10**0) + (2 x 10**1) + (1 x 10**2).

T thought I would try to express rank of a character in a string by
bit-shift left for each ranking. - then summing in the next
character..

LODSB		;; [SI++] -> AL
add  edx, eax	;; hash function, addin new,
shl  edx, 1	;;  shl result.

... this is simple enough, so someone else has surely already done
this.

Here's the code.. the data set is the comman ansi C keywords.

;; File: ADD_SHL.NSM
;; Last: 29-Dec-10 10:41:53 PM
;; Init: 28-Dec-10 07:57:47 AM
;; Vers: 0r5
;;
;; ----=3D=3D=3D*******=3D=3D=3D----
	%INCLUDE 'PROLOG.NSM'

  [SECTION .cseg]
;/*
;** --=3D=3D Main Startup =3D=3D--
;*/
MaxLps EQU 28h

main:
	clc			;; clear carry
	cld			;; clear direction, inc.
	mov  ecx, MaxLps	;; trial maximum count
	mov  edx, 0		;; clear virtual accumulator
	mov  eax, 0		;; clear accumulator
	mov  ESI, KeyWords	;; keyword strings, input
	mov  EDI, HashNumb	;; output to hash list

myloop:
	LODSB			;; [SI++] -> AL
	cmp  al, 0		;; ident termination
	je   NextIdent
	cmp  al, 0FFh	;; ident list termination
	je   Quit

	add  edx, eax	;; hash function, addin new,
	shl  edx, 1		;;  shl result.

	jc   NextIdent
	loop myloop

NextIdent:
	mov  [di], edx	;; store hash value into table
	add  di, 4		;; index to next list position.
	mov  ecx, MaxLps
	mov  edx, 0
	mov  eax, 0
	jmp  myloop		;; restart hash

Quit:
	RET

  [SECTION .dseg]
ALIGN 16
;; 32 ansi C keywords.. plus 2 extra.
KeyWords:  db 'auto',0,'break',0,'case',0,'char',0,'const',0
	db 'continue',0,'default',0,'do',0,'double',0,'else',0
	db 'enum',0,'extern',0,'float',0,'for',0,'goto',0
	db 'if',0,'int',0,'long',0,'register',0,'return',0
	db 'short',0,'signed',0,'sizeof',0,'static',0
	db 'struct',0,'switch',0,'typedef',0,'union',0
	db 'unsigned',0,'void',0,'volatile',0,'while',0
	db '#asm',0,'#endasm',0
LmtKeyWords:  dw 0FFFFh
ALIGN 16
HashNumb: TIMES 40 dd 0
LmtHashNumb:

	%INCLUDE 'EPILOG.NSM'
; --- End of Compilation ---

Running the above in a debugger and collecting the results at
HashNumb:

KEYWORDS	Hash Value

auto     00000C66
break	000018E2
case	00000BCE
char	00000BD8

const	00001974
continue	0000D11E
default	00006450
do	0000026E

double	000033BA
else	00000C46
enum	00000C6E
extern	0000354C

float	00001964
for	000005D0
goto	00000C96
if	00000270

int	000005E8
long	00000CBE
register	0000D8A8
return	000036AC

short	00001B08
signed	0000361C
sizeof	00003730
static	0000375A

struct	0000387C
switch	0000382C
typedef	00007210
union	00001B60

unsigned	0000E21C
void	00000D44
volatile	0000E1A2
while	00001B22

#asm	000007DE
#endasm	0000446E

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Order By Key
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

do	0000026E
if	00000270
for	000005D0
int	000005E8

#asm	000007DE
case	00000BCE
char	00000BD8
else	00000C46

auto	00000C66
enum	00000C6E
goto	00000C96
long	00000CBE

void	00000D44
break	000018E2
float	00001964
const	00001974

short	00001B08
while	00001B22
union	00001B60
double	000033BA

extern	0000354C
signed	0000361C
return	000036AC
sizeof	00003730

static	0000375A
switch	0000382C
struct	0000387C
#endasm	0000446E

default	00006450
typedef	00007210
continue	0000D11E
register	0000D8A8

volatile	0000E1A2
unsigned	0000E21C

I wanted to see what length of string could be handled without carry
out of the double word, so I used ' ' for the low value limit and '~'
for the high value limit.

24 characters of '~' can be hashed into a Double-Word *or*
26 characters of ' ' can be hashed into a Double-Word, the 27th shl
causes CF overflow.

The hash values are not dense, so a list of them requires a search on
the list for a match.  But this could be a benefit because an unknown
token is found not to be a keyword if its hash is greater than one
keyword hash but less than the next keyword hash.

But the real question is: is ADD; SHL; a true 'perfect hash' (no
collisions) algorithm over the set of 'limit length' strings?

Steve

> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

0
Reply s_dubrovich 12/31/2010 4:34:11 PM

<s_dubrovich@nospicedham.yahoo.com> wrote in message
news:94e86589-fe6a-47f3-8b79-bc88d19e467a@l32g2000yqc.googlegroups.com...
> On Dec 24, 5:18 am, Terje Mathisen <"terje.mathisen at
> tmsw.no"@giganews.com> wrote:
> >  ...

> The books I have on programming/computer science that discuss
> the subject are vague on algorithm development and testing..

That's because most hash algortihms are [really ineffective]...  Insert
typical curse words and phrases at your leisure or due to your exposure
level with hashes.

Hashes can fail in many ways.  The first, IMO, is speed.  Many are just too
slow to use.  The second, IMO, is collisions.  Almost all generate way too
many collisions.  The third issue is the type of data.  Most are "designed"
for binary data.  Yet, most hashes are used for string data.  There is far
less randomness in string data than in binary data, i.e., perform worse...
And, then, of course, are the types of failures that those experts who do
design hashes deem to be important...  Such as stuff you've never heard of
or care about, e.g., changing a single bit at a time, 3d-plots, encryption,
or "avalanche" etc.

There are a few ways to find them: 1) design, 2) trial and error, and 3)
look them up.  Design doesn't seem to work too well even for most experts.
Looking them up is a bad idea also since most [insert appropriate failure
mode curse words].  So, that leaves trial and error.  IMO, the two best
general hashes I've tested are:

Austin Appleby's MurmurHash2
Bob Jenkins' hashlittle() from lookup3.c

> > With a given, relatively small, set of keywords you can define a hash
> > function that results in zero collisions, so that you can simply
> > generate the hash as you read in characters, then go directly to the
> > corresponding table entry.
> >
>
> I reasoned an approach to a hash function for strings, but how do I
> proof that it is a perfect hash algorithm?

To be a perfect hash, it must not have any collisions, or they must be
resolvable.  Collisions occur when different inputs generate the same hash
value.  To not have any collisions or to remediate them, the entire input
set must be known in advance.  If any input is unknown, it's possible to
have a collision.  In which case, one must use some other method to
separate the inputs.

> Previously discussed, a 'prefix' of the first 4 characters of
> string was considered as a 'hash'.
>
> I thought briefly about summing each character in an input stream, but
> collisions are obvious: IF & FI (endif in intel's pseudo code) came to
> mind, as they would sum to the same value.  In thinking of 123, one
> hundred twenty-three, each digit has a rank according to it position
> of: (3 x 10**0) + (2 x 10**1) + (1 x 10**2).

It's difficult to reduce collisions.  You usually need a larger range of
output values.  Alternately, you can increase the randomness of the data.
ASCII characters and string data using them are not random enough to
generate good hashes.  One can translate ASCII characters to another
character set consisting of random data, then generate the hash from that.

> T thought I would try to express rank of a character in a string by
> bit-shift left for each ranking. - then summing in the next character..
>

What you need is a large dictionary of words or your inputs.  Feed them into
the algorithm.  Save the output.  Look at the collisions.  Change the
algorithm.  Repeat.  Again.  Again.  Again.  Again. Again.  Again.  Again.
Simple stuff doesn't work...

When you like the algorithm, you can see it catastrophically fail in
numerous ways with the test routines created by Bob Jenkins.

One of the odd things you'll come across is that by adding certain starting
values, "seeds", you'll generate better results than others.

> But the real question is: is ADD; SHL; a true 'perfect hash' (no
> collisions) algorithm over the set of 'limit length' strings?
>

Is the complete input set known?  Are the hash values assigned or computed?
If "no" and "computed", the answer is usually "no" since that situation will
generate collisions, but they can be resolved.


Rod Pemberton


0
Reply Rod 1/1/2011 6:09:13 AM

On Jan 1, 12:09=A0am, "Rod Pemberton"
<do_not_h...@nospicedham.notreplytome.cmm> wrote:
> <s_dubrov...@nospicedham.yahoo.com> wrote in message
>
> news:94e86589-fe6a-47f3-8b79-bc88d19e467a@l32g2000yqc.googlegroups.com...
>
> > On Dec 24, 5:18 am, Terje Mathisen <"terje.mathisen at
> > tmsw.no"@giganews.com> wrote:
> > > =A0...
> > The books I have on programming/computer science that discuss
> > the subject are vague on algorithm development and testing..
>
> That's because most hash algortihms are [really ineffective]... =A0Insert
> typical curse words and phrases at your leisure or due to your exposure
> level with hashes.
>
> Hashes can fail in many ways. =A0The first, IMO, is speed. =A0Many are ju=
st too
> slow to use. =A0The second, IMO, is collisions. =A0Almost all generate wa=
y too
> many collisions. =A0The third issue is the type of data. =A0Most are "des=
igned"
> for binary data. =A0Yet, most hashes are used for string data. =A0There i=
s far
> less randomness in string data than in binary data, i.e., perform worse..=
..
> And, then, of course, are the types of failures that those experts who do
> design hashes deem to be important... =A0Such as stuff you've never heard=
 of
> or care about, e.g., changing a single bit at a time, 3d-plots, encryptio=
n,
> or "avalanche" etc.
>
> There are a few ways to find them: 1) design, 2) trial and error, and 3)
> look them up. =A0Design doesn't seem to work too well even for most exper=
ts.
> Looking them up is a bad idea also since most [insert appropriate failure
> mode curse words]. =A0So, that leaves trial and error. =A0IMO, the two be=
st
> general hashes I've tested are:
>
> Austin Appleby's MurmurHash2
> Bob Jenkins' hashlittle() from lookup3.c
>
> > > With a given, relatively small, set of keywords you can define a hash
> > > function that results in zero collisions, so that you can simply
> > > generate the hash as you read in characters, then go directly to the
> > > corresponding table entry.
>
> > I reasoned an approach to a hash function for strings, but how do I
> > proof that it is a perfect hash algorithm?
>
> To be a perfect hash, it must not have any collisions, or they must be
> resolvable. =A0Collisions occur when different inputs generate the same h=
ash
> value. =A0To not have any collisions or to remediate them, the entire inp=
ut
> set must be known in advance. =A0If any input is unknown, it's possible t=
o
> have a collision. =A0In which case, one must use some other method to
> separate the inputs.
>
> > Previously discussed, a 'prefix' of the first 4 characters of
> > string was considered as a 'hash'.
>
> > I thought briefly about summing each character in an input stream, but
> > collisions are obvious: IF & FI (endif in intel's pseudo code) came to
> > mind, as they would sum to the same value. =A0In thinking of 123, one
> > hundred twenty-three, each digit has a rank according to it position
> > of: (3 x 10**0) + (2 x 10**1) + (1 x 10**2).
>
> It's difficult to reduce collisions. =A0You usually need a larger range o=
f
> output values. =A0Alternately, you can increase the randomness of the dat=
a.
> ASCII characters and string data using them are not random enough to
> generate good hashes. =A0One can translate ASCII characters to another
> character set consisting of random data, then generate the hash from that=
..
>
> > T thought I would try to express rank of a character in a string by
> > bit-shift left for each ranking. - then summing in the next character..
>
> What you need is a large dictionary of words or your inputs. =A0Feed them=
 into
> the algorithm. =A0Save the output. =A0Look at the collisions. =A0Change t=
he
> algorithm. =A0Repeat. =A0Again. =A0Again. =A0Again. =A0Again. Again. =A0A=
gain. =A0Again.
> Simple stuff doesn't work...
>
> When you like the algorithm, you can see it catastrophically fail in
> numerous ways with the test routines created by Bob Jenkins.
>
> One of the odd things you'll come across is that by adding certain starti=
ng
> values, "seeds", you'll generate better results than others.
>
> > But the real question is: is ADD; SHL; a true 'perfect hash' (no
> > collisions) algorithm over the set of 'limit length' strings?
>
> Is the complete input set known? =A0Are the hash values assigned or compu=
ted?
> If "no" and "computed", the answer is usually "no" since that situation w=
ill
> generate collisions, but they can be resolved.
>
> Rod Pemberton

Well, I don't need uniform distribution of hash values.  The important
property for what I am doing is just to be collision free.  And I'm
not confident that the method I used gives a perfect hash either, so
I'm left with brute force checking against a dictionary, as you
suggested as an option.  I was thinking I could infer by induction
that for the set of unique character pairs, that if no collisions were
found, then the algorithm produced a 'perfect hash'.  However, it may
be that a string of three low value characters collide with a string
of two higher value characters.

Steve
0
Reply s_dubrovich 1/2/2011 4:55:31 AM

s_dubrovich@nospicedham.yahoo.com wrote:
> Well, I don't need uniform distribution of hash values.  The important
> property for what I am doing is just to be collision free.  And I'm
> not confident that the method I used gives a perfect hash either, so
> I'm left with brute force checking against a dictionary, as you
> suggested as an option.  I was thinking I could infer by induction
> that for the set of unique character pairs, that if no collisions were
> found, then the algorithm produced a 'perfect hash'.  However, it may
> be that a string of three low value characters collide with a string
> of two higher value characters.

I'm sorry I sent you off on a side track:

For a parser/compiler you really cannot depend on a perfect hash, since 
you'll have to detect/handle any form of mis-spelling as well!

I.e. you can use a relatively small/fast hash to get to the right 
bucket, then do full string compares (preferably in full word chunks) to 
verify which actual key word you've located.

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 1/2/2011 9:51:41 AM

Steve in discussion about Hash:
....
|Well, I don't need uniform distribution of hash values.  The important
|property for what I am doing is just to be collision free.  And I'm
|not confident that the method I used gives a perfect hash either, so
|I'm left with brute force checking against a dictionary, as you
|suggested as an option.  I was thinking I could infer by induction
|that for the set of unique character pairs, that if no collisions were
|found, then the algorithm produced a 'perfect hash'.  However, it may
|be that a string of three low value characters collide with a string
|of two higher value characters.

With the posted set of ASCII-strings you may not even need any
algo to differentiate between these literals :)
Because first four byte are enough anyway. So why detour with
shift-add or whatsoever when there are no words occure that
have first 32-bits all equal ?

I always wonder why or what any hash-algo could gain on speed
or performance. I use sorted lists and seem to be on a faster
track in terms of searching (better call it 'find fast').
Of course there is some overhaed with inserts and resorting,
but this usually wont be needed too often in most case.

Sure any fixed lookup by index is faster, but I use this only
on a fixed set of strings like error-messages by err-numbers.
And for things like this there is no hash algo required at all.

__
wolfgang


0
Reply wolfgang 1/2/2011 4:48:09 PM

On Jan 2, 3:51=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> s_dubrov...@nospicedham.yahoo.com wrote:
> > Well, I don't need uniform distribution of hash values. =A0The importan=
t
> > property for what I am doing is just to be collision free. =A0And I'm
> > not confident that the method I used gives a perfect hash either, so
> > I'm left with brute force checking against a dictionary, as you
> > suggested as an option. =A0I was thinking I could infer by induction
> > that for the set of unique character pairs, that if no collisions were
> > found, then the algorithm produced a 'perfect hash'. =A0However, it may
> > be that a string of three low value characters collide with a string
> > of two higher value characters.
>
> I'm sorry I sent you off on a side track:

Don't be sorry, the whole of life is a side track and a compromise.  I
enjoyed the jaunt.

>
> For a parser/compiler you really cannot depend on a perfect hash, since
> you'll have to detect/handle any form of mis-spelling as well!
>
> I.e. you can use a relatively small/fast hash to get to the right
> bucket, then do full string compares (preferably in full word chunks) to
> verify which actual key word you've located.
>

Yes, although Wolfgang makes his point.

I was leaning in your direction, if a hash is used that is not proven
to be 'perfect', because that should handle collisions.

Steve

> Terje
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

0
Reply s_dubrovich 1/2/2011 8:37:41 PM

On Jan 2, 10:48=A0am, "wolfgang kern" <nowh...@never.at> wrote:
> Steve in discussion about Hash:
> ...
> |Well, I don't need uniform distribution of hash values. =A0The important
> |property for what I am doing is just to be collision free. =A0And I'm
> |not confident that the method I used gives a perfect hash either, so
> |I'm left with brute force checking against a dictionary, as you
> |suggested as an option. =A0I was thinking I could infer by induction
> |that for the set of unique character pairs, that if no collisions were
> |found, then the algorithm produced a 'perfect hash'. =A0However, it may
> |be that a string of three low value characters collide with a string
> |of two higher value characters.
>
> With the posted set of ASCII-strings you may not even need any
> algo to differentiate between these literals :)
> Because first four byte are enough anyway. So why detour with
> shift-add or whatsoever when there are no words occure that
> have first 32-bits all equal ?
>

Well because the test data set, ansi-c keywords, may be substituted if
the method is to be generic.

> I always wonder why or what any hash-algo could gain on speed
> or performance. I use sorted lists and seem to be on a faster
> track in terms of searching (better call it 'find fast').
> Of course there is some overhaed with inserts and resorting,
> but this usually wont be needed too often in most case.
>
> Sure any fixed lookup by index is faster, but I use this only
> on a fixed set of strings like error-messages by err-numbers.
> And for things like this there is no hash algo required at all.
>

No, not required here, but perhaps kept in mind for (or against) in
another situation.

Steve

> __
> wolfgang

0
Reply s_dubrovich 1/2/2011 8:45:33 PM

Il 02.01.2011 10:51, Terje Mathisen ha scritto:
> s_dubrovich@nospicedham.yahoo.com wrote:
>> Well, I don't need uniform distribution of hash values. The important
>> property for what I am doing is just to be collision free. And I'm
>> not confident that the method I used gives a perfect hash either, so
>> I'm left with brute force checking against a dictionary, as you
>> suggested as an option. I was thinking I could infer by induction
>> that for the set of unique character pairs, that if no collisions were
>> found, then the algorithm produced a 'perfect hash'. However, it may
>> be that a string of three low value characters collide with a string
>> of two higher value characters.
>
> I'm sorry I sent you off on a side track:
>
> For a parser/compiler you really cannot depend on a perfect hash, since
> you'll have to detect/handle any form of mis-spelling as well!
>
> I.e. you can use a relatively small/fast hash to get to the right
> bucket, then do full string compares (preferably in full word chunks) to
> verify which actual key word you've located.
>
> Terje

If you want a  good "collision-free" hash than use a simple one: the 
*sdbm* in my tests (compiler numbered/labels) wins on bernstein djb2.

Once i was a fan of djb2, but now, after some casual lucky :-) 
sdbm-tuning i use only the latter.
sdbm works good and speedly, with or without tuning.

Algos:
http://www.cse.yorku.ca/~oz/hash.html

/for the record/
Numbers from my personal tunings:
100% collision free on 4.000.000 compiler-like labels
100% collision-free on 47000~ compounded directory names
40 duplicated on 626182 english dictionary words.

1) there are alot of methods; creativity is the limit. as example,
    using 2 or more functions, *adapting* the Cuckoo hashing may be an
    idea. see  http://en.wikipedia.org/wiki/Cuckoo_hashing

2) you dont need necessairly a "uniform distribution";
    what the practical meanining of u.d. related to your program ?

3) you dont need a perfect hash;
    a simple, clever structured lookup-table could give you in few (a
    dozen,2 dozens) comparisons all keywords you need. (some thousands)

4) avoid using the known hash functions (Murmur,Jenkins,etc);
    all of these start from the point of view of the good
    uniform distribution. It seems a religious song for the prayers:
    "Every hasheth function must do have the good old
    distribution...blah,blah.."

5) dont trust FNV without or *with* biasing/ajustment. it doesnt work!
    see, http://home.comcast.net/~bretm/hash/6.html



-- 

..::mrk::.
  x64 Assembly Lab
  http://sites.google.com/site/x64lab
0
Reply hopcode 1/2/2011 11:03:07 PM

Il 02.01.2011 21:45, s_dubrovich@nospicedham.yahoo.com ha scritto:
> On Jan 2, 10:48 am, "wolfgang kern"<nowh...@never.at>  wrote:
>> Steve in discussion about Hash:
>> ...
>>
>> With the posted set of ASCII-strings you may not even need any
>> algo to differentiate between these literals :)
>> Because first four byte are enough anyway. So why detour with
>> shift-add or whatsoever when there are no words occure that
>> have first 32-bits all equal ?
>>
>
> Well because the test data set, ansi-c keywords, may be substituted if
> the method is to be generic.
>
> Steve
>
>> __
>> wolfgang
>

A tip may sound this way: do not store ASCII-keywords-strings in your
compiler at all. Store special pre-calculated values. During tokenizing
you read already things one-by-one. This give the tokenizer a powerful
separation/abstraction/modular/pluggable capability on the used language 
(of your keywords etc.).
I hope it helps, ;-)

Cheers,

-- 

..::mrk::.
  x64 Assembly Lab
  http://sites.google.com/site/x64lab
0
Reply hopcode 1/2/2011 11:21:36 PM

On Jan 2, 5:21=A0pm, hopcode <hopc...@nospicedham.nullnichtsnada.com>
wrote:
> Il 02.01.2011 21:45, s_dubrov...@nospicedham.yahoo.com ha scritto:
>
>
>
>
>
> > On Jan 2, 10:48 am, "wolfgang kern"<nowh...@never.at> =A0wrote:
> >> Steve in discussion about Hash:
> >> ...
>
> >> With the posted set of ASCII-strings you may not even need any
> >> algo to differentiate between these literals :)
> >> Because first four byte are enough anyway. So why detour with
> >> shift-add or whatsoever when there are no words occure that
> >> have first 32-bits all equal ?
>
> > Well because the test data set, ansi-c keywords, may be substituted if
> > the method is to be generic.
>
> > Steve
>
> >> __
> >> wolfgang
>
> A tip may sound this way: do not store ASCII-keywords-strings in your
> compiler at all. Store special pre-calculated values. During tokenizing
> you read already things one-by-one. This give the tokenizer a powerful
> separation/abstraction/modular/pluggable capability on the used language
> (of your keywords etc.).
> I hope it helps, ;-)
>

Sure another idea.
And thanks for the links above, interesting.

thanks again to all,

Steve


> Cheers,
>
> --
>
> .::mrk::.
> =A0 x64 Assembly Lab
> =A0http://sites.google.com/site/x64lab- Hide quoted text -
>
> - Show quoted text -

0
Reply s_dubrovich 1/3/2011 10:16:26 PM

107 Replies
205 Views

(page loaded in 1.648 seconds)

Similiar Articles:


















7/23/2012 7:20:50 PM


Reply: