What causes this infinite loop?

  • Follow


I hope y'all don't mind my frequent questions, so here's another...

In emu8086 I wanted to see what would happen if I left some data on the
stack and exit the program with the following:

;begin
mov ax, 0ffh ; Is it a good practice to start all hex numbers with a
zero?
push ax
;pop ax ; If I pop ax, the program exits and all appears aok.
ret
end
;end

I thought the program would stop when IP reached ffffh, but it started
over at 0 and kept chugging along. The emulator jumps to "ret" and
starts executing, "ADD [BX + SI], AL". The memory locations start
incrementing filling with NULLs that look like this, "0E711: 00 000
NULL" (memory hex decimal ascii). BX and SI are both "0000", so the
code is trying put ffh into offset zero. IIRC, each zero offset of
every assembly program I have run has always been NULL (not sure why,
but it's probably important). So, since the code is trying to put a
value into what is probably an important memory location, there is a
disturbance in the force.

If I "pop ax" after the push, the program exits at "10F00: F4 244
<ascii character>" (memory hex decimal ascii) with the next memory
location being "10F01: 00 000 NULL" (memory hex decimal ascii) and
associated with the instruction "ADD [BX + SI], AL".

Is it always the case that a program with "blow up" if something is
left on the stack?
--
Sam

0
Reply spamtrap2 (1628) 8/8/2006 5:14:47 PM

Sam wrote:
> I hope y'all don't mind my frequent questions, so here's another...
> 
> In emu8086 I wanted to see what would happen if I left some data on the
> stack and exit the program with the following:

What environment are you running this code in? At what address is it 
loaded and what calls it? This determines the meaning of "exiting" the 
program.

> ;begin
> mov ax, 0ffh ; Is it a good practice to start all hex numbers with a
> zero?
> push ax
> ;pop ax ; If I pop ax, the program exits and all appears aok.
> ret
> end
> ;end

The ret instruction simply pops one word of the top of the stack and 
jumps to that address, so in this example it jumps to address ff in the 
current code segment when it reaches the ret instruction.

> IIRC, each zero offset of
> every assembly program I have run has always been NULL (not sure why,
> but it's probably important). So, since the code is trying to put a
> value into what is probably an important memory location, there is a
> disturbance in the force.

What is at offset 0 depends on what else besides your code is running. 
If you start the program from MSDOS, for instance, address 0 is in the 
Program Segment Prefix and contains some data or other describing the 
process or relating to its termination (too lazy to look it up). The PSP 
is 256 bytes long, which is why ORG is always 100 hex in MSDOS .com 
programs.

If your code is the only code running, you decide what is at offset 0. 
However, trying to return from a piece of code with ret does not make 
sense if that code was not called from somewhere.

> Is it always the case that a program with "blow up" if something is
> left on the stack?

As stated, the ret instruction simply jumps to whatever address is on 
top of the stack. If this is not really an address but some forgotten 
data, the program will jump to somewhere unintended and likely blow up, yes.


Bjarni
-- 

                        INFORMATION WANTS TO BE FREE

0
Reply Bjarni 8/8/2006 8:54:04 PM


Sam wrote:
> I hope y'all don't mind my frequent questions, so here's another...
> 
> In emu8086 I wanted to see what would happen if I left some data on the
> stack and exit the program with the following:
> 
> ;begin
> mov ax, 0ffh ; Is it a good practice to start all hex numbers with a
> zero?

With most assemblers, "It's not just a good idea, it's the law!" 
Usually, an "identifier" must not start with a decimal digit, a number 
must start with a decimal digit. If the "hex number" starts with a 
decimal digit, you don't have to add a zero. There isn't much point to 
"mov ah, 09h" :)

cmp eax, 0DEADh
jz dead
dead:

Without the zero, how would the assembler know which was the number and 
which was the label?

Nasm will let you indicate hex numbers three ways, 0FFFFh, 0xFFFF, and 
$0FFFF (note that the "$" notation still needs the zero, 'cause of other 
ways Nasm uses "$"). A leading zero does *not* indicate octal, as in C, 
Nasm does octal as 777q. Case is optional - the uppercase hex digits and 
lowercase 'h' is my preference... Starting with "0x" is compatible with 
C, and that might be an advantage... Not all assemblers use the same 
syntax...

> push ax
> ;pop ax ; If I pop ax, the program exits and all appears aok.
> ret

Oh yes, your question! :) If you've got data on the stack, you aren't 
exiting the program. The "ret" instruction transfers control to the 
address on the stack, and removes that item - much like "pop ip", if 
there were such an instruction. Ordinarily, that return address would 
have been put on the stack by the "call" instruction... Equivalent to:

push ret_add
jmp subroutine
ret_add:

In the case of a .com file, dos has pushed a zero on the stack when it 
loads the file. At cs:0000, you'll find - as part of the "Program 
Segment Prefix" that dos created - "CD 20"(hex) - int 20h, the real 
"exit to system" interrupt. Note that this is at the bottom of your 
segment, not the bottom of memory. At 0000:0000 you'll find the 
"Interrupt Vector Table" - the list of addresses for the interrupts - 
not just the interrupts *you* can use, but hardware interrupts like the 
timer that interrupts 18.2 times per second, and the keyboard interrupt 
(on key press *and* key release). Scribbling in this area is bad! :)

.....
> IIRC, each zero offset of
> every assembly program I have run has always been NULL (not sure why,
> but it's probably important)

That's weird. I'd expect CD 20 .. .. .. Well... this is in an emulator, 
right?

> So, since the code is trying to put a
> value into what is probably an important memory location, there is a
> disturbance in the force.

That's one way to put it! :)

> If I "pop ax" after the push, the program exits at "10F00: F4 244
> <ascii character>" (memory hex decimal ascii) with the next memory
> location being "10F01: 00 000 NULL" (memory hex decimal ascii) and
> associated with the instruction "ADD [BX + SI], AL".

Yeah... when you see that, you're "executing zeros". Generally an 
indication that your program's run off to somewhere you didn't intend.

> Is it always the case that a program with "blow up" if something is
> left on the stack?

No, you could "int 20h" or "mov ah, 4Ch"/"int 21h" with any kind of 
garbage on the stack. Or you could put something on the stack that was a 
valid address to real code... The problem is that "ret" uses whatever it 
finds on the stack as the address to "ret" to. The CPU will execute 
whatever it finds there - if it can - whether it was intended to be code 
or not!

Best,
Frank

0
Reply Frank 8/9/2006 9:00:21 AM

Frank Kotler <spamtrap@crayne.org> wrote in part:
> Sam wrote:
>> mov ax, 0ffh ; Is it a good practice to start all hex numbers with a
>> zero?
> 
> With most assemblers, "It's not just a good idea, it's the law!" 
> Usually, an "identifier" must not start with a decimal digit, a number 
> must start with a decimal digit. If the "hex number" starts with a 
> decimal digit, you don't have to add a zero. There isn't much point to 
> "mov ah, 09h" :)

With some old-time assemblers, if you use a leading zero, octal
notation is assumed.  They then usually use 0x to indicate hex.

-- Robert

0
Reply Robert 8/9/2006 1:30:56 PM

Sam <spamtrap@crayne.org> wrote in part:
> Is it always the case that a program with "blow up" if
> something is left on the stack?  -- Sam

This is entirely OS dependant, and the pgm entry and exit
section match the OS requirements.  For simple MS-DOS *.COM
pgms, the loader pushes 0000h onto the stack, and loads the
PSP with `int 20h` at CS:0 so the pgm can be conveniently
terminated with `ret` if the stack is kept aligned.  But notice
the `ret` doesn't terminate anything even under this most
rudimentary of OSes.  `int 20h` does the work.

This is generally true for more advanced [modern] OSes.
There is a syscall for terminating a pgm.  This does
important things like close files and free memory.

In general, the stack can be left with garbage so
long as `ret` is not used.  Some people use no stack,
often to use [E]SP as a general purpose register.

-- Robert

0
Reply Robert 8/9/2006 1:45:12 PM

In article <NIOdneYyOeh3OUTZnZ2dnUVZ_tKdnZ2d@comcast.com>, 
spamtrap@crayne.org says...

[ ... ]

> With most assemblers, "It's not just a good idea, it's the law!" 
> Usually, an "identifier" must not start with a decimal digit, a number 
> must start with a decimal digit. If the "hex number" starts with a 
> decimal digit, you don't have to add a zero. There isn't much point to 
> "mov ah, 09h" :)

I'm not at all sure that's true of most assemblers at all. Most of the 
assemblers I've used on non-Intel platforms have used what I'd consider 
more sensible rules. There's virtually nothing sensible about using a 
suffix to signal a number's base -- a prefix makes far more sense. It 
makes the code easier to read and the assembler easier to write.

In a typical case, a hexadecimal number is signalled by a '$' as a 
prefix:

lda #14	; loads accumulator with 14 decimal
lda #$14	; loads accumulator with 14 hexadecimal

Simple, straightforward, and no confusion. Oh, in case you wondered, 
those same assemblers typically signal an immediate value with '#':

lda #14	; the value 14
lda 14		; the value from address 14
lda (14)	; the value pointed to by address 14
		; some use '[14]' or '*14' instead.

Since assemblers for Intel platforms generally don't require anything to 
signal an immediate value, there's nearly constant confusion in this 
area. Specifically, there are three distinct cases, but they provide 
only two notations to differentiate between them.

Some assemblers (e.g. NASM) have consistent rules, but from the number 
of times questions about this arise here, it's pretty clear that most 
people don't quite understand them. Even when you do understand them, 
it's annoying to use (IMO, obviously).

MASM goes the opposite direction -- it attempts to be less annoying to 
use (mostly successfully, IMO), but its rules about what means what are 
so convoluted that I doubt even its authors are sure of them.

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.

0
Reply Jerry 8/9/2006 3:03:06 PM

Jerry Coffin wrote:
> In article <NIOdneYyOeh3OUTZnZ2dnUVZ_tKdnZ2d@comcast.com>, 
> spamtrap@crayne.org says...
> 
> [ ... ]
> 
> 
>>With most assemblers, "It's not just a good idea, it's the law!" 
>>Usually, an "identifier" must not start with a decimal digit, a number 
>>must start with a decimal digit. If the "hex number" starts with a 
>>decimal digit, you don't have to add a zero. There isn't much point to 
>>"mov ah, 09h" :)
> 
> 
> I'm not at all sure that's true of most assemblers at all. Most of the 
> assemblers I've used on non-Intel platforms have used what I'd consider 
> more sensible rules. There's virtually nothing sensible about using a 
> suffix to signal a number's base -- a prefix makes far more sense. It 
> makes the code easier to read and the assembler easier to write.

Okay, I'm not familiar with that many assemblers - especially non-x86. I 
go back far enough that "x" used to mean "times" - multiplication - and 
whenever I see "zero times ???" my brain goes "BZZZZT", so I really 
don't like the "0x" notation. I'm gradually getting used to it, but it's 
taking a long time! :)

> In a typical case, a hexadecimal number is signalled by a '$' as a 
> prefix:
> 
> lda #14	; loads accumulator with 14 decimal
> lda #$14	; loads accumulator with 14 hexadecimal
> 
> Simple, straightforward, and no confusion. Oh, in case you wondered, 
> those same assemblers typically signal an immediate value with '#':
> 
> lda #14	; the value 14
> lda 14		; the value from address 14
> lda (14)	; the value pointed to by address 14
> 		; some use '[14]' or '*14' instead.
> 
> Since assemblers for Intel platforms generally don't require anything to 
> signal an immediate value,

..... but many do...

> there's nearly constant confusion in this 
> area. Specifically, there are three distinct cases, but they provide 
> only two notations to differentiate between them.
> 
> Some assemblers (e.g. NASM) have consistent rules, but from the number 
> of times questions about this arise here, it's pretty clear that most 
> people don't quite understand them. Even when you do understand them, 
> it's annoying to use (IMO, obviously).

Not so obvious to me... Seems quite unintuitive to require some "marker" 
to indicate an immediate. The "[mem]" notation visually suggests 
"contents" to me. What could be simpler? Obviously, preferences differ - 
different assemblers don't do it differently by accident!

Best,
Frank

0
Reply Frank 8/9/2006 7:05:31 PM

6 Replies
109 Views

(page loaded in 0.098 seconds)

5/24/2013 7:06:33 AM


Reply: