Disassembler questions

  • Follow


Hello,
  I have been recently trying to see how some simple c programs translate
  into assembly but my efforts have led to endless segfaults after
  reassembling them

so i started with a simple hello world program
	int main(){
		printf("Hello, World!\n");
	}
 which to my surprise "ndisasm -b 32" returned a asm file that was 30000+
 lines long
 so i thought hmm that wasn't right
 so i did the simple hello world  with nasm
	section .data
		msg db "Hello, World!","$"
		len equ $ - msg
	section .code
	global _start
	
	_start:
		mov edx,len
		mov ecx,msg
		mov ebx,1	
		mov eax,4
		int 21h
	
	mov eax,1
	mov ebx,0
	int 21h

also returned code with several thousands of line (mostly 'add [eax], al')

any way i was wondering if there was a disassembler that if you reassemble
it it works like the first program


			also i get stuff that doesn't work right like 'loopne 0x154f38'

i also tried udis but that worked less than ndisasm

-dylan

0
Reply thiesd 7/26/2008 5:35:39 PM

"thiesd" <spamtrap@crayne.org> wrote in message 
news:g6fn9h$f6o$1@misc-cct.server.rpi.edu...
> Hello,
>  I have been recently trying to see how some simple c programs translate
>  into assembly but my efforts have led to endless segfaults after
>  reassembling them
>
> so i started with a simple hello world program
> int main(){
> printf("Hello, World!\n");
> }
> which to my surprise "ndisasm -b 32" returned a asm file that was 30000+
> lines long

Depending on the compiler, it will most likely add all of the startup code,
including the PE header (for Windows), etc.  Also, see below for more.

> so i thought hmm that wasn't right
> so i did the simple hello world  with nasm
> section .data
> msg db "Hello, World!","$"
> len equ $ - msg
> section .code
> global _start
>
> _start:
> mov edx,len
> mov ecx,msg
> mov ebx,1
> mov eax,4
> int 21h
>
> mov eax,1
> mov ebx,0
> int 21h

The code looks like Linux, but the int 21h looks like DOS.
I haven't done any Linux programming, so I may be wrong here,
but shouldn't the int 21h be int 80h?

> also returned code with several thousands of line (mostly 'add [eax], al')

If you will notice, the binary form for  add [eax],al is 00 00.

Most likely, nasm is adding padding to the end of the section.  I don't
know much about nasm, but this would be my guess.  Now, if a section
is 4096 bytes, and it takes 2 bytes per add [eax],al, then you would
have 2048 lines.  Not several thousand, but a couple thousand non the less.

> any way i was wondering if there was a disassembler that if you reassemble
> it it works like the first program

A disassembler simply takes the binary bytes and translates them
to the corresponding mnemonics.  It doesn't know the difference between
data or code.

>From your post, I get that you want a "smart" disassembler.  There are
a few debuggers that do a better job, but the better the job, the more
expensive the tool.

> also i get stuff that doesn't work right like 'loopne 0x154f38'

What is wrong with  loopne 0x154f38 ?

It is the same as loopnz 0x154f38, which loops if the zero flag
is clear and ecx != 0.

> i also tried udis but that worked less than ndisasm

I have never tried udis, so I have no comment here.  Maybe someone
else has and can comment.

Ben

0
Reply Benjamin 7/26/2008 7:58:07 PM


"thiesd" <spamtrap@crayne.org> wrote in message 
news:g6fn9h$f6o$1@misc-cct.server.rpi.edu...
> Hello,
>  I have been recently trying to see how some simple c programs translate
>  into assembly but my efforts have led to endless segfaults after
>  reassembling them
>
> so i started with a simple hello world program
> int main(){
> printf("Hello, World!\n");
> }
> which to my surprise "ndisasm -b 32" returned a asm file that was 30000+
> lines long

Depending on the compiler, it will most likely add all of the startup code,
including the PE header (for Windows), etc.  Also, see below for more.

> so i thought hmm that wasn't right
> so i did the simple hello world  with nasm
> section .data
> msg db "Hello, World!","$"
> len equ $ - msg
> section .code
> global _start
>
> _start:
> mov edx,len
> mov ecx,msg
> mov ebx,1
> mov eax,4
> int 21h
>
> mov eax,1
> mov ebx,0
> int 21h

The code looks like Linux, but the int 21h looks like DOS.
I haven't done any Linux programming, so I may be wrong here,
but shouldn't the int 21h be int 80h?

> also returned code with several thousands of line (mostly 'add [eax], al')

If you will notice, the binary form for  add [eax],al is 00 00.

Most likely, nasm is adding padding to the end of the section.  I don't
know much about nasm, but this would be my guess.  Now, if a section
is 4096 bytes, and it takes 2 bytes per add [eax],al, then you would
have 2048 lines.  Not several thousand, but a couple thousand non the less.

> any way i was wondering if there was a disassembler that if you reassemble
> it it works like the first program

A disassembler simply takes the binary bytes and translates them
to the corresponding mnemonics.  It doesn't know the difference between
data or code.

>From your post, I get that you want a "smart" disassembler.  There are
a few debuggers that do a better job, but the better the job, the more
expensive the tool.

> also i get stuff that doesn't work right like 'loopne 0x154f38'

What is wrong with  loopne 0x154f38 ?

It is the same as loopnz 0x154f38, which loops if the zero flag
is clear and ecx != 0.

> i also tried udis but that worked less than ndisasm

I have never tried udis, so I have no comment here.  Maybe someone
else has and can comment.

Ben

0
Reply Benjamin 7/26/2008 7:58:07 PM

On 26-Jul-2008, "Benjamin David Lunt"  <spamtrap@crayne.org> wrote:

> > section .data
> > msg db "Hello, World!","$"
> > len equ $ - msg
> > section .code
> > global _start
> >
> > _start:
> > mov edx,len
> > mov ecx,msg
> > mov ebx,1
> > mov eax,4
> > int 21h
> >
> > mov eax,1
> > mov ebx,0
> > int 21h
>
> The code looks like Linux, but the int 21h looks like DOS.
> I haven't done any Linux programming, so I may be wrong here,
> but shouldn't the int 21h be int 80h?

that is dos code ^_^ but yea you are right to work in linux it needs to be
80h (i've tried on both os)

i have been messing around with differant methods and found that objdump (i
think it is a linux but also comes with djgpp) gives the best results  so
far but it requires some clean up to code
i can figure out the trivial stuff like "test.o: \n\n\t Disassembly of
`_main':" like stuff but there is some syntax that i cannot understand such
as "call   4004e0 <_start+0x90>", well i understand it but i don't know how
to make it assemble correctly 

0
Reply Dylan 7/26/2008 8:33:22 PM

thiesd wrote:
> Hello,
>   I have been recently trying to see how some simple c programs translate
>   into assembly but my efforts have led to endless segfaults after
>   reassembling them
> 
> so i started with a simple hello world program
> 	int main(){
> 		printf("Hello, World!\n");
> 	}
>  which to my surprise "ndisasm -b 32" returned a asm file that was 30000+
>  lines long
>  so i thought hmm that wasn't right

:)

Linked with the "--static" switch? Well, anyway, there's a lot of 
"cruft"  in a C-generated file (IME). Giving ndisasm a long and 
complicated command line may help (RTFM)... "gcc -S" or "objdump -s" may 
produce more useful results (although not in Nasm syntax).

>  so i did the simple hello world  with nasm
> 	section .data
> 		msg db "Hello, World!","$"
> 		len equ $ - msg
> 	section .code
> 	global _start
> 	
> 	_start:
> 		mov edx,len
> 		mov ecx,msg
> 		mov ebx,1	
> 		mov eax,4
> 		int 21h

Say what???

> 	mov eax,1
> 	mov ebx,0
> 	int 21h

I suppose this is a "posto", and the real file is int 80h? If not... I 
have *no* idea! :)

> also returned code with several thousands of line (mostly 'add [eax], al')

Lots of zero-padding... (giving ld the "-s" switch, or better yet, 
"strip -R.comment myfile" will help some)

> any way i was wondering if there was a disassembler that if you reassemble
> it it works like the first program

Try Jeff Owens' "asmsrc":

http://linuxasmtools.net/

No promises - a "perfect" disassembly (of "any arbitrary file") is 
theoretically "impossible", but at least "asmsrc" is intended to do what 
you want. It *does* work (imperfectly) on your example file (with the 
int 21h's "promoted" to int 80h)...

;Input file: hw.src

(no, Jeff, the input file was "hw", *this* file is "hw.src"... Close 
enough for asm :)

;Dynamic Libraries found: no
;Lib startup code wrapper found: no
;Symbol table found: yes
;Debug symbols found: no

;static load file
; Compile with:  nasm -felf xxxx.asm -o xxxx.o
;                ld xxxx.o -o xxxx
; (xxxx = filename)


  global _start

   [section .text]
_start:
    mov        eax,04H
    mov        ebx,01H
    mov        ecx,msg
    mov        edx,0DH
    int        byte 080H
    mov        eax,01H
    int        byte 080H
msg:
    dec        eax
    db "ello, world"
    db 00Ah


Dunno where the "dec eax" came from - asmsrc *does* seem to understand 
that it's doing data (my error - I put "msg in .text - works right with 
it in .data)... Works as intended anyway. Good Luck!!!

Best,
Frank

0
Reply Frank 7/26/2008 8:46:50 PM

On Sat, 26 Jul 2008 12:58:07 -0700
"Benjamin David Lunt"  <spamtrap@crayne.org> wrote:

> The code looks like Linux, but the int 21h looks like DOS.
> I haven't done any Linux programming, so I may be wrong here,
> but shouldn't the int 21h be int 80h?

Indeed it should.

> Most likely, nasm is adding padding to the end of the section.  I
> don't know much about nasm, but this would be my guess

NASM itself adds an ELF header before the user code, and a lot of
system information following it, but it does not pad out the section(s).
However, if one is disassembling the executable module, then one must
also consider what the linker has done.

-- 
Chuck 
http://www.pacificsites.com/~ccrayne/charles.html

0
Reply Charles 7/26/2008 8:58:30 PM

On Sat, 26 Jul 2008 17:35:39 GMT
thiesd <spamtrap@crayne.org> wrote:

> any way i was wondering if there was a disassembler that if you
> reassemble it it works like the first program

The NASM disassembler is designed for files in binary format, and needs
some guidance from the user about where the user code begins and ends
in other formats. As it appears that you are using Linux, read the man
pages for readelf and objdump.

-- 
Chuck 
http://www.pacificsites.com/~ccrayne/charles.html

0
Reply Charles 7/26/2008 9:02:57 PM

"Charles Crayne" <spamtrap@crayne.org> wrote in message 
news:20080726135830.39e8d8d6@thor.crayne.org...
> On Sat, 26 Jul 2008 12:58:07 -0700
> "Benjamin David Lunt"  <spamtrap@crayne.org> wrote:
>
>> The code looks like Linux, but the int 21h looks like DOS.
>> I haven't done any Linux programming, so I may be wrong here,
>> but shouldn't the int 21h be int 80h?
>
> Indeed it should.
>
>> Most likely, nasm is adding padding to the end of the section.  I
>> don't know much about nasm, but this would be my guess
>
> NASM itself adds an ELF header before the user code, and a lot of
> system information following it, but it does not pad out the section(s).
> However, if one is disassembling the executable module, then one must
> also consider what the linker has done.

Thanks Chuck,

I knew that.  I have been working too much with binary only
assembly modules.  I haven't used a linker with my assembly
in years. :-)

Thanks,
Ben 

0
Reply Benjamin 7/26/2008 9:10:22 PM

On Sat, 26 Jul 2008 20:33:22 GMT
"Dylan Thies"  <spamtrap@crayne.org> wrote:

> i can figure out the trivial stuff like "test.o: \n\n\t Disassembly of
> `_main':" like stuff but there is some syntax that i cannot
> understand such as "call   4004e0 <_start+0x90>", well i understand
> it but i don't know how to make it assemble correctly 

Unless the original compilation or assembly was done with debug
information requested, local labels will be lost, and you will have to
replace them. So, to make "call   4004e0 <_start+0x90>" assemble
correctly you need to change it to something like "call  L4004e0", and
also insert this same label in front of the instruction which starts
at the address which is 90 hex bytes after the label _start. Of course,
a more meaningful label would be even better.

-- 
Chuck 
http://www.pacificsites.com/~ccrayne/charles.html

0
Reply Charles 7/26/2008 11:43:40 PM

I figured out most of what i wanted to know now

turns out the

> call   4004e0 <_start+0x90>
 stuff was actually objdump  trying to disassemble stuff that didn't need to
 be disassembled
I figured this out when ever I looked at the hex of the "message:" label. 
Funny thing is it was "Hello, World!" (not that exact part of code i quoted
but the whole section).

anyways thanks for the insight to some of the other programs avalible to
disassmble.



-Dylan

0
Reply Dylan 7/27/2008 1:52:50 AM

hello again,
Is there a way to get the out put like readelf and objdump on binary files
also is there such that runs in win xp

-Dylan

0
Reply Dylan 7/27/2008 7:41:45 AM

On Jul 27, 12:41�am, "Dylan Thies"  <spamt...@crayne.org> wrote:
> hello again,
> Is there a way to get the out put like readelf and objdump on binary files
> also is there such that runs in win xp
>
> -Dylan

For windows, try using IDA Pro.  There is a free version available.
I'm not familiar with the elf file format, so I cannot help you there.

0
Reply bwaichu 7/28/2008 6:53:47 AM

"Dylan Thies"  <spamtrap@crayne.org> wrote:
>
>Is there a way to get the out put like readelf and objdump on binary files
>also is there such that runs in win xp

Assuming you have Visual Studio, you can get a first attempt by:
    link /dump /disasm xxx.obj

It isn't as helpful as products like IDA.
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

0
Reply Tim 7/29/2008 6:03:29 AM

On Tue, 29 Jul 2008 06:03:29 GMT, Tim Roberts  <spamtrap@crayne.org>
wrote:

>"Dylan Thies"  <spamtrap@crayne.org> wrote:
>>
>>Is there a way to get the out put like readelf and objdump on binary files
>>also is there such that runs in win xp
>
>Assuming you have Visual Studio, you can get a first attempt by:
>    link /dump /disasm xxx.obj
>
>It isn't as helpful as products like IDA.

Almost useless, IMO.  :-)
-- 
ArarghMail807 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the extra stuff from the reply address.

0
Reply ArarghMail807NOSPAM 7/29/2008 7:14:41 AM

On Tue, 29 Jul 2008 02:14:41 -0500, ArarghMail807NOSPAM
<spamtrap@crayne.org> wrote:

>On Tue, 29 Jul 2008 06:03:29 GMT, Tim Roberts  <spamtrap@crayne.org>
>wrote:
>
>>"Dylan Thies"  <spamtrap@crayne.org> wrote:
>>>
>>>Is there a way to get the out put like readelf and objdump on binary files
>>>also is there such that runs in win xp
>>
>>Assuming you have Visual Studio, you can get a first attempt by:
>>    link /dump /disasm xxx.obj
>>
>>It isn't as helpful as products like IDA.
>
>Almost useless, IMO.  :-)
The first, not IDA.  :-)
-- 
ArarghMail807 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the extra stuff from the reply address.

0
Reply ArarghMail807NOSPAM 7/29/2008 8:43:02 AM

     Ok so in my wanderings of the web and thanks to y'all, I think
     (therefore i might be?)
that for linux objdump/readelf are amazing and wish they worked for non-elf
object
and as far as windows goes ipa pro is really usefull has it's quirks but
very usefull
so  I want to thank you all for your help, and for people new to
dissassembly
IPA pro + 		for windows users
objdump/readelf ++ 	for linux users
(^.^)
-Dylan

0
Reply Dylan 7/29/2008 4:55:09 PM

On Jul 29, 12:14 am, ArarghMail807NOSPAM <spamt...@crayne.org> wrote:
> On Tue, 29 Jul 2008 06:03:29 GMT, Tim Roberts  <spamt...@crayne.org>
> wrote:
>
> >"Dylan Thies"  <spamt...@crayne.org> wrote:
>
> >>Is there a way to get the out put like readelf and objdump on binary files
> >>also is there such that runs in win xp
>
> >Assuming you have Visual Studio, you can get a first attempt by:
> >    link /dump /disasm xxx.obj
>
> >It isn't as helpful as products like IDA.
>
> Almost useless, IMO.  :-)

If you do that on an EXE for which you have a PDB file with symbols,
it may be useful. :)

Alex

0
Reply Alexei 7/30/2008 7:36:51 AM

"Dylan Thies"  <spamtrap@crayne.org> wrote:
>
>so  I want to thank you all for your help, and for people new to
>dissassembly
>IPA pro + 		for windows users
>objdump/readelf ++ 	for linux users

Although I am thoroughly in favor of the liberal application of IPA to
programming projects, you meant IDA here.

(IPA = India pale ale)
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

0
Reply Tim 7/31/2008 5:58:57 AM

Yeah that's what I meant, it was a late night (up for 36+ hours)(<working on
fixing that bad habit)
 but yeah sometimes I think that a little ale could make the hex make more
 sense well at least look better (^_^) probably a little less bulky too

-Dylan

0
Reply Dylan 7/31/2008 4:19:08 PM

18 Replies
102 Views

(page loaded in 0.232 seconds)


Reply: