f



object files, header files, source files?

I can't find any books describing the anatomy of C/C++ programs. What is 
an object file? Can it be generated from a header file?, what is 
linking?, why does a source file need an object file? etc.

I have looked in The C programming Language, The C++ Programming 
Language, Accelerated C++, GNU make etc but they don't describe these 
basic subject. Any literature that covers these subjects?
0
ddd4 (17)
3/30/2008 5:32:13 PM
comp.unix.programmer 10848 articles. 0 followers. kokososo56 (350) is leader. Post Follow

4 Replies
1302 Views

Similar Articles

[PageSpeed] 46

saneman wrote:
> I can't find any books describing the anatomy of C/C++ programs. What is 
> an object file? Can it be generated from a header file?, what is 
> linking?, why does a source file need an object file? etc.
> 
> I have looked in The C programming Language, The C++ Programming 
> Language, Accelerated C++, GNU make etc but they don't describe these 
> basic subject. Any literature that covers these subjects?

Wikipedia and Google are your friends.  I don't think there's any 
literature available because there's not much to tell about the subject. 
  The compiler translates the source files into machine code and stores 
it in object files and that's about it :P
0
realnc (202)
3/30/2008 5:48:22 PM
saneman <ddd@sdf.com> writes:
> I can't find any books describing the anatomy of C/C++ programs. What
> is an object file? Can it be generated from a header file?, what is
> linking?, why does a source file need an object file?

A CPU executes 'machine code', ie simple instructions from an
instruction set, encoded as sequence of integers. This sequence of
integers is usually generated from an 'assembly language input file'
by a program called assembler. 'Assembly language' is essentially the
same as machine code, only the instructions are represented by
so-called 'mnemonics', ie alphanumeric abbreviations intended to be
more easily understandable to humans than the machine code itself.

Below is an example of some sequence of ARMv5 assembly instructions:

    ldr     r2, [r3]		-- load register r2 with the contents of the
                           	memory address contained in register r3
                           
    mov     r0, #7  ; 0x7	-- load register r0 with the constant 7
    ldr     r2, [r2]		-- same as 1st, address in r2
    mov     r3, #1  		-- same as 2nd, load r3, constant 1
    sub     r0, r0, #1		-- subtract one from r0, store result in r0

It is possible to write complete programs in assembly, but because the
available instructions can only accomplish very simple tasks, it is
necessary to write a lot of text even to implement simple algorithms.
That's why so-called high-level languages are used, which can describe
algorithms both more concisely and more easily understandable than
assembly could. But the high-level language cannot be executed by
the CPU anymore. A special program called 'a compiler' is (on UNIX(*))
used to translate the high-level language source code into assembly
language, which can then be translated to machine code the CPU can
executed by the assembler.

The assembler generates objects code files, which contain the machine
code corresponding with the subroutines ('functions') defined by the
original high-level language source code and various tables specifying
the names of subroutines (or variables) and the addresses of their
'entry points', ie the location the CPU needs to jump to to start
executing the subroutine, (or the addresses of the memory locations
reserved for containing the value of a particular variable).

Multiple object code files can (and usually will) be combined to
generate an actually executable program. This the job of the linker
(or link editor), which reads a set of object files, assigns
(non-conflicting) addresses relative to the program start address
(platform-specific) to all entry points (variable addresses) contained
in all of the object files, modifies the machine code, perusing the name
tables, such that, eg each call of a particular subroutine contains the
newly calculated 'subroutine entry point address' and finally writes
the combined and suitably modified machine code contained in all of
the object code files to an output file, which is an exectuable program.

NB: This is simplified to the point of being (more than) occasionally
wrong and omits A LOT of important things one would usually expect on
a modern system (eg shared objects or even libraries).

In case you are interested in the actual details, a starting point
could be

	http://www.ibm.com/developerworks/power/library/pa-spec12/index.html
0
rweikusat (2830)
3/30/2008 6:24:55 PM
saneman wrote:
> I can't find any books describing the anatomy of C/C++ programs. What is 
> an object file? Can it be generated from a header file?, what is 
> linking?, why does a source file need an object file? etc.

A compilers book would probably cover that, although it would also likely
cover a whole lot of other stuff you might not be interested in.  :-)

The key to understanding a lot of this is that the compiler doesn't need
to process the entire program at once.  You can have a whole bunch of
different files with source code in them, and then you can process some
pieces of your source in one run of the compiler and some other pieces
later.

C and C++ both have the concept of a preprocessor, which takes a bunch
of files on disk and produces one big piece of text.  The single big
piece of text (think of it as a stream of characters) is passed along
to the compiler.  There are commands to insert the contents of another
file into the stream, and there are commands to leave out certain parts
of the stream, or make other changes to it, as it's created.  This piece
of text that is generated is called a "translation unit".

Anyway, the point is that you create this text, and you pass it to the
compiler, and the compiler processes it.  But the compiler can't just
write out an executable at this point because it only has part of the source
code as its input.  So it needs to write out a temporary, intermediate
file that contains its part of the final program.  This intermediate
file is, for whatever reason, called an object file.

Finally, when you've processed all the source code and produced object
files, you put the object files together into an executable.  There is
a little work to do at this point because functions call each other
using numeric addresses (or relative offsets), but a function in one
object file doesn't know at what position (or how far away) another
function is if it's in another file.  So there is a process of filling
in all those last-minute details.  Basically, the object files get
pasted together (essentially just concatenated consecutively), and
then some routine goes through an finds all the functions that need
other functions and writes into a table (or even just tweaks the code
in place) so that every function knows the offsets to every function
it calls (or every variable it references).

As an analogy, think of structure that is going to be assembled in
pieces off site and then put together when it's finally at its destination.
Each piece will be fabricated in a different factory, then shipped to
the final destination, and everything will be assembled into a whole,
probably by bolting together or welding.  In the C world, the
object files are the pieces that are built off site and shipped
to the destination, and the linker is the software that puts it
all together at the destination.

I've left out a description of header files up until now.  There
is really not much that's special about a header file.  It's just a
source file that is inserted into the stream by the preprocessor.
The reason an include file exists is that sometimes you want certain
information to be included in multiple different translation units.
For example, you might have a struct that is used all over the place.
You want the definition of the struct to be available to any piece
of code that needs it, so you put the text defining it into a header
file, and then any other file that needs that will #include the
header file.  The definition thus ends up in the translation unit,
and the compiler can see it while it is running.

Hope that helps.

   - Logan
0
lshaw-usenet (927)
3/30/2008 6:32:38 PM
>I can't find any books describing the anatomy of C/C++ programs. What is 
>an object file? Can it be generated from a header file?, what is 
>linking?, why does a source file need an object file? etc.

In many languages and implementations of them, a program consists
of one or more source files, which are text that programmers edit.
A source file may include headers, which are more text and tend to
include function declarations, data types and structures, global
variables (and for C and similar languages, macro definitions).
The same header is often included by multiple source files.  A
source file (and all the stuff it includes, like headers) is compiled
to produce an object file.  There's no guarantee whatever what's
*IN* an object file, but it's likely to include relocatable machine
code and symbols.

Source files generally don't include other source files, as a matter
of style and convention.  Occasionally there are exceptions for
things like source files generated by another program (e.g. yacc,
lex, or some program that generates data tables and spits out code
to initialize an array to them).  And it's possible to compile the
SAME source file with different compiler options repeatedly, then
link the resulting object files all together.  This isn't usually
done except when you have a whole lot of nearly-identical "glue"
routines, like user-mode interfaces to system calls or interfaces
between one language and another, with different names.

A library is a collecton of one (well, an empty library may be
possible but it's pretty useless) or more object files, possibly
in a different form.  You may or may not be able to get back the
original set of object files from the library.  System routines are
often provided in libraries for linking.  Non-open-source systems
may not provide corresponding source code.

Putting a bunch of object files together to produce an executable
file (or shared library) is called linking.  All of the object files
for a particular program are linked to each other and to system
libraries.  References in one file are matched with definitions in
another and they are connected.

"static" linking takes all of the object files needed, and perhaps parts
of the system libraries, and puts the code into an executable file.
The executable can run without libraries being present at run-time.

"dynamic" linking takes all of the object files needed, and references
to what's in shared libraries, and puts copies of the object files
into the executable file, and arranges that the executable finds the
shared libraries at runtime and uses them.  Multiple programs running
ideally use the same copy of the shared library (at least the read-only
parts, like code and constant data).  Copy-on-write allows sharing 
modifiable pages until the data is actually changed.

Common build procedure:

	For each source file, generate a corresponding object file:

	gcc -c foo.c
	produces foo.o

	Link the whole mess together (usually you use a specific list), 
	with a few libraries:

	gcc -o foo *.o -lcrypt -lm
	Typically dynamic linking is the default.
	This produces an executable foo.

This is a quick explanation.  There are lots of details not mentioned,
and plenty of compiler-specific details.

0
3/30/2008 9:05:38 PM
Reply: