There are two problems caused by separate compilation: linking, and relocation:
In addition to linking together all of the files you have assembled, gcc directs that your files be linked with certain predetermined libraries. What is a library? A library is a set of compiled object code that is set up for the linker to "pick and choose" from. When the linker finds unresolved external symbols in your code, it looks in the libraries to see if any symbols are defined there. If so, it copies the associated subroutine code into your program.
The assembler always assembles each file as if it started at memory location zero. When files are combined into one program, they are placed one-after-the other. So when two files are combined into one program, they can't both start at zero; one (at least) has to be changed since it will start after the other.
For linking and relocation purposes, there are THREE KINDS of symbols:
Source file +-----------------------+ | | | foo: save %sp... | Location: 10 <- | ... subroutine...| | | | | | | Distance = -490 | | | | main: | | | call foo | Location: 500 <- | | +-----------------------+In this case, foo's address can be calculated by the assembler, since it is in the same file as main (the assembler processes the whole file). In addition, the address used in the "call foo" operation doesn't need to be changed, since subroutine calls and branches are PC-relative. Recall that PC-relative means that instead of storing the address of the subroutine or branch target, instead the distance to the subroutine or branch target is stored. In this case, the argument to the call instruction would be the number -490. During execution, the processor adds this distance to the current PC to get the actual address to branch to. This means that a file can be moved around in memory without changing the addresses of subroutines or branch targets.
Note that in the file, after assembly, the symbol "foo" is completely gone.
Source file A +-----------------------+ | | Location: 0 | | | foo: .word 7 | Location: 10 | | | | | | | main: | | sethi %hi(foo), %l0 | | | +-----------------------+The assembler can calculate the address of foo, as before, but now foo is a 32-bit pointer whose value is based on the assumption that the file starts at memory location 0. During assembly, the argument to the sethi operation here is the number 10. However, when this file is linked with another, it will probably not start at 0:
File B +-----------------------+ | blah | Location: 0 | blah | | blah | | | | | | | | | | | | | +-----------------------+ Location: 1000 File A +-----------------------+ | | | foo: .word 7 | Location: 1010 | | | | | | | | | main: | | sethi %hi(foo), %l0 | | | +-----------------------+So during linking, the linker must change the value "10" that was originally assmebled as the argument of "sethi" to the value "1010". The basic operation the linker must do is add the new location of the file (here, 1000) to each Type 2 address.
Source file A +-----------------------+ | | | | | | | | | | | | | main: | | call printf | Location 200 | | +-----------------------+In this case, the assembler can't find printf anywhere in the source file. So, printf is added to the Unresolved References table, which is kept at the end of each object file:
Unresolved Reference At Location -------------------- --------------- printf 200This Unresolved Reference will be resolved by the linker.
Source file A +-----------------------+ | .global var1 | | var1 .word 3 | Location 16 | | | | | | | .global main | | main: | Location 196 | call printf | | | +-----------------------+ Global Symbol At Location ------------- ----------- main 196 var1 16
% size ci 78445 + 976 + 3280 = 82701 text + data + bss = total size in memoryIf unresolved references still exist, the linker does not make the program file executable, and signals an error.