首页 > 代码库 > 《CS:APP》 chapter 7 Linking 笔记

《CS:APP》 chapter 7 Linking 笔记

Linking


           Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed.



7.1 Compiler Drivers

          Most compilation systems provide acompiler driver that invokes the language preprocessor, compiler, assembler, and linker, as needed on behalf of the user.  


        The driver first runs the C preprocessor ( cpp ), which translates the C source file main.cinto an ASCII
intermediate file main.i:

这个很有意思

cpp [other arguments] main.c /tmp/main.i


Next, the driver runs the C compiler ( cc1 ), which translates main.iinto an ASCII assembly language file main.s.

cc1 /tmp/main.i main.c -O2 [other arguments] -o /tmp/main.s


Then, the driver runs the assembler (as), which translates main.sinto a relocatable object filemain.o:

as [other arguments] -o /tmp/main.o /tmp/main.s

The driver goes through the same process to generateswap.o. Finally, it runs the linker program ld, which combines main.oand swap.o, along with the necessary system object files, to create the executable object file p:

ld -o p [system object files and args] /tmp/main.o /tmp/swap.o






7.2 Static Linking


         Static linkers such as the Unix ld program take as input a collection of relocatable object files and command-line arguments and generate as output a fully linked executable object file that can be loaded and run. 
To build the executable, the linker must perform two main tasks:


 

Symbol resolution.Object files define and reference symbols . The purpose of symbol resolution is to associate each symbol reference with exactly one symbol definition.


Relocation. Compilers and assemblers generate code and data sections that start at address 0. The linkerrelocates these sections by associating a memory location with each symbol definition, and then modifying all of the references to those symbols so that they point to this memory location.






7.3 Object Files


Object files come in three forms:

                 Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.


                Executable object file.Contains binary code and data in a form that can be copied directly into memory and executed.


                Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.



Object file formats vary from system to system. 

           The Unix Executable and Linkable Format(ELF). Although our discussion will focus on ELF, the basic concepts are similar, regardless of the particular format.





7.4 Relocatable Object Files

           Figure 7.3 shows the format of a typical ELF relocatable object file. The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file. This includes the size of the ELF header, the object file type





Linux下使用 readelf 命令对ELF格式的文件进行信息读取

typedef struct {
int name; /* String table offset */
int value; /* Section offset, or VM address */
int size; /* Object size in bytes */
char type:4, /* Data, func, section, or src file name (4 bits) */
binding:4; /* Local or global (4 bits) */
char reserved; /* Unused */
char section; /* Section header index, ABS, UNDEF, */
/* Or COMMON */
} Elf_Symbol;


对于hello world 程序的elf读取信息
There are 30 section headers, starting at offset 0x1178:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000400298  00000298
       000000000000001c  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000004002b8  000002b8
       0000000000000060  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000400318  00000318
       000000000000003d  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           0000000000400356  00000356
       0000000000000008  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000400360  00000360
       0000000000000020  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             0000000000400380  00000380
       0000000000000018  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000400398  00000398
       0000000000000048  0000000000000018   A       5    12     8
  [11] .init             PROGBITS         00000000004003e0  000003e0
       000000000000001a  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         0000000000400400  00000400
       0000000000000040  0000000000000010  AX       0     0     16
  [13] .text             PROGBITS         0000000000400440  00000440
       00000000000001a4  0000000000000000  AX       0     0     16
  [14] .fini             PROGBITS         00000000004005e4  000005e4
       0000000000000009  0000000000000000  AX       0     0     4
  [15] .rodata           PROGBITS         00000000004005f0  000005f0
       0000000000000011  0000000000000000   A       0     0     4
  [16] .eh_frame_hdr     PROGBITS         0000000000400604  00000604
       0000000000000034  0000000000000000   A       0     0     4
 [17] .eh_frame         PROGBITS         0000000000400638  00000638
       00000000000000d4  0000000000000000   A       0     0     8
  [18] .init_array       INIT_ARRAY       0000000000600e10  00000e10
       0000000000000008  0000000000000000  WA       0     0     8
  [19] .fini_array       FINI_ARRAY       0000000000600e18  00000e18
       0000000000000008  0000000000000000  WA       0     0     8
  [20] .jcr              PROGBITS         0000000000600e20  00000e20
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .dynamic          DYNAMIC          0000000000600e28  00000e28
       00000000000001d0  0000000000000010  WA       6     0     8
  [22] .got              PROGBITS         0000000000600ff8  00000ff8
       0000000000000008  0000000000000008  WA       0     0     8
  [23] .got.plt          PROGBITS         0000000000601000  00001000
       0000000000000030  0000000000000008  WA       0     0     8
  [24] .data             PROGBITS         0000000000601030  00001030
       0000000000000010  0000000000000000  WA       0     0     8
  [25] .bss              NOBITS           0000000000601040  00001040
       0000000000000008  0000000000000000  WA       0     0     4
  [26] .comment          PROGBITS         0000000000000000  00001040
       000000000000002a  0000000000000001  MS       0     0     1
  [27] .shstrtab         STRTAB           0000000000000000  0000106a
       0000000000000108  0000000000000000           0     0     1
  [28] .symtab           SYMTAB           0000000000000000  000018f8
       0000000000000618  0000000000000018          29    45     8
  [29] .strtab           STRTAB           0000000000000000  00001f10
       0000000000000236  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)




.data: Initialized global C variables. Local C variables are maintained at run time on the stack, and do not appear in either the .data or .bsssections.

.bss: Uninitialized global C variables. This section occupies no actual space in the object file; it is merely a place holder. Object file formats distin-guish between initialized and uninitialized variables for space efficiency: uninitialized variables do not have to occupy any actual disk space in the object file.

就贴这两个的说明,其他段看书或者wiki吧









7.5 Symbols and Symbol Tables


                   Each relocatable object module, m, has a symbol table that contains information about the symbols that are defined and referenced by m. In the context of a linker, there are three different kinds of symbols:



                  Global symbols that are defined by module m and that can be referenced by other modules. Global linker symbols correspond tononstatic C functions and global variables that are defined withoutthe C staticattribute.

                  Global symbols that are referenced by modulem but defined by some other module. Such symbols are called externals and correspond to C functions and variables that are defined in other modules.

                  Local symbolsthat are defined and referenced exclusively by module m. Some local linker symbols correspond to C functions and global variables that are defined with the staticattribute. These symbols are visible anywhere within modulem, but cannot be referenced by other modules. The sections in an object file and the name of the source file that corresponds to module m also get local symbols.



                It is important to realize that local linker symbols are not the same as local program variables. The symbol table in .symtab does not contain any symbols that correspond to local nonstatic program variables.





7.6 Symbol Resolution


            When the compiler encounters a symbol (either a variable or function name) that is not defined in the current module, it assumes that it is defined in some other module, gener-ates a linker symbol table entry, and leaves it for the linker to handle.



7.6.1 How Linkers Resolve Multiply Defined Global Symbols


          Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.



Rule 1: Multiple strong symbols are not allowed.

Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol.

Rule 3
: Given multiple weak symbols, choose any of the weak symbols.


这里仅仅给出主要的判断依据Rule,具体的demo书上讲的很好,还有跟着的习题都有。不一一贴出来了。


7.6.2 Linking with Static Libraries


                 In practice, all compilation systems provide a mechanism for packaging related object modules into a single file called a static library



                  A big disadvantage is that every executable file in a system would now contain a complete copy of the collection of standard functions, which would be extremely wasteful of disk space.


                 Another big disadvantage is that any change to any standard function, no matter how small, would require the library developer to recompile the entire source file, a time-consuming operation that would complicate the development and maintenance
of the standard functions.









                Figure 7.7 summarizes the activity of the linker. The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without any further linking at load time.




7.7 Relocation

 
 
              Relocating sections and symbol definitions.In this step, the linker merges all sections of the same type into a new aggregate section of the same type.
 
              Relocating symbol references within sections.In this step, the linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses.


7.7.1 Relocation Entries


              When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory. Nor does it know the locations of any externally defined functions or global variables that are referenced by the module. So whenever the assembler encounters a reference to an object whose ultimate location is unknown

 

1 typedef struct {
2 int offset; /* Offset of the reference to relocate */
3 int symbol:24, /* Symbol the reference should point to */
4 type:8; /* Relocation type */
5 } Elf32_Rel;








7.9 Loading Executable Object Files

              To run an executable object file p, we can type its name to the Unix shell’s command line:
unix> ./p




用户空间程序究竟怎么开始的,怎么结束的:


              When the loader runs, it creates the memory image shown in Figure 7.13. Guided by the segment header table in the executable, it copies chunks of the executable into the code and data segments. Next, the loader jumps to the pro-gram’s entry point, which is always the address of the _start symbol. The startup codeat the _start address is defined in the object file crt1.oand is the same for all C programs. Figure 7.14 shows the specific sequence of calls in the startup code. After calling initialization routines from the .text and .init sections, the
startup code calls theatexitroutine, which appends a list of routines that should be called when the application terminates normally. The exitfunction runs the functions registered by atexit, and then returns control to the operating system by calling _exit . Next, the startup code calls the application’s mainroutine, which begins executing our C code. After the application returns, the startup code calls the _exit routine, which returns control to the operating system



 





7.12 Position-Independent Code (PIC)


           A key purpose of shared libraries is to allow multiple running processes to share the same library code in memory and thus save precious memory resources.