One 、 An introduction to

Every linking process is done by a linking script (linker script, General with lds As the suffix of the file ) control .  Link scripts are mainly used to specify how to put the section Put it in the output file , And control the layout of each part of the output file in the program address space . But you can also do other things with the connect command .

The connector has a default built-in connection script , You can use ld –verbose see . Connection options -r and -N Can affect the default connection script ( How to influence ?).

-T Option to specify your own link script , It will replace the default connection script . You can also use to add custom link commands .

There is no special explanation below , Connectors are static connectors .

 

Two 、 Basic concepts

Linker combines one or more input files into one output file .

Input file : Target file or link script file .

The output file : Object file or executable file .

Target file ( Include executable ) Has a fixed format , stay UNIX or GNU/Linux Under the platform , It's usually ELF Format

Sometimes put the input file in section It's called input section(input section), Put... In the output file section It's called output section(output sectin).

Each of the target files section At least two pieces of information :  Name and size . Most of the section It also contains a piece of data associated with it , be called section contents(section Content ). One section Can be marked as “loadable( Loadable )” or “allocatable( Distributable )”.

loadable section:  When the output file runs , Corresponding section The content will be loaded into the process address space .

allocatable section:  The content is empty section Can be marked as “ Distributable ”.  When the output file runs , The size of the free space in the process address space is the same as section The part of the specified size . In some cases , This memory has to be set to zero .

If one section No “ Loadable ” or “ Distributable ”, Then the section Usually contains debugging information . You can use objdump -h Command to view related information .

Every “ Loadable ” or “ Distributable ” Output section It usually contains two addresses : VMA(virtual memory address Virtual memory address or program address space address ) and LMA(load memory address Load memory address or process address space address ).  Usually VMA and LMA It's the same .

In the target file , loadable or allocatable Output section There are two addresses : VMA(virtual Memory Address) and LMA(Load Memory Address). VMA When executing the output file section Address , and LMA When loading the output file section Address . generally speaking , some section Of VMA == LMA. But in embedded systems , There is often a difference between the loading address and the execution address : For example, load the output file into the development board flash in ( from LMA Appoint ), And at run time it will be in flash Copy the output file in to SDRAM in ( from VMA Appoint ).

But to understand it this way VMA and LMA, hypothesis :

(1) .data section Corresponding VMA The address is 0×08050000, The section It contains 3 individual 32 Bit global variables , i、j and k, Respectively 1,2,3.

(2) .text section It consists of ”printf( “j=%d “, j );” Code generated by program fragments .

Specify... When connecting .data section Of VMA by 0×08050000, Produced printf The instruction is to set the address to 0×08050004 Situated 4 The byte content is printed as an integer .

If .data section Of LMA by 0×08050000, The obvious result is j=2

If .data section Of LMA by 0×08050004, The obvious result is j=1

It can be understood in this way LMA:

.text section There are two instructions at the beginning of the content (intel i386 Instruction is 10 byte , Each line corresponds to 5 byte ):

jmp 0×08048285

movl $0×1,%eax

If .text section Of LMA by 0×08048280, So in the process address space 0×08048280 Place for “jmp 0×08048285” Instructions , 0×08048285 Place for movl $0×1,%eax Instructions . Suppose an instruction jumps to an address 0×08048280, Obviously, its implementation will lead to %eax The register is assigned to 1.

If .text section Of LMA by 0×08048285, So in the process address space 0×08048285 Place for “jmp 0×08048285” Instructions , 0×0804828a Place for movl $0×1,%eax Instructions . Suppose an instruction jumps to an address 0×08048285, Obviously, its execution jumps into the process address space 0×08048285 It's about , Create a dead cycle .

Symbol (symbol): Each target file has a symbol table (SYMBOL TABLE), Contains defined symbols ( Corresponding to global variables and static The names of variables and defined functions ) And undefined symbols ( Undefined function name and reference, but undefined symbol ) Information .

Symbol value : Each symbol corresponds to an address , That is, symbolic values ( This is related to c The values of variables in the program are different , In some cases, you can think of it as the address of a variable ). You can use nm Command to see them . (nm Please refer to this blog Of GNU binutils note )

 

3、 ... and 、 Script format

The link script consists of a series of commands , Each command consists of a keyword ( Generally, it is followed by relevant parameters ) The assignment of a statement or pair of symbols . The order is given by a semicolon ‘;’ Separate .

If the file name or format name contains a semicolon ’;' Or some other separator , Use quotation marks ‘”’ Quote the full name . Cannot process quoted file names .

/* */ In between are notes .

 

Four 、 A simple example

Before introducing the command to link description files , Let's take a look at a simple example :

The following script will output the text section Locate in 0×10000, data section Locate in 0×8000000:

SECTIONS

{

. = 0×10000;

.text : { *(.text) }

. = 0×8000000;

.data : { *(.data) }

.bss : { *(.bss) }

}

Explain the above example :

. = 0×10000 : Set the locator symbol to 0×10000 ( If not specified , Then the initial value of the symbol is 0).

.text : { *(.text) } : Will all (* The symbol represents any input file ) Input file .text section Merge into one .text section, The section The address of is specified by the value of the locator symbol , namely 0×10000.

. = 0×8000000 : Set the locator symbol to 0×8000000

.data : { *(.data) } : All input files will be .data section Merge into one .data section, The section The address of is set to 0×8000000.

.bss : { *(.bss) } : All input files will be .bss section Merge into one .bss section, The section The address of is set to 0×8000000+.data section Size .

Every time the connector reads one section After description , Set the value of the locator symbol * increase * The section Size . Be careful : Alignment constraints are not considered here .

 

5、 ... and 、 Simple script commands

ENTRY(SYMBOL) : Will symbol SYMBOL Set the value of to the entry address .

Entrance address (entry point) It refers to the address of the first user space instruction executed by the process in the process address space

ld There are many ways to set the process entry address , Click on the order : ( The number goes forward , The higher the priority )

1, ld Command line -e Options

2, Connection script ENTRY(SYMBOL) command

3, If you define start Symbol , Use start Symbol value

4, If there is .text section, Use .text section The position value of the first byte of

5, Using value 0

INCLUDE filename : Contains other names called filename Link script for

amount to c Within the program #include Instructions , To include another link script .

The script search path is created by -L Option assignment . INCLUDE Instructions can be nested , Maximum depth is 10. namely : file 1 Inside INCLUDE file 2, file 2 Inside INCLUDE file 3… , file 10 Inside INCLUDE file 11. Then the file 11 There can be no more INCLUDE Instructions .

INPUT(files): Use the file in brackets as the input file for the linking process

ld First look for the file in the current directory , If not , It is caused by -L Search under the specified search path . file It can be for -lfile form , It's like the command line -l Options as . If the command appears in an implied script , In this order file The order in the linking process is determined by the order of the implied script on the command line .

GROUP(files) : Specify multiple input files that need to repeatedly search for symbol definitions

file Must be a library file , And file The files as a group are ld Repeat scan , Until no new undefined references appear .

OUTPUT(FILENAME) : Define the name of the output file

Same as ld Of -o Options , however -o Options have a higher priority . So it can be used to define the default output file name . Such as a.out

SEARCH_DIR(PATH) : Define the search path ,

Same as ld Of -L Options , But by -L The specified path is searched prior to the path it defines .

STARTUP(filename) : Appoint filename For the first input file

In the process of linking , Each input file is sequential . This command sets the file filename For the first input file .

OUTPUT_FORMAT(BFDNAME) : Set the output file to use BFD Format

Same as ld Options -o format BFDNAME, however ld Options have higher priority .

OUTPUT_FORMAT(DEFAULT,BIG,LITTLE) : Define three output file formats ( Large and small end )

If you have a command line option -EB, Then use the 2 individual BFD Format ; If you have a command line option -EL, Then use the 3 individual BFD Format . Otherwise, the first one will be chosen by default BFD Format .

TARGET(BFDNAME): Set the input file's BFD Format

Same as ld Options -b BFDNAME. If used TARGET command , But not used OUTPUT_FORMAT command , The most popular one is TARGET Command settings BFD The format will be used as BFD Format .

ASSERT(EXP, MESSAGE): If EXP Not true , Terminate the connection process

EXTERN(SYMBOL SYMBOL …): Add undefined symbols to the output file , Like connector options -u

FORCE_COMMON_ALLOCATION: by common symbol( Common symbols ) Allocate space , That is to use -r Connection options are also assigned to it

NOCRO***EFS(SECTION SECTION …): Check the output listed section, If you find that there's a mutual reference between them , False report . For some systems , Especially for the embedded system with limited memory , some section It can't be in memory at the same time , So they can't quote each other .

OUTPUT_ARCH(BFDARCH): Set the output file machine architecture( Architecture ),BFDARCH For the quilt BFD One of the names used by the library . You can use the command objdump -f see .

It can be done by  man -S 1 ld see ld Online help for , It also includes an introduction to these commands .

 

6、 ... and 、 The assignment of symbols

Symbols defined in the target file can be assigned in the linked script . ( Pay attention to and C Different assignments in languages !) At this point, the symbol is defined as global . Each symbol corresponds to an address ,  The assignment here is to change the address of the symbol .

give an example . Look at the variables through the following program a The address of :

a.c file

/* a.c */

#include <stdio.h>

int a = 100;

int main()

{

printf( "&a=%p\n", &a );

return 0;

}

a.lds file

/* a.lds */

a = 3;

Compile command :

$ gcc -Wall -o a-without-lds.exe a.c

Running results :

&a = 0×601020

Compile command :

$ gcc -Wall -o a-with-lds.exe a.c a.lds

Running results :

&a = 0×3

Be careful : Assignments to symbols work only on global variables !

For some simple assignment statements , We can use any c The assignment operation of language grammar :

SYMBOL = EXPRESSION ;

SYMBOL += EXPRESSION ;

SYMBOL -= EXPRESSION ;

SYMBOL *= EXPRESSION ;

SYMBOL /= EXPRESSION ;

SYMBOL >= EXPRESSION ;

SYMBOL &= EXPRESSION ;

SYMBOL |= EXPRESSION ;

Except for the first kind of expression , Using other expressions requires SYMBOL Has been defined in the source code of an object file .

.  It's a special symbol , It's a locator , A position pointer , Point to a location in the program address space ( Or some section The deviation inside , If it's in SECTIONS Some one in the order section In the description ), The symbol can only be used in SECTIONS Use within the command .

Be careful : The assignment statement contains 4 A grammar element : Symbol name 、 The operator 、 expression 、 A semicolon ; No less .

After being assigned , The symbol belongs to section Is set to an expression EXPRESSION Of SECTION( see 11. Expressions in scripts )

The assignment statement can appear in three places in the connection script :SECTIONS In command ,SECTIONS In the order section Describe internal and global locations .

Example 1:

floating_point = 0; /* Global position */

SECTIONS

{

.text :

{

*(.text)

_etext = .; /* section In the description */

}

_bdata = (. + 3) & ~ 4; /* SECTIONS In command */

.data : { *(.data) }

}

PROVIDE keyword

This keyword is used to define such symbols : Referenced in the target file , But there are no symbols defined in any object file .

Example 2:

SECTIONS

{

.text :

{

*(.text)

_etext = .;

PROVIDE(etext = .);

}

}

here , When the object file refers to etext Symbol , When you don't define it ,etext The address corresponding to the symbol is defined as .text section The address of the first byte after .

 

7、 ... and 、 SECTIONS command

SECTIONS Command tell ld How to put the input file sections Map to each... Of the output file section: How to input section Combined into output section; How to put the output section Put it in the program address space (VMA) And process address space (LMA).

The command format is as follows :

SECTIONS

{

SECTIONS-COMMAND

SECTIONS-COMMAND

}

SECTION-COMMAND There are four kinds. :

(1) ENTRY command

(2)  Sign assignment statements

(3)  An output section Description of (output section description)

(4)  One section Overlay description (overlay description)

If there is no SECTIONS command , that ld Enter all the same names into section Synthesis into an output section Inside , Each input section The order in which they are discovered by the connector . If an input section Not in the SECTIONS The order says , Then the section Will be copied directly to the output section.

 

7.1、 Output section describe ( basic )

Output section The description has the following format :

SECTION-NAME [ADDRESS] [(TYPE)] : [AT(LMA)]

{

OUTPUT-SECTION-COMMAND

OUTPUT-SECTION-COMMAND

} [>REGION] [AT>LMA_REGION] [:PHDR HDR ...] [=FILLEXP]

[ ] The content in is optional , In general, you don't need .

SECTION-NAME:section name .SECTION-NAME Left and right blanks 、 parentheses 、 Colons are required , Line breaks and other spaces are optional .

 

7.1.1、 Output section name

Output section name SECTION-NAME Must meet the output file format requirements , such as :a.out The format of the file is only allowed to exist .text、.data and .bss section name . And some formats only allow number names , Then you should use quotation marks to put all the numbers in the name together ; in addition , There are also formats that allow any sequence of characters to exist in section In the name , In this case, if the name contains special characters ( Such as the blank space 、 Comma, etc ), So you need to put them together in quotation marks .

 

7.1.2、 Output section Address

Output section Address [ADDRESS] It's an expression , Its value is used to set VMA. If you don't have this option and you have REGION Options , Then the connector will be based on REGION Set up VMA; If not REGION Options , Then the connector will be based on the positioning symbol ‘.’ Set the value of section Of VMA, Adjust the value of the positioning symbol to meet the output section The value after the alignment requirement , This is the output section The alignment requirement of is : This output section Describe all the inputs used in section The most stringent alignment requirement of all .

Example :

.text . : { *(.text) } and .text : { *(.text) }

The two descriptions are quite different , The first will .text section Of VMA Set to the value of the positioning symbol , And the second one is the adjustment value set as the positioning symbol , After meeting the alignment requirements .

ADDRESS It can be an arbitrary expression , such as ,ALIGN(0×10) This will put the section Of VMA Set as the adjustment value of the positioning symbol , Satisfy 16 Byte aligned .

Be careful : Set up ADDRESS value , The value of the anchor symbol will be changed .

 

7.1.3、 Output section describe

Output section describe OUTPUT-SECTION-COMMAND It's one of four :

(1). Sign assignment statements

(2). Input section describe

(3). Directly contained data values

(4). Some special outputs section keyword

 

7.1.3.1、 Sign assignment language

The symbolic assignment statement is already in 《Linux Under the lds Link script Foundation ( One )》 I've introduced , I don't want to talk about it here .

 

7.1.3.2、 Input section describe :

The most common output section Description command is input section describe .

Input section Describe basic grammar :

FILENAME([EXCLUDE_FILE (FILENAME1 FILENAME2 ...) SECTION1 SECTION2 ...)

FILENAME file name , It can be the name of a specific file , It can also be a string pattern .

SECTION name , It could be a specific one section name , It can also be a string pattern

Examples are the most telling ,

*(.text) : Representing all input files .text section

(*(EXCLUDE_FILE (*crtend.o *otherfile.o) .ctors)) : Express Division crtend.o、otherfile.o All input files except file .ctors section.

data.o(.data) : Express data.o Of documents .data section

data.o : Express data.o All of the documents section

*(.text .data) : Representing all files .text section and .data section, The order is : The first one is .text section, The first one is .data section, The second file is .text section, The second file is .data section,...

*(.text) *(.data) : Representing all files .text section and .data section, The order is : The first one is .text section, The second file is .text section,..., The last file is .text section, The first one is .data section, The second file is .data section,..., The last file is .data section

Let's see how the connector finds the corresponding file .

When FILENAME Is a specific file name , The connector looks to see if it appears in the connection command line or in INPUT In the command .

When FILENAME Is a string pattern , The connector just looks to see if it appears on the connection command line .

Be careful : If the connector finds a file in INPUT In the command , Then it will -L Search the specified path for the file .

The following wildcards can exist within the string pattern :

* : Represents any number of characters

? : Represents any character

[CHARS] : Means any one of CHARS In the character , You can use - The sign indicates the range , Such as :a-z

: Represents a reference to the next following character

In the file name , The wildcard does not match the folder separator /, But when the string pattern contains only wildcards * Except when .

Any... Of any file section Only in SECTIONS Once in the command .

Here's an example

SECTIONS {

.data : { *(.data) }

.data1 : { data.o(.data) }

}

data.o Of documents .data section At the first OUTPUT-SECTION-COMMAND The command was used , So in the second OUTPUT-SECTION-COMMAND Command will no longer be used , That is to say, even if the connector does not report an error , Of the output file .data1 section The content of is also empty .

Again : The connector scans each in turn OUTPUT-SECTION-COMMAND The file name in the command , Any one of any file section Can only be used once .

Readers can use -M Connect command options to produce a map file , It contains all the input section To output section The combination of information .

Let's take another example ,

SECTIONS {

.text : { *(.text) }

.DATA : { [A-Z]*(.data) }

.data : { *(.data) }

.bss : { *(.bss) }

}

This example shows that , Input of all files .text section Make up the output .text section; All files that start with capital letters .data section Make up the output .DATA section, Other documents .data section Make up the output .data section; Input of all files .bss section Make up the output .bss section.

It can be used SORT() Keyword incrementally sorts all names that satisfy the string pattern , Such as SORT(.text*).

 

Common symbols (common symbol) The input of section

In many object file formats , Universal symbols don't take up one section. I think : All the common symbols in the input file are called COMMON Of section Inside .

Example ,

.bss { *(.bss) *(COMMON) }

This example puts all the common symbols of all the input files into the output .bss section Inside . You can see COMMOM section How to use it is the same as other section The way to use it is the same .

Some object file formats divide common symbols into several categories . for example , stay MIPS elf In the target file format , Divide common symbols into standard common symbols( Standard general symbols ) and small common symbols( Micro universal symbol , I don't know if this translation is right ?), At this point, the connector thinks that all standard common symbols stay COMMON section Inside , and small common symbols stay .scommon section Inside .

As you can see in some previous connection scripts [COMMON], amount to *(COMMON), It is not recommended to continue to use this old way .

 

Input section And garbage collection

In the command line, you use the option –gc-sections after , The connector may take something that it doesn't think is useful section To filter out , At this point, it is necessary to force the connector to retain some specific section, You can use KEEP() Key words achieve this purpose . Such as KEEP(*(.text)) or KEEP(SORT(*)(.text))

Finally, let's look at a simple input section Related examples :

SECTIONS {

outputa 0×10000 :

{

all.o

foo.o (.input1)

}

outputb :

{

foo.o (.input2)

foo1.o (.input1)

}

outputc :

{

*(.input1)

*(.input2)

}

}

In this case , take all.o All of the documents section and foo.o All of the documents ( There can be more than one file with the same name section).input1 section Put in the output in turn outputasection Inside , The section Of VMA yes 0×10000; take foo.o All of the documents .input2 section and foo1.o All of the documents .input1 section Put in the output in turn outputb section Inside , The section Of VMA Is the revision value of the current locator symbol ( After alignment ); Send other files ( Not all.o、foo.o、foo1.o) Of documents . input1section and .input2 section Put in the output outputc section Inside .

 

7.1.3.3、 Include data values directly

Can be displayed in the output section Fill in the information you want to fill in ( Is it possible to write programs by connecting scripts ? Simple procedures, of course ).

BYTE(EXPRESSION) 1 byte

SHORT(EXPRESSION) 2 byte

LOGN(EXPRESSION) 4 byte

QUAD(EXPRESSION) 8 byte

SQUAD(EXPRESSION) 64 Bit processor code ,8 byte

The byte order of the output file big endianness or little endianness, It can be determined by the format of the output target file ; If the format of the output target file cannot determine the byte order , Then the byte order is the same as that of the first input file .

Such as :BYTE(1)、LANG(addr).

Be careful , These commands can only be placed in the output section In the description , Not anywhere else .

error :SECTIONS { .text : { *(.text) } LONG(1) .data : { *(.data) } }

correct :SECTIONS { .text : { *(.text) LONG(1) } .data : { *(.data) } }

In the current output section There may be an undescribed storage area in ( For example, gaps due to alignment ), It can be used FILL(EXPRESSION) Commands determine the contents of these storage areas , EXPRESSION The first two bytes of are valid , These two bytes can be reused when necessary to fill such storage areas . Such as FILE(0×9090). At output section There can be... In the description =FILEEXP attribute , It works like FILE() command , however FILE The command only works on the FILE After the order section Area , and =FILEEXP Attributes act on the entire output section Area , And FILE Commands have a higher priority !!!

 

7.1.3.4、 Special output section keyword

At output section describe OUTPUT-SECTION-COMMAND Some special outputs can also be used in section keyword .

CREATE_OBJECT_SYMBOLS : Create a symbol for each input file , The symbol name is the name of the input file . Where each symbol is located section Is the keyword that appears section.

CONSTRUCTORS : And c++ Internal ( Of global objects ) Constructors and ( The whole picture is ) Destructor related , They are referred to as global construction and global deconstruction .

about a.out Target file format , Connectors are implemented in some unusual ways c++ Global construction and global deconstruction of .

When the target file format generated by the connector does not support arbitrary section By name , for instance ECOFF、XCOFF Format , Connectors will identify global constructs and global destructs by name , For these file formats , The connector puts the information related to global construction and global deconstruction into the display CONSTRUCTORS Keyword output section Inside .

Symbol __CTORS_LIST__ Represents the beginning of global construction information ,__CTORS_END__ Represents the end of global construction information .

Symbol __DTORS_LIST__ Represents the beginning of global construction information ,__DTORS_END__ Represents the end of global construction information .

The beginning of these two pieces of information is a word long message , Indicates how many items of data there are in the block information , Then it ends with one word data with a value of zero .

Generally speaking ,GNU C++ In function __main Arrange the running of the global construction code in , and __main Function is initialized with code ( stay main Execute before function call ) call . This is not true for some target file formats ???

For supporting arbitrary section The target file format of the name , such as COFF、ELF Format ,GNU C++ Put the global construction and global deconstruction information into .ctors section and .dtors section Inside , Then add the following in the connection script ,

__CTOR_LIST__ = .;

LONG((__CTOR_END__ – __CTOR_LIST__) / 4 – 2)

*(.ctors)

LONG(0)

__CTOR_END__ = .;

__DTOR_LIST__ = .;

LONG((__DTOR_END__ – __DTOR_LIST__) / 4 – 2)

*(.dtors)

LONG(0)

__DTOR_END__ = .;

If you use GNU C++ Initialization priority support provided ( It controls the order in which each global constructor is called ), So please put... In the connection script CONSTRUCTORS Replace with SORT (CONSTRUCTS), hold *(.ctors) Switch to *(SORT(.ctors)), hold *(.dtors) Switch to *(SORT(.dtors)). Generally speaking , The default connection script does the job .

Modify the locator

We can match the locator with . Make an assignment to modify the value of the locator .

Example

SECTIONS

{

. = SIZEOF_HEADERS;

.text : { *(.text) }

. = 0×10000;

.data : { *(.data) }

. = 0×8000000;

.bss : { *(.bss) }

}

Output section Discarded by

about .foo: { *(.foo) }, If no input file contains .foo section, Then the connector will not be created .foo Output section. But if at these outputs section The description contains non input section Describe the command ( Such as symbolic assignment statement ), Then the connector will always create the output section.

in addition , There is a special output section, be known as /DISCARD/, By the section Any input referenced section Will not appear in the output file , This is it. DISCARD What do you mean . If /DISCARD/ section Quoted by itself ? think about it .

 

7.2、 Output section describe ( Advanced )

Let's review the following output section The grammar of description :

SECTION-NAME [ADDRESS] [(TYPE)] : [AT(LMA)]

{

OUTPUT-SECTION-COMMAND

OUTPUT-SECTION-COMMAND

} [>REGION] [AT>LMA_REGION] [:PHDR HDR ...] [=FILLEXP]

We introduced SECTION、ADDRESS、OUTPUT-SECTION-COMMAND Related information , Next we'll look at other properties .

 

7.2.1、 Output section The type of

Can pass [(TYPE)] Set output section The type of . If not specified TYPE type , So the connector depends on the output section Reference input section Set the output section The type of . It can be of the following five values ,

NOLOAD : The section When the program is running , Not loaded into memory .

DSECT,COPY,INFO,OVERLAY : These types are rarely used , It's reserved for backward compatibility . This type of section Must be marked as “ Not loadable ”, So that when programs are running, they are not allocated memory .

What is the default value ?Puzzle!

 

7.2.2、 Output section Of LMA 

By default ,LMA be equal to VMA, But it can go through [AT(LMA)] term , The key word AT() Appoint LMA.

With keywords AT() Appoint , The parentheses contain the expression , The value of the expression is used to set LMA. If not AT() keyword , So it's available AT>LMA_REGION The expression setting specifies that section Load the range of addresses . This property is mainly used for components ROM It's like .

Example ,

SECTIONS

{

.text 0×1000 : {_etext = . ;*(.text);  }

.mdata 0×2000 :

AT ( ADDR (.text) + SIZEOF (.text) )

{ _data = . ; *(.data); _edata = . ; }

.bss 0×3000 :

{ _bstart = . ; *(.bss) *(COMMON) ; _bend = . ;}

}

The procedure is as follows ,

extern char _etext, _data, _edata, _bstart, _bend;

char *src = &_etext;

char *dst = &_data;

/* ROM has data at end of text; copy it. */

while (dst rom }

 

7.2.3、 Set output section The program segment in which it is located

Can pass [:PHDR HDR ...] Item will output section Put in a predefined program segment (program segment) Inside . If an output section Set one or more program segments where it is located , So the next output is defined section The default program segment of is the same as the output section In the same . Unless explicitly specified again . Example ,

PHDRS { text PT_LOAD ; }

SECTIONS { .text : { *(.text) } :text }

Can pass :NONE Specifies that the connector does not section In any program segment . Please check the details PHDRS command

 

7.2.4、 Set output section Fill template for

This was mentioned earlier , Any output section Describes an unspecified memory area in the , The connector fills the area with the template . We can go through [=FILLEXP] Item set fill value . usage :=FILEEXP, The first two bytes are valid , When the region is larger than two bytes , Reuse these two bytes to fill it up . Example ,

SECTIONS { .text : { *(.text) } =0×9090 }

 

7.3、 Overlay (overlay) describe

The overlay graph describes how to make two or more different section Occupy the same program address space . The overlay management code is responsible for section Copy in and copy out of . Consider the situation , When the access speed of a storage block is faster than that of other storage blocks , So if the section Copy to the memory block to execute or access , Then the speed will be improved , Overlay description is very suitable for this situation . The grammar is as follows ,

SECTIONS {

OVERLAY [START] : [NOCRO***EFS] [AT ( LDADDR )]

{

SECNAME1

{

OUTPUT-SECTION-COMMAND

OUTPUT-SECTION-COMMAND

} [:PHDR...] [=FILL]

SECNAME2

{

OUTPUT-SECTION-COMMAND

OUTPUT-SECTION-COMMAND

} [:PHDR...] [=FILL]

} [>REGION] [:PHDR...] [=FILL]

}

It can be seen from the above grammar that , In the same overlay section Have the same VMA. here VMA from [START]  decision .SECNAME2 Of LMA by SECTNAME1 Of LMA add SECNAME1 Size , In the same way SECNAME2,3,4… Of LMA.SECNAME1 Of LMA from LDADDR decision , If it is not specified , So by the START decision , If it's not specified either , So it depends on the value of the current positioning symbol .

NOCRO***EFS Key words describe each section No cross references between , Otherwise, the report will be wrong .

about OVERLAY Each of the descriptions section, The connector will define two symbols __load_start_SECNAME and __load_stop_SECNAME, The values of these two symbols represent SECNAME section Of LMA The beginning and end of the address .

The connector is finished OVERLAY After describing the statement , Add the value of the positioning symbol to all overlay graphs section The maximum size .

Example :

SECTIONS{

OVERLAY 0×1000 : AT (0×4000)

{

.text0 { o1/*.o(.text) }

.text1 { o2/*.o(.text) }

}

}

.text0 section and .text1 section Of VMA The address is 0×1000,.text0 section Loaded in the address 0×4000,.text1 section Just behind .

Program code , Copy .text1 section Code ,

extern char __load_start_text1, __load_stop_text1;

memcpy ((char *) 0×1000, &__load_start_text1,&__load_stop_text1 – &__load_start_text1);

 

8、 ... and 、  Memory area command

By default , The connector can be section Allocate any storage area in the program address space . And through the output section Description of the > REGION Property to display the output section Limited to a storage area within the program address space , When the size of the storage area cannot meet the requirements , The connector will report the error .

You can also use it MEMORY The order gave way to SECTIONS In command * not * Refer to the selection Allocated in a storage area within the program address space .

Be careful : The following storage area refers to... In the program address space .

MEMORY The grammar of the command is as follows ,

MEMORY {

NAME1 [(ATTR)] : ORIGIN = ORIGIN1, LENGTH = LEN1

NAME2 [(ATTR)] : ORIGIN = ORIGIN2, LENGTH = LEN2

}

NAME : The name of the storage area , This name can be associated with a symbolic name 、 file name 、section The first name is repeated , Because it's in a separate namespace .

ATTR : Define the properties of the storage area , It's about SECTIONS The order referred to , When an input section Not in the SECTIONS When referenced within a command , The connector sends the input section Copy directly to output section, And then the output section Put it in the memory area . If the memory region is set, the ATTR attribute , Then the region only accepts section( How to judge that section Is it satisfactory? ? Output section There seems to be no record of this in the description section Read write execution properties of ).

ATTR The following can appear in the attribute 7 Characters ,

R  read-only section

W  read / Write section

X  Executable section

A ‘ Distributable ’section

I  Initialized section

L  Same as I

!  Does not satisfy any attribute after the character section

ORIGIN : keyword , The starting address of the area , It can be written in brief. org or o

LENGTH : keyword , The size of the area , It can be written in brief. len or l

Example

MEMORY

{

rom (rx) : ORIGIN = 0, LENGTH = 256K

ram (!rx) : org = 0×40000000, l = 4M

}

In this case , Put in SECTIONS In command * not * Referenced input with read or write properties section Put in rom In the area , Put other unreferenced input section Put in ram. If an output section To be put into a memory area , And the output section It's not specified ADDRESS attribute , Then the connector will output section Put it in the next available position in the area .

 

Nine 、 PHDRS command

The command is only generated when ELF The target file is valid .

ELF The target file format is program headers Program header ( The program header contains one or more segment Segment description ) To describe how a program is loaded into memory . It can be used objdump -p Command view .

When it's local ELF System operation ELF Object file format program , The system loader reads the program header information to know how to load the program into memory . To understand how the system loader parses program headers , Please refer to ELF ABI file .

Do not specify... In the connection script PHDRS On command , Connector can create program header very well , But sometimes you need to describe the program header more precisely , that PAHDRS Orders come in handy .

Be careful : Once used in the connection script PHDRS command , So the connector ** Only ** establish PHDRS The information specified by the command , So use it with caution .

PHDRS The command grammar is as follows ,

PHDRS

{

NAME TYPE [ FILEHDR ] [ PHDRS ] [ AT ( ADDRESS ) ]

[ FLAGS ( FLAGS ) ] ;

}

among FILEHDR、PHDRS、AT、FLAGS Keyword .

NAME : Is the segment name , This name can be associated with the symbol name 、section name 、 Duplicate file name , Because it's in a separate namespace . This name can only be used in SECTIONS Use within the command .

A program segment can be made up of multiple ‘ Loadable ’ Of section form . By output section The properties of the description :PHDRS The output can be section Add a program segment ,: PHDRS Medium PHDRS Is the segment name . In an output section It can be used many times in the description :PHDRS command , That is to say, a section Add multiple segments .

If at an output section The description specifies :PHDRS attribute , So the output after that section Description will use this property by default , Unless it also defines :PHDRS attribute . Obviously when multiple outputs section When it belongs to the same program segment, the writing can be simplified .

TYPE It can be in the following eight forms ,

PT_NULL 0

Represents an unused program segment

PT_LOAD 1

Indicates that the program segment should be loaded when the program is running

PT_DYNAMIC 

Indicates that the program segment contains dynamic connection information

PT_INTERP 3

Indicates the name of the program loader contained in the program segment , stay linux The next common program loader is ld-linux.so.2

PT_NOTE 4

Indicates that the program segment contains program description information

PT_SHLIB 5

A reserved program header type , Not in the ELF ABI Define... In the document

PT_PHDR 6

Indicates that the program segment contains program header information .

EXPRESSION  Expression value

Each of the above types corresponds to a number , This expression defines a user-defined program header .

stay TYPE Property exists after FILEHDR keyword , Indicates that the paragraph contains ELF File header information ; There is PHDRS keyword , Indicates that the paragraph contains ELF Program header information .

AT(ADDRESS) Property to define where the segment is loaded (LMA), This property will ** Cover ** In this program segment section Of AT() attribute .

By default , The connector will be based on the section Properties of ( What attribute ? It seems to be outputting section Not seen in the description ) Set up FLAGS sign , This flag is used to set the p_flags Domain .

Let's take a look at a typical PHDRS Set up

Example

PHDRS

{

headers PT_PHDR PHDRS ;

interp PT_INTERP ;

text PT_LOAD FILEHDR PHDRS ;

data PT_LOAD ;

dynamic PT_DYNAMIC ;

}

SECTIONS

{

. = SIZEOF_HEADERS;

.interp : { *(.interp) } :text :interp

.text : { *(.text) } :text

.rodata : { *(.rodata) } /* defaults to :text */

. = . + 0×1000; /* move to a new page in memory */

.data : { *(.data) } :data

.dynamic : { *(.dynamic) } :data :dynamic

}

 

Ten 、 Version number command

When using ELF Target file format , The connector supports symbols with version numbers . The version number is limited to ELF File format .

Readers can find that just in the shared library , The version number attribute of the symbol is meaningful . The dynamic loader uses the version number of the symbol to select a specific implementation version of a function in the shared library for the application .

You can use the version number command directly in the connection script , You can also implement the version number command in a specific version number description file ( With the connection option –version-script Specify the file ).

The syntax of the command is as follows ,

VERSION { version-script-commands }

  The following discussion uses gcc

 

10.1. Definition of symbol with version number ( Shared library )

file b.c The contents are as follows ,

int getVersion()

{

return 1;

}

Write a version control script for , In this case b.lds, The contents are as follows

VER1.0{

getVersion;

};

VER2.0{

};

$gcc -c b.c

$gcc -shared -Wl,--version-script=b.lds -o libb.so b.o

Can be in {} Fill in the symbol to be bound , In this case getVersion Symbols are related to VER1.0 The binding .

So if there's an application connected to the library's getVersion Symbol , So what it connects is VER1.0 Version of getVersion Symbol

If we're right b.c The file has been upgraded , Change it as follows :

int getVersion()

{

return 101;

}

Here I am right getVersion() Made changes , The meaning of its return value also changes , That is, it is not compatible with the former :

For the safety of the program , We put b.lds Change to ,

VER1.0{

};

VER2.0{

getVersion;

};

Then generate a new libb.so file .

And then if we run app.exe( It's connected to VER1.0 Version of getVersion()), You'll find that the application doesn't work .

The message is as follows :

./app.exe: relocation error: ./app.exe: symbol getVersion, version VER1.0 not defined in file libb.so with link time reference

Because there is no VER1.0 Version of getVersion(), Only VER2.0 Version of getVersion().

 

10.2、 See the version of the linked symbol

For the above generated app.exe Execute the following command :

nm app.exe | grep getVersion

result

U [email protected]@VER1.0

use nm Command discovery app Connect to VER1.0 Version of getVersion

 

10.3、 GNU Expansion

stay GNU in , Allow binding within program files * Symbol * To * Alias symbol with version number *

file b.c The contents are as follows ,

int old_getVersion()

{

return 1;

}

int new_getVersion()

{

return 101;

}

__asm__(".symver old_getVersion,[email protected]");

__asm__(".symver new_getVersion,[email protected]@VER2.0");

among , about VER1.0 Version number of getVersion The alias symbol is old_getVersion;

about VER2.0 Version number of getVersion The alias symbol is new_getVersion,

When the connection is , The default version number is VER2.0

Version control scripts for connectors b.lds The contents are as follows ,

VER1.0{

};

VER2.0{

};

Version control file must contain version VER1.0 And version VER2.0 The definition of , Because in b.c There are references to them in the document

Execute the following command again to compile the connection b.c and app.c

gcc -c src/b.c

gcc -shared -Wl,--version-script=./lds/b.lds -o libb.so b.o

gcc -o app.exe ./src/app.c libb.so

function :

./app.exe

result :

Version=0x65

explain app.exe It's really connected VER2.0 Of getVersion, namely new_getVersion()

 

We have to app.c Make changes , To make it connected VER1.0 Of getVersion, namely old_getVersion()

app.c file :

#include <stdio.h>

__asm__(".symver getVersion,[email protected]");

extern int getVersion();

int main()

{

printf("Version=%p\n", getVersion());

return 0;

}

Compile the connection again b.c and app.c

function :

./app.exe

result :

Version=0x1

Explain this time app.exe It's really connected VER1.0 Of getVersion, namely old_getVersion()

 

11、 ... and 、 expression

lds The grammar of expressions in and C The expression grammar of language is consistent , The values of expressions are integers , If ld The running host of and the target machine of generating files are 32 position , Then the expression is 32 Bit data , It is 64 Bit data .

Here are some common expressions :

_fourk_1 = 4K; /* K、M Company */

_fourk_2 = 4096; /* Integers */

_fourk_3 = 0×1000; /* 16 carry */

_fourk_4 = 01000; /* 8 carry */

Be careful :1K=1024 1M=1024*1024

 

11.1、 Symbol name

Not quoted ”" The symbol of encirclement , In letters 、 Underline or ’.' start , Can contain letters 、 Underline 、’.' and ’-'. When the symbol name is surrounded by quotation marks , The symbol name can be the same as the keyword . Such as ,

“SECTION”=9;

“with a space” = “also with a space” + 10;

 

11.2、 Positioning symbols ’.'

Only in SECTIONS The order is valid , Represents the address in the address space of a program .

Be careful : When the connection is , When locators are used in SECTIONS Output of command section Describing the inner tense , It represents the section The current ** The offset **, Instead of the absolute address of the program address space . Of course, when the program loads , The last address of the symbol is the absolute address of the program address space .

Example 11.2_1:

SECTIONS

{

output :

{

file1(.text)

. = . + 1000;

file2(.text)

. += 1000;

file3(.text)

} = 0×1234;

}

The gap due to the assignment of locators is caused by 0×1234 fill . Other contents should be easy to understand .

Example 11.2_2:

SECTIONS

{

. = 0×100

.text: {

*(.text)

. = 0×200

}

. = 0×500

.data: {

*(.data)

. += 0×600

}

.text section At the beginning of the program address space is 0x100

Example 11.2_3

file src\a.c

#include <stdio.h>

int a = 100;

int b=0;

int c=0;

int d=1;

int main()

{

printf( "&a=%p\n", &a );

printf( "&b=%p\n", &b );

printf( "&c=%p\n", &c );

printf( "&d=%p\n", &d );

return 0;

}

file lds\a.lds

a = 10; /* Global position */

SECTIONS

{

b = 11;

.text :

{

*(.text)

c = .; /* section In the description */

. = 10000;

d = .;

}

_bdata = (. + 3) & ~ 4; /* SECTIONS In command */

.data : { *(.data) }

}

Before use a.lds Compile in this case

gcc -Wall -o a-without-lds.exe ./src/a.c

function ./a-without-lds.exe

result :

&a=0x601020

&b=0x601038

&c=0x60103c

&d=0x601024

In the use of a.lds Compile in this case

gcc -Wall -o a-with-lds.exe ./src/a.c ./lds/a.lds

function ./a-with-lds.exe

result :

&a=0xa

&b=0xb

&c=0x400638

&d=0x402b20

 

10.3、 Operators for expressions

stay lds in , The operator of an expression is related to C The language is the same .

priority Combining order The operator

1 left ! – ~ (1)

2 left * / %

3 left + -

4 left >>  =

5 left &

6 left |

7 left &&

8 left ||

9 right ? :

10 right &= += -= *= /= (2)

(1) Represents the prefix character ,(2) Represents the assignment character .

 

10.4、 Calculation of expressions

Connector latency evaluates most expressions .

however , Treat expressions that are closely related to the join process , The connector immediately evaluates the expression , If it cannot be calculated, an error will be reported . such as , about section Of VMA Address 、 The starting address and size of the memory area block , The expression associated with it should be evaluated immediately .

Example ,

SECTIONS

{

.text 9+this_isnt_constant :

{ *(.text) }

}

In this case ,9+this_isnt_constant The value of the expression is used to set .text section Of VMA Address , So you need to do it immediately , But because of this_isnt_constant The value of the variable is uncertain , So at this point, the connector cannot establish the value of the expression , The connector will report an error .

 

10.5、 Relative and absolute values

At output section The expression in the description , The connector takes its relative value , In contrast to section The offset of the starting position of

stay SECTIONS In command and not output section The expression in the description , The connector takes its absolute value

adopt ABSOLUTE Keywords can convert relative values into absolute values , That is to add the expression on the basis of the original value section Of VMA value .

Example

SECTIONS

{

.data : { *(.data) ;_edata = ABSOLUTE(.); }

}

In this example ,_edata The value of the sign is .data section At the end of ( The absolute value , In the program address space ).

 

10.6、 Built-in functions

lds There are some built-in functions in :

ABSOLUTE(EXP) : Convert to absolute values

ADDR(SECTION) : Return to section Of VMA value .

ALIGN(EXP) : Returns the locator ’.' According to EXP Adjust the value after alignment , The adjusted value algorithm after alignment is :(. + EXP – 1) & ~(EXP – 1).

BLOCK(EXP) : Like ALIGN(EXP), For forward compatibility .

DEFINED(SYMBOL) : If the symbol SYMBOL In the global symbol table , And it's defined , Then the return 1, Otherwise return to 0.

Example :

SECTIONS { …

.text : {

begin = DEFINED(begin) ? begin : . ;

}

}

LOADADDR(SECTION) : Back to three SECTION Of LMA

MAX(EXP1,EXP2) : Back to the big

MIN(EXP1,EXP2) : Back to the little one

NEXT(EXP) : Return to the next available address , The address is EXP Multiple , Be similar to ALIGN(EXP). Unless you use MEMORY The command defines some non contiguous blocks of memory , otherwise NEXT(EXP) And ALIGH(EXP) It must be the same .

SIZEOF(SECTION) : return SECTION Size . When SECTION When not assigned , Is this time SECTION When the size of the , The connector will report an error .

SIZEOF_HEADERS : Returns the number of bytes in the head of the output file . This information appears at the beginning of the output file . When setting the start address of the first segment , You can use this number . If you choose accelerated paging , When there is a ELF When outputting a file , If the linker script uses SIZEOF_HEADERS Built-in functions , The connector has to be in it

Calculate the value of the program head before calculating all segment addresses and lengths . If the connector later finds out that it needs additional program headers , It will report a “not enough room for 

program headers” error . To avoid such a mistake , You have to avoid using SIZEOF_HEADERS function , Or you have to modify your connector script to avoid coercion

Connector to use additional program header , Or you have to use PHDRS Command to define your own program header

 

Twelve 、 Implied connection script

The input file can be the target file , It can also be a connection script , The connection script at this point is called Implied connection script

If the connector does not recognize an input file , Then the file is parsed as a connection script . Further more , If you find that its format is not the format of the connection script , So the connector reported an error .

An implied connection script does not replace the default connection script , Just adding new connections .

Generally speaking , Implied connection script symbol assignment command , or INPUT、GROUP、VERSION command .

In the connection command line , The order of each input file is fixed , The implied connection script occupies a place in the connection command line , This location determines the order of the input files specified by the connection script during the connection process .

The typical implied connection script is libc.so file , stay GNU/linux There are generally /usr/lib Under the table of contents .