From fb6e98a90cb253df6e0841f27ee83af2df446993 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 26 Feb 2024 17:44:34 +0800 Subject: [PATCH 01/22] Add program property for CFI extension Define two bit for landing pad and shadow stack, and we plan to defined third bit `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX` for complex labeling scheme. --- riscv-elf.adoc | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 7202a7d5..a7d6e6fa 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1442,6 +1442,43 @@ that a linker or runtime loader needs to check for compatibility. The linker should ignore and discard unknown bits in program properties, and issue warnings or errors. +[[rv-prog-prop-type]] +.RISC-V-specific program property types +[cols="3,2,2,3"] +[width=80%] +|=== +| Name | Value | Size | Meaning + +| GNU_PROPERTY_RISCV_FEATURE_1_AND | 0xc0000000 | 4-bytes | RISC-V processor-specific features used in program. +|=== + +==== GNU_PROPERTY_RISCV_FEATURE_1_AND + +`GNU_PROPERTY_RISCV_FEATURE_1_AND` describe a set of features, each bit describe +a different features. + +[%autowidth] +|=== +| Bit | Bit Name +| 0 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE +| 1 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS +|=== + +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` This bit indicate that all executable +sections are built to be compatible with the landing pad mechanism provided by +the `Zicfilp` extension. An executable or shared library with this bit set is +required to generate PLTs with the landing pad (`lpad`) instruction, and all +label are set to `1`. + +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS`: This bit indicate that all executable +sections are built to be compatible with the shadow stack mechanism provided by +the `Zicfiss` extension. Loading an executable or shared library with this bit +set requires the execution environment to provide either the `Zicfiss` extension +or the `Zimop` extension. When the executable or shared library is compiled with +compressed instructions then loading an executable with this bit set requires +the execution environment to provide the `Zicfiss` extension or to provide both +the `Zcmop` and `Zimop` extensions. + === Mapping Symbol The section can have a mixture of code and data or code with different ISAs. From 34d500e5c4d65ea5e8d68a516b3ee01424dbfcf1 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 26 Feb 2024 17:56:08 +0800 Subject: [PATCH 02/22] Tweak table layout --- riscv-elf.adoc | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index a7d6e6fa..bc5cd192 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1442,14 +1442,27 @@ that a linker or runtime loader needs to check for compatibility. The linker should ignore and discard unknown bits in program properties, and issue warnings or errors. +<> provides details of the RISC-V ELF program property; the +meaning of each column is given below: + + +Name:: The name of the program property type, omitting the prefix of `GNU_PROPERTY_RISCV_`. + +Value:: The type value for the program property type. + +Size:: The data type size hold within this program property type. + +Description:: Additional information about the program property type. + + [[rv-prog-prop-type]] .RISC-V-specific program property types -[cols="3,2,2,3"] -[width=80%] +[cols="3,3,2,5"] +[width=100%] |=== -| Name | Value | Size | Meaning +| Name | Value | Size | Description -| GNU_PROPERTY_RISCV_FEATURE_1_AND | 0xc0000000 | 4-bytes | RISC-V processor-specific features used in program. +| FEATURE_1_AND | 0xc0000000 | 4-bytes | RISC-V processor-specific features used in program. |=== ==== GNU_PROPERTY_RISCV_FEATURE_1_AND From 66073198412bacad3d3f4b19ebd4f6fe9308c7ec Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 26 Feb 2024 20:42:30 +0800 Subject: [PATCH 03/22] Update simple labeling scheme to use 0 --- riscv-elf.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index bc5cd192..936efc2a 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1481,7 +1481,7 @@ a different features. sections are built to be compatible with the landing pad mechanism provided by the `Zicfilp` extension. An executable or shared library with this bit set is required to generate PLTs with the landing pad (`lpad`) instruction, and all -label are set to `1`. +label are set to `0`. `GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS`: This bit indicate that all executable sections are built to be compatible with the shadow stack mechanism provided by From c9978bdf810913256404eda06ddf3193c40edf64 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 26 Feb 2024 21:49:08 +0800 Subject: [PATCH 04/22] Add simple landing pad PLT --- riscv-elf.adoc | 57 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 51 insertions(+), 6 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 936efc2a..93cdcc18 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -724,6 +724,19 @@ The PLT (Procedure Linkage Table) exists to allow function calls between dynamically linked shared objects. Each dynamic object has its own GOT (Global Offset Table) and PLT (Procedure Linkage Table). +RISC-V has defined several PLT styles, which used for different situation, +the default PLT sytle should be used if the program is not met the condition for +using all other PLT sytle. + +[[plt-style]] +.PLT styles +[cols="1,2"] +[width=70%] +|=== +| Default PLT | - +| Simple landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` is set. +|=== + The first entry of a shared object PLT is a special entry that calls `_dl_runtime_resolve` to resolve the GOT offset for the called function. The `_dl_runtime_resolve` function in the dynamic loader resolves the @@ -731,8 +744,9 @@ GOT offsets lazily on the first call to any function, except when `LD_BIND_NOW` is set in which case the GOT entries are populated by the dynamic linker before the executable is started. Lazy resolution of GOT entries is intended to speed up program loading by deferring symbol -resolution to the first time the function is called. The first entry -in the PLT occupies two 16 byte entries: +resolution to the first time the function is called. + +The first entry in the PLT occupies two 16 byte entries for the default PLT style: [,asm] ---- @@ -746,11 +760,28 @@ in the PLT occupies two 16 byte entries: jr t3 ---- -Subsequent function entry stubs in the PLT take up 16 bytes and load a -function pointer from the GOT. On the first call to a function, the -entry redirects to the first PLT entry which calls `_dl_runtime_resolve` -and fills in the GOT entry for subsequent calls to the function: +And occupies three 16 byte entries for the simple landing pad PLT style: +[,asm] +---- +1: lpad 0 + auipc t2, %pcrel_hi(.got.plt) + sub t1, t1, t3 # shifted .got.plt offset + hdr size + 12 + l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve + addi t1, t1, -(hdr size + 12) # shifted .got.plt offset + addi t0, t2, %pcrel_lo(1b) # &.got.plt + srli t1, t1, log2(16/PTRSIZE) # .got.plt offset + l[w|d] t0, PTRSIZE(t0) # link map + jr t3 + nop + nop +---- + +Subsequent function entry stubs in the PLT take up 16 bytes. +On the first call to a function, the entry redirects to the first PLT entry +which calls `_dl_runtime_resolve` and fills in the GOT entry for subsequent +calls to the function. +The code sequences of the PLT entry for the default PLT style: [,asm] ---- 1: auipc t3, %pcrel_hi(function@.got.plt) @@ -759,6 +790,15 @@ and fills in the GOT entry for subsequent calls to the function: nop ---- +The code sequences of the PLT entry for the the simple landing pad PLT style: +[,asm] +---- +1: lpad 0 + auipc t3, %pcrel_hi(function@.got.plt) + l[w|d] t3, %pcrel_lo(1b)(t3) + jalr t1, t3 +---- + ==== Procedure Calls `R_RISCV_CALL` and `R_RISCV_CALL_PLT` relocations are associated with @@ -1204,12 +1244,17 @@ The defined processor-specific dynamic array tags are listed in <>. | Name | Value | d_un | Executable | Shared Object | DT_RISCV_VARIANT_CC | 0x70000001 | d_val | Platform specific | Platform specific +| DT_RISCV_SIMPLE_LP_PLT | 0x70000003 | d_val | Platform specific | Platform specific |=== An object must have the dynamic tag `DT_RISCV_VARIANT_CC` if it has one or more `R_RISCV_JUMP_SLOT` relocations against symbols with the `STO_RISCV_VARIANT_CC` attribute. +`DT_RISCV_SIMPLE_LP_PLT` indicate PLTs enabled landing pad with simple labeling +scheme, an object must have the dynamic tag `DT_RISCV_SIMPLE_LP_PLT` if +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` has set in the output. + `DT_INIT` and `DT_FINI` are not required to be supported and should be avoided in favour of `DT_PREINIT_ARRAY`, `DT_INIT_ARRAY` and `DT_FINI_ARRAY`. From 8370ecd411abc047868394d166dc42cc2e172ad5 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 12 Apr 2024 16:12:09 +0800 Subject: [PATCH 05/22] Drop DT_RISCV_SIMPLE_LP_PLT --- riscv-elf.adoc | 5 ----- 1 file changed, 5 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 93cdcc18..dfa409f7 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1244,17 +1244,12 @@ The defined processor-specific dynamic array tags are listed in <>. | Name | Value | d_un | Executable | Shared Object | DT_RISCV_VARIANT_CC | 0x70000001 | d_val | Platform specific | Platform specific -| DT_RISCV_SIMPLE_LP_PLT | 0x70000003 | d_val | Platform specific | Platform specific |=== An object must have the dynamic tag `DT_RISCV_VARIANT_CC` if it has one or more `R_RISCV_JUMP_SLOT` relocations against symbols with the `STO_RISCV_VARIANT_CC` attribute. -`DT_RISCV_SIMPLE_LP_PLT` indicate PLTs enabled landing pad with simple labeling -scheme, an object must have the dynamic tag `DT_RISCV_SIMPLE_LP_PLT` if -`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` has set in the output. - `DT_INIT` and `DT_FINI` are not required to be supported and should be avoided in favour of `DT_PREINIT_ARRAY`, `DT_INIT_ARRAY` and `DT_FINI_ARRAY`. From 3abd293ec53ccbe52d1694d4ae1abd7848540deb Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 12 Apr 2024 16:13:29 +0800 Subject: [PATCH 06/22] Add one more nop for alignment --- riscv-elf.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index dfa409f7..18607812 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -774,6 +774,7 @@ And occupies three 16 byte entries for the simple landing pad PLT style: jr t3 nop nop + nop ---- Subsequent function entry stubs in the PLT take up 16 bytes. From ecf913c077776cb4b7461f1471f3c864fd888af6 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Wed, 17 Jul 2024 15:59:00 +0800 Subject: [PATCH 07/22] Minor revision Changes: - Rename `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` to `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` - Fix wrong offset in the first PLT stubs for the simple landing pad PLT. --- riscv-elf.adoc | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 18607812..55c93458 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -733,8 +733,8 @@ using all other PLT sytle. [cols="1,2"] [width=70%] |=== -| Default PLT | - -| Simple landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` is set. +| Default PLT | - +| Unlabeled landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` is set. |=== The first entry of a shared object PLT is a special entry that calls @@ -765,9 +765,9 @@ And occupies three 16 byte entries for the simple landing pad PLT style: ---- 1: lpad 0 auipc t2, %pcrel_hi(.got.plt) - sub t1, t1, t3 # shifted .got.plt offset + hdr size + 12 + sub t1, t1, t3 # shifted .got.plt offset + hdr size + 16 l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve - addi t1, t1, -(hdr size + 12) # shifted .got.plt offset + addi t1, t1, -(hdr size + 16) # shifted .got.plt offset addi t0, t2, %pcrel_lo(1b) # &.got.plt srli t1, t1, log2(16/PTRSIZE) # .got.plt offset l[w|d] t0, PTRSIZE(t0) # link map @@ -1508,21 +1508,23 @@ Description:: Additional information about the program property type. ==== GNU_PROPERTY_RISCV_FEATURE_1_AND -`GNU_PROPERTY_RISCV_FEATURE_1_AND` describe a set of features, each bit describe -a different features. + +`GNU_PROPERTY_RISCV_FEATURE_1_AND` describes a set of features, where each bit +represents a different feature. The linker should perform a bitwise AND +operation when merging different objects. [%autowidth] |=== | Bit | Bit Name -| 0 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE +| 0 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED | 1 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS |=== -`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` This bit indicate that all executable -sections are built to be compatible with the landing pad mechanism provided by -the `Zicfilp` extension. An executable or shared library with this bit set is -required to generate PLTs with the landing pad (`lpad`) instruction, and all -label are set to `0`. +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` This bit indicate that all +executable sections are built to be compatible with the landing pad mechanism +provided by the `Zicfilp` extension. An executable or shared library with this +bit set is required to generate PLTs with the landing pad (`lpad`) instruction, +and all label are set to `0`. `GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS`: This bit indicate that all executable sections are built to be compatible with the shadow stack mechanism provided by From bf242f28f2de52d1aaa22e40d2e7821b1a95527b Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 14 Mar 2025 15:44:04 +0800 Subject: [PATCH 08/22] Apply mylai-mtk's comment --- riscv-elf.adoc | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 55c93458..4acbb3db 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -724,9 +724,9 @@ The PLT (Procedure Linkage Table) exists to allow function calls between dynamically linked shared objects. Each dynamic object has its own GOT (Global Offset Table) and PLT (Procedure Linkage Table). -RISC-V has defined several PLT styles, which used for different situation, -the default PLT sytle should be used if the program is not met the condition for -using all other PLT sytle. +RISC-V defines several PLT styles, which are used in different situations. +The default PLT style should be used if the program does not meet the conditions +for using all other PLT sytles. [[plt-style]] .PLT styles @@ -760,11 +760,11 @@ The first entry in the PLT occupies two 16 byte entries for the default PLT styl jr t3 ---- -And occupies three 16 byte entries for the simple landing pad PLT style: +And occupies three 16 byte entries for the unlabeled landing pad PLT style: [,asm] ---- -1: lpad 0 - auipc t2, %pcrel_hi(.got.plt) + lpad 0 +1: auipc t2, %pcrel_hi(.got.plt) sub t1, t1, t3 # shifted .got.plt offset + hdr size + 16 l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve addi t1, t1, -(hdr size + 16) # shifted .got.plt offset @@ -791,11 +791,11 @@ The code sequences of the PLT entry for the default PLT style: nop ---- -The code sequences of the PLT entry for the the simple landing pad PLT style: +The code sequences of the PLT entry for the unlabeled landing pad PLT style: [,asm] ---- -1: lpad 0 - auipc t3, %pcrel_hi(function@.got.plt) + lpad 0 +1: auipc t3, %pcrel_hi(function@.got.plt) l[w|d] t3, %pcrel_lo(1b)(t3) jalr t1, t3 ---- @@ -1489,9 +1489,10 @@ meaning of each column is given below: Name:: The name of the program property type, omitting the prefix of `GNU_PROPERTY_RISCV_`. -Value:: The type value for the program property type. +Value:: The `pr_type` value for the program property type. -Size:: The data type size hold within this program property type. +Size:: The size (`pr_datasz`) of data type held within this program property + type. Description:: Additional information about the program property type. @@ -1508,7 +1509,6 @@ Description:: Additional information about the program property type. ==== GNU_PROPERTY_RISCV_FEATURE_1_AND - `GNU_PROPERTY_RISCV_FEATURE_1_AND` describes a set of features, where each bit represents a different feature. The linker should perform a bitwise AND operation when merging different objects. @@ -1520,20 +1520,20 @@ operation when merging different objects. | 1 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS |=== -`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` This bit indicate that all +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED`: This bit indicates that all executable sections are built to be compatible with the landing pad mechanism -provided by the `Zicfilp` extension. An executable or shared library with this -bit set is required to generate PLTs with the landing pad (`lpad`) instruction, -and all label are set to `0`. +provided by the Zicfilp extension in the unlabeled scheme: Executables and +shared libraries with this bit set are required to generate PLTs in the +unlabeled landing pad PLT style, and all of the labels of lpad instructions are +set to 0, i.e. unlabeled. -`GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS`: This bit indicate that all executable +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS`: This bit indicates that all executable sections are built to be compatible with the shadow stack mechanism provided by the `Zicfiss` extension. Loading an executable or shared library with this bit set requires the execution environment to provide either the `Zicfiss` extension or the `Zimop` extension. When the executable or shared library is compiled with -compressed instructions then loading an executable with this bit set requires -the execution environment to provide the `Zicfiss` extension or to provide both -the `Zcmop` and `Zimop` extensions. +compressed instructions then loading it with this bit set requires the execution +environment to provide the `Zicfiss` extension or the `Zimop` extensions. === Mapping Symbol From 8789f2a705bba1c45ed25793a019d9c87f0ee008 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 26 Feb 2024 21:01:12 +0800 Subject: [PATCH 09/22] Add complex labeling scheme for landing pad Function signature based labeling scheme, follow the "Function types" mangling rule defeind in Itanium C++ ABI. With few specific rules: - `main` funciton is using signature of `(int, pointer to pointer to char) returning int` (`FiiPPcE`). - `_dl_runtime_resolve` use zero for the landing pad. - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the _Itanium {Cpp} ABI_ <>. - Virtual functions in {Cpp} should use the member function type of the base class that first defined the virtual function. - If a virtual function is inherited from more than one base class, it should use the type of the first base class. Thunk functions will use the type of the corresponding base class. Co-authored-by: Ming-Yi Lai --- riscv-elf.adoc | 117 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 115 insertions(+), 2 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 4acbb3db..e0630cbc 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -733,8 +733,9 @@ for using all other PLT sytles. [cols="1,2"] [width=70%] |=== -| Default PLT | - +| Default PLT | - | Unlabeled landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` is set. +| Complex landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX` is set. |=== The first entry of a shared object PLT is a special entry that calls @@ -746,6 +747,9 @@ dynamic linker before the executable is started. Lazy resolution of GOT entries is intended to speed up program loading by deferring symbol resolution to the first time the function is called. +The PLT entry is 16 bytes for the default PLT style and the simple landing pad +PLT style, and 32 bytes for the complex landing pad PLT style. + The first entry in the PLT occupies two 16 byte entries for the default PLT style: [,asm] @@ -777,7 +781,41 @@ And occupies three 16 byte entries for the unlabeled landing pad PLT style: nop ---- -Subsequent function entry stubs in the PLT take up 16 bytes. +The complex landing pad PLT style occupies two 32 byte entries: + +[,asm] +---- +1: lpad 0 + sub t1, t1, t3 # shifted .got.plt offset + hdr size + 24 + auipc t2, %pcrel_hi(.got.plt) + addi t0, t2, %pcrel_lo(1b) # &.got.plt + l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve + addi t1, t1, -(hdr size + 24) # shifted .got.plt offset + srli t1, t1, log2(32/PTRSIZE) # .got.plt offset + l[w|d] t0, PTRSIZE(t0) # link map + jr t3 + nop + nop +---- + + +[,asm] +---- +1: lpad 0 + auipc t2, %pcrel_hi(.got.plt) + sub t1, t1, t3 # shifted .got.plt offset + hdr size + 24 + l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve + addi t1, t1, -(hdr size + 24) # shifted .got.plt offset + addi t0, t2, %pcrel_lo(1b) # &.got.plt + srli t1, t1, log2(32/PTRSIZE) # .got.plt offset + l[w|d] t0, PTRSIZE(t0) # link map + jr t3 + nop + nop +---- + +Subsequent function entry stubs in the PLT take up 16 bytes or 32 bytes depends +on the style. On the first call to a function, the entry redirects to the first PLT entry which calls `_dl_runtime_resolve` and fills in the GOT entry for subsequent calls to the function. @@ -800,6 +838,19 @@ The code sequences of the PLT entry for the unlabeled landing pad PLT style: jalr t1, t3 ---- +The code sequences of the PLT entry for the the complex landing pad PLT style: +[,asm] +---- +1: lpad + auipc t3, %pcrel_hi(function@.got.plt) + l[w|d] t3, %pcrel_lo(1b)(t3) + lui t2, + jalr t1, t3 + nop + nop + nop +---- + ==== Procedure Calls `R_RISCV_CALL` and `R_RISCV_CALL_PLT` relocations are associated with @@ -1518,6 +1569,7 @@ operation when merging different objects. | Bit | Bit Name | 0 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED | 1 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS +| 2 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX |=== `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED`: This bit indicates that all @@ -1535,6 +1587,12 @@ or the `Zimop` extension. When the executable or shared library is compiled with compressed instructions then loading it with this bit set requires the execution environment to provide the `Zicfiss` extension or the `Zimop` extensions. +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX` This bit indicate that all executable +sections are built to be compatible with the landing pad mechanism provided by +the `Zicfilp` extension. An executable or shared library with this bit set is +required to generate PLTs with the landing pad (`lpad`) instruction, and all +label are set to a value which hashed from its function signature. + === Mapping Symbol The section can have a mixture of code and data or code with different ISAs. @@ -1579,6 +1637,61 @@ is not enough for the disassembler to disassemble the `rv64gcv` version correctly. Specifying ISA string appropriately with the two memcpy instruction mapping symbols helps the disassembler to disassemble instructions correctly. +== Label Value Compuatation for Complex Labeling Scheme Landing Pad + +The label value for the complex labeling scheme landing pad is computed from the +hash of the function signature string, which uses the same scheme as the +"Function types" mangling rule defined in the _Itanium {Cpp} ABI_ +<>, the value is taken from the lower 20 bits of the MD5 +hash result of the function signature string. + +Additionally, here are a few specific rules for {Cpp} member functions: + +- {Cpp} member functions should use the "Pointer-to-member types" mangling rule + defined in the _Itanium {Cpp} ABI_ <>. +- Virtual functions in {Cpp} should use the member function type of the base + class that first defined the virtual function. + + +Example: + +[,cxx] +---- + +double foo(int, float *); + +class Base +{ +public: + virtual void memfunc1(); + virtual void memfunc2(int); +}; + +class Derived : public Base +{ +public: + virtual void memfunc1(); + virtual void memfunc3(double); + void memfunc4(); +}; + +class DerivedDerived : public Derived +{ +public: + virtual void memfunc2(int); + virtual void memfunc3(double); +}; + +---- + +The function signatures for the above functions are described below: + +- `foo` is encoded as `FdiPfE`. +- `Base::memfunc1` and `Derived::memfunc1` are both encoded as `M4BaseFvvE`. +- `Base::memfunc2` and `DerivedDerived::memfunc2` are both encoded as `M4BaseFviE`. +- `Derived::memfunc3` and `DerivedDerived::memfunc3` are both encoded as `M7DerivedFvdE`. +- `Derived::memfunc4` is encoded as `M7DerivedFvvE`. + == Linker Relaxation At link time, when all the memory objects have been resolved, the code sequence From 196b64acec58b28d5ac0190003f155d1c71e915e Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 10 May 2024 17:47:17 +0800 Subject: [PATCH 10/22] Serveral udpate for the function signature based labeling scheme Changes: - Rename complex labeling scheme to function signature based labeling scheme - Fix the PLT stubs - Add labeling rule for `main` and `_dl_runtime_resolve`. - Clarify the rule for those virtual function from more than one base class. --- riscv-elf.adoc | 67 +++++++++++++++++++++++++------------------------- 1 file changed, 34 insertions(+), 33 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index e0630cbc..94233707 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -735,7 +735,7 @@ for using all other PLT sytles. |=== | Default PLT | - | Unlabeled landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` is set. -| Complex landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX` is set. +| Function signature based landing pad PLT | Must use this PLT style when `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_FUNC_SIG` is set. |=== The first entry of a shared object PLT is a special entry that calls @@ -748,7 +748,7 @@ entries is intended to speed up program loading by deferring symbol resolution to the first time the function is called. The PLT entry is 16 bytes for the default PLT style and the simple landing pad -PLT style, and 32 bytes for the complex landing pad PLT style. +PLT style, and 32 bytes for the function signature based landing pad PLT style. The first entry in the PLT occupies two 16 byte entries for the default PLT style: @@ -781,36 +781,21 @@ And occupies three 16 byte entries for the unlabeled landing pad PLT style: nop ---- -The complex landing pad PLT style occupies two 32 byte entries: +The function signature based landing pad PLT style occupies two 32 byte entries: [,asm] ---- 1: lpad 0 sub t1, t1, t3 # shifted .got.plt offset + hdr size + 24 - auipc t2, %pcrel_hi(.got.plt) - addi t0, t2, %pcrel_lo(1b) # &.got.plt - l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve + auipc t3, %pcrel_hi(.got.plt) + addi t0, t3, %pcrel_lo(1b) # &.got.plt + l[w|d] t3, %pcrel_lo(1b)(t3) # _dl_runtime_resolve addi t1, t1, -(hdr size + 24) # shifted .got.plt offset srli t1, t1, log2(32/PTRSIZE) # .got.plt offset l[w|d] t0, PTRSIZE(t0) # link map jr t3 nop nop ----- - - -[,asm] ----- -1: lpad 0 - auipc t2, %pcrel_hi(.got.plt) - sub t1, t1, t3 # shifted .got.plt offset + hdr size + 24 - l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve - addi t1, t1, -(hdr size + 24) # shifted .got.plt offset - addi t0, t2, %pcrel_lo(1b) # &.got.plt - srli t1, t1, log2(32/PTRSIZE) # .got.plt offset - l[w|d] t0, PTRSIZE(t0) # link map - jr t3 - nop nop ---- @@ -838,7 +823,7 @@ The code sequences of the PLT entry for the unlabeled landing pad PLT style: jalr t1, t3 ---- -The code sequences of the PLT entry for the the complex landing pad PLT style: +The code sequences of the PLT entry for the the function signature based landing pad PLT style: [,asm] ---- 1: lpad @@ -1569,7 +1554,7 @@ operation when merging different objects. | Bit | Bit Name | 0 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED | 1 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_SS -| 2 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX +| 2 | GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_FUNC_SIG |=== `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED`: This bit indicates that all @@ -1587,7 +1572,7 @@ or the `Zimop` extension. When the executable or shared library is compiled with compressed instructions then loading it with this bit set requires the execution environment to provide the `Zicfiss` extension or the `Zimop` extensions. -`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX` This bit indicate that all executable +`GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_FUNC_SIG` This bit indicate that all executable sections are built to be compatible with the landing pad mechanism provided by the `Zicfilp` extension. An executable or shared library with this bit set is required to generate PLTs with the landing pad (`lpad`) instruction, and all @@ -1637,21 +1622,27 @@ is not enough for the disassembler to disassemble the `rv64gcv` version correctly. Specifying ISA string appropriately with the two memcpy instruction mapping symbols helps the disassembler to disassemble instructions correctly. -== Label Value Compuatation for Complex Labeling Scheme Landing Pad +== Label Value Compuatation for Function Signature based Scheme Landing Pad -The label value for the complex labeling scheme landing pad is computed from the +The label value for the function signature based labeling scheme landing pad is computed from the hash of the function signature string, which uses the same scheme as the "Function types" mangling rule defined in the _Itanium {Cpp} ABI_ -<>, the value is taken from the lower 20 bits of the MD5 -hash result of the function signature string. +<>, and the function signature will use the "Compression" rule +defined in _Itanium {Cpp} ABI_, the value is taken from the lower 20 bits of +the MD5 hash result of the function signature string. -Additionally, here are a few specific rules for {Cpp} member functions: +Additionally, here are a few specific rules: +- `main` funciton is using signature of + `(int, pointer to pointer to char) returning int` (`FiiPPcE`). +- `_dl_runtime_resolve` use zero for the landing pad. - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the _Itanium {Cpp} ABI_ <>. - Virtual functions in {Cpp} should use the member function type of the base class that first defined the virtual function. - +- If a virtual function is inherited from more than one base class, it should + use the type of the first base class. Thunk functions will use the type of + the corresponding base class. Example: @@ -1675,7 +1666,13 @@ public: void memfunc4(); }; -class DerivedDerived : public Derived +class OtherBase +{ +public: + virtual void memfunc2(int); +} + +class DerivedDerived : public Derived, OtherBase { public: virtual void memfunc2(int); @@ -1688,9 +1685,13 @@ The function signatures for the above functions are described below: - `foo` is encoded as `FdiPfE`. - `Base::memfunc1` and `Derived::memfunc1` are both encoded as `M4BaseFvvE`. -- `Base::memfunc2` and `DerivedDerived::memfunc2` are both encoded as `M4BaseFviE`. -- `Derived::memfunc3` and `DerivedDerived::memfunc3` are both encoded as `M7DerivedFvdE`. +- `Base::memfunc2` is encoded as `M4BaseFviE`. +- `OtherBase::memfunc2` is encoded as `M9OtherBaseFviE`. +- `Derived::memfunc3` and `DerivedDerived::memfunc3` are both encoded as + `M7DerivedFvdE`. - `Derived::memfunc4` is encoded as `M7DerivedFvvE`. +- `DerivedDerived::memfunc2` is encoded as `M4BaseFviE`, and the thunk function + for `OtherBase::memfunc2` will be `M9OtherBaseFviE`. == Linker Relaxation From a22679ab290457d06fb15452d859beb04d72f4bf Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 14 Jun 2024 15:58:38 +0800 Subject: [PATCH 11/22] Revise rule - Speical rule for return type of member function. - Speical rule for class destructors - should be ignored. - Static functions should follow the same rules as normal functions. - wchar_t is platform dependent. - Functions with an empty parameter list are treated as `void` (`v`). --- riscv-elf.adoc | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 94233707..df884a7a 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1635,14 +1635,33 @@ Additionally, here are a few specific rules: - `main` funciton is using signature of `(int, pointer to pointer to char) returning int` (`FiiPPcE`). -- `_dl_runtime_resolve` use zero for the landing pad. +- `_dl_runtime_resolve` uses zero for the landing pad. - {Cpp} member functions should use the "Pointer-to-member types" mangling rule - defined in the _Itanium {Cpp} ABI_ <>. -- Virtual functions in {Cpp} should use the member function type of the base - class that first defined the virtual function. -- If a virtual function is inherited from more than one base class, it should - use the type of the first base class. Thunk functions will use the type of - the corresponding base class. + defined in the Itanium C++ ABI <> with the following + additional rules: + - Virtual functions in {Cpp} should use the member function type of the base + class that first defined the virtual function. + - If a virtual function is inherited from more than one base class, it should + use the type of the first base class. Thunk functions will use the type of + the corresponding base class. + - The return type of a class member function should mangle to `void *` if it + is a pointer or reference to a non-primitive type. A pointer to a pointer + of a non-primitive type is not included in this rule. + - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). + - `` should be ignored. + - Static function is following the rule as normal function. +- `wchar_t` should match the type of the target platform. For example, on + Linux, it uses int, so it mangles to `i` rather than `w`. +- Function with an empty parameter list are treated as `void` (`v`). + + +NOTE: Class destructors generally should not be called via indirect call, but + they may be registered as program destructors via `__cxa_atexit`. + Therefore, they must match the signature of the argument of + `__cxa_atexit`, which is `void (*)(void*)`. + +NOTE: `` is ignored due to C++ standard backward compatibility, + as it was introduced after C++17. Example: From 6610a9627477a701e99047b08ce7c32ece7b36c6 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 14 Jun 2024 17:07:18 +0800 Subject: [PATCH 12/22] Minor tweak - Add note to mention covariant return types --- riscv-elf.adoc | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index df884a7a..42ec2047 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1651,9 +1651,11 @@ Additionally, here are a few specific rules: - `` should be ignored. - Static function is following the rule as normal function. - `wchar_t` should match the type of the target platform. For example, on - Linux, it uses int, so it mangles to `i` rather than `w`. + Linux, it uses `int`, so it mangles to `i` rather than `w` for {Cpp}. - Function with an empty parameter list are treated as `void` (`v`). +NOTE: The special rule for the return type of class member functions is defined + to handle covariant return types. NOTE: Class destructors generally should not be called via indirect call, but they may be registered as program destructors via `__cxa_atexit`. @@ -1661,7 +1663,7 @@ NOTE: Class destructors generally should not be called via indirect call, but `__cxa_atexit`, which is `void (*)(void*)`. NOTE: `` is ignored due to C++ standard backward compatibility, - as it was introduced after C++17. + as it was introduced after {Cpp}17. Example: From aa65c9b6dab09095a7d7f40b74713e7c801222a0 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Fri, 14 Jun 2024 18:04:31 +0800 Subject: [PATCH 13/22] Tweak return type rule --- riscv-elf.adoc | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 42ec2047..f977ab3f 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1639,23 +1639,20 @@ Additionally, here are a few specific rules: - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the Itanium C++ ABI <> with the following additional rules: - - Virtual functions in {Cpp} should use the member function type of the base - class that first defined the virtual function. - - If a virtual function is inherited from more than one base class, it should - use the type of the first base class. Thunk functions will use the type of - the corresponding base class. - - The return type of a class member function should mangle to `void *` if it - is a pointer or reference to a non-primitive type. A pointer to a pointer - of a non-primitive type is not included in this rule. + - Virtual functions should use `v` for `` rather than the actual + class name. + - The return type of a virtual class member function should mangle to + `void *` if it is a pointer or reference to a non-primitive type. A pointer + to a pointer of a non-primitive type is not included in this rule. - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). - `` should be ignored. - - Static function is following the rule as normal function. + - Static function is following the rule as non-member function. - `wchar_t` should match the type of the target platform. For example, on Linux, it uses `int`, so it mangles to `i` rather than `w` for {Cpp}. - Function with an empty parameter list are treated as `void` (`v`). -NOTE: The special rule for the return type of class member functions is defined - to handle covariant return types. +NOTE: The special rule for the return type of virtual class member functions is + defined to handle covariant return types. NOTE: Class destructors generally should not be called via indirect call, but they may be registered as program destructors via `__cxa_atexit`. From 27f0ae2024d1f2bf8d1b2f4bbdc805d315381482 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Wed, 3 Jul 2024 20:28:11 +0800 Subject: [PATCH 14/22] Apply for all member function --- riscv-elf.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index f977ab3f..d2b76a2d 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1639,7 +1639,7 @@ Additionally, here are a few specific rules: - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the Itanium C++ ABI <> with the following additional rules: - - Virtual functions should use `v` for `` rather than the actual + - Member functions should use `v` for `` rather than the actual class name. - The return type of a virtual class member function should mangle to `void *` if it is a pointer or reference to a non-primitive type. A pointer From 65d46f111878f3e94ee9c8b67b68ea3e2ade1888 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 15 Jul 2024 17:28:53 +0800 Subject: [PATCH 15/22] Update according mylai-mtk's comment --- riscv-elf.adoc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index d2b76a2d..268f958c 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1636,20 +1636,22 @@ Additionally, here are a few specific rules: - `main` funciton is using signature of `(int, pointer to pointer to char) returning int` (`FiiPPcE`). - `_dl_runtime_resolve` uses zero for the landing pad. +- 'Y' component in the `` should be ignored. +- `` should be ignored. - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the Itanium C++ ABI <> with the following additional rules: - Member functions should use `v` for `` rather than the actual - class name. + class name, e.g. `M1v` rahter than `M3foo` for `class foo`. - The return type of a virtual class member function should mangle to `void *` if it is a pointer or reference to a non-primitive type. A pointer to a pointer of a non-primitive type is not included in this rule. - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). - - `` should be ignored. - Static function is following the rule as non-member function. - `wchar_t` should match the type of the target platform. For example, on Linux, it uses `int`, so it mangles to `i` rather than `w` for {Cpp}. -- Function with an empty parameter list are treated as `void` (`v`). +- Functions with an empty parameter list are treated as explicitly declaring + taking no parameters (having void as the parameter list) NOTE: The special rule for the return type of virtual class member functions is defined to handle covariant return types. From 76e830319adea18d0dde26cfe6a21d356d6aec6c Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 15 Jul 2024 21:08:15 +0800 Subject: [PATCH 16/22] Update example and minor tweak for the rule --- riscv-elf.adoc | 48 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 268f958c..779dd5eb 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1644,8 +1644,9 @@ Additionally, here are a few specific rules: - Member functions should use `v` for `` rather than the actual class name, e.g. `M1v` rahter than `M3foo` for `class foo`. - The return type of a virtual class member function should mangle to - `void *` if it is a pointer or reference to a non-primitive type. A pointer - to a pointer of a non-primitive type is not included in this rule. + `void *`/`void &` if it is a pointer/reference to a non-primitive + type. A pointer to a pointer of a non-primitive type is not included in + this rule. - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). - Static function is following the rule as non-member function. - `wchar_t` should match the type of the target platform. For example, on @@ -1676,27 +1677,37 @@ class Base public: virtual void memfunc1(); virtual void memfunc2(int); + virtual Base *memfunc3(int); }; class Derived : public Base { public: - virtual void memfunc1(); - virtual void memfunc3(double); - void memfunc4(); + virtual void memfunc1() override; + virtual Derived *memfunc3(int) override; + virtual void memfunc4(double); + void memfunc5(); }; class OtherBase { public: virtual void memfunc2(int); -} +}; + +class OtherClass; class DerivedDerived : public Derived, OtherBase { public: - virtual void memfunc2(int); - virtual void memfunc3(double); + virtual void memfunc2(int) override; + virtual DerivedDerived *memfunc3(int) override; + virtual void memfunc4(double) override; + DerivedDerived *memfunc6(); + OtherClass *memfunc7(float); + OtherClass &memfunc8(); + OtherClass memfunc9(float); + int *memfunc10(); }; ---- @@ -1704,14 +1715,19 @@ public: The function signatures for the above functions are described below: - `foo` is encoded as `FdiPfE`. -- `Base::memfunc1` and `Derived::memfunc1` are both encoded as `M4BaseFvvE`. -- `Base::memfunc2` is encoded as `M4BaseFviE`. -- `OtherBase::memfunc2` is encoded as `M9OtherBaseFviE`. -- `Derived::memfunc3` and `DerivedDerived::memfunc3` are both encoded as - `M7DerivedFvdE`. -- `Derived::memfunc4` is encoded as `M7DerivedFvvE`. -- `DerivedDerived::memfunc2` is encoded as `M4BaseFviE`, and the thunk function - for `OtherBase::memfunc2` will be `M9OtherBaseFviE`. +- `Base::memfunc1` and `Derived::memfunc1` are both encoded as `M1vFvvE`. +- `Base::memfunc2`, `OtherBase::memfunc2` `DerivedDerived::memfunc2` + is all encoded as `M1vFviE`. +- `Base::memfunc3`, `Derived::memfunc3`, `DerivedDerived::memfunc3` is encoded + as `M1vFPviE`. +- `Derived::memfunc4` and `DerivedDerived::memfunc4` are both encoded as + `M1vFvdE`. +- `Derived::memfunc5` is encoded as `M1vFvvE`. +- `DerivedDerived::memfunc6`, `DerivedDerived::memfunc7` are encoded as + `M1vFPviE`. +- `DerivedDerived::memfunc8` is encoded as `M1vFRvvE`. +- `DerivedDerived::memfunc9` is encoded as `M1vF10OtherClassvE`. +- `DerivedDerived::memfunc10` is encoded as `M1vFPivE`. == Linker Relaxation From 249d06c2f09e0c7d2c38f4afb9f9e38c88014d9a Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Tue, 16 Jul 2024 22:15:26 +0800 Subject: [PATCH 17/22] Minor word tweaking --- riscv-elf.adoc | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 779dd5eb..a89a37e1 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1633,26 +1633,28 @@ the MD5 hash result of the function signature string. Additionally, here are a few specific rules: -- `main` funciton is using signature of +- `main` function uses the signature of `(int, pointer to pointer to char) returning int` (`FiiPPcE`). - `_dl_runtime_resolve` uses zero for the landing pad. -- 'Y' component in the `` should be ignored. +- The 'Y' component in the `` should be ignored. - `` should be ignored. - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the Itanium C++ ABI <> with the following additional rules: - Member functions should use `v` for `` rather than the actual - class name, e.g. `M1v` rahter than `M3foo` for `class foo`. + class name. For example, use `1v` instead of `3foo` for the `` + in `class foo`. - The return type of a virtual class member function should mangle to - `void *`/`void &` if it is a pointer/reference to a non-primitive + `void *`/`void &`(`Pv`/`Rv`) if it is a pointer/reference to a non-primitive type. A pointer to a pointer of a non-primitive type is not included in this rule. - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). - - Static function is following the rule as non-member function. + - Static functions should follow the rules of non-member functions. - `wchar_t` should match the type of the target platform. For example, on Linux, it uses `int`, so it mangles to `i` rather than `w` for {Cpp}. - Functions with an empty parameter list are treated as explicitly declaring - taking no parameters (having void as the parameter list) + that they take no parameters (having `void` as the parameter list). + NOTE: The special rule for the return type of virtual class member functions is defined to handle covariant return types. @@ -1716,14 +1718,14 @@ The function signatures for the above functions are described below: - `foo` is encoded as `FdiPfE`. - `Base::memfunc1` and `Derived::memfunc1` are both encoded as `M1vFvvE`. -- `Base::memfunc2`, `OtherBase::memfunc2` `DerivedDerived::memfunc2` +- `Base::memfunc2`, `OtherBase::memfunc2`, and `DerivedDerived::memfunc2` is all encoded as `M1vFviE`. -- `Base::memfunc3`, `Derived::memfunc3`, `DerivedDerived::memfunc3` is encoded - as `M1vFPviE`. +- `Base::memfunc3`, `Derived::memfunc3`, and `DerivedDerived::memfunc3` are + encoded as `M1vFPviE`. - `Derived::memfunc4` and `DerivedDerived::memfunc4` are both encoded as `M1vFvdE`. - `Derived::memfunc5` is encoded as `M1vFvvE`. -- `DerivedDerived::memfunc6`, `DerivedDerived::memfunc7` are encoded as +- `DerivedDerived::memfunc6` and `DerivedDerived::memfunc7` are encoded as `M1vFPviE`. - `DerivedDerived::memfunc8` is encoded as `M1vFRvvE`. - `DerivedDerived::memfunc9` is encoded as `M1vF10OtherClassvE`. From f08c7f6eca7b94ac31b99394b8b6fb890f6e7725 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Thu, 18 Jul 2024 16:25:42 +0800 Subject: [PATCH 18/22] Minor tweak for covariant return type --- riscv-elf.adoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index a89a37e1..5d402f40 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1644,10 +1644,10 @@ Additionally, here are a few specific rules: - Member functions should use `v` for `` rather than the actual class name. For example, use `1v` instead of `3foo` for the `` in `class foo`. - - The return type of a virtual class member function should mangle to - `void *`/`void &`(`Pv`/`Rv`) if it is a pointer/reference to a non-primitive - type. A pointer to a pointer of a non-primitive type is not included in - this rule. + - The return type of a virtual class member function should mangle as + `class v` rather than the original type name if it is a pointer or reference + to a non-primitive type. A pointer to a pointer of a non-primitive type + is not included in this rule. - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). - Static functions should follow the rules of non-member functions. - `wchar_t` should match the type of the target platform. For example, on From 25ca3d34f6e51ca26623a4326d27be81903aef1b Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Wed, 25 Sep 2024 17:40:42 +0800 Subject: [PATCH 19/22] Add rule if result is zero --- riscv-elf.adoc | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 5d402f40..4fff1860 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1624,12 +1624,16 @@ mapping symbols helps the disassembler to disassemble instructions correctly. == Label Value Compuatation for Function Signature based Scheme Landing Pad -The label value for the function signature based labeling scheme landing pad is computed from the -hash of the function signature string, which uses the same scheme as the -"Function types" mangling rule defined in the _Itanium {Cpp} ABI_ -<>, and the function signature will use the "Compression" rule -defined in _Itanium {Cpp} ABI_, the value is taken from the lower 20 bits of -the MD5 hash result of the function signature string. +The label value for the function signature-based labeling scheme landing pad is +computed from the hash of the function signature string, which follows the same +scheme as the "Function types" mangling rule defined in the _Itanium {Cpp} ABI_ +<>. The function signature will also use the "Compression" rule +defined in the _Itanium {Cpp} ABI_. + +The label value is derived from the lower 20 bits of the MD5 hash result of the +function signature string. If the lower 20 bits are all zeros, the higher 20 +bits are used. If all 32 bits are zeros, the lower 20 bits of the MD5 hash +result of the string "RISC-V" are used. Additionally, here are a few specific rules: From 9e430692e70f5790499cb5e39d18a822c1148845 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Thu, 26 Sep 2024 14:57:35 +0800 Subject: [PATCH 20/22] Tweak rule for member funciton and covariant type --- riscv-elf.adoc | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 4fff1860..3d50a621 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -781,16 +781,16 @@ And occupies three 16 byte entries for the unlabeled landing pad PLT style: nop ---- -The function signature based landing pad PLT style occupies two 32 byte entries: +The function signature based landing pad PLT style occupies 48 byte entries: [,asm] ---- 1: lpad 0 - sub t1, t1, t3 # shifted .got.plt offset + hdr size + 24 + sub t1, t1, t3 # shifted .got.plt offset + hdr size + 20 auipc t3, %pcrel_hi(.got.plt) addi t0, t3, %pcrel_lo(1b) # &.got.plt l[w|d] t3, %pcrel_lo(1b)(t3) # _dl_runtime_resolve - addi t1, t1, -(hdr size + 24) # shifted .got.plt offset + addi t1, t1, -(hdr size + 20) # shifted .got.plt offset srli t1, t1, log2(32/PTRSIZE) # .got.plt offset l[w|d] t0, PTRSIZE(t0) # link map jr t3 @@ -1647,11 +1647,14 @@ Additionally, here are a few specific rules: additional rules: - Member functions should use `v` for `` rather than the actual class name. For example, use `1v` instead of `3foo` for the `` - in `class foo`. - - The return type of a virtual class member function should mangle as - `class v` rather than the original type name if it is a pointer or reference - to a non-primitive type. A pointer to a pointer of a non-primitive type - is not included in this rule. + in `class foo`. This rule only applies to the `` at the top level + of ``, and does not affect cases where an argument + contains a pointer to a member type. + - The return type of a virtual class member function, if it is a pointer or + reference to a class type, should have its class type mangled as `class v` + rather than the declared class type. Const and volatile type qualifiers + should be ignored if this rule applies. Multi-level pointers or references + are exempted from this rule. - Class destructors should use the signature `void (*)(void*)` (`FvPvE`). - Static functions should follow the rules of non-member functions. - `wchar_t` should match the type of the target platform. For example, on From b8579527b5181e1541934ce6c3a99497f8a2c143 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Thu, 26 Sep 2024 18:00:10 +0800 Subject: [PATCH 21/22] Update rule for the label value for handling zero value --- riscv-elf.adoc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 3d50a621..af7507ed 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1631,9 +1631,11 @@ scheme as the "Function types" mangling rule defined in the _Itanium {Cpp} ABI_ defined in the _Itanium {Cpp} ABI_. The label value is derived from the lower 20 bits of the MD5 hash result of the -function signature string. If the lower 20 bits are all zeros, the higher 20 -bits are used. If all 32 bits are zeros, the lower 20 bits of the MD5 hash -result of the string "RISC-V" are used. +function signature string. If the lower 20 bits are all zeros, use the next +20 bits, and continue using the next 20 bits until a non-zero value is obtained. +If less than 20 bits are available in the final segment, the highest 20 bits of +the MD5 hash result will be used. If all 128 bits are zeros, the lower 20 bits +of the MD5 hash result of the string "RISC-V" are used. Additionally, here are a few specific rules: From 1f96d3ef8dd541cbff7fad9004f5f1602c824343 Mon Sep 17 00:00:00 2001 From: Kito Cheng Date: Mon, 14 Oct 2024 21:25:36 +0800 Subject: [PATCH 22/22] Update rule for the label value for handling zero value Use zero-filled value if remain bits is less than 20 bits --- riscv-elf.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index af7507ed..e46ce1b4 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -1633,9 +1633,9 @@ defined in the _Itanium {Cpp} ABI_. The label value is derived from the lower 20 bits of the MD5 hash result of the function signature string. If the lower 20 bits are all zeros, use the next 20 bits, and continue using the next 20 bits until a non-zero value is obtained. -If less than 20 bits are available in the final segment, the highest 20 bits of -the MD5 hash result will be used. If all 128 bits are zeros, the lower 20 bits -of the MD5 hash result of the string "RISC-V" are used. +If less than 20 bits are available in the final segment, the remaining bits +will be zero-filled to make up 20 bits. If all 128 bits are zeros, the lower +20 bits of the MD5 hash result of the string "RISC-V" are used. Additionally, here are a few specific rules: