Skip to content

Add %lpad_hash for Zicfilp#99

Open
kito-cheng wants to merge 1 commit intomainfrom
lpad_hash
Open

Add %lpad_hash for Zicfilp#99
kito-cheng wants to merge 1 commit intomainfrom
lpad_hash

Conversation

@kito-cheng
Copy link
Collaborator

NOTE: This PR will keep in draft state until toolchain PoC and psABI spec ready.


Zicfilp has provided two labeling schemes: simple and complex (also known as function signature-based). The simple scheme uses an lpad with a constant 0, which does not require any hashing mechanism. In contrast, the complex labeling scheme computes the MD5 hash from the signature string.

Filling up an MD5 hash value is straightforward for compilers, but it is non-trivial work for humans to maintain. Therefore, we have added new assembler modifiers to compute this value.

See also: riscv/riscv-cfi#151

Zicfilp has provided two labeling schemes: simple and complex (also known as
function signature-based). The simple scheme uses an lpad with a constant 0,
which does not require any hashing mechanism. In contrast, the complex
labeling scheme computes the MD5 hash from the signature string.

Filling up an MD5 hash value is straightforward for compilers, but it is
non-trivial work for humans to maintain. Therefore, we have added new assembler
modifiers to compute this value.

See also: riscv/riscv-cfi#151
Push/pop current options to/from the options stack.

## Assembler Relocation Functions
## Assembler Modifiers
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mylai-mtk
Copy link

Actually, I wonder if this would turn out to be useful, since the function signature string format is (for now) drafted to be the mangled string of the function type, which IMHO is not human-friendly enough to be written/read correctly by a human programmer. Also, AFAIK, there's no convenient tool to mangle function signatures to strings, despite the widespread usage of c++filt for demangling. Under this impression, I doubt there would be people trying use this assembly modifier due to the difficulty of producing those function signature strings.

To get a feel: lpad %lpad_hash("FiiPPcE") vs lpad 0xe088e. While the %lpad_hash() form may be easier to read for someone familiar with the mangling rule (this advantage may be useful in the rare scenario of reviewing assemblies by compiler experts), I don't think it's comprehensible for average assembly developers, not to mention to write it out without the help of tools.

Though I doubt the usefulness of this %lpad_hash() modifier, I do agree that we need a method to ease the pain of obtaining correct label values. Here I propose a possible, but not really ideal method: In my own toolchain prototyping process, I emit symbols containing the label values for all C/C++ functions, so I can easily extract the resulting compiler-generated labels by looking into symbol tables. This emission was originally intended to facilitate function-signature-based PLT generation in linkers, but it turns out that I use it a lot to when patching musl libc assemblies with lpad insns. This "compile-then-inspect-binary" approach is far from straightforward and beautiful, but at least I can trust the values obtained to be correct, if I don't make a mistake when copying them 😜

@ved-rivos
Copy link

ved-rivos commented May 17, 2024

While the %lpad_hash() form may be easier to read for someone familiar with the mangling rule (this advantage may be useful in the rare scenario of reviewing assemblies by compiler experts), I don't think it's comprehensible for average assembly developers, not to mention to write it out without the help of tools.

Should we add a modifier that takes the function prototype string instead of mangled string as input - like %lpad_hash_proto("void (*f)(int, char)").

@mylai-mtk
Copy link

Should we add a modifier that takes the function prototype string instead of mangled string as input - like %lpad_hash_proto("void (*f)(int, char)").

I guess this would be a huge effort for assemblers to implement, since they do not know anything about the C/C++ language, so the corresponding C parser and C++ mangler would need to be pulled in, which is a big cost for a minor convenient feature like this.

Comment on lines +421 to +422
The hash result is obtained from the lower 20 bits of the MD5 result of the
string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, this is not a complete description of the hashing algorithm, so I suggest you either explain more, or remove this message

@kito-cheng kito-cheng marked this pull request as ready for review May 12, 2025 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants