Skip to content

RISCV: compressed insn treats the uncompressed version as an alias#2959

Draft
slate5 wants to merge 1 commit into
capstone-engine:nextfrom
slate5:fix/uncompressed-alias
Draft

RISCV: compressed insn treats the uncompressed version as an alias#2959
slate5 wants to merge 1 commit into
capstone-engine:nextfrom
slate5:fix/uncompressed-alias

Conversation

@slate5

@slate5 slate5 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This draft is connected to #2923.
It's a very simple change. All compressed instructions that have an uncompressed counterpart use that instruction as an alias. A better solution would be to include a new uncompressed_id instead of an alias, but that would require more modifications (cs_insn)...

Test plan

...

Closing issues

...

@github-actions github-actions Bot added the RISCV Arch label Jun 9, 2026
@slate5 slate5 force-pushed the fix/uncompressed-alias branch from 646c2ac to bc9042a Compare June 9, 2026 03:41

@Rot127 Rot127 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests of course.
Otherwise I think we can start with that one.

@slate5

slate5 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Will do, but let's see what @moste00 says

@moste00

moste00 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hello @Rot127 @slate5, thanks for notifying me of this.

So a couple of problems that I see:

1- We're still keeping the "(un)compressing is the same thing as aliasing" paradigm, and the flags of noalias and noaliascompressed with their old meanings. Didn't we talk about removing them because LLVM doesn't see compression as an aliasing ? will we remove them in a later PR or just give up and keep them for now?

2- The logic assigns a normal CS instruction ID (the ID of the uncompressed instruction) to the alias ID, but are the 2 ID spaces mutually exclusive? What if they intersect in some IDs ?

3- While I didn't run this yet as I'm not on my laptop now, the details logic will probably not solve my problem because when it prints the details for if(is_uncompressed) path, it prints for the MI struct, not McInstr. MI is the original compressed instruction, so this will print the original details of the compressed instruction. I want the details of the uncompressed instruction.

All in all: I think the main problem here is that is that we're just keeping most of the logic as-is and patching 1 problem ad-hoc instead of refactoring enough, but if both of you think we're not ready for the somewhat complicated approach in the other PR, and if (3) is addressed without breaking any existing tests, I'm okay with this PR.

Personally, I would like the other approach I described in the other PR, it just exhaustively lists all possibilities and lays out the exact thing to do for each. I might be biased though and if you think it's unclear or too verbose a description then so will capstone users. Maybe I just mis-explained it ?

@slate5

slate5 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Hi @moste00,

1- We're still keeping the "(un)compressing is the same thing as aliasing" paradigm...

Yes, that is what I also addressed. Maybe it would be better to have a separate cs_insn element for this specifically (eg, uncompressed_id). On the other hand, if we start complicating with noalias, nocompressed, keep this or that, we have to expect that the user is an expert in riscv. I will be the first to forget this "mess" with compressed insn, and its "aliases" in a week. From the perspective of an asm writer, it makes sense like this because in your source file you put addi t0, t0, -1 and the assembler translates that into c.addi (same reg is used and imm is in range of signed 2^6) or not if imm is bigger or there is no riscv C extension on system/assembler. What I'm pointing out with this is that addi is kinda an alias either way because until the assembler decides its encoding, it stands as a logical representation of some functionality and not a raw instruction.

I'm making this long... I lean towards fewer features (flags) that tune options without the explicit need of end users. It's clutter, and i'm not sure if there is demand for it. To me, this differentiation between +noaliascompressed and +noalias (as it is now) is irrelevant. Either i want aliases or not, i don't care if insn is compressed. As you said in the other PR, "Or you can do uncompression alone but stop at just the alias mapping, using +noaliascompressed" makes more sense to redefine the purpose of it (it would stop the logic of this PR, no uncompressed aliases, e.g., c.addi doesn't have an alias ID).

2- The logic assigns a normal CS instruction ID (the ID of the uncompressed instruction) to the alias ID, but are the 2 ID spaces mutually exclusive? What if they intersect in some IDs ?

They are in the same enum riscv_insn, separated by RISCV_INS_ENDING and RISCV_INS_ALIAS_BEGIN, if that is your question

3- While I didn't run this yet as I'm not on my laptop now...

But if you want real details, specify -r or what am i missing?

I don't think this PR solution is great, but i'm trying to stick to occam's razor :) And yes, I think if you explain to me a bit simpler in terms of what the end goal of your approach is, and what problem it solves (or what benefit it gives), it would help me a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RISCV Arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants