High P-code Normalization strategies for Cross-Arch Analysis #8874

YuCheng1122 · 2026-01-13T14:14:31Z

YuCheng1122
Jan 13, 2026

Hi all,

I'm researching cross-architecture malware classification and using High P-code for feature extraction. I've noticed a significant structural divergence between x86 and MIPS even for the exact same C source code, which is causing a major accuracy drop in my models (MIPS accuracy is ~20% lower than x86).

The Problem: Despite using High P-code (via DecompInterface), MIPS retains many architectural artifacts compared to x86.

Example: A simple function call appears as a clean CALL in x86, but manifests as multiple INDIRECT operations or complex pointer arithmetic in MIPS High P-code.

(See attached screenshots for the comparison of signal_handler / sheet_hash_1).

What I'm looking for: Is there a specific decompiler optimization pass or a recommended normalization strategy within the Ghidra API to abstract these architectural differences further? Or is High P-code expected to retain this level of architectural dependency?

Any insights or pointers to relevant classes/analyzers would be greatly appreciated. Thanks!

YuCheng1122 · 2026-01-14T08:13:44Z

YuCheng1122
Jan 14, 2026
Author

Thanks for the insights first!

I observed that INDIRECT and MULTIEQUAL account for ~70% of operations in my MIPS High P-code samples (vs. a much lower percentage in x86).

Given this massive distribution shift, I am considering strictly filtering these two opcodes out to improve cross-arch alignment. Is this a valid strategy to reduce noise? Or is there a recommended normalization approach (e.g., edge contraction or specific analyzer settings) that would better preserve the underlying data-flow logic while still reducing this architectural artifact?

I appreciate any ideas.

0 replies

ABratovic · 2026-01-15T01:04:19Z

ABratovic
Jan 15, 2026

Hi, the Discussions page can be a graveyard at times ☹️
To get more attention to your post I'd suggest changing the title to
High P-Code Normalization for ARM, MIPS, x86, PowerPC Cross-Arch Analysis

Or is High P-code expected to retain this level of architectural dependency?

My guess is that you'll need to enable the undocumented Decompiler Rules and tailor for each processor.
These links may be of help
https://msm.lt/re/ghidra/rulecompile/
https://github.com/LukeSerne/DecompVis
#2991
#8758

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High P-code Normalization strategies for Cross-Arch Analysis #8874

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

High P-code Normalization strategies for Cross-Arch Analysis #8874

Uh oh!

YuCheng1122 Jan 13, 2026

Replies: 2 comments

Uh oh!

YuCheng1122 Jan 14, 2026 Author

Uh oh!

ABratovic Jan 15, 2026

YuCheng1122
Jan 13, 2026

YuCheng1122
Jan 14, 2026
Author

ABratovic
Jan 15, 2026