Description
Motivation
The link between core data types (such as Capstone
, Insn
, InsnDetail
, etc.) and architecture-specific data types (the data types under crate::arch
) is not that "explicit" on the typing. For example, to get the instruction ID on the x86 arch, we have to code:
let cs = Capstone::new().x86().build().unwrap();
let insns = cs.disasm_all(CODE, 0x1000).unwrap();
for i in insns.as_ref() {
let insn_id = unsafe {
// Neither typing nor documentation mention this!
std::mem::transmute::<_, X86Insn>(i.id().0)
};
// do something with insn_id
}
which is not easier for beginners to catch.
We can simply resolve the problem above by adding a method arch_insn_id
that returns the corresponding instruction ID enum variants just like the InsnDetail::arch_detail
method:
pub enum ArchInsnId {
X86InsnId(X86Insn),
// other architectures
}
impl<'a> Insn<'a> {
pub fn arch_insn_id(&self) -> ArchInsnId {
// ... code
}
}
This leads to another slightly disturbing problem. We have to match
against the return value of arch_insn_id
to extract the x86-specific instruction ID, given that we are already confident about the architecture. This problem also arises when we call the InsnDetail::arch_detail
method or other methods with a similarly-typed return value.
The Proposal
The proposal posted here is only an (too-)early draft and more details may be missing for further considerations and discussions.
First of all, we can add a new trait that abstracts a specific architecture:
pub trait Arch {
type InsnId;
type InsnDetail;
// ... other stuff
}
Then, we add a generic parameter to Capstone
, Insn
and InsnDetail
that represents the architecture:
pub struct Capstone<A: Arch> {
// ... fields
}
pub struct Insn<'a, A: Arch> {
// ... fields
}
pub struct InsnDetail<'a, A: Arch> {
// ... fields
}
Then, the methods mentioned in the motivation section can be typed in a more straight-forward way:
impl<A: Arch> Capstone<A> {
pub fn insn_detail<'s, 'i: 's>(
&'s self,
insn: &'i Insn<'_, A>
) -> CsResult<InsnDetail<'i, A>> {
// ... code
}
}
impl<'a, A: Arch> Insn<'a, A> {
pub fn id(&self) -> A::InsnId {
// ... code
}
}
impl<'a, A: Arch> InsnDetail<'a, A> {
pub fn arch_detail(&self) -> A::InsnDetail {
// ... code
}
}
No more match
es, as long as we're targeting a specific architecture. Also, beginners can find corresponding architecture-specific implementations just by looking at the typings. The instruction ID problem can be resolved accordingly.
Unresolved Problems
When the target architecture cannot be determined during compile-time (when the disassembler is created by the Capstone::new_raw
method), the generic parameters cannot be set to represent specific architecture. To resolve this problem, maybe we need to introduce a DynamicArch
that implements Arch
and represents the target architecture is determined during runtime.
This proposal may be pre-mature, but I do think that it reveals some (possibly minor) problems.