Skip to content

Proposal: add generic parameters to types to distinguish different data structures on different architectures #118

Open
@Lancern

Description

@Lancern

Motivation

The link between core data types (such as Capstone, Insn, InsnDetail, etc.) and architecture-specific data types (the data types under crate::arch) is not that "explicit" on the typing. For example, to get the instruction ID on the x86 arch, we have to code:

let cs = Capstone::new().x86().build().unwrap();
let insns = cs.disasm_all(CODE, 0x1000).unwrap();
for i in insns.as_ref() {
    let insn_id = unsafe {
        // Neither typing nor documentation mention this!
        std::mem::transmute::<_, X86Insn>(i.id().0)
    };
    // do something with insn_id
}

which is not easier for beginners to catch.

We can simply resolve the problem above by adding a method arch_insn_id that returns the corresponding instruction ID enum variants just like the InsnDetail::arch_detail method:

pub enum ArchInsnId {
    X86InsnId(X86Insn),
    // other architectures
}

impl<'a> Insn<'a> {
    pub fn arch_insn_id(&self) -> ArchInsnId {
        // ... code
    }
}

This leads to another slightly disturbing problem. We have to match against the return value of arch_insn_id to extract the x86-specific instruction ID, given that we are already confident about the architecture. This problem also arises when we call the InsnDetail::arch_detail method or other methods with a similarly-typed return value.

The Proposal

The proposal posted here is only an (too-)early draft and more details may be missing for further considerations and discussions.

First of all, we can add a new trait that abstracts a specific architecture:

pub trait Arch {
    type InsnId;
    type InsnDetail;
    // ... other stuff
}

Then, we add a generic parameter to Capstone, Insn and InsnDetail that represents the architecture:

pub struct Capstone<A: Arch> {
    // ... fields
}

pub struct Insn<'a, A: Arch> {
    // ... fields
}

pub struct InsnDetail<'a, A: Arch> {
    // ... fields
}

Then, the methods mentioned in the motivation section can be typed in a more straight-forward way:

impl<A: Arch> Capstone<A> {
    pub fn insn_detail<'s, 'i: 's>(
        &'s self, 
        insn: &'i Insn<'_, A>
    ) -> CsResult<InsnDetail<'i, A>> {
        // ... code
    }
}

impl<'a, A: Arch> Insn<'a, A> {
    pub fn id(&self) -> A::InsnId {
        // ... code
    }
}

impl<'a, A: Arch> InsnDetail<'a, A> {
    pub fn arch_detail(&self) -> A::InsnDetail {
        // ... code
    }
}

No more matches, as long as we're targeting a specific architecture. Also, beginners can find corresponding architecture-specific implementations just by looking at the typings. The instruction ID problem can be resolved accordingly.

Unresolved Problems

When the target architecture cannot be determined during compile-time (when the disassembler is created by the Capstone::new_raw method), the generic parameters cannot be set to represent specific architecture. To resolve this problem, maybe we need to introduce a DynamicArch that implements Arch and represents the target architecture is determined during runtime.


This proposal may be pre-mature, but I do think that it reveals some (possibly minor) problems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions