Skip to content

Expose x86 instruction encoding #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions capstone-rs/src/arch/x86.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use core::convert::From;
use core::convert::TryInto;
use core::{cmp, fmt, slice};

use capstone_sys::cs_x86_encoding;
use capstone_sys::{
cs_ac_type, cs_x86, cs_x86_op, cs_x86_op__bindgen_ty_1, x86_op_mem, x86_op_type,
};
Expand All @@ -24,8 +25,7 @@ use crate::instruction::{RegAccessType, RegId, RegIdInt};
/// Contains X86-specific details for an instruction
pub struct X86InsnDetail<'a>(pub(crate) &'a cs_x86);

// todo(tmfink): expose new types cs_x86__bindgen_ty_1, cs_x86_encoding, x86_xop_cc,
// cs_x86_op::access
// todo(tmfink): expose new types cs_x86__bindgen_ty_1, cs_x86_op::access

impl X86OperandType {
fn new(op_type: x86_op_type, value: cs_x86_op__bindgen_ty_1) -> X86OperandType {
Expand Down Expand Up @@ -105,6 +105,11 @@ impl<'a> X86InsnDetail<'a> {
&self.0.opcode
}

/// Instruction encoding information, e.g. displacement offset, size.
pub fn encoding(&self) -> cs_x86_encoding {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to expose the capstone-sys types with the C naming convention. Also, the actual C type is not very "Rusty" since just uses the convention "0 when irrelevant":

typedef struct cs_x86_encoding {
/// ModR/M offset, or 0 when irrelevant
uint8_t modrm_offset;
/// Displacement offset, or 0 when irrelevant.
uint8_t disp_offset;
uint8_t disp_size;
/// Immediate offset, or 0 when irrelevant.
uint8_t imm_offset;
uint8_t imm_size;
} cs_x86_encoding;

Instead, we want to provide a better interface taking advantage of the Rust type system.
Here's an example of how it could look. I'm guessing at the semantics of the fields--you are going to need to do some experiements. Feel free to use it as a starting point; apply with git apply:

diff --git a/capstone-rs/src/arch/mod.rs b/capstone-rs/src/arch/mod.rs
index a2e5834..9cc097a 100644
--- a/capstone-rs/src/arch/mod.rs
+++ b/capstone-rs/src/arch/mod.rs
@@ -12,6 +12,11 @@ use crate::capstone::Capstone;
 use crate::constants::Endian;
 use crate::error::CsResult;
 
+pub struct InsnOffsetSpan {
+    pub offset: u8,
+    pub size: u8,
+}
+
 macro_rules! define_subset_enum {
     ( [
         $subset_enum:ident = $base_enum:ident
diff --git a/capstone-rs/src/arch/x86.rs b/capstone-rs/src/arch/x86.rs
index 20866ef..21118ff 100644
--- a/capstone-rs/src/arch/x86.rs
+++ b/capstone-rs/src/arch/x86.rs
@@ -22,6 +22,8 @@ pub use crate::arch::arch_builder::x86::*;
 use crate::arch::DetailsArchInsn;
 use crate::instruction::{RegAccessType, RegId, RegIdInt};
 
+use super::InsnOffsetSpan;
+
 /// Contains X86-specific details for an instruction
 pub struct X86InsnDetail<'a>(pub(crate) &'a cs_x86);
 
@@ -81,6 +83,18 @@ pub enum X86OperandType {
 #[derive(Debug, Copy, Clone)]
 pub struct X86OpMem(pub(crate) x86_op_mem);
 
+pub struct X86Encoding {
+    pub modrm_offset: u8,
+    pub disp: Option<InsnOffsetSpan>,
+    pub imm: Option<InsnOffsetSpan>,
+}
+
+impl From<cs_x86_encoding> for X86Encoding {
+    fn from(value: cs_x86_encoding) -> Self {
+        todo!()
+    }
+}
+
 impl<'a> X86InsnDetail<'a> {
     /// Instruction prefix, which can be up to 4 bytes.
     /// A prefix byte gets value 0 when irrelevant.
@@ -106,8 +120,8 @@ impl<'a> X86InsnDetail<'a> {
     }
 
     /// Instruction encoding information, e.g. displacement offset, size.
-    pub fn encoding(&self) -> cs_x86_encoding {
-        self.0.encoding
+    pub fn encoding(&self) -> X86Encoding {
+        self.0.encoding.into()
     }
 
     /// REX prefix: only a non-zero value is relevant for x86_64

self.0.encoding
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a test for the new method below in the test module.

Please try to create examples where some of the offset values are zero and others are non-zero.

/// REX prefix: only a non-zero value is relevant for x86_64
pub fn rex(&self) -> u8 {
self.0.rex
Expand Down