refactor implementation into library by ailrst · Pull Request #31 · UQ-PAC/gtirb-semantics

ailrst · 2025-08-14T00:33:09Z

update api use for https://github.com/UQ-PAC/aslp-rpc
move main chunks of code into library: collecting code blocks and folding the lifter over a block

katrinafyi

it's much improved!

generally, the comments are about API design. also, I think that I'd have more tendency to put things into the library. like all the json formatting could be a library function as well, so we could embed the same json into other places (hypothetically).

katrinafyi · 2025-08-14T01:20:14Z

lib/lib.ml

+type rectified_block = {
+  ruuid : bytes;
+  contents : bytes;
+  opcodes : bytes list;


this could store an Opcode.t instead, to avoid remembering whether endian has been flipped

katrinafyi · 2025-08-14T01:21:17Z

lib/lib.ml

+
+let b64_of_uuid uuid = Base64.encode_exn (Bytes.to_string uuid)
+
+let fold_opcode_list_with_address address ops lift =


I feel like this function is over-specialised. it should just return a (int, Opcode.t) list of addresses+opcode, and the caller can fold it if they wish

same for the rectified block folding variant

katrinafyi · 2025-08-14T01:23:26Z

lib/lib.ml

+         ( i + opcode_length,
+           lift i
+             (* have already swapped the byte order to big endian just load opcode in machine's format*)
+             (Opcode.of_le_bytes (String.of_bytes op)) ))


this is outside of this PR but it looks like the of_le_bytes is using a string as a byte sequence. it might be good to use different types to distinguish strings of "0x01939471" vs strings which have unreadable byte data

Yeah its a bit confusing, but using like type be_bytes = string doesn't help much as the tooling seems to generally see through non-opaque types (e.g. showing the real type in the documentation) which might make sense. I don't think it can be an opaque type as its part of the interface? string I think is the right representation as its just an immutable byte sequence.

katrinafyi · 2025-08-14T01:26:14Z

lib/lib.ml

+
+(** Extrect the code blocks from a module and convert them to rectified_blocks
+    of absolute-addressed opcode sequences. *)
+let code_blocks_of_module (m : Module.t) =


code_block_of_module should not internally rectify the block for us. it can return a codeblock + content_block instead, and the user can call rectify if they wish.

this is because atm, exporting rectify_block in this lib is not very useful - where would the library user get a content_block from?

@ailrst thoughts? ^

test/regression.t

bin/dune

ailrst · 2025-08-18T07:24:55Z

Comments should be addressed now, thanks

katrinafyi

thanks :) just a couple more questions

katrinafyi · 2025-08-18T07:29:31Z

lib/lib.ml

+
+(** Extrect the code blocks from a module and convert them to rectified_blocks
+    of absolute-addressed opcode sequences. *)
+let code_blocks_of_module (m : Module.t) =


@ailrst thoughts? ^

katrinafyi · 2025-08-18T07:30:34Z

lib/lib.ml

+  let fold_map_opcode_list_with_address address ops =
+    snd
+    @@ List.fold_left_map (fun i op -> (i + opcode_length, (i, op))) address ops
+  in


optionally, this doesn't need a let decl

and/or

optionally, use List.mapi instead of fold_left_map

katrinafyi · 2025-08-18T07:34:49Z

lib/lib.ml

+  (* have already swapped the byte order to ensure big endian just load opcode in host 
+     machine's endianness *)
+  let opcodes =
+    List.map (fun op -> Opcode.of_le_bytes (String.of_bytes op)) opcodes


wait I might be crazy but the comment says "swapped to big endian" and the code agrees with this, but we use Opcode.of_le_bytes?? why is this le and not be? and is the code host endian dependent?

Yes because Opcode.t stores it as an i32 it assumes of_le_bytes directly blits the bytes into an i32, as it has already been endian-swapped.

I'm terribly confused. I don't think users of Opcode should ever have to think about its i32 representation or the endianness of Opcode.t, whatever that may be.

you should be able to use the functions as they are named, and from that perspective it doesn't make sense. if of_le_bytes is meant to be a direct byte copy then it should be named something different, we can't assume that given its current name. (but I would say that using a direct byte copy is suspicious in any case)

I think not supporting big-endian machines is fine, this unusual interpretation is just because the byte order has already been swapped if it was big-endian. of_le_byes is probably only a direct byte copy on little endian machines. The solution is to just use of_be_bytes if the module's endianness is big, rather than swap the byte order.

But I think converting out of Opcode.t to the string asli expects is also going to be wrong on big endian machines anway. I think it stores it in big endian format so that printf "%x" Opcode.t is the asli format. Maybe of_be_bytes and of_le_bytes need to be swapped too lol.

I'm sorry, this is deeply unsatisfying and I will block this PR for this reason (though ik it's an upstream problem - I guess we should've reviewed UQ-PAC/aslp-rpc#16 first).

to me, the Opcode abstraction is meant to abstract over these details so we never have endianness confusion again. if I have to think about my machine's endian when using Opcode, then it has failed to meet its design goals and should be changed.

ailrst added 4 commits August 13, 2025 18:31

update lifter api and separate into lib

6323323

fix

76bc630

cram test

05f78d5

fix

800f0b8

katrinafyi reviewed Aug 14, 2025

View reviewed changes

ailrst added 3 commits August 18, 2025 17:14

fix comments

f95e3c5

pretty json

d5080ca

flag to disable timer

fc52334

katrinafyi approved these changes Aug 18, 2025

View reviewed changes


		let b64_of_uuid uuid = Base64.encode_exn (Bytes.to_string uuid)

		let fold_opcode_list_with_address address ops lift =

Comments

Conversation

ailrst commented Aug 14, 2025

Uh oh!

katrinafyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ailrst commented Aug 18, 2025

Uh oh!

katrinafyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

katrinafyi Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

katrinafyi Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ailrst Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

katrinafyi Aug 18, 2025 •

edited

Loading

katrinafyi Aug 19, 2025 •

edited

Loading

ailrst Aug 19, 2025 •

edited

Loading