Skip to content

Comments

refactor implementation into library#31

Draft
ailrst wants to merge 7 commits intomainfrom
modularised
Draft

refactor implementation into library#31
ailrst wants to merge 7 commits intomainfrom
modularised

Conversation

@ailrst
Copy link
Contributor

@ailrst ailrst commented Aug 14, 2025

Copy link
Member

@katrinafyi katrinafyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's much improved!

generally, the comments are about API design. also, I think that I'd have more tendency to put things into the library. like all the json formatting could be a library function as well, so we could embed the same json into other places (hypothetically).

lib/lib.ml Outdated
type rectified_block = {
ruuid : bytes;
contents : bytes;
opcodes : bytes list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could store an Opcode.t instead, to avoid remembering whether endian has been flipped

lib/lib.ml Outdated

let b64_of_uuid uuid = Base64.encode_exn (Bytes.to_string uuid)

let fold_opcode_list_with_address address ops lift =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function is over-specialised. it should just return a (int, Opcode.t) list of addresses+opcode, and the caller can fold it if they wish

same for the rectified block folding variant

lib/lib.ml Outdated
( i + opcode_length,
lift i
(* have already swapped the byte order to big endian just load opcode in machine's format*)
(Opcode.of_le_bytes (String.of_bytes op)) ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is outside of this PR but it looks like the of_le_bytes is using a string as a byte sequence. it might be good to use different types to distinguish strings of "0x01939471" vs strings which have unreadable byte data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its a bit confusing, but using like type be_bytes = string doesn't help much as the tooling seems to generally see through non-opaque types (e.g. showing the real type in the documentation) which might make sense. I don't think it can be an opaque type as its part of the interface? string I think is the right representation as its just an immutable byte sequence.


(** Extrect the code blocks from a module and convert them to rectified_blocks
of absolute-addressed opcode sequences. *)
let code_blocks_of_module (m : Module.t) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code_block_of_module should not internally rectify the block for us. it can return a codeblock + content_block instead, and the user can call rectify if they wish.

this is because atm, exporting rectify_block in this lib is not very useful - where would the library user get a content_block from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailrst thoughts? ^

@ailrst
Copy link
Contributor Author

ailrst commented Aug 18, 2025

Comments should be addressed now, thanks

Copy link
Member

@katrinafyi katrinafyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks :) just a couple more questions


(** Extrect the code blocks from a module and convert them to rectified_blocks
of absolute-addressed opcode sequences. *)
let code_blocks_of_module (m : Module.t) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailrst thoughts? ^

let fold_map_opcode_list_with_address address ops =
snd
@@ List.fold_left_map (fun i op -> (i + opcode_length, (i, op))) address ops
in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optionally, this doesn't need a let decl

and/or

optionally, use List.mapi instead of fold_left_map

(* have already swapped the byte order to ensure big endian just load opcode in host
machine's endianness *)
let opcodes =
List.map (fun op -> Opcode.of_le_bytes (String.of_bytes op)) opcodes
Copy link
Member

@katrinafyi katrinafyi Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait I might be crazy but the comment says "swapped to big endian" and the code agrees with this, but we use Opcode.of_le_bytes?? why is this le and not be? and is the code host endian dependent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because Opcode.t stores it as an i32 it assumes of_le_bytes directly blits the bytes into an i32, as it has already been endian-swapped.

Copy link
Member

@katrinafyi katrinafyi Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm terribly confused. I don't think users of Opcode should ever have to think about its i32 representation or the endianness of Opcode.t, whatever that may be.

you should be able to use the functions as they are named, and from that perspective it doesn't make sense. if of_le_bytes is meant to be a direct byte copy then it should be named something different, we can't assume that given its current name. (but I would say that using a direct byte copy is suspicious in any case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not supporting big-endian machines is fine, this unusual interpretation is just because the byte order has already been swapped if it was big-endian. of_le_byes is probably only a direct byte copy on little endian machines. The solution is to just use of_be_bytes if the module's endianness is big, rather than swap the byte order.

Copy link
Contributor Author

@ailrst ailrst Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think converting out of Opcode.t to the string asli expects is also going to be wrong on big endian machines anway. I think it stores it in big endian format so that printf "%x" Opcode.t is the asli format. Maybe of_be_bytes and of_le_bytes need to be swapped too lol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, this is deeply unsatisfying and I will block this PR for this reason (though ik it's an upstream problem - I guess we should've reviewed UQ-PAC/aslp-rpc#16 first).

to me, the Opcode abstraction is meant to abstract over these details so we never have endianness confusion again. if I have to think about my machine's endian when using Opcode, then it has failed to meet its design goals and should be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants