Conversation
ailrst
commented
Aug 14, 2025
- update api use for https://github.com/UQ-PAC/aslp-rpc
- move main chunks of code into library: collecting code blocks and folding the lifter over a block
katrinafyi
left a comment
There was a problem hiding this comment.
it's much improved!
generally, the comments are about API design. also, I think that I'd have more tendency to put things into the library. like all the json formatting could be a library function as well, so we could embed the same json into other places (hypothetically).
lib/lib.ml
Outdated
| type rectified_block = { | ||
| ruuid : bytes; | ||
| contents : bytes; | ||
| opcodes : bytes list; |
There was a problem hiding this comment.
this could store an Opcode.t instead, to avoid remembering whether endian has been flipped
lib/lib.ml
Outdated
|
|
||
| let b64_of_uuid uuid = Base64.encode_exn (Bytes.to_string uuid) | ||
|
|
||
| let fold_opcode_list_with_address address ops lift = |
There was a problem hiding this comment.
I feel like this function is over-specialised. it should just return a (int, Opcode.t) list of addresses+opcode, and the caller can fold it if they wish
same for the rectified block folding variant
lib/lib.ml
Outdated
| ( i + opcode_length, | ||
| lift i | ||
| (* have already swapped the byte order to big endian just load opcode in machine's format*) | ||
| (Opcode.of_le_bytes (String.of_bytes op)) )) |
There was a problem hiding this comment.
this is outside of this PR but it looks like the of_le_bytes is using a string as a byte sequence. it might be good to use different types to distinguish strings of "0x01939471" vs strings which have unreadable byte data
There was a problem hiding this comment.
Yeah its a bit confusing, but using like type be_bytes = string doesn't help much as the tooling seems to generally see through non-opaque types (e.g. showing the real type in the documentation) which might make sense. I don't think it can be an opaque type as its part of the interface? string I think is the right representation as its just an immutable byte sequence.
|
|
||
| (** Extrect the code blocks from a module and convert them to rectified_blocks | ||
| of absolute-addressed opcode sequences. *) | ||
| let code_blocks_of_module (m : Module.t) = |
There was a problem hiding this comment.
code_block_of_module should not internally rectify the block for us. it can return a codeblock + content_block instead, and the user can call rectify if they wish.
this is because atm, exporting rectify_block in this lib is not very useful - where would the library user get a content_block from?
|
Comments should be addressed now, thanks |
katrinafyi
left a comment
There was a problem hiding this comment.
thanks :) just a couple more questions
|
|
||
| (** Extrect the code blocks from a module and convert them to rectified_blocks | ||
| of absolute-addressed opcode sequences. *) | ||
| let code_blocks_of_module (m : Module.t) = |
| let fold_map_opcode_list_with_address address ops = | ||
| snd | ||
| @@ List.fold_left_map (fun i op -> (i + opcode_length, (i, op))) address ops | ||
| in |
There was a problem hiding this comment.
optionally, this doesn't need a let decl
and/or
optionally, use List.mapi instead of fold_left_map
| (* have already swapped the byte order to ensure big endian just load opcode in host | ||
| machine's endianness *) | ||
| let opcodes = | ||
| List.map (fun op -> Opcode.of_le_bytes (String.of_bytes op)) opcodes |
There was a problem hiding this comment.
wait I might be crazy but the comment says "swapped to big endian" and the code agrees with this, but we use Opcode.of_le_bytes?? why is this le and not be? and is the code host endian dependent?
There was a problem hiding this comment.
Yes because Opcode.t stores it as an i32 it assumes of_le_bytes directly blits the bytes into an i32, as it has already been endian-swapped.
There was a problem hiding this comment.
I'm terribly confused. I don't think users of Opcode should ever have to think about its i32 representation or the endianness of Opcode.t, whatever that may be.
you should be able to use the functions as they are named, and from that perspective it doesn't make sense. if of_le_bytes is meant to be a direct byte copy then it should be named something different, we can't assume that given its current name. (but I would say that using a direct byte copy is suspicious in any case)
There was a problem hiding this comment.
I think not supporting big-endian machines is fine, this unusual interpretation is just because the byte order has already been swapped if it was big-endian. of_le_byes is probably only a direct byte copy on little endian machines. The solution is to just use of_be_bytes if the module's endianness is big, rather than swap the byte order.
There was a problem hiding this comment.
But I think converting out of Opcode.t to the string asli expects is also going to be wrong on big endian machines anway. I think it stores it in big endian format so that printf "%x" Opcode.t is the asli format. Maybe of_be_bytes and of_le_bytes need to be swapped too lol.
There was a problem hiding this comment.
I'm sorry, this is deeply unsatisfying and I will block this PR for this reason (though ik it's an upstream problem - I guess we should've reviewed UQ-PAC/aslp-rpc#16 first).
to me, the Opcode abstraction is meant to abstract over these details so we never have endianness confusion again. if I have to think about my machine's endian when using Opcode, then it has failed to meet its design goals and should be changed.