Skip to content

Conversation

@letypequividelespoubelles
Copy link
Collaborator

@letypequividelespoubelles letypequividelespoubelles commented Dec 3, 2025

Note

Introduce a zkasm-based rlputils module (integer, bytestring, bytes32, data_pricing), wire it into RLP TXN, add needed constants/utilities, and remove the old Lisp implementation and related lookups/tables.

  • RLP Utils (zkasm):
    • Add rlputils/rlputils.zkasm implementing RLP_UTILS_INST_INTEGER, RLP_UTILS_INST_BYTE_STRING_PREFIX, RLP_UTILS_INST_BYTES32, RLP_UTILS_INST_DATA_PRICING.
    • Add helper libs: util/byte_counting.zkasm, util/byte_size.zkasm, util/padding.zkasm.
    • Minor typo fix in util/bit_shl.zkasm comments.
  • Wiring/Build:
    • Update Makefile to use rlputils/rlputils.zkasm (Cancun/Prague/Osaka builds).
    • Update lookup rlptxn/cancun/lookups/rlptxn_into_rlputils.lisp to new flat rlputils interface (INST, DATA_1..8).
  • Constants:
    • Extend constants/evm.zkasm with RLP opcodes/prefixes and TWO_FIVE_SIX_POWER_FIFTEEN.
  • Removal/Cleanup:
    • Delete legacy Cancun rlputils Lisp columns/constants/constraints/lookups and rlputils/cancun/maxlog.zkasm.
    • Remove reftables/*/power.lisp and related lookups into wcp/power.

Written by Cursor Bugbot for commit cc38b34. This will update automatically on new commits. Configure here.

Signed-off-by: F Bojarski <[email protected]>
@letypequividelespoubelles letypequividelespoubelles linked an issue Dec 3, 2025 that may be closed by this pull request
@letypequividelespoubelles letypequividelespoubelles marked this pull request as ready for review December 3, 2025 13:03
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
bytestring:
;; check for empty bytestring:
if DATA_4 !=0 goto bytestring_non_empty_case
if DATA_1 !=0 goto exit_fail
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Inverted condition for empty bytestring check

The condition if DATA_4 !=0 goto bytestring_non_empty_case appears inverted based on the documentation. Line 92 states DATA_4 = empty_byte_string = DATA_1 == 0, meaning DATA_4 = 1 when the string is empty. However, the code jumps to bytestring_non_empty_case when DATA_4 is non-zero (empty), and falls through to fail on line 102 when DATA_4 is zero (non-empty). This causes empty byte strings to be processed as non-empty and non-empty byte strings to trigger exit_fail.

Fix in Cursor Fix in Web

;; case 128 <= integer <= 2 ^128
if DATA_5 != 1 goto exit_fail ;; sanity check that rlp_prefix_required = true
DATA_8 = log256_u128(DATA_2)
DATA_6 = (RLP_PREFIX_INT_SHORT + DATA_8) * TWO_FIVE_SIX_POWER_SIXTEEN
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Off-by-one error in integer RLP prefix calculation

The log256_u128 function returns a 0-indexed byte position (byte_size - 1), but DATA_8 is documented as leading_limb_byte_size. For example, log256_u128(128) returns 0, but 128 requires 1 byte. This causes the RLP prefix at line 67 to compute (0x80 + 0) instead of (0x80 + 1), producing prefix 0x80 when it should be 0x81. The same issue affects line 81-82 for two-limb integers. The assignments need + 1 to convert from byte position to byte size.

Additional Locations (1)

Fix in Cursor Fix in Web

var size u4
size = log256_u128(DATA_1)
DATA_8 = size + 1
rlp_prefix_tmp = rpad_128(RLP_PREFIX_INT_LONG + (DATA_3 * RLP_FROM_INT_TO_LIST) + size)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Bytestring long prefix uses wrong byte size variable

Line 131 correctly computes DATA_8 = size + 1 to convert log256 output to byte count, but line 132 uses size instead of DATA_8 (or size + 1) in the RLP long prefix calculation. This causes the first prefix byte to be RLP_PREFIX_INT_LONG + size instead of RLP_PREFIX_INT_LONG + size + 1, resulting in an off-by-one error in the encoded prefix for long byte strings.

Fix in Cursor Fix in Web

Copy link
Collaborator

@DavePearce DavePearce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good. I would test the subfunctions count_nz_128 and rpad_128 etc in isolation. Also be good to have more detailed descriptions for "top-level" functions (e.g. for count_nz_128 but not needed for count_nz_64, etc).

@@ -0,0 +1,42 @@
;; Right or Left Pad an input to a given size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some more informative description of what it does. Since our current constraints are almost entirely uncommented, I think it would be good to start with a strong culture of commenting in zkasm.

if remaining != 0 goto exit_shift
goto exit_no_change
exit_shift:
res = rpad_128(256 * remaining)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, using recursion here seems good to me. It means you end up with upto 16 rows for each input (though some can be reused between different inputs of course). I was wondering whether there is a decomposition approach which could work, but didn't figure it out yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it can be done using a recursive decomposition (i.e. rather than the iterative style you have here). You split it into two pieces: (1) count number of leading zeros; (2) shift using existing approach. It might reduce the overall number of rows. But anyway, perhaps that's for another day. What you have here works well!

;; The interface with RLP* modules is as follow
;; TODO, insert interface

;; The RLP_UTILS module is a helper module to compute some RLP-ifications:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, what you have here seems reasonable enough to me. It starts to feel like a long function, so we could maybe break it up. But, I don't think its actually necessary (my oob implementation is longer than this!)

The only other thing I noticed is that you declared var variables in certain specific places, like this:

bytestring_prefix_required:
  DATA_7 = 0
  var input_size_lt56 u1

This is totally fine. BUT, it is worth noting that go-corset does not "reuse" registers at all. For example, say this:

fn f(x u8) -> (r u8) {
  if x == 0 goto exit_0
  var c u1
  c, r = x + 1
  return
exit_0:
  var b u1
  b,r = x - 1
  return
}

Here, we end up with two columns b and c in the final constraints, when actually we could have just had one by doing this:

n f(x u8) -> (r u8) {
  var cb u1
  if x == 0 goto exit_0
  var c u1
  cb, r = x + 1
  return
exit_0:
  var b u1
  cb, r = x - 1
  return
}

I hope that, in the future, go-corset will do this kind of simplification for us (kinda like perspectives I suppose).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that, in the future, go-corset will do this kind of simplification for us (kinda like perspectives I suppose).

That would be nice yes :)

var low_useless u1
var size_next u4
high, low = limb
low_useless, size_next = size - 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I normally use either c (for carry) when adding or b (for borrow) when subtracting. It comes from hardware "flags registers".

Copy link
Collaborator

@DavePearce DavePearce Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see also this line low_useless, size_next = size - 8 doesn't compile because it considers all cases for size, plus requires a sign bit for the subtraction. Hence, you need enough bits to encode 31 - 8 plus the sign bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding you want I think is this:

low_useless, tmp, size_next = size - 8

Where tmp is an unused u1 column (which in fact is always 0).

return_high:
var b u1
var size_trm u4
b, size_trm = size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its a lot annoying to do this kind of trimming. In the future, we could have a cast operator (since in this case you already know its < 16); and/or we could support a shorthand syntax like _, size_trm = size which just adds the needed column for us.

Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Signed-off-by: F Bojarski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

asm rlputils

3 participants