Implementation of feat109: add a trait to serialize instruction data #131

Chazzzzzzz · 2025-04-13T21:00:25Z

This implements #109 for Token program

If things look good I will implement this for all other programs.

I am not using Deref because it requires to implement Deref for all Struct like ApproveChecked/Burn/FreezeAccount

"pub trait InstructionData: Deref<Target = [u8]> {}"

febo · 2025-04-15T06:22:37Z

programs/token/src/instructions/approve.rs

+        // Instruction data layout:
+        // -  [0]: instruction discriminator (1 byte, u8)
+        // -  [1..9]: amount (8 bytes, u64)
+        let mut instruction_data = Box::new([UNINIT_BYTE; 9]);


We need to avoid using a Box here since it will be expensive. You can use: core::mem::MaybeUninit<[u8; 9]>.

there will be a lifetime issue using stack alone. one way to do this is by specifying this function #[inline(always)] so that the variable will always be on the same stack with the caller. but this is too hacky wdyt?

assuming using the bump allocator, allocating a fixed size may only increment the free pointer, like 2 ebpf instructions?

Based on some quick tests that I did, using a Box is a bit more expensive (#126) than just the pointer increment.

We can use a similar approach than the ReturnData: https://github.com/anza-xyz/pinocchio/blob/main/sdk/pinocchio/src/cpi.rs#L316-L325

assuming using the bump allocator, allocating a fixed size may only increment the free pointer, like 2 ebpf instructions?

this is interesting. one overhead in the example would be zero initialization. i cannot think of any other operation a simple alloc would do. can verify this later.

update: confirmed. beside zero initialization, alignment stuff and pointer update, the alloc crate itself emits tons of rust error unwinding like we had in #85. using alloc crate should be discouraged if we want max performance.

We can use a similar approach than the ReturnData: https://github.com/anza-xyz/pinocchio/blob/main/sdk/pinocchio/src/cpi.rs#L316-L325

In the example, seems only get_return_data generates the data, are you suggesting that we do not parse pointer around, instead just return [u8] back?

If we want to parse pointer around, the data has to live on the heap somewhere.

@febo

febo · 2025-04-15T06:23:03Z

programs/token/src/instructions/approve.rs

+        // Set amount as u64 at offset [1..9]
+        write_bytes(&mut instruction_data[1..], &self.amount.to_le_bytes());
+
+        unsafe { from_raw_parts(instruction_data.as_ptr() as _, 9) }


Then here you can have unsafe { instruction_data.assume_init() }.

febo · 2025-04-15T06:26:27Z

programs/token/src/lib.rs

@@ -15,3 +15,7 @@ fn write_bytes(destination: &mut [MaybeUninit<u8>], source: &[u8]) {
        d.write(*s);
    }
 }
+
+pub trait InstructionData {
+    fn get_instruction_data(&self) -> &[u8];


Maybe we can name this get.

I think this could be a bit confusing with just get alone. A separate idea could be perhaps using the AsRef trait instead?

ilya-bobyr · 2025-06-17T00:43:43Z

programs/token/src/lib.rs

+pub trait InstructionData {
+    fn get_instruction_data(&self) -> &[u8];


I might be missing some context.
But this API seems inconsistent between the definition and the implementation/usage.
And as a result, at the usage location, there are unsafe blocks that introduce undefined behavior, as they have "use after free" issues.

As written, this function says that the returned slice has the same lifetime as self.
Which normally¹ means that the returned reference points to data inside self.
And, the compiler will use this to enforce the lifetime of self.
In particular, it would be required for self to live at least as long as the returned reference.

Yet, in the current implementation, all the get_instruction_data() methods return reference to freed memory. For example:

impl InstructionData for Transfer<'_> { #[inline] fn get_instruction_data(&self) -> &[u8] { // Instruction data layout: // - [0]: instruction discriminator (1 byte, u8) // - [1..9]: amount (8 bytes, u64) let mut instruction_data = Box::new([UNINIT_BYTE; 9]); // Set discriminator as u8 at offset [0] write_bytes(&mut instruction_data.as_mut_slice(), &[3]); // Set amount as u64 at offset [1..9] write_bytes(&mut instruction_data[1..9], &self.amount.to_le_bytes()); unsafe { from_raw_parts(instruction_data.as_ptr() as _, 9) } } }

instruction_data points to a dynamically allocated block of memory, which will be freed at the end of the function execution.
&[u8] constructed in the last expression will point to freed memory.

This may work, as Solana programs currently uses bump allocator for the heap, and it never actually allocates new objects on top of the freed memory. But it is still an undefined behavior.

In general, I would recommend against using unsafe if possible.
While in the existing code unsafe is used to avoid zero initialization, in the new code there is no explicit justification as to why a new unsafe block is necessary.
Ideally, there should be a good reason to disable provided safety mechanisms.

Note that the existing code is correct, because a reference in the data field of an Instruction instance is alive longer than the lifetime of that Instruction instance:

// Instruction data layout: // - [0]: instruction discriminator (1 byte, u8) // - [1..9]: amount (8 bytes, u64) let mut instruction_data = [UNINIT_BYTE; 9]; // Set discriminator as u8 at offset [0] write_bytes(&mut instruction_data, &[3]); // Set amount as u64 at offset [1..9] write_bytes(&mut instruction_data[1..9], &self.amount.to_le_bytes()); // ### Instruction instance is constructed here. let instruction = Instruction { program_id: &crate::ID, accounts: &account_metas, data: unsafe { from_raw_parts(instruction_data.as_ptr() as _, 9) }, }; invoke_signed(&instruction, &[self.from, self.to, self.authority], signers) // ### Instruction instance is destroyed here. // ### Array pointed to by `instruction_data` is freed here, when the function stack is unrolled. }

Depending on what are you trying to do, one possible way to express your intent might be to return a Vec<u8> or a Box<[u8]>.
This would correctly transfer the ownership of the allocated buffer to the caller.
It might be possible to optimize memory allocation to avoid too much work.
But you would need to do a dynamic allocation, and, I do not know if the compiler is smart enough to avoid it. As the dynamic allocation is a call to the allocator, which is a side effect.

Another option would be to actually store the buffer that holds the data inside the self object.
This way, this interface becomes valid.
And you can even do stack allocation in this case.

Right now, Transfer holds "parsed" representation of the call. One approach is to introduce an intermediate step between the "parsed" representation and the invoke()/invoke_signed() calls.
You can call it TransferPacked, or TransferSerialized, or, perhaps, rename the current Tranfer to TransferBuilder, and use Transfer as the name of the serialized representation.

Now, TransferSerialized holds an array of bytes that are the encoded instruction.
And invoke()/invoke_signed() just used these bytes.

The advantage is that it is up to the caller to decide where TransferSerialized is allocated. And as it is a specific type, the compiler knows the size as well, so you can easily allocate it on the stack.

Something like this:

pub struct Transfer<'a> { /// Sender account. pub from: &'a AccountInfo, /// Recipient account. pub to: &'a AccountInfo, /// Authority account. pub authority: &'a AccountInfo, /// Amount of microtokens to transfer. pub amount: u64, } impl Transfer<'a> { pub fn serialize(self) -> TransferSerialized<'a> { self.into() } } pub struct TransferSerialized<'a> { from: &'a AccountInfo, to: &'a AccountInfo, authority: &'a AccountInfo, data: [u8; 9]; } impl<'a> TransferSerialized<'a> { pub data(&self) -> &[u8; 9] { &self.data } #[inline(always)] pub fn invoke(&self) -> ProgramResult { self.invoke_signed(&[]) } pub fn invoke_signed(&self, signers: &[Signer]) -> ProgramResult { let Self { from, to, authority, data, } = self; let accounts: [AccountMeta; 3] = [ AccountMeta::writable(from.key()), AccountMeta::writable(to.key()), AccountMeta::readonly_signer(authority.key()), ]; let instruction = Instruction { program_id: &crate::ID, accounts, data, }; invoke_signed(&instruction, &[from, to, authority], signers) } } impl<'a> From<Transfer<'a>> for TransferSerialized<'a> { fn from( Transfer { from, to, authority, amount, }: Transfer, ) -> Self { // Instruction data layout: // - [0]: instruction discriminator (1 byte, u8) // - [1..9]: amount (8 bytes, u64) let mut data = [UNINIT_BYTE; 9]; // Set discriminator as u8 at offset [0] write_bytes(&mut data, &[3]); // Set amount as u64 at offset [1..9] write_bytes(&mut data[1..9], &amount.to_le_bytes()); // TODO: Replace with `array_assume_init()` when it is stabilized: // https://github.com/rust-lang/rust/issues/96097 // // SAFETY: We have initialized all the bytes above. let data = unsafe { mem::transmute::<_, [u8; 9]>(data) }; Self { from, to, authority, data, } } }

You can also compute the accounts value and store it in the TransferSerialiezd instance.
But it is unclear to me, to be honest, why do you want to expose the data field in the first place.
Depending on it, it may or may not make sense to store the accounts value.

Footnotes

I'm not sure if it can mean anything else at all. But I could be missing some cases. ↩

Chazzzzzzz added 2 commits April 13, 2025 16:42

implement feat109 for token

4845f33

add inline and use box for everything

24ce5fc

Chazzzzzzz changed the title ~~Chaz/feat109~~ Implementation of feat109 Apr 13, 2025

Chazzzzzzz mentioned this pull request Apr 13, 2025

feat request: add a trait to serialize instruction data #109

Open

Chazzzzzzz changed the title ~~Implementation of feat109~~ Implementation of feat109: add a trait to serialize instruction data Apr 13, 2025

febo reviewed Apr 15, 2025

View reviewed changes

ilya-bobyr reviewed Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of feat109: add a trait to serialize instruction data #131

Implementation of feat109: add a trait to serialize instruction data #131

Uh oh!

Chazzzzzzz commented Apr 13, 2025 •

edited

Loading

Uh oh!

febo Apr 15, 2025

Uh oh!

publicqi Apr 15, 2025

Uh oh!

publicqi Apr 15, 2025

Uh oh!

febo Apr 15, 2025

Uh oh!

febo Apr 15, 2025

Uh oh!

publicqi Apr 15, 2025 •

edited

Loading

Uh oh!

Chazzzzzzz Apr 16, 2025 •

edited

Loading

Uh oh!

febo Apr 15, 2025 •

edited

Loading

Uh oh!

febo Apr 15, 2025

Uh oh!

stegaBOB Jun 8, 2025

Uh oh!

ilya-bobyr Jun 17, 2025

Uh oh!

Uh oh!

		pub trait InstructionData {
		fn get_instruction_data(&self) -> &[u8];

Implementation of feat109: add a trait to serialize instruction data #131

Are you sure you want to change the base?

Implementation of feat109: add a trait to serialize instruction data #131

Uh oh!

Conversation

Chazzzzzzz commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

publicqi Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Chazzzzzzz Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

febo Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Uh oh!

Chazzzzzzz commented Apr 13, 2025 •

edited

Loading

publicqi Apr 15, 2025 •

edited

Loading

Chazzzzzzz Apr 16, 2025 •

edited

Loading

febo Apr 15, 2025 •

edited

Loading