Skip to content

Commit 7d00568

Browse files
authored
transfer-lamports: Improve asm performance, update table (#12)
#### Problem The assembly version of transfer-lamports does some redundant work on certain registers. Also, the tables could use more info. #### Summary of changes Optimize the assembly version a bit further, and add some information about the relative performance of different implementations.
1 parent cee1e4c commit 7d00568

File tree

2 files changed

+17
-12
lines changed

2 files changed

+17
-12
lines changed

README.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -165,20 +165,25 @@ the amount given by a little-endian u64 in instruction data.
165165
| Rust | 459 |
166166
| Zig | 44 |
167167
| C | 104 |
168-
| Assembly | 31 |
168+
| Assembly | 30 |
169169
| Rust (pinocchio) | 32 |
170170

171171
This one starts to get interesting since it requires parsing the instruction
172172
input. Since the assembly version knows exactly where to find everything, it can
173-
be hyper-optimized.
173+
be hyper-optimized. The pinocchio version performs very closely to the assembly
174+
implementation!
174175

175176
* CPI: allocates a PDA given by the seed "You pass butter" and a bump seed in
176177
the instruction data. This requires a call to `create_program_address` to check
177178
the address and `invoke_signed` to CPI to the system program.
178179

179-
| Language | CU Usage |
180-
| --- | --- |
181-
| Rust | 3662 |
182-
| Zig | 2825 |
183-
| C | 3122 |
184-
| Rust (pinocchio) | 2816 |
180+
| Language | CU Usage | CU Usage (minus syscalls) |
181+
| --- | --- | --- |
182+
| Rust | 3662 | 1162 |
183+
| Zig | 2825 | 325 |
184+
| C | 3122 | 622 |
185+
| Rust (pinocchio) | 2816 | 316 |
186+
187+
Note: `create_program_address` consumes 1500 CUs, and `invoke` consumes 1000, so
188+
we can subtract 2500 CUs from each program to see the actual cost of the program
189+
logic.

transfer-lamports/asm/main.s

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,9 @@ entrypoint:
1212
add64 r4, 8 + 8 + 32 + 32 + 8 + 8 + 10240 + 8 # calculate end of account data
1313
add64 r4, r3
1414
mov64 r5, r4 # check how much padding we need to add
15-
and64 r5, -8 # clear low bits
15+
and64 r4, -8 # clear low bits
1616
jeq r5, r4, 1 # no low bits set, jump ahead
1717
add64 r4, 8 # add 8 for truncation if needed
18-
and64 r4, -8 # clear low bits
1918

2019
ldxb r5, [r4 + 0] # get second account
2120
jne r5, 0xff, error # we don't allow duplicates
@@ -25,11 +24,12 @@ entrypoint:
2524
add64 r7, 8 + 32 + 32 + 8 + 8 + 10240 + 8 # calculate end of account data
2625
add64 r7, r6
2726
mov64 r8, r7 # check how much padding we need to add
28-
and64 r8, -8 # clear low bits
27+
and64 r7, -8 # clear low bits
2928
jeq r8, r7, 1 # no low bits set, jump ahead
3029
add64 r7, 8 # add 8 for truncation if low bits are set
30+
3131
ldxdw r8, [r7 + 0] # get instruction data size
32-
jne r8, 0x08, error # need 8 bytes of instruction data
32+
jne r8, 8, error # need 8 bytes of instruction data
3333
ldxdw r8, [r7 + 8] # get instruction data as little-endian u64
3434

3535
sub64 r2, r8 # subtract lamports

0 commit comments

Comments
 (0)