Skip to content

Add the transaction-level account index of erroring programs to TransactionError::InstructionError #6083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

steveluscher
Copy link

@steveluscher steveluscher commented May 3, 2025

Problem

Consider a transaction error that originates from a cross-program invocation (ie. an inner instruction). Currently TransactionError returns you the index of the outer instruction, which in no way helps you to correlate the content of the error message with the actual program from whence it came.

Consider this failed transaction. The index of the failure points to instruction #4 (ie. the instruction at index 3).

{
    "value": [{
        "confirmationStatus": "finalized",
        "confirmations": null,
        "err": {
            "InstructionError": [ 3, { "Custom": 6038 } ]
        },
        "slot": 323282082,
        "status": {
            "Err": {
                "InstructionError": [ 3, { "Custom": 6038 } ]
            }
        }
    }]
}

This is a bit disingenuous because the error didn't actually emanate from instruction #4 (program JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4), it came from instruction #4.5 (program whirLbMiicVdio4qvUfM5KAg6Ct8VwpYzGff3uctyCc). If you tried to decode this as a program error of JUP6Lk...

if (isProgramError(
    error,
    transactionMessage,
    address('JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4'),
    TickArraySequenceInvalidIndex.code,
)) {
    // ...
}

…you will get the wrong result. That method will return false instead of true because of the wrong choice of programAddress.

Given the existing situation (ie. the server does not vend the address of the program that threw the error) developers have no choice but to parse logs to try to figure out the program from which the custom error came.

See also anza-xyz/kit#149

Summary of Changes

This PR adds the transaction-level index of the program from which the error came to the TransactionError::InstructionError. This is added to storage as an Option<u32> since it will not be possible to backfill old transactions' erroring program account indices.

The account index of the program is designed to be from the perspective of the transaction and not the perspective of the instruction to make this change safe from SIMD-163.

It also adds the index of the inner instruction, in cases where the error originated from a CPI.

Until v3.0.0 of solana-transaction-error is released, you can test this PR locally by checking anza-xyz/solana-sdk#74 out locally at the path ../solana-sdk/ relative to agave.

Despite CI results in GitHub, if you link in the not-yet-released version of solana-transaction-error as described above, the tests all pass.

Notes to reviewers

You will really want to go through the ‘Commits’ tab in GitHub, because I separated out the interesting change from the ‘now I have to massage all the tests’ change.

Questions

  • Does this whole thing require a feature gate before we start including the program account index and writing it into storage?

Depends on the release of anza-xyz/solana-sdk#74.
Implements #5152.
Addresses anza-xyz/kit#149.

Copy link

mergify bot commented May 3, 2025

If this PR represents a change to the public RPC API:

  1. Make sure it includes a complementary update to rpc-client/ (example)
  2. Open a follow-up PR to update the JavaScript client @solana/kit (example)

Thank you for keeping the RPC clients in sync with the server API @steveluscher.

@@ -645,6 +645,106 @@ zstd = "0.13.3"
opt-level = 3

[patch.crates-io]
# The following entries are auto-generated by /bin/bash
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this. This will disappear when anza-xyz/solana-sdk#74 is released.

let invalid_instruction_data_error = TransactionError::InstructionError(
index,
InstructionError::InvalidInstructionData,
Some(instruction.program_id_index),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the SVMInstruction already knows the transaction-wide index of its program, but please double check me on this. cc/ @buffalojoec

https://github.com/steveluscher/agave/blob/f7a6f80ff632216ceb82bcbd95a9a8c1d9855987/svm-transaction/src/instruction.rs#L8-L9

Comment on lines 58 to 87
macro_rules! test {
($instructions:expr, $expected_result:expr $(,)?) => {
for feature_set in [FeatureSet::default(), FeatureSet::all_enabled()] {
test!($instructions, $expected_result, &feature_set);
}
};
($instructions:expr, $expected_result:expr, $feature_set:expr $(,)?) => {
__test_inner!($instructions, $feature_set, |result| {
assert_eq!(result, $expected_result);
});
};
($instructions:expr, $expected_result:pat $(,)?) => {
for feature_set in [FeatureSet::default(), FeatureSet::all_enabled()] {
test!($instructions, $expected_result, &feature_set);
}
};
($instructions:expr, $expected_result:pat if $guard:expr $(,)?) => {
for feature_set in [FeatureSet::default(), FeatureSet::all_enabled()] {
test!($instructions, $expected_result if $guard, &feature_set);
}
};
($instructions:expr, $expected_result:pat, $feature_set:expr $(,)?) => {
__test_inner!($instructions, $feature_set, |result| {
assert_matches!(result, $expected_result);
});
};
($instructions:expr, $expected_result:pat if $guard:expr, $feature_set:expr $(,)?) => {
__test_inner!($instructions, $feature_set, |result| {
assert_matches!(result, $expected_result if $guard);
});
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just me expanding these macros to support patterns.

In general, throughout these tests, you'll see me use assert_matches and a pattern for the program account index because:

  1. It ensures that the tests will not break if the index of the program changes
  2. The absolute index of the responsible program is tested in transaction_processor.rs so we don't need to keep testing it over and over everywhere else.

@@ -129,6 +129,7 @@ impl RpcSender for MockSender {
Err(TransactionError::InstructionError(
0,
InstructionError::UninitializedAccount,
Some(42), // Mock responsible program account index.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total missed opportunity to Sum 41.

image

Comment on lines 89 to 115
let transaction_context: &TransactionContext = invoke_context.transaction_context;
let responsible_program_account_index = transaction_context
// By definition the last instruction (outer or inner) in the trace before the trace
// stopped being appended to is the one that encountered an error.
.get_instruction_trace_length()
.checked_sub(1)
.and_then(|index_in_trace| {
transaction_context
.get_instruction_context_at_index_in_trace(index_in_trace)
.ok()
})
// The last program address in the instruction is that of the program being called.
.and_then(|ctx| ctx.get_last_program_key(transaction_context).ok())
// The order of program accounts in the `TransactionContext` has no relation to the
// order of the program accounts in the original message. It's the index in the
// message that we need.
.and_then(|errored_program_pubkey| {
message
.account_keys()
.iter()
.position(|message_pubkey| message_pubkey.eq(errored_program_pubkey))
});
TransactionError::InstructionError(
top_level_instruction_index as u8,
err,
responsible_program_account_index.map(|i| i as u8),
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code that actually captures the account index of the erroring program, and only if there appeared an error.

The key insight is here:

By definition the last instruction (outer or inner) in the trace before the trace stopped being appended to is the one that encountered an error.

Comment on lines +1316 to +1328
// This mock program takes in a list of programs and two bytes of data:
// (1) the index of the program that should throw an error
// (2) the index of the program being called
//
// The programs get executed - two at each CPI call depth - like this:
//
// Program 0
// -- Program 1
// -- Program 2
// ---- Program 3
// ---- Program 4
// ------ Program 5
// ------ Program 6
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of the test. Essentially I created a program that calls itself over and over as diagrammed in the comment. You can pass it the index at which you want it to throw an error. The test then makes sure that the TransactionError::InstructionError indicates that the error was thrown from the program whose account index is that one exactly.

Comment on lines +1422 to +1424
accounts: (0..PROGRAM_ADDRESSES.len())
.map(|index| (PROGRAM_ADDRESSES[index], mock_program_account.clone()))
.collect_vec(),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we know that the program accounts are in the order we expect them to be from the perspective of the transaction.

&[Instruction::new_with_bytes(
PROGRAM_ADDRESSES[base_program_index],
&[
index_of_program_that_should_throw_exception,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how you tell the program ‘I want you to throw an error when you get to the program with this index.’

@@ -70,6 +70,7 @@ message InstructionError {
uint32 index = 1;
InstructionErrorType error = 2;
CustomError custom = 3;
optional uint32 responsible_program_account_index = 4;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require a feature gate before we start writing to it?

@steveluscher steveluscher requested a review from apfitzge May 3, 2025 01:07
@steveluscher
Copy link
Author

@Lichtso I made this the account index of the program from the perspective of the transaction and not the perspective of the instruction to make it safe from SIMD-163.

{index_of_program_that_should_throw_exception}."
);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably write two additional tests:

  1. Test that the kind of instruction error that would blow up without a program at all (eg. NotEnoughAccountKeys or whatever) returns a None for the program account index.
  2. Test that it still gets the right index when the program is in an address lookup table

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds exactly right to me, along with what I suggested in another comment, about multiple top-level instructions

@jstarry
Copy link

jstarry commented May 5, 2025

Given the existing situation (ie. the server does not vend the address of the program that threw the error) developers have no choice but to parse logs to try to figure out the program from which the custom error came.

Can't they look at the inner ix metadata to figure this out?

@steveluscher
Copy link
Author

Can't they look at the inner ix metadata to figure this out?

Yes! But not without a round trip to getTransaction() to fetch that metadata, which is definitely not what we want to do if we want to keep apps performant and reliable.

@jstarry
Copy link

jstarry commented May 5, 2025

Yes! But not without a round trip to getTransaction() to fetch that metadata, which is definitely not what we want to do if we want to keep apps performant and reliable.

Well they had to get the status anyways, right?

@steveluscher
Copy link
Author

Well they had to get the status anyways, right?

Two cases where it's not correct to also fetch the entire transaction:

  1. You're fetching getSignatureStatuses and you don't know that the status is an error yet.
  2. You're running one of several default transaction confirmation routines.

Speculatively fetching getTransaction() in these cases would be wasteful and would increase global load on RPCs if all apps did it.

@t-nelson
Copy link

t-nelson commented May 5, 2025

is this not breaking several public interfaces?

@steveluscher
Copy link
Author

is this not breaking several public interfaces?

  • RPC API: Anything fetching a transaction status where that status contains an err property currently expects a tuple of (instructionIndex: number, err: InstructionError). This change would add a third element to the tuple. Existing clients would continue to expect 2, use 2, and ignore the third. Non breaking.
  • Code: After this change, any type position that uses the old type will generate a type error at that position (eg. TypeScript), but will not produce a runtime error. This is because the change is additive and does not modify the types of the values in position 0 and 1 of the tuple.

In any case, we'll go to 3.0.0 of solana-transaction-error to make this clear via semver.

@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from f7a6f80 to eebeab3 Compare May 6, 2025 02:21
@steveluscher
Copy link
Author

Updated the PR to include the index of the inner instruction, if applicable.

…oring program is added to `TransactionError::InstructionError`, patch up all of the tests
@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from eebeab3 to 0f963cf Compare May 6, 2025 16:55
@joncinque joncinque self-requested a review May 6, 2025 20:11
Comment on lines 70 to +74
uint32 index = 1;
InstructionErrorType error = 2;
CustomError custom = 3;
optional uint32 inner_instruction_index = 4;
optional uint32 responsible_program_account_index = 5;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steviez is there something fancy I should be doing here? Creating an option (1 byte) and a uint32 (4 bytes) just to store one byte of data sort of stinks. Could I instead hijack the remaining 24 bytes of uint32 index = 1 to do this?

xxxxxxxx abbbbbbb bcdddddd dd------

x - existing 8 bytes for `index`
a - option flag for `inner_instruction_index`
b - new 8 bytes for `inner_instruction_index`
c - option flag for `responsible_program_account_index`
d - new 8 bytes for `responsible_program_account_index`

…then just figure it all out in convert.rs?

Copy link

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Mostly small questions and comments.

I'll defer to Joe about the SVM message bit and to Steve about the proto bit since they'll know better. For what it's worth, the current change seems reasonable to me in both cases.

@@ -29,6 +29,7 @@ crate-type = ["lib"]
name = "solana_compute_budget_instruction"

[dev-dependencies]
assert_matches = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intended?

@@ -35,6 +35,7 @@ solana-instruction = { workspace = true }
solana-message = { workspace = true }
solana-pubkey = { workspace = true }
solana-rpc-client-api = { workspace = true }
solana-sdk-ids = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intended?

@@ -12,6 +12,7 @@ edition = { workspace = true }
[dependencies]
ahash = { workspace = true }
itertools = { workspace = true }
lazy_static = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's work to remove this everywhere #6049 in favor of LazyLock https://doc.rust-lang.org/std/sync/struct.LazyLock.html, so we probably shouldn't add it

Comment on lines +92 to +100
// By definition the last instruction (outer or inner) in the trace before the trace
// stopped being appended to is the one that encountered an error.
.get_instruction_trace_length()
.checked_sub(1)
.and_then(|index_in_trace| {
transaction_context
.get_instruction_context_at_index_in_trace(index_in_trace)
.ok()
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my lack of understanding, but would it be simpler to call get_current_instruction_context? On the flip side, it looks like it boils down to almost exactly the same code

Comment on lines +112 to +119
enum InnerInstructionIndexSearchState {
SearchingForTopLevelInstruction(
usize, // Index of top-level instruction being sought next
),
InTopLevelInstruction(
Option<u8>, // Inner instruction index
),
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the point of the first variant of this enum. It's only ever used for 0, and never incremented, which makes me think there's either a bug, or it doesn't need an index, and we can just initialize to InTopLevelInstruction(None).

It might be simpler and more legible to just have an Option<u8> with the current inner instruction index, which is reset to None whenever we get to a top-level ix, and otherwise incremented. What do you think?

Would it be worth also adding a test with multiple top-level instructions to make sure this logic works?

{index_of_program_that_should_throw_exception}."
);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds exactly right to me, along with what I suggested in another comment, about multiple top-level instructions

@t-nelson
Copy link

t-nelson commented May 8, 2025

is this not breaking several public interfaces?

* **RPC API**: Anything fetching a transaction status where that status contains an `err` property currently expects a tuple of `(instructionIndex: number, err: InstructionError)`. This change would add a third element to the tuple. Existing clients would continue to expect 2, use 2, and ignore the third. Non breaking.

* **Code**: After this change, any type position that uses the old type will generate a _type_ error at that position (eg. [TypeScript](https://www.typescriptlang.org/play/?#code/C4TwDgpgBAkgdgZ2AJwK4GNgEsD2cCiyyOyUAvFAiALYBGOANgNwBQLokUAQhAGYnQKAbTio6EZABpYiFBmx5CxZAF1WHaAEFewCeSgixtCdPhI0mXASIlpo8atYteqOJbxR+OABQA3AIYMqBAAXNx8AgCUUADebAC+bF7eQgCM0gDKNPQM3tH+CDLm8lZKJCqRrAD0VVAAAsAIALQQAB6QmC02pADMUBAMENQQcI1QWLIQ-gAmUDi8UABMzjg+aZnZjHlQBUVy7tbK0osVTEA)), but will not produce a _runtime_ error. This is because the change is additive and does not modify the types of the values in position 0 and 1 of the tuple.

In any case, we'll go to 3.0.0 of solana-transaction-error to make this clear via semver.

tbh i care zero about either of those (they're leaving validator/monorepo soon enough). this breaks the rust sdk and all binary consumers. we can't "just bump major" because the validator is imposing the change on all consumers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants