Skip to content

Add the inner instruction index and address of erroring programs to TransactionError::InstructionError #6083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveluscher
Copy link

@steveluscher steveluscher commented May 3, 2025

Problem

Consider a transaction error that originates from a cross-program invocation (ie. an inner instruction). Currently TransactionError returns you the index of the outer instruction, which in no way helps you to correlate the content of the error message with the actual program from whence it came.

Consider this failed transaction. The index of the failure points to instruction #4 (ie. the instruction at index 3).

{
    "value": [{
        "confirmationStatus": "finalized",
        "confirmations": null,
        "err": {
            "InstructionError": [ 3, { "Custom": 6038 } ]
        },
        "slot": 323282082,
        "status": {
            "Err": {
                "InstructionError": [ 3, { "Custom": 6038 } ]
            }
        }
    }]
}

This is a bit disingenuous because the error didn't actually emanate from instruction #4 (program JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4), it came from instruction #4.5 (program whirLbMiicVdio4qvUfM5KAg6Ct8VwpYzGff3uctyCc). If you tried to decode this as a program error of JUP6Lk...

if (isProgramError(
    error,
    transactionMessage,
    address('JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4'),
    TickArraySequenceInvalidIndex.code,
)) {
    // ...
}

…you will get the wrong result. That method will return false instead of true because of the wrong choice of programAddress.

Given the existing situation (ie. the server does not vend the address of the program that threw the error) developers have no choice but to parse logs to try to figure out the program from which the custom error came.

See also anza-xyz/kit#149

Summary of Changes

This PR adds:

  • the inner instruction index of the erroring program.
  • the transaction-level index of the program from which the error came to the TransactionError::InstructionError.

The account index of the program is designed to be from the perspective of the transaction and not the perspective of the instruction to make this change safe from SIMD-163.

Until v3.0.0 of solana-transaction-error is released, you can test this PR locally by checking anza-xyz/solana-sdk#74 out locally at the path ../solana-sdk/ relative to agave.

Despite CI results in GitHub, if you link in the not-yet-released version of solana-transaction-error as described above, the tests all pass.

Notes to reviewers

You will really want to go through the ‘Commits’ tab in GitHub, because I separated out the interesting change from the ‘now I have to massage all the tests’ change.

Questions

  • Does this whole thing require a feature gate before we start including the program account index and writing it into storage?

Depends on the release of anza-xyz/solana-sdk#74.
Implements #5152.
Addresses anza-xyz/kit#149.

Copy link

mergify bot commented May 3, 2025

If this PR represents a change to the public RPC API:

  1. Make sure it includes a complementary update to rpc-client/ (example)
  2. Open a follow-up PR to update the JavaScript client @solana/kit (example)

Thank you for keeping the RPC clients in sync with the server API @steveluscher.

@@ -645,6 +645,106 @@ zstd = "0.13.3"
opt-level = 3

[patch.crates-io]
# The following entries are auto-generated by /bin/bash
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this. This will disappear when anza-xyz/solana-sdk#74 is released.

let invalid_instruction_data_error = TransactionError::InstructionError(
index,
InstructionError::InvalidInstructionData,
Some(instruction.program_id_index),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the SVMInstruction already knows the transaction-wide index of its program, but please double check me on this. cc/ @buffalojoec

https://github.com/steveluscher/agave/blob/f7a6f80ff632216ceb82bcbd95a9a8c1d9855987/svm-transaction/src/instruction.rs#L8-L9

Copy link

@buffalojoec buffalojoec May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 58 to 87
macro_rules! test {
($instructions:expr, $expected_result:expr $(,)?) => {
for feature_set in [FeatureSet::default(), FeatureSet::all_enabled()] {
test!($instructions, $expected_result, &feature_set);
}
};
($instructions:expr, $expected_result:expr, $feature_set:expr $(,)?) => {
__test_inner!($instructions, $feature_set, |result| {
assert_eq!(result, $expected_result);
});
};
($instructions:expr, $expected_result:pat $(,)?) => {
for feature_set in [FeatureSet::default(), FeatureSet::all_enabled()] {
test!($instructions, $expected_result, &feature_set);
}
};
($instructions:expr, $expected_result:pat if $guard:expr $(,)?) => {
for feature_set in [FeatureSet::default(), FeatureSet::all_enabled()] {
test!($instructions, $expected_result if $guard, &feature_set);
}
};
($instructions:expr, $expected_result:pat, $feature_set:expr $(,)?) => {
__test_inner!($instructions, $feature_set, |result| {
assert_matches!(result, $expected_result);
});
};
($instructions:expr, $expected_result:pat if $guard:expr, $feature_set:expr $(,)?) => {
__test_inner!($instructions, $feature_set, |result| {
assert_matches!(result, $expected_result if $guard);
});
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just me expanding these macros to support patterns.

In general, throughout these tests, you'll see me use assert_matches and a pattern for the program account index because:

  1. It ensures that the tests will not break if the index of the program changes
  2. The absolute index of the responsible program is tested in transaction_processor.rs so we don't need to keep testing it over and over everywhere else.

@@ -129,6 +129,7 @@ impl RpcSender for MockSender {
Err(TransactionError::InstructionError(
0,
InstructionError::UninitializedAccount,
Some(42), // Mock responsible program account index.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total missed opportunity to Sum 41.

image

Comment on lines 89 to 106
let transaction_context: &TransactionContext = invoke_context.transaction_context;
let responsible_program_account_index = transaction_context
// By definition the last instruction (outer or inner) in the trace before the trace
// stopped being appended to is the one that encountered an error.
.get_instruction_trace_length()
.checked_sub(1)
.and_then(|index_in_trace| {
transaction_context
.get_instruction_context_at_index_in_trace(index_in_trace)
.ok()
})
// The last program address in the instruction is that of the program being called.
.and_then(|ctx| ctx.get_last_program_key(transaction_context).ok())
// The order of program accounts in the `TransactionContext` has no relation to the
// order of the program accounts in the original message. It's the index in the
// message that we need.
.and_then(|errored_program_pubkey| {
message
.account_keys()
.iter()
.position(|message_pubkey| message_pubkey.eq(errored_program_pubkey))
});
TransactionError::InstructionError(
top_level_instruction_index as u8,
err,
responsible_program_account_index.map(|i| i as u8),
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code that actually captures the account index of the erroring program, and only if there appeared an error.

The key insight is here:

By definition the last instruction (outer or inner) in the trace before the trace stopped being appended to is the one that encountered an error.

Comment on lines +1316 to +1392
// This mock program takes in a list of programs and two bytes of data:
// (1) the index of the program that should throw an error
// (2) the index of the program being called
//
// The programs get executed - two at each CPI call depth - like this:
//
// Program 0
// -- Program 1
// -- Program 2
// ---- Program 3
// ---- Program 4
// ------ Program 5
// ------ Program 6
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of the test. Essentially I created a program that calls itself over and over as diagrammed in the comment. You can pass it the index at which you want it to throw an error. The test then makes sure that the TransactionError::InstructionError indicates that the error was thrown from the program whose account index is that one exactly.

Comment on lines +1422 to +1531
accounts: (0..PROGRAM_ADDRESSES.len())
.map(|index| (PROGRAM_ADDRESSES[index], mock_program_account.clone()))
.collect_vec(),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we know that the program accounts are in the order we expect them to be from the perspective of the transaction.

&[Instruction::new_with_bytes(
PROGRAM_ADDRESSES[base_program_index],
&[
index_of_program_that_should_throw_exception,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how you tell the program ‘I want you to throw an error when you get to the program with this index.’

@@ -70,6 +70,7 @@ message InstructionError {
uint32 index = 1;
InstructionErrorType error = 2;
CustomError custom = 3;
optional uint32 responsible_program_account_index = 4;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require a feature gate before we start writing to it?

@steveluscher steveluscher requested a review from apfitzge May 3, 2025 01:07
@steveluscher
Copy link
Author

@Lichtso I made this the account index of the program from the perspective of the transaction and not the perspective of the instruction to make it safe from SIMD-163.

{index_of_program_that_should_throw_exception}."
);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably write two additional tests:

  1. Test that the kind of instruction error that would blow up without a program at all (eg. NotEnoughAccountKeys or whatever) returns a None for the program account index.
  2. Test that it still gets the right index when the program is in an address lookup table

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds exactly right to me, along with what I suggested in another comment, about multiple top-level instructions

@jstarry
Copy link

jstarry commented May 5, 2025

Given the existing situation (ie. the server does not vend the address of the program that threw the error) developers have no choice but to parse logs to try to figure out the program from which the custom error came.

Can't they look at the inner ix metadata to figure this out?

@steveluscher
Copy link
Author

Can't they look at the inner ix metadata to figure this out?

Yes! But not without a round trip to getTransaction() to fetch that metadata, which is definitely not what we want to do if we want to keep apps performant and reliable.

@jstarry
Copy link

jstarry commented May 5, 2025

Yes! But not without a round trip to getTransaction() to fetch that metadata, which is definitely not what we want to do if we want to keep apps performant and reliable.

Well they had to get the status anyways, right?

@steveluscher
Copy link
Author

Well they had to get the status anyways, right?

Two cases where it's not correct to also fetch the entire transaction:

  1. You're fetching getSignatureStatuses and you don't know that the status is an error yet.
  2. You're running one of several default transaction confirmation routines.

Speculatively fetching getTransaction() in these cases would be wasteful and would increase global load on RPCs if all apps did it.

@t-nelson
Copy link

t-nelson commented May 5, 2025

is this not breaking several public interfaces?

@steveluscher
Copy link
Author

is this not breaking several public interfaces?

  • RPC API: Anything fetching a transaction status where that status contains an err property currently expects a tuple of (instructionIndex: number, err: InstructionError). This change would add a third element to the tuple. Existing clients would continue to expect 2, use 2, and ignore the third. Non breaking.
  • Code: After this change, any type position that uses the old type will generate a type error at that position (eg. TypeScript), but will not produce a runtime error. This is because the change is additive and does not modify the types of the values in position 0 and 1 of the tuple.

In any case, we'll go to 3.0.0 of solana-transaction-error to make this clear via semver.

@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from f7a6f80 to eebeab3 Compare May 6, 2025 02:21
@steveluscher
Copy link
Author

Updated the PR to include the index of the inner instruction, if applicable.

@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from eebeab3 to 0f963cf Compare May 6, 2025 16:55
@joncinque joncinque self-requested a review May 6, 2025 20:11
Comment on lines 70 to 74
uint32 index = 1;
InstructionErrorType error = 2;
CustomError custom = 3;
optional uint32 inner_instruction_index = 4;
optional uint32 responsible_program_account_index = 5;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steviez is there something fancy I should be doing here? Creating an option (1 byte) and a uint32 (4 bytes) just to store one byte of data sort of stinks. Could I instead hijack the remaining 24 bytes of uint32 index = 1 to do this?

xxxxxxxx abbbbbbb bcdddddd dd------

x - existing 8 bytes for `index`
a - option flag for `inner_instruction_index`
b - new 8 bytes for `inner_instruction_index`
c - option flag for `responsible_program_account_index`
d - new 8 bytes for `responsible_program_account_index`

…then just figure it all out in convert.rs?

Copy link

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Mostly small questions and comments.

I'll defer to Joe about the SVM message bit and to Steve about the proto bit since they'll know better. For what it's worth, the current change seems reasonable to me in both cases.

@@ -29,6 +29,7 @@ crate-type = ["lib"]
name = "solana_compute_budget_instruction"

[dev-dependencies]
assert_matches = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intended?

@@ -35,6 +35,7 @@ solana-instruction = { workspace = true }
solana-message = { workspace = true }
solana-pubkey = { workspace = true }
solana-rpc-client-api = { workspace = true }
solana-sdk-ids = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intended?

svm/Cargo.toml Outdated
@@ -12,6 +12,7 @@ edition = { workspace = true }
[dependencies]
ahash = { workspace = true }
itertools = { workspace = true }
lazy_static = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's work to remove this everywhere #6049 in favor of LazyLock https://doc.rust-lang.org/std/sync/struct.LazyLock.html, so we probably shouldn't add it

Comment on lines 92 to 100
// By definition the last instruction (outer or inner) in the trace before the trace
// stopped being appended to is the one that encountered an error.
.get_instruction_trace_length()
.checked_sub(1)
.and_then(|index_in_trace| {
transaction_context
.get_instruction_context_at_index_in_trace(index_in_trace)
.ok()
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my lack of understanding, but would it be simpler to call get_current_instruction_context? On the flip side, it looks like it boils down to almost exactly the same code

Comment on lines 112 to 119
enum InnerInstructionIndexSearchState {
SearchingForTopLevelInstruction(
usize, // Index of top-level instruction being sought next
),
InTopLevelInstruction(
Option<u8>, // Inner instruction index
),
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the point of the first variant of this enum. It's only ever used for 0, and never incremented, which makes me think there's either a bug, or it doesn't need an index, and we can just initialize to InTopLevelInstruction(None).

It might be simpler and more legible to just have an Option<u8> with the current inner instruction index, which is reset to None whenever we get to a top-level ix, and otherwise incremented. What do you think?

Would it be worth also adding a test with multiple top-level instructions to make sure this logic works?

{index_of_program_that_should_throw_exception}."
);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds exactly right to me, along with what I suggested in another comment, about multiple top-level instructions

@t-nelson
Copy link

t-nelson commented May 8, 2025

is this not breaking several public interfaces?

* **RPC API**: Anything fetching a transaction status where that status contains an `err` property currently expects a tuple of `(instructionIndex: number, err: InstructionError)`. This change would add a third element to the tuple. Existing clients would continue to expect 2, use 2, and ignore the third. Non breaking.

* **Code**: After this change, any type position that uses the old type will generate a _type_ error at that position (eg. [TypeScript](https://www.typescriptlang.org/play/?#code/C4TwDgpgBAkgdgZ2AJwK4GNgEsD2cCiyyOyUAvFAiALYBGOANgNwBQLokUAQhAGYnQKAbTio6EZABpYiFBmx5CxZAF1WHaAEFewCeSgixtCdPhI0mXASIlpo8atYteqOJbxR+OABQA3AIYMqBAAXNx8AgCUUADebAC+bF7eQgCM0gDKNPQM3tH+CDLm8lZKJCqRrAD0VVAAAsAIALQQAB6QmC02pADMUBAMENQQcI1QWLIQ-gAmUDi8UABMzjg+aZnZjHlQBUVy7tbK0osVTEA)), but will not produce a _runtime_ error. This is because the change is additive and does not modify the types of the values in position 0 and 1 of the tuple.

In any case, we'll go to 3.0.0 of solana-transaction-error to make this clear via semver.

tbh i care zero about either of those (they're leaving validator/monorepo soon enough). this breaks the rust sdk and all binary consumers. we can't "just bump major" because the validator is imposing the change on all consumers

@steveluscher steveluscher marked this pull request as draft May 10, 2025 00:44
@steveluscher
Copy link
Author

Moving to draft status; please don't review at the moment.

@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from 0f963cf to 071d83c Compare May 16, 2025 05:37
@steveluscher steveluscher changed the title Add the transaction-level account index of erroring programs to TransactionError::InstructionError Add the inner instruction index and transaction-level account index of erroring programs to TransactionError::InstructionError May 16, 2025
@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from 071d83c to 7ca6c8b Compare May 16, 2025 17:56
@buffalojoec
Copy link

Moving to draft status; please don't review at the moment.

Let me know whenever it's ready again. I combed through and for the most part I think it makes sense. I'll add my detailed review once you're ready. 🫡

@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from 7ca6c8b to 68cf8e6 Compare May 28, 2025 20:13
@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch 8 times, most recently from 7c542ea to 338cf2b Compare May 29, 2025 22:51
@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from 338cf2b to 72b3748 Compare May 30, 2025 17:38
… program account index) into the existing storage for `TransactionError`, in a way that's space-efficient and backward compatible
…arries the index of the inner instruction from which the error was thrown (if applicable) and the address of the program responsible
@steveluscher steveluscher force-pushed the responsible-program-of-transaction-error branch from 72b3748 to 152f001 Compare May 30, 2025 18:32
Copy link
Author

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Pending the release of this PR in solana-sdk, I think this is ready to go.

The approach

  1. Make a breaking change to the TransactionError::InstructionError type to add the address of the program responsible for the error and its inner instruction index (anza-xyz/solana-sdk#74).
  2. ‘Airgap’ this type from the stored representation in blockstore by creating StoredTransactionError. Implement a custom serializer on that type to make the serialization backward and forward compatible with the old one. (e207230)
  3. ‘Airgap’ this type from the RPC API by creating UiTransactionError and UiTransactionResult. Implement a custom serializer on that type to make the serialization backward and forward compatible with the old one. (0d1fd35)
  4. Teach TransactionContext to be able to pull the current inner instruction index, as would be displayed on an Explorer. (cb19cdf)
  5. Teach InvokeContext how to keep track of the program responsible for any first-thrown error, as well as the inner instruction index of that program in the transaction (431fc75)
  6. Teach SVM to use InvokeContext to blame programs for throwing errors (33966d7).

The outcome

The end result should be that we have a new, structured TransactionError::InstructionError with named fields that transaction processing code can use, that we maintain backward compatibility with old apps that expect instruction errors to appear in the form { "InstructionError": [0, { "Custom": 1 }] } through the RPC, and that new clients get to use additional data in that sequence as the next version of Agave starts to produce errors of the form { "InstructionError": [0, { "Custom": 1 }, "11111111111111111111111111111111", null] }.

How to review this PR

I recommend stepping through the ‘commits’ tab, ignoring the first commit where I patch in a local copy of solana-sdk with the change in anza-xyz/solana-sdk#74. CI will not be able to run on this PR until 74 is landed and released, so we're flying blind for now. The only way to run the tests is to check out this PR, and 74 of solana-sdk as a sibling of agave/ and compile the whole shebang with RUSTFLAGS="-Adeprecated"

Here's what I need from you folks

  • @steviez, can you review the changes in e207230 from the perspective of how the new `TransactionError::InstructionError gets stored in blockstore. I believe these to be backward and forward compatible, and have several tests that attempt to prove that.
  • @Lichtso, can you review my changes to TransactionContext (cb19cdf), InvokeContext (431fc75), and SVM (33966d7).
    • I'm very unhappy with this code. It exists because I haven't found all of the places where errors are thrown from SVM, so I've had to accept that, sometimes, first_seen_error_attribution will be None, despite an error having been encountered. What I really want is to make that a true invariant that throws, because there should always be error attribution if you've entered that block.
  • @apfitzge, can you review this from the perspective of the fees reviewer?
  • @buffalojoec, can you review (0d1fd35) and (c964064) with the goal of making sure that I've ensured that RPC responses get serialized in the old tuple-variant style everywhere, to maintain compatibility with the client RPC API.
  • @joncinque, can you sort of skim the whole thing, and also double check my assumption in 152f001?

/// Observe how the index resets every time a new top-level instruction is called.
///
/// * #1 Program A (no index)
/// * #1.1 CPI to Program B (index 0)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explorers typically 1-index inner instruction indexes, but in the code we zero index them, None meaning ‘this is not an inner instruction.’

Comment on lines +742 to +743
// Index values are 32-bit integers of the form:
// TTTTTTTT IIIIIIII xxxxxxxx xxxxxxxx
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already a uint32, so we can fit more u8 data in here, rather than to create a new column.

// * I - The inner index of the instruction that errored; 0 if None, 1-indexed otherwise
let [outer_instruction_index, maybe_inner_instruction_index, _unused1, _unused2] =
instruction_error.index.to_le_bytes();
let inner_instruction_index = maybe_inner_instruction_index.checked_sub(1);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this as a single-byte option, where 0 means None and 1 means 0. The implication is that this can only store an inner instruction index of at most 255. If we can CPI that many instructions, it probably means that Solana has undergone major changes, and all of this code is already gone.

fn from(value: StoredTransactionError) -> Self {
let bytes = value.0;
match &bytes.as_slice() {
[8, 0, 0, 0, ..] => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the bytes of a transaction error from storage, this header implies that it's a TransactionError::InstructionError (ie. the eighth variant in the TransactionError struct.

attr.inner_instruction_index.map(|i| i as u8),
Some(attr.responsible_program_address),
),
None => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to do is to make this an invariant; you should never be able to reach this code without something in the SVM having blamed a program for the error. I don't see a way to provably make this so, so instead I've opted for a ‘soft fail’ that sets the attribution data to None. cc/ @Lichtso.

@@ -226,12 +234,122 @@ impl From<&MessageAddressTableLookup> for UiAddressTableLookup {
}
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct UiTransactionError(pub TransactionError);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start here for the RPC UiTransactionError and UiTransactionResult airgap change. These are essentially shims that ensure that the RPC serialization for transaction errors won't change its structure despite TransactionError::InstructionError having changed its structure.

@@ -244,7 +244,7 @@ struct RentMetrics {
pub type BankStatusCache = StatusCache<Result<()>>;
#[cfg_attr(
feature = "frozen-abi",
frozen_abi(digest = "5dfDCRGWPV7thfoZtLpTJAV8cC93vQUXgTm6BnrfeUsN")
frozen_abi(digest = "5UmYzdMvTDkFBKsqddQ43mSikgEA6s2bTvRZUq78YPQ2")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is actually bad, because Result contains TransactionError, which has the updated version of TransactionError::InstructionError, which means that snapshots that are serialized with this version of BankSlotDelta will be incompatible with Agave <2.3.

This probably means that I have work here to do to make sure that the snapshots themselves are backward/forward compatible, but please let me know if I'm wrong about that. Maybe it's the case that we don't have an expectation that you can boot Agave 2.2 from a snapshot produced by 2.3? You tell me. cc/ @joncinque.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @brooksprumo?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi 👋

Without looking at the PR/changes, the way to determine if the digests are safe/right to change is to generate the abi files locally and compare. You'll want to run the test-abi.sh script, or the cmd it itself runs, on both master and this PR. Then you can diff the output see which fields changed. Sometimes it is adding fields, and sometimes it is just renaming a module.


This probably means that I have work here to do to make sure that the snapshots themselves are backward/forward compatible, but please let me know if I'm wrong about that. Maybe it's the case that we don't have an expectation that you can boot Agave 2.2 from a snapshot produced by 2.3? You tell me.

Snapshots must be compatible between adjacent version. For example, a snapshot created by v2.2 must be loadable by v2.1 and v2.3. A snapshot created by v2.3 must be loadable by v2.2, but does not need to be loadable by v2.1. To change the snapshot serialization format, a SIMD is required, as it impacts the other validator clients (e.g. firedancer).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance!

If the diff was this (ie. adding two fields to the end of the tuple variant InstructionError) would that be backward compatible? That is to say if v2.3 wrote those two extra elements would v2.2 ignore them and carry on, or would it panic?

I'm pretty sure that I can get BankSlotDeltas to serialize like that, if that would ensure backward/forward compat, but what I'm hearing is that adding any fields at all will require a SIMD no matter what.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the diff was this (ie. adding two fields to the end of the tuple variant InstructionError) would that be backward compatible? That is to say if v2.3 wrote those two extra elements would v2.2 ignore them and carry on, or would it panic?

In this case I'm not sure. Luckily it's an easy thing to try out! Create a snapshot with this PR and then try to load that snapshot with v2.2.

I'm pretty sure that I can get BankSlotDeltas to serialize like that, if that would ensure backward/forward compat, but what I'm hearing is that adding any fields at all will require a SIMD no matter what.

I don't have personal experience with adding tuple variants on something that was marked for frozen abi, so I'm not actually sure what'll happen here. I think a SIMD will depend on the result of the experiment above.

Luckily, if a SIMD is required, they are pretty simple. At least as far as SIMDs are concerned :)

@steveluscher steveluscher marked this pull request as ready for review May 30, 2025 18:54
@steveluscher steveluscher changed the title Add the inner instruction index and transaction-level account index of erroring programs to TransactionError::InstructionError Add the inner instruction index and address of erroring programs to TransactionError::InstructionError May 30, 2025
@steveluscher
Copy link
Author

Giving up on this; wrote a giant redux of everything I tried, split this PR into smaller landable PRs that will make the next person's job easier, and linked everything together in #6546.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants