-
Notifications
You must be signed in to change notification settings - Fork 81
feat Add metadata size to AccountsFile deserialization
#128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat Add metadata size to AccountsFile deserialization
#128
Conversation
fernandodeluret
commented
Sep 24, 2025
- Deserialize snapshot metadata file
- Use AccountFile lenght contained in metadata for deserialization
| let (slot_str, id_str) = file_name | ||
| .split_once('.') | ||
| .unwrap_or_else(|| panic!("Invalid file name: {}", file_name)); | ||
| let slot = slot_str | ||
| .parse::<u64>() | ||
| .unwrap_or_else(|_| panic!("Invalid slot: {}", slot_str)); | ||
| let id = id_str | ||
| .parse::<u64>() | ||
| .unwrap_or_else(|_| panic!("Invalid id: {}", id_str)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can share details on this? Wonderful addition of write version. Please share any resources too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not much resources about any of the changes here. I basically had to read through agave code, in theory we should only have 1 write version of the account per slot in snapshots (so also means 1 file per slot). But it's there just for cover that edge case. Also because it's a good way to double check that metadata deserialization went correctly (because we look on metadata not only by slot but also by write version)
| let mut size = None; | ||
| for account in accounts_metadata { | ||
| if account.id as u64 == id { | ||
| size = Some(account.accounts_current_len); | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me a bit about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the basic idea of this(and all this change in general) is that we can teorically have files for which the accounts data does not match the file length (because of padding). So the safe way to decode accounts is to read the "accounts_current_len" from the metadata file, which is how agave is loading snapshots. In the tests I ran after the deserialization in practice they do match most of the time, but I left it there because it's the way that agave uses and also we cover the case there is difference between account data and file size
| ); | ||
|
|
||
| let solana_snapshot = SolanaSnapshot::unpack_compressed(config.path.clone())?; | ||
| let solana_snapshot = SolanaSnapshot::unpack_compressed(config.path.clone(), config.slot)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the slot is in the name of the file why pass it in? And also assuming folks don't change the filename not a big one but it is something that could break this. I don't think it should rely on the name of the tar but can rely on the naming and folder structure of the contents of the tar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, actually for now this is draft, but I havethe intention of doing the slot optional, and if not passed use the metadata slot (the file name can also be checked). But this is a good double check (actually triple check) about the slot, because we can set as the slot in the config the data received as result from the endpoint we get the snapshot (and over which we do some calculations). So when dealing with several snapshots in the same process this is just an reduntant check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to test further all this though (seems is working good so far, and could be a good resource for people working with snapshots, because there is not much docs out there about all the length check, at least that I have found)