-
Notifications
You must be signed in to change notification settings - Fork 445
Allows scanning slot and getting stored account info without data #5840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows scanning slot and getting stored account info without data #5840
Conversation
accounts-db/src/accounts_db.rs
Outdated
storage_scan_func: impl for<'a, 'b, 'storage> Fn( | ||
&'b B, | ||
&'a StoredAccountInfoWithoutData<'storage>, | ||
Option<&'storage [u8]>, // account data | ||
) + Sync, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change. Now storage_scan_func
has a StoredAccountInfoWithoutData
parameter, which means the Option
data parameter is the only way to get account data (and thus cannot be incorrectly ignored).
ScanAccountStorageData::NoData => { | ||
storage.scan_accounts(|account| { | ||
let account_without_data = StoredAccountInfoWithoutData::new_from(&account); | ||
storage_scan_func(retval, &account_without_data, None); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR we can add an AccountsFile::scan_accounts_without_data()
function, so we'll actually not load the account data here.
let mut loaded_hash = loaded_account.loaded_hash(); | ||
if loaded_hash == AccountHash(Hash::default()) { | ||
loaded_hash = Self::hash_account(loaded_account, loaded_account.pubkey()) | ||
loaded_hash = Self::hash_account(&loaded_account, loaded_account.pubkey()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're hashing the account, so we do need the account data.
// Storage may have duplicates so only keep the latest version for each key | ||
accum.insert(*loaded_account.pubkey(), loaded_account.take_account()); | ||
}, | ||
ScanAccountStorageData::NoData, | ||
ScanAccountStorageData::DataRefForStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're returning the account, so we do need the account data.
accum.insert(key, (loaded_hash, account)); | ||
}, | ||
ScanAccountStorageData::NoData, | ||
ScanAccountStorageData::DataRefForStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing, we're hashing the account here.
295b835
to
bfe6f1e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5840 +/- ##
=======================================
Coverage 83.0% 83.0%
=======================================
Files 828 828
Lines 375691 375750 +59
=======================================
+ Hits 311874 311931 +57
- Misses 63817 63819 +2 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
EditThis comment was cross-posted from #5838, which is importantly different in its implementation, and the lead-in to this comment is not quite valid. That being said, I'm still in favor of not overloading I'm curious if we can avoid overloading For example, if we relax the requirement on pub fn scan_cache_storage_fallback<R, B>(
&self,
slot: Slot,
cache_func: impl Fn(&Arc<CachedAccount>) -> Option<R> + Sync,
storage_fallback_func: impl Fn(&mut B, &AccountsFile) + Sync,
) -> ScanStorageResult<R, B>
where
R: Send,
B: Send + Default + Sync,
{
if let Some(slot_cache) = self.accounts_cache.slot_cache(slot) {
// If we see the slot in the cache, then all the account information
// is in this cached slot
ScanStorageResult::Cached(
slot_cache
.iter()
.filter_map(|account| cache_func(account.value()))
.collect(),
)
} else {
let mut retval = B::default();
// If the slot is not in the cache, then all the account information must have
// been flushed. This is guaranteed because we only remove the rooted slot from
// the cache *after* we've finished flushing in `flush_slot_cache`.
// Regarding `shrinking_in_progress_ok`:
// This fn could be running in the foreground, so shrinking could be running in the background, independently.
// Even if shrinking is running, there will be 0-1 active storages to scan here at any point.
// When a concurrent shrink completes, the active storage at this slot will
// be replaced with an equivalent storage with only alive accounts in it.
// A shrink on this slot could have completed anytime before the call here, a shrink could currently be in progress,
// or the shrink could complete immediately or anytime after this call. This has always been true.
// So, whether we get a never-shrunk, an about-to-be shrunk, or a will-be-shrunk-in-future storage here to scan,
// all are correct and possible in a normally running system.
if let Some(storage) = self
.storage
.get_slot_storage_entry_shrinking_in_progress_ok(slot)
{
storage_fallback_func(&mut retval, &storage.accounts);
}
ScanStorageResult::Stored(retval)
}
} This would enable something like the following, which closely resembles pub(crate) fn scan_account_storage_without_data<R, B>(
&self,
slot: Slot,
cache_map_func: impl Fn(&StoredAccountInfoWithoutData) -> Option<R> + Sync,
storage_scan_func: impl Fn(&mut B, &StoredAccountInfoWithoutData) + Sync,
) -> ScanStorageResult<R, B>
where
R: Send,
B: Send + Default + Sync,
{
self.scan_cache_storage_fallback(
slot,
|cached_account| {
cache_map_func(&StoredAccountInfoWithoutData {
pubkey: *cached_account.pubkey(),
lamports: cached_account.lamports(),
owner: cached_account.owner(),
data_len: cached_account.data_len(),
executable: cached_account.executable(),
rent_epoch: cached_account.rent_epoch(),
})
},
|retval, storage| {
storage.scan_accounts_without_data(|account| storage_scan_func(retval, &account))
},
)
} And pub(crate) fn scan_account_storage<R, B>(
&self,
slot: Slot,
cache_map_func: impl Fn(&LoadedAccount) -> Option<R> + Sync,
storage_scan_func: impl Fn(&mut B, &LoadedAccount) + Sync,
) -> ScanStorageResult<R, B>
where
R: Send,
B: Send + Default + Sync,
{
self.scan_cache_storage_fallback(
slot,
|cached_account| cache_map_func(&LoadedAccount::Cached(Cow::Borrowed(cached_account))),
|retval, storage| {
storage.scan_accounts(|account| {
let loaded_account = LoadedAccount::Stored(account);
storage_scan_func(retval, &loaded_account, data)
})
},
)
} With this scheme, we can remove the All this being said, I am not at all an expert in this part of the codebase, and there may be more to the picture with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My earlier comment was cross-posted from #5838, which is importantly different in its implementation, and the lead-in to this comment is not quite valid. I'm still in favor of not overloading scan_account_storage and removing the ScanAccountStorageData enum, as I think it adds extra complexity and room for error that could be simplified with separate scan methods (e.g., removing Option<&'storage [u8]> from the no-data scan alltogther).
That being said, this is certainly an improvement and does pave a route for avoiding loading account data in scans!
I know this comment was copied over from #5838 (comment), but want to respond here too. This PR doesn't overload LoadedAccount, so that shouldn't be an issue here.
Yep! I think this is the right direction to go. With this PR we make progress in that direction. Subsequent PRs can continue doing that improvement, but I didn't want to go all the way in this PR for the sake of clarity/review-ability, since it'll require a lot of refactoring (i.e. noise in the PR).
and
Yep! I'm in agreement. |
Problem
The
scan_account_storage()
function (reproduced below) has ascan_account_storage_data
parameter to indicate if the caller wants the account data or not. Unfortunately this parameter is not respected; we always load the account data. We should not do that.Summary of Changes
This PR allows the caller of
scan_account_storage()
to specify if they want account data or not, and have the choice be respected.This PR introduces a new type,
StoredAccountInfoWithoutData
, that is used in thestorage_scan_func
parameter, which replaces the currentLoadedAccount
. Now this "account" parameter is strictly without data, and the trailingOption<&[u8]>
must be properly requested in order to access account data.This PR does not change the back-end code used to scan the storage itself—that will be done in a follow up PR.