Skip to content

Allows scanning slot and getting stored account info without data #5838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Apr 15, 2025

Problem

The scan_account_storage() function (reproduced below) has a scan_account_storage_data parameter to indicate if the caller wants the account data or not. Unfortunately this parameter is not respected; we always load the account data. We should not do that.

    /// Scan a specific slot through all the account storage
    pub(crate) fn scan_account_storage<R, B>(
        &self,
        slot: Slot,
        cache_map_func: impl Fn(&LoadedAccount) -> Option<R> + Sync,
        storage_scan_func: impl Fn(&B, &LoadedAccount, Option<&[u8]>) + Sync,
        scan_account_storage_data: ScanAccountStorageData,
    ) -> ScanStorageResult<R, B>

Summary of Changes

This PR introduces a new enum variant, LoadedAccount::StoredNoData, to indicate we have loaded an account from a storage file, but without its account data.

This means the LoadedAccount itself may not have data(), which revealed that the current code does not respect the ScanAccountStorageData::NoData value at all, since the non-test code all needs the account data.

This PR does not change the back-end code used to scan the storage itself—that will be done in a follow up PR. This PR allows the caller of scan_account_storage() to specify if they want account data or not, and have the choice be respected.

@brooksprumo brooksprumo self-assigned this Apr 15, 2025
@brooksprumo brooksprumo marked this pull request as ready for review April 15, 2025 20:53
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 62.29508% with 23 lines in your changes missing coverage. Please review.

Project coverage is 83.0%. Comparing base (5c52a62) to head (e055abf).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #5838     +/-   ##
=========================================
- Coverage    83.0%    83.0%   -0.1%     
=========================================
  Files         828      828             
  Lines      375695   375730     +35     
=========================================
- Hits       311893   311892      -1     
- Misses      63802    63838     +36     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cpubot
Copy link

cpubot commented Apr 16, 2025

I'm curious if we can avoid overloading LoadedAccount by further generalizing scan_cache_storage_fallback and using more specific data structures for each type of scan. Ideally this would allow us to avoid further complexity and runtime panic cases being bundled into LoadedAccount in favor of more specific and type safe methods. By using an enum for ScanAccountStorageData, we force a burden on higher level callers to make sure they are aware of the variant of ScanAccountStorageData such that they don't call .take_account() and panic. If we introduce separate types for the different scan types (similar to what AppendVec is doing), we can avoid this altogether (and IMO make the code easier to reason about).

For example, if we relax the requirement on scan_cache_storage_fallback of producing a LoadedAccount, we can allow the more specialized scan functions to convert a CachedAccount into whatever structure is appropriate for the scan type.

pub fn scan_cache_storage_fallback<R, B>(
    &self,
    slot: Slot,
    cache_func: impl Fn(&Arc<CachedAccount>) -> Option<R> + Sync,
    storage_fallback_func: impl Fn(&mut B, &AccountsFile) + Sync,
) -> ScanStorageResult<R, B>
where
    R: Send,
    B: Send + Default + Sync,
{
    if let Some(slot_cache) = self.accounts_cache.slot_cache(slot) {
        // If we see the slot in the cache, then all the account information
        // is in this cached slot
        ScanStorageResult::Cached(
            slot_cache
                .iter()
                .filter_map(|account| cache_func(account.value()))
                .collect(),
        )
    } else {
        let mut retval = B::default();
        // If the slot is not in the cache, then all the account information must have
        // been flushed. This is guaranteed because we only remove the rooted slot from
        // the cache *after* we've finished flushing in `flush_slot_cache`.
        // Regarding `shrinking_in_progress_ok`:
        // This fn could be running in the foreground, so shrinking could be running in the background, independently.
        // Even if shrinking is running, there will be 0-1 active storages to scan here at any point.
        // When a concurrent shrink completes, the active storage at this slot will
        // be replaced with an equivalent storage with only alive accounts in it.
        // A shrink on this slot could have completed anytime before the call here, a shrink could currently be in progress,
        // or the shrink could complete immediately or anytime after this call. This has always been true.
        // So, whether we get a never-shrunk, an about-to-be shrunk, or a will-be-shrunk-in-future storage here to scan,
        // all are correct and possible in a normally running system.
        if let Some(storage) = self
            .storage
            .get_slot_storage_entry_shrinking_in_progress_ok(slot)
        {
            storage_fallback_func(&mut retval, &storage.accounts);
        }

        ScanStorageResult::Stored(retval)
    }
}

This would enable something like the following, which closely resembles scan_account_storage, but depends on StoredAccountInfoWithoutData rather than LoadedAccount.

pub(crate) fn scan_account_storage_without_data<R, B>(
    &self,
    slot: Slot,
    cache_map_func: impl Fn(&StoredAccountInfoWithoutData) -> Option<R> + Sync,
    storage_scan_func: impl Fn(&mut B, &StoredAccountInfoWithoutData) + Sync,
) -> ScanStorageResult<R, B>
where
    R: Send,
    B: Send + Default + Sync,
{
    self.scan_cache_storage_fallback(
        slot,
        |cached_account| {
            cache_map_func(&StoredAccountInfoWithoutData {
                pubkey: *cached_account.pubkey(),
                lamports: cached_account.lamports(),
                owner: cached_account.owner(),
                data_len: cached_account.data_len(),
                executable: cached_account.executable(),
                rent_epoch: cached_account.rent_epoch(),
            })
        },
        |retval, storage| {
            storage.scan_accounts_without_data(|account| storage_scan_func(retval, &account))
        },
    )
}

And scan_cache_storage_fallback is still general enough to implement scan_account_storage with slight modification.

pub(crate) fn scan_account_storage<R, B>(
    &self,
    slot: Slot,
    cache_map_func: impl Fn(&LoadedAccount) -> Option<R> + Sync,
    storage_scan_func: impl Fn(&mut B, &LoadedAccount) + Sync,
) -> ScanStorageResult<R, B>
where
    R: Send,
    B: Send + Default + Sync,
{
    self.scan_cache_storage_fallback(
        slot,
        |cached_account| cache_map_func(&LoadedAccount::Cached(Cow::Borrowed(cached_account))),
        |retval, storage| {
            storage.scan_accounts(|account| {
                let loaded_account = LoadedAccount::Stored(account);
                storage_scan_func(retval, &loaded_account, data)
            })
        },
    )
}

With this scheme, we can remove the ScanAccountStorageData enum altogether, and call sites can just use whatever scan function is appropriate for their needs.


All this being said, I am not at all an expert in this part of the codebase, and there may be more to the picture with LoadedAccount that I am not aware of, but thought I'd throw this out there as it was something I was thinking about when working on this area!

@brooksprumo
Copy link
Author

Closing, as #5840 was merged instead.

@brooksprumo brooksprumo deleted the accounts/stored-account-info-without-data branch April 16, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants