Skip to content

Conversation

satushh
Copy link
Collaborator

@satushh satushh commented Sep 15, 2025

What type of PR is this?

Bug fix

What does this PR do? Why is it needed?

earliest_available_slot needs to be updated (if required) when pruning occurs. Since peers will think we have data for slot n, while in reality, we wiped data up to slot m with n < m.

Which issues(s) does this PR fix?

Part of #14129

Other notes for review

Steps to reproduce:

bazel build //cmd/beacon-chain:oci_image_tarball \
  --platforms=@io_bazel_rules_go//go/toolchain:linux_arm64_cgo \
  --config=release \
  --//proto:network=minimal \
  --@io_bazel_rules_go//go/config:tags=minimal

docker load -i bazel-bin/cmd/beacon-chain/oci_image_tarball/tarball.tar
docker tag gcr.io/offchainlabs/prysm/beacon-chain prysm-bn-custom-image:minimal-build

bazel build //cmd/validator:oci_image_tarball \
  --platforms=@io_bazel_rules_go//go/toolchain:linux_arm64_cgo \
  --config=release \
  --//proto:network=minimal \
  --@io_bazel_rules_go//go/config:tags=minimal

docker load -i bazel-bin/cmd/validator/oci_image_tarball/tarball.tar
docker tag gcr.io/offchainlabs/prysm/validator prysm-vc-custom-image:minimal-build
  • Then use this yaml in ethereum-package to run kurtosis:
participants:
  - el_type: geth
    el_image: ethpandaops/geth:fusaka-devnet-3
    cl_type: prysm
    cl_image: "prysm-bn-custom-image:minimal-build"
    vc_image: "prysm-vc-custom-image:minimal-build"
    count: 2
    supernode: true
    cl_extra_params:
      - --verbosity=debug
      - --pruner-retention-epochs=4
      - --beacon-db-pruning
      - --subscribe-all-subnets
network_params:
  preset: minimal
  electra_fork_epoch: 0
  fulu_fork_epoch: 1
  min_epochs_for_block_requests: 3
ethereum_genesis_generator_params:
  image: ethereum-genesis-generator:local
additional_services:
  - dora        
  - spamoor
spamoor_params:
  image: ethpandaops/spamoor:master
  max_mem: 4000
  spammers:
    - scenario: eoatx
      config:
        throughput: 200
    - scenario: blobs
      config:
        throughput: 20    

Acknowledgements

@satushh satushh requested a review from Copilot September 16, 2025 11:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Updates the beacon chain node to properly maintain the earliest available slot value when pruning operations occur. This ensures peers receive accurate information about data availability after pruning removes older blocks and data columns.

  • Integrates custody updater functionality with both pruner and data column storage services
  • Modifies pruning logic to update earliest available slot after data removal
  • Adds comprehensive test coverage for custody update scenarios

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
changelog/satushh-update-easlot-pruning.md Documents the feature in changelog
beacon-chain/node/node.go Configures P2P service as custody updater for pruner and data column storage
beacon-chain/db/pruner/pruner.go Adds custody updater interface and updates earliest available slot after pruning
beacon-chain/db/pruner/pruner_test.go Tests custody update functionality with mock implementations
beacon-chain/db/filesystem/data_column.go Adds custody updater support to data column storage pruning
beacon-chain/db/filesystem/data_column_test.go Tests data column storage custody updates
beacon-chain/db/pruner/BUILD.bazel Adds test dependency for assert library
beacon-chain/db/filesystem/BUILD.bazel Adds test dependency for assert library

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

custodyGroupCount, err := p.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after pruning")
return nil
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning nil here prevents the method from properly signaling an error state to callers. This should return the original error instead of nil since the custody update failure represents a genuine error condition that calling code should be aware of.

Suggested change
return nil
return errors.Wrap(err, "failed to get custody group count for earliest available slot update")

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the pruning itself is successful.

Comment on lines 619 to 626
return
}

// Get current custody group count to preserve it during update
custodyGroupCount, err := dcs.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after data column pruning")
return
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early returns from this method prevent the cleanup of muChans map which happens after the custody update logic. The custody update errors should be logged but not cause early returns, or the muChans cleanup should be moved before the custody update logic.

Suggested change
return
}
// Get current custody group count to preserve it during update
custodyGroupCount, err := dcs.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after data column pruning")
return
// Do not return early; continue to allow all cleanup logic to run
}
// Get current custody group count to preserve it during update
custodyGroupCount, err := dcs.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after data column pruning")
// Do not return early; continue to allow all cleanup logic to run

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what custodyGroupCount should be set to here if we can't fetch it.

Comment on lines 619 to 626
return
}

// Get current custody group count to preserve it during update
custodyGroupCount, err := dcs.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after data column pruning")
return
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early returns from this method prevent the cleanup of muChans map which happens after the custody update logic. The custody update errors should be logged but not cause early returns, or the muChans cleanup should be moved before the custody update logic.

Suggested change
return
}
// Get current custody group count to preserve it during update
custodyGroupCount, err := dcs.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after data column pruning")
return
// Do not return early; continue to cleanup
}
// Get current custody group count to preserve it during update
custodyGroupCount, err := dcs.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after data column pruning")
// Do not return early; continue to cleanup

Copilot uses AI. Check for mistakes.

@satushh satushh marked this pull request as ready for review September 16, 2025 11:23
@satushh satushh requested a review from nalepae September 16, 2025 11:23
@@ -0,0 +1,3 @@
### Added

- Update the earliest available slot while pruning. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bit more to this changelog? These are used to compile the release notes and this isn't enough context to understand why this change is important

Comment on lines 188 to 207
if p.custody != nil {
earliestAvailableSlot := pruneUpto + 1

// Get current custody group count to preserve it during update
custodyGroupCount, err := p.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after pruning")
return nil
}

// Update the custody info with new earliest available slot
_, _, err = p.custody.UpdateCustodyInfo(earliestAvailableSlot, custodyGroupCount)
if err != nil {
log.WithError(err).WithField("earliestAvailableSlot", earliestAvailableSlot).
Error("Failed to update earliest available slot after pruning")
} else {
log.WithField("earliestAvailableSlot", earliestAvailableSlot).
Debug("Updated earliest available slot after pruning")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should probably into its own method for a better abstraction/separation of concerns.

Suggested change
if p.custody != nil {
earliestAvailableSlot := pruneUpto + 1
// Get current custody group count to preserve it during update
custodyGroupCount, err := p.custody.CustodyGroupCount()
if err != nil {
log.WithError(err).Error("Failed to get custody group count, cannot update earliest available slot after pruning")
return nil
}
// Update the custody info with new earliest available slot
_, _, err = p.custody.UpdateCustodyInfo(earliestAvailableSlot, custodyGroupCount)
if err != nil {
log.WithError(err).WithField("earliestAvailableSlot", earliestAvailableSlot).
Error("Failed to update earliest available slot after pruning")
} else {
log.WithField("earliestAvailableSlot", earliestAvailableSlot).
Debug("Updated earliest available slot after pruning")
}
}
p.updateEarliestSlot(pruneUpto+1)


// Update the earliest available slot via injected updater after pruning.
// The earliest available slot is the first slot after the pruned epochs.
if dcs.custody != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feedback here about moving the "update earliest slot" logic out of the prune method and into its own method.

@james-prysm james-prysm added peer-das fulu optimization PR making Prysm more effective labels Sep 18, 2025
require.NoError(t, err)

// Test the pruning calculation
currentSlot := primitives.Slot(20000) // Reasonable slot number for minimal config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't assume this kind of "reasonable" number. If the ETH spec changes, then this test may break.
Instead, override the config.MinEpochsForBlockRequests at the start of the test.

Unfortunatly, for some historical reasons, this value is recomputed in the MinEpochsForBlockRequests function.

This function was defined and used before the corresponding value was introduced in the spec.
So, IMO the cleanest way to do it is:

  1. In the Prysm codebase, remove the MinEpochsForBlockRequests and directly use config.MinEpochsForBlockRequests where MinEpochsForBlockRequests where used.
  2. In this test, override config.MinEpochsForBlockRequests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: f324089

}

// P2P service is required for pruner to update earliest available slot after pruning
p2pService := b.fetchP2P()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can be done directly in pruner.New, so responsability is shifter from caller to callee.


// UpdateCustodyInfo atomically updates the custody group count only it is greater than the stored one.
// In this case, it also updates the earliest available slot with the provided value.
// UpdateCustodyInfo updates the custody group count if it is greater than the stored one,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why atomically had been removed?

// Update earliest available slot if it advanced.
// This is for pruning to work correctly as blocks are pruned,
// the earliest available slot moves forward independently of custody group changes.
if earliestAvailableSlot > storedEarliestAvailableSlot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing the earliestAvailableSlot only while not increasing the custodyGroupCount at the same time should not be allowed is the new value of earliestAvailableSlot is higher than the minimum slot to be stored.

Else it means that a node could simply choose to put earliestAvailableslot = currentSlot and do not serve any data to its peers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as #15694 (comment) actually.

// custodyUpdater is a tiny interface that p2p service implements; kept here to avoid
// importing the p2p package and creating a cycle.
type custodyUpdater interface {
CustodyGroupCount(context.Context) (uint64, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustodyGroupCount can be renamed into GroupCount to avoid

...custody.CutodyGroupCount


// updateEarliestSlot updates the earliest available slot via the injected custody updater
// and also persists it to the database.
func (p *Service) updateEarliestSlot(earliestAvailableSlot primitives.Slot) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateEarliestSlot ==> updateEarliestAvailableSlot to stick with the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should have the same checks/protections as UpdateEarliestAvailableSlot on the P2P side.

"duration": time.Since(tt),
"currentSlot": slot,
"batchSize": defaultPrunableBatchSize,
"numBatches": numBatches,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log should be displayed after updating the earliest available slot to reflect the real state of the node.

Comment on lines 148 to 153
var minRequiredEpoch primitives.Epoch
if currentEpoch > minEpochsForBlocks {
minRequiredEpoch = currentEpoch - minEpochsForBlocks
} else {
minRequiredEpoch = 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var minRequiredEpoch primitives.Epoch
if currentEpoch > minEpochsForBlocks {
minRequiredEpoch = currentEpoch - minEpochsForBlocks
} else {
minRequiredEpoch = 0
}
minRequiredEpoch := primitives.Epoch(0)
if currentEpoch > minEpochsForBlocks {
minRequiredEpoch = currentEpoch - minEpochsForBlocks
}


// updateEarliestSlot updates the earliest available slot via the injected custody updater
// and also persists it to the database.
func (p *Service) updateEarliestSlot(earliestAvailableSlot primitives.Slot) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should have the same checks/protections as UpdateEarliestAvailableSlot on the P2P side.

params.OverrideBeaconConfig(config)

t.Run("Valid update", func(t *testing.T) {
const initialSlot primitives.Slot = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use

const (
    initialSlot primitives.Slot = 50
    const newSlot primitives.Slot = 100
    ...
)

grouped declaration.

}

// Update the earliest available slot
if err := p.custody.UpdateEarliestAvailableSlot(earliestAvailableSlot); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.custody.UpdateEarliestAvailableSlot could return the (possibly updated) custody group count to avoid to call p.custody.CustodyGroupCount later, before saving into the DB.

(As done in UpdateCustodyInfo.)

@nalepae nalepae removed the optimization PR making Prysm more effective label Oct 13, 2025
defer s.custodyInfoLock.Unlock()

if s.custodyInfo == nil {
return 0, 0, errors.New("no custody info available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add a test case for that.

// required slot (based on MIN_EPOCHS_FOR_BLOCK_REQUESTS from current time).
// This prevents nodes from arbitrarily refusing to serve mandatory historical data.
if earliestAvailableSlot > storedEarliestAvailableSlot {
// If custody group count is NOT increasing, validate the increase is allowed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to invert the if condition (and return in the if) to reduce the indentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in case of backfill? Is the stored earliestAvailableSlot value decreased in the DB?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants