internal/era: New EraE implementation#32157
Conversation
MariusVanDerWijden
left a comment
There was a problem hiding this comment.
lint is failing, you can check on your machine with make lint
| @@ -0,0 +1,429 @@ | |||
| package era2 | |||
lightclient
left a comment
There was a problem hiding this comment.
Left a bunch of comments here - we still need to figure out how to call this module since era2 isn't very elegant. In the meantime should rename builder2.go to just builder.go and era2.go to era.go.
| } | ||
| } | ||
|
|
||
| func (b *Builder) Add(header types.Header, body types.Body, receipts types.Receipts, blockhash common.Hash, blocknum uint64, td *big.Int, proof *Proof) error { |
There was a problem hiding this comment.
blockhash and blocknum are already available via header, so no need to duplicate them
| } | ||
| } | ||
|
|
||
| func (b *Builder) Add(header types.Header, body types.Body, receipts types.Receipts, blockhash common.Hash, blocknum uint64, td *big.Int, proof *Proof) error { |
There was a problem hiding this comment.
I would separate this into Add and AddRLP
| headersRLP [][]byte | ||
| bodiesRLP [][]byte | ||
| receiptsRLP [][]byte | ||
| proofsRLP [][]byte |
There was a problem hiding this comment.
would remove RLP suffix since it is pretty explanatory by the type [][]byte
| "github.com/klauspost/compress/snappy" | ||
| ) | ||
|
|
||
| type meta struct { |
There was a problem hiding this comment.
| type meta struct { | |
| type metadata struct { |
| type meta struct { | ||
| start uint64 // start block number | ||
| count uint64 // number of blocks in the era | ||
| compcount uint64 // number of properties |
There was a problem hiding this comment.
| compcount uint64 // number of properties | |
| components uint64 // number of properties |
| start uint64 // start block number | ||
| count uint64 // number of blocks in the era | ||
| compcount uint64 // number of properties | ||
| filelen int64 // length of the file in bytes |
There was a problem hiding this comment.
| filelen int64 // length of the file in bytes | |
| length int64 // length of the file in bytes |
| mu *sync.Mutex | ||
| headeroff, bodyoff, receiptsoff, tdoff, proofsoff []uint64 // offsets for each entry type | ||
| indstart int64 | ||
| rootheader uint64 // offset of the root header in the file if present |
There was a problem hiding this comment.
when reading each era file I load in the index table into cache to make lookup faster, indstart is where the byte where the index table starts, and the root header is where the accumulator root is present when reading so it can seek there quickly since it is its own e2store object. The mutex should be removed though, forgot to do so very early on when I didn't understand what the file was doing I thought I wouldn't want it to read and write at the same time so put a lock.
| m meta // metadata for the era2 file | ||
| mu *sync.Mutex | ||
| headeroff, bodyoff, receiptsoff, tdoff, proofsoff []uint64 // offsets for each entry type | ||
| indstart int64 |
|
Also, please write a description for your PRs and try to fix the CI errors so your code is all green. |
|
|
||
| func (*BlockProofHistoricalSummariesDeneb) Variant() proofvar { return proofDeneb } | ||
|
|
||
| func proofVariantOf(p Proof) proofvar { |
There was a problem hiding this comment.
| func proofVariantOf(p Proof) proofvar { | |
| func variantOf(p Proof) proofvar { |
| @@ -57,41 +57,33 @@ const ( | |||
|
|
|||
| type proofvar uint16 | |||
There was a problem hiding this comment.
| type proofvar uint16 | |
| type variant uint16 |
We know that the variant is for Proofs, should be at most a comment, not part of the variable name, otherwise you end up with the types like mev-boost :D
There was a problem hiding this comment.
gotcha will make the change :)
lightclient
left a comment
There was a problem hiding this comment.
Left a lot of style nits, but the main problems we'll need to work through are:
- utilize proof interface more, avoid handling variant values directly
- remove unneeded struct fields
- avoid building entire era file in memory, write incremental work to disk
| buff buffer | ||
| off offsets | ||
|
|
||
| prooftype variant |
There was a problem hiding this comment.
This should be removed. If you store the proofs as the interface object you will be able to determine the variant at anytime / alternatively if it makes better sense to store the bytes, then the bytes will already have included the variant type so it won't matter anymore.
| off offsets | ||
|
|
||
| prooftype variant | ||
| tdsint []*big.Int |
There was a problem hiding this comment.
This should be removed and tds in buffer should be big.Ints. Serialize them all at once during finalize.
| type Builder struct { | ||
| w *e2store.Writer | ||
| buf *bytes.Buffer | ||
| sn *snappy.Writer |
There was a problem hiding this comment.
| sn *snappy.Writer | |
| snappy *snappy.Writer |
| } | ||
|
|
||
| // retrieves the raw body frame in bytes of a specific block | ||
| func (e *Era) GetRawBodyFrameByNumber(blockNum uint64) ([]byte, error) { |
There was a problem hiding this comment.
what do you mean by "frame" here?
There was a problem hiding this comment.
It is the raw bytes that are written, without snappy decoding it and rlp decoding
There was a problem hiding this comment.
I think you should decode the snappy here (and other methods like this) and return the RLP bytes
| return io.ReadAll(r) | ||
| } | ||
|
|
||
| // retrieves the raw receipts frame in bytes of a specific block |
There was a problem hiding this comment.
| // retrieves the raw receipts frame in bytes of a specific block | |
| // GetRawReceiptsFrameByNumber retrieves the raw receipts frame in bytes of a specific block. |
| return io.ReadAll(r) | ||
| } | ||
|
|
||
| // retrieves the raw proof frame in bytes of a specific block proof |
There was a problem hiding this comment.
| // retrieves the raw proof frame in bytes of a specific block proof | |
| // GetRawProofFrameByNumber retrieves the raw proof frame in bytes of a specific block proof. |
| return io.ReadAll(r) | ||
| } | ||
|
|
||
| // loads in the index table containing all offsets and caches it |
There was a problem hiding this comment.
| // loads in the index table containing all offsets and caches it | |
| // loadIndex loads in the index table containing all offsets and caches it. |
| // Getter methods to calculate offset of a specific component in the file. | ||
| func (e *Era) headerOff(num uint64) (uint64, error) { return e.indexOffset(num, compHeader) } | ||
| func (e *Era) bodyOff(num uint64) (uint64, error) { return e.indexOffset(num, compBody) } | ||
| func (e *Era) rcptOff(num uint64) (uint64, error) { return e.indexOffset(num, compReceipts) } |
There was a problem hiding this comment.
| func (e *Era) rcptOff(num uint64) (uint64, error) { return e.indexOffset(num, compReceipts) } | |
| func (e *Era) receiptOff(num uint64) (uint64, error) { return e.indexOffset(num, compReceipts) } |
Really no reason to ever abbreviate by taking out vowels in golang.
| // following the Era format. | ||
| func ExportHistory(bc *core.BlockChain, dir string, first, last, step uint64) error { | ||
| log.Info("Exporting blockchain history", "dir", dir) | ||
| func ExportHistory(bc *core.BlockChain, dir string, first, last, step uint64, f ExportFormat) error { |
There was a problem hiding this comment.
I would not use an enum to dictate the output format, this should be done via the method naming, e.g.
ExportHistoryEra1(..)
ExportHistoryEraE(..)
| if f == Era1 { | ||
| filename = era.Filename | ||
| newBuilder = func(w io.Writer) any { return era.NewBuilder(w) } | ||
| add = func(b any, blk *types.Block, rcpt types.Receipts, td *big.Int) error { | ||
| return b.(*era.Builder).Add(blk, rcpt, td) | ||
| } | ||
| } else { | ||
| filename = era2.Filename | ||
| newBuilder = func(w io.Writer) any { return era2.NewBuilder(w) } | ||
| add = func(b any, blk *types.Block, rcpt types.Receipts, td *big.Int) error { | ||
| return b.(*era2.Builder).Add(*blk.Header(), *blk.Body(), rcpt, td, nil) | ||
| } | ||
| } |
There was a problem hiding this comment.
This is kind of impressive, but also is what an Interface is for :)
If you want different builders with the same methods (like Add) you can create the interface type and implement the method for both.
| receipts := bc.GetReceiptsByHash(block.Hash()) | ||
| if receipts == nil { | ||
| return fmt.Errorf("export failed on #%d: receipts not found", n) | ||
| rcpt := bc.GetReceiptsByHash(blk.Hash()) |
There was a problem hiding this comment.
| rcpt := bc.GetReceiptsByHash(blk.Hash()) | |
| receipts := bc.GetReceiptsByHash(blk.Hash()) |
|
|
||
| for j := uint64(0); j < step && batch+j <= last; j++ { | ||
| n := batch + j | ||
| blk := bc.GetBlockByNumber(n) |
There was a problem hiding this comment.
| blk := bc.GetBlockByNumber(n) | |
| block := bc.GetBlockByNumber(n) |
| ) | ||
| if block == nil { | ||
| return fmt.Errorf("export failed on #%d: not found", n) | ||
| bldr := newBuilder(f) |
There was a problem hiding this comment.
| bldr := newBuilder(f) | |
| builder := newBuilder(f) |
| tds []*big.Int | ||
| } | ||
|
|
||
| // The offsets holds the offsets of the different block components in the e2store file. Eventually these offsets will be used to write the index table at the end of the file. |
There was a problem hiding this comment.
| // The offsets holds the offsets of the different block components in the e2store file. Eventually these offsets will be used to write the index table at the end of the file. | |
| // offsets holds the offsets of the different block components in the e2store file. Eventually these offsets will be used to write the index table at the end of the file. |
| buf *bytes.Buffer | ||
|
|
||
| buff buffer |
There was a problem hiding this comment.
we still have buf and buff, I think we discussed removing buf since it isn't used until the end?
| // Add writes a block entry, its reciepts, and optionally its proofs as well into the e2store file. | ||
| func (b *Builder) Add(header types.Header, body types.Body, receipts types.Receipts, td *big.Int, proof Proof) error { | ||
| if len(b.buff.headers) == 0 { // first block determines wether proofs are expected | ||
| b.expectsProofs = proof != nil |
There was a problem hiding this comment.
i don't think you need to track this explicitly, just check if b.buff.proofs != nil and if proof == nil or vice versa. only special case is first block, which you allow for the b.buff.proofs to be nil even if proof is non-nil.
| if err != nil { | ||
| return common.Hash{}, fmt.Errorf("compute accumulator: %w", err) | ||
| } | ||
| if n, err := b.w.Write(TypeAccumulatorRoot, accRoot[:]); err != nil { |
There was a problem hiding this comment.
looks like this still needs to be addressed
| } | ||
|
|
||
| // retrieves the raw body frame in bytes of a specific block | ||
| func (e *Era) GetRawBodyFrameByNumber(blockNum uint64) ([]byte, error) { |
There was a problem hiding this comment.
I think you should decode the snappy here (and other methods like this) and return the RLP bytes
| type Iterator interface { | ||
| Next() bool | ||
| Number() uint64 | ||
| Block() (*types.Block, error) | ||
| Receipts() (types.Receipts, error) | ||
| Error() error | ||
| } |
There was a problem hiding this comment.
This interface looks right, but it's in the wrong place. We should define it in the era package. I will take a stab at this.
| func (era1Format) Filename(n string, e int, h common.Hash) string { return era.Filename(n, e, h) } | ||
| func (era1Format) NewBuilder(w io.Writer) Builder { return &era1Builder{era.NewBuilder(w)} } | ||
| func (era1Format) ReadDir(dir, net string) ([]string, error) { return era.ReadDir(dir, net) } | ||
| func (era1Format) NewIterator(f *os.File) (Iterator, error) { | ||
| e, err := era.From(f) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| return era.NewIterator(e) | ||
| } |
There was a problem hiding this comment.
you shouldn't need to redefine these methods to satisfy the interface - if the type implements the interface methods then it can be accepted anywhere the interface is accepted
| // starting from genesis. The assumption is held that the provided chain | ||
| // segment in Era1 file should all be canonical and verified. | ||
| func ImportHistory(chain *core.BlockChain, dir string, network string) error { | ||
| func ImportHistory(chain *core.BlockChain, dir string, network string, format Format) error { |
There was a problem hiding this comment.
So on this method and ExportHistory we can avoid using this Format type by just passing in the functions we need. For example, here we need a way to read all the entries and a way to create an iterator from an open file. If we just pass those two functions in, we can create an if statement in the cli handling so that we can reuse the function.
The issue with format is it is kind of a superfluous type.
828377d to
a85cbd5
Compare
1c247de to
b0dadc3
Compare
Fixed all issues including extra logic regarding proof types, modularizing some functions and refactoring code for correctness and readability.
| if err != nil { | ||
|
|
||
| var ( | ||
| format = ctx.String(utils.EraFormatFlag.Get(ctx)) |
There was a problem hiding this comment.
| format = ctx.String(utils.EraFormatFlag.Get(ctx)) | |
| format = ctx.String(utils.EraFormatFlag.Name) |
|
Also seen this panic: |
|
Export on mainnet completed in ~22 hours, import in ~62 hours. Definitely would be good to look into speeding up import some, but overall this is ready to come into geth. Thanks @shazam8253 for your hard work on this last summer - sorry it took so long to finally get in! Thank you @s1na for pushing it over the edge. I've been treading water here for too long. |
|
@lightclient @s1na Thank you guys for pushing this through, super happy to see this! Had a great time this summer, and hopefully when I have more bandwidth I can make a PR again soon. |
This PR allows users to prune their nodes up to the Prague fork. It indirectly depends on #32157 and can't really be merged before eraE files are widely available for download. The `--history.chain` flag becomes mandatory for `prune-history` command. Here I've listed all the edge cases that can happen and how we behave: ## prune-history Behavior | From | To | Result | |-------------|--------------|--------------------------| | full | postmerge | ✅ prunes | | full | postprague | ✅ prunes | | postmerge | postprague | ✅ prunes further | | postprague | postmerge | ❌ can't unprune | | any | all | ❌ use import-history | ## Node Startup Behavior | DB State | Flag | Result | |-------------|--------------|----------------------------------------------------------------| | fresh | postprague | ✅ syncs from Prague | | full | postprague | ❌ "run prune-history first" | | postmerge | postprague | ❌ "run prune-history first" | | postprague | postmerge | ❌ "can't unprune, use import-history or fix flag" | | pruned | all | ✅ accepts known prune points |
Here is a draft for the New EraE implementation. The code follows along with the spec listed at this link.