go/oasis-node/cmd/storage: Add create and import checkpoint cmd#6454
go/oasis-node/cmd/storage: Add create and import checkpoint cmd#6454martintomazic wants to merge 3 commits intomasterfrom
Conversation
ea89ecc to
c5e2f2a
Compare
9761a3c to
0bba8dd
Compare
fe09fe6 to
f833d73
Compare
0bba8dd to
41b49b4
Compare
f833d73 to
b47eb6c
Compare
41b49b4 to
b31dfff
Compare
✅ Deploy Preview for oasisprotocol-oasis-core canceled.
|
b31dfff to
744884b
Compare
|
Works! :) The only thing that is impractical is finding corresponding runtime rounds to given consensus height and the fact that bootstrap "eats" one height as described. Finally, one should be very careful with creation/import height/rounds so that you have all relevant light history for the runtime checkpoints you are importing. |
ef92148 to
a41d394
Compare
| if height != 0 { // TODO handle zero value vs not set correctly. | ||
| if err := createConsensusCp(); err != nil { | ||
| return fmt.Errorf("failed to create consensus checkpoint (height: %d): %w", height, err) |
There was a problem hiding this comment.
Maybe use default undefined round (aka max uint64), alternative is cmd.Flags().Changed("height").
Update: Alternative is explicit --consensus flag or possible consensus/runtime sub-commands. No height/round could also mean latest height. -all flag with --height would be also interesting if it would find corresponding runtime rounds for the given height.
There was a problem hiding this comment.
Currently things are not fine, as you can do consensus and runtime checkpoints at the same time. Furthermore, this might be confusing for users, e.g.., do they need to set height, round, both?
There was a problem hiding this comment.
Yes I left it intentional for now. I can easily allow only one at the time. The question is would using sub-commands make things even clearer? Also any preference for what should omitting height/round do?
Observe also that if you want to be able to create checkpoints for multiple heights/versions (same NodeDB) the import command also grows in complexity.
Finally, one should be very careful for which height to specific runtime round combinations you are creating as you can easily get left missing missing runtime light blocks, and therefore stuck runtime state restore.
We could also make import command use consensus/runtime/height/round and import the actual directories created one by one to make it symmetric to create.
Any pref?
a41d394 to
206c70e
Compare
|
Creating checkpoints from the penultimate snapshot, is dominated by the Sapphire checkpoint creation. With 6 chunker threads current projection is 5-7 hours (will update). Import is a matter of minutes. |
817bc76 to
2be35e9
Compare
|
Added unit and e2e tests, fixed empty state corner case and improved code quality. Two minor things left to discuss:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6454 +/- ##
==========================================
- Coverage 64.73% 64.56% -0.18%
==========================================
Files 699 700 +1
Lines 68246 68581 +335
==========================================
+ Hits 44179 44279 +100
- Misses 19060 19183 +123
- Partials 5007 5119 +112 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
peternose
left a comment
There was a problem hiding this comment.
When I import a consensus checkpoint, I get few lines of the following error. Afterwards, blocks execute normally.
{"caller":"grpc.go:194","err":"failed to get consensus status: failed to fetch current block: cometbft: block query failed: height 28800866 must be less than or equal to the current blockchain height 0","level":"error","method":"/oasis-core.NodeController/GetStatus","module":"grpc/internal","msg":"request failed","req_seq":15,"ts":"2026-02-24T13:00:28.934344662Z"}
| return cmd | ||
| } | ||
|
|
||
| func createCheckpoints(ctx context.Context, ndb api.NodeDB, ns common.Namespace, version uint64, outputDir string) error { |
There was a problem hiding this comment.
Maybe creating a struct checkpointer would be better, as you could create multiple checkpoints with the same struct, e.g.
cp.Create(ctx, 1, "/checkpoints/1")
cp.Create(ctx, 2, "/checkpoints/2")
...
or even without outputDir if accepted in the constructor. The new struct might also be easier to test and could be decoupled from the commands.
There was a problem hiding this comment.
update:
The new struct might also be easier to test and could be decoupled from the commands.
newCheckpointer + cp.create (proposed) = createCheckpoints (current) so this is only a matter of style.
As usual I prefer an explicit function over abstractions until multiple methods share same parameter set.
The question is do we want to allow multiple checkpoint heights/rounds for the same NodeDB.
If you want to make this generic helper I think this would fit inside checkpoint package.
See ( #6467):
// Consider using functional optional arguments to shorten args list.
CreateCheckpoint(ctx context.Context, ndb db.NodeDB, store Store, root node.Root, chunkSize uint64, chunkerThreads uint16) (*Metadata, error)
// Maybe add CreateAllCheckpoints as well and avoid passing root there.There was a problem hiding this comment.
Yes, it is a matter of style by the quality/complexity of the code. And more parameters you have, the harder is to follow. And as suggested, compare code if you need to create multiple checkpoints for different versions with a function or with a method call.
| if height != 0 { // TODO handle zero value vs not set correctly. | ||
| if err := createConsensusCp(); err != nil { | ||
| return fmt.Errorf("failed to create consensus checkpoint (height: %d): %w", height, err) |
There was a problem hiding this comment.
Currently things are not fine, as you can do consensus and runtime checkpoints at the same time. Furthermore, this might be confusing for users, e.g.., do they need to set height, round, both?
2be35e9 to
06b5cc4
Compare
e9c6485 to
9633a30
Compare
Nice catch. Yes this is also how CometBFT checkpoint import works, but found a fixup regardless :). The more annoying thing that I find is that you technically cannot import a checkpoint for the latest height, so probably adding extra validation + documenting this in the command would be beneficial, instead of unexpected error. |
|
Ready for a second review. As you spotted I am "abusing" However, this command is technically not For this reason I have created my helpers (stateless), so that they can also be easily refactored, possibly moved to Let's align on the user facing API and sanity checking the inputs:
|
Will have a look. Merge after we release 26.0. |
The test should be ideally hardened by also making sure the target node also syncs up to the tip of the runtime chain and not just consensus. Furthermore, given that e2e tests are expensive and meant to test complex scenarios, my suggestions would be to also run prune, compact, and inspect command on the source node prior to creating a checkpoint. This way we would "smoke test" remaining storage commands, and the scenario could be called storaged_cmds instead.
9633a30 to
f082fcd
Compare
Closes #6423