Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"changes": [
{
"packageName": "@subsquid/util-internal-archive-layout",
"comment": "fix raw writer when batch size is too small",
"type": "patch"
}
],
"packageName": "@subsquid/util-internal-archive-layout"
}
1 change: 1 addition & 0 deletions util/util-internal-archive-layout/src/layout.ts
Original file line number Diff line number Diff line change
Expand Up @@ -444,4 +444,5 @@ function* pack<T>(items: T[], size: number): Iterable<T[]> {
}

items.splice(0, offset)
yield items
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the whole idea of committing the data in fixed sized chunks. This is crucial for reproducibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if we keep this code as it is then some chunks will not be complete.
recently i discovered that solana raw dataset had ~5 blocks gap. 2 dump isntances were running: one from a specific block and without --last-block and another with --last-block. so that one with --last-block instead of writing a final chunk from 5 blocks were finishing its process without writing anything at all.
and it happend only because the dumper restarted before this final chunk otherwise it still would be able to write this final chunk

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is not here. It's just that, blocks left unwritten after the end of ingest loop, should be saved -

}