-
Notifications
You must be signed in to change notification settings - Fork 320
feat: file-system based block storage #1825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.38.x-celestia
Are you sure you want to change the base?
Conversation
f75a9ce
to
7390d94
Compare
7390d94
to
f31baa7
Compare
to move conversations from the sprint here: @adlerjohn this PR isn't ready for review, but is it okay for the performance to be 5-10x lower? If not what overhead would be acceptable to move forward with the feature? ref #1815 |
If this PR is coupled with #1815, and recent blocks use the current blockstore while ancient blocks are migrated to the file-based store, then the performance degradation shouldn't matter much since it's just for ancient blocks? Not sure though. |
Benchmark comparison against existing kv store implementation
|
Oh wow, interesting stats. First, we definitely shouldn't merge this PR as-is. On to the results. It looks like saving and loading blocks takes longer, but deleting blocks is much faster. That second part is a good sign! I'd guess with some optimizations, the time to save and load blocks can be brought down significantly. There's almost certainly some low-hanging unoptimal stuff there. |
As we can assume SSD operation, we probably should make writes more parallel (each file in separate goroutine?) - another bonus is parallelization of serialization process for better CPU utilization. Depending on number of files and sized this might result in better performance. |
Out of curiosity I ran the benchmark - results below:
I would add: NVMe, XFS. Differences are smaller (~3x vs ~10x for SaveBlock), but still significant. |
store/bench_test.go
Outdated
// setupBlockStore creates a new block store for benchmarking | ||
func setupBlockStore(b *testing.B, storeType string) (sm.State, interface{}, func()) { | ||
config := test.ResetTestRoot("block_store_bench") | ||
stateStore := sm.NewStore(dbm.NewMemDB(), sm.StoreOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test file based DB should be used, otherwise we're comparing saving to files vs saving to memory.
We should not expect reads and writes to be faster with file based storage. Opening FD takes a while. This is one of the reasons DBs exists and prefer to open a single/limited set of files, like SQLite |
I agree that we shouldn't expect files to be much faster than DB - there are overheads from OS and filesystem, and from DB engine (which also need to use OS and filesystem underneath). Very good read from SQLite - also reminds to keep internal fragmentation in mind. |
And one more concern - it seems that there is a lot of redundancy (this is especially important for #1815):
For reading speed (if we would like to replace KV based store with file store) it might make sense, for archiving, redundancy should be minimized. From performance perspective it would be interesting to see how it would work if only raw block is stored. |
Part of #1815 is the intuition that recent and non-recent blocks might have different performance needs. |
d001305
to
4824028
Compare
4824028
to
3b68255
Compare
3x is more of what I would expect, and seems doable for an "ancient blockstore" we could make an even larger tradeoff if we compressed the blocks nice digging @tzdybal |
@tzdybal i'm still getting some crazy SaveBlock times. What would you suggest for improvemt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started experimenting by increasing the block size by setting number of txs to 500, 1000, etc. This seemingly made results almost even for both implementations. But the test results are also skewed by generation of test data - it's quite significant. It's worth adding b.StopTimer()
before createTestingBlock
call and b.StartTimer()
after.
Also, the transactions are very small which makes resulting blocks also very small. They probably won't even execute some of the fancy logic related to buffering. Another quick change - using Tx%01000d
as format makes blocks way bigger and this again changes the results significantly. But this is also very specific data - it's entropy is very low, and compression rate is very high (~20x), which may impact the results.
In general, I think test blocks should be created in a way simulating something real - we can take a look at average block size on mainnet, data should probably be randomized to some extend (to introduce more entropy and make compression less effective).
To make sure that time measurement is correct, you also need to ensure that closing of DB is included in time measurement - otherwise, data might still be in buffers, not flushed to disk, which impacts results.
I would also opt for testing multiple blocks per benchmark "op" - for example:
- Start Timer
- Save 1000 blocks.
- Close DB to flush/sync DB to disk.
- Stop timer.
2720e86
to
9021048
Compare
WIP
Closes #1812