perf(persistence): optimize append_history_index with upsert #19825

duyquang6 · 2025-11-18T06:16:16Z

Summary

Optimizes the append_history_index

Profiling append_history_index:

Vec collect + delete_current + IntegerList::new_presorted cursor contribute dominate on append_history_index

Changes

Replace seek_exact + delete + insert with seek_exact + upsert: reduce delete_current operation hotpath
fast path optimization**: When appended indices fit within the current shard (≤2000 indices), directly upsert without collecting into a Vec, avoiding allocate, clone & chunk overhead

Before

After

Profile result:

…of delete+insert

mediocregopher · 2025-11-19T17:04:51Z

This is actually a footgun with MDBX implementation: DUPSORT tables do not support upsert as you would expect. When upsert is called on a DUPSORT table it will actually only insert and leave any previous entry in place.

reth/crates/storage/db/src/implementation/mdbx/cursor.rs

Lines 240 to 243 in c57792c

    
               /// For a DUPSORT table, `upsert` will not actually update-or-insert. If the key already exists, 
        
               /// it will append the value to the subkey, even if the subkeys are the same. So if you want 
        
               /// to properly upsert, you'll need to `seek_exact` & `delete_current` if the key+subkey was 
        
               /// found, before calling `upsert`.

So unless this PR only affects upsert on non-DUPSORT tables we cannot merge it.

duyquang6 · 2025-11-20T09:08:38Z

This is actually a footgun with MDBX implementation: DUPSORT tables do not support upsert as you would expect. When upsert is called on a DUPSORT table it will actually only insert and leave any previous entry in place.

reth/crates/storage/db/src/implementation/mdbx/cursor.rs

Lines 240 to 243 in c57792c

/// For a DUPSORT table, `upsert` will not actually update-or-insert. If the key already exists,

/// it will append the value to the subkey, even if the subkeys are the same. So if you want

/// to properly upsert, you'll need to `seek_exact` & `delete_current` if the key+subkey was

/// found, before calling `upsert`.

So unless this PR only affects upsert on non-DUPSORT tables we cannot merge it.

Hi sir! Nice catch

I've added a runtime assertion to prevent append_history_index from being used with DUPSORT tables:
About the current use of this function, it used for only 2 tables AccountsHistory and StorageHistory, both are non-dupsort table.

mediocregopher · 2025-11-20T16:51:00Z

This is actually a footgun with MDBX implementation: DUPSORT tables do not support upsert as you would expect. When upsert is called on a DUPSORT table it will actually only insert and leave any previous entry in place.

reth/crates/storage/db/src/implementation/mdbx/cursor.rs

Lines 240 to 243 in c57792c

/// For a DUPSORT table, `upsert` will not actually update-or-insert. If the key already exists,

/// it will append the value to the subkey, even if the subkeys are the same. So if you want

/// to properly upsert, you'll need to `seek_exact` & `delete_current` if the key+subkey was

/// found, before calling `upsert`.

So unless this PR only affects upsert on non-DUPSORT tables we cannot merge it.

Hi sir! Nice catch

I've added a runtime assertion to prevent append_history_index from being used with DUPSORT tables: About the current use of this function, it used for only 2 tables AccountsHistory and StorageHistory, both are non-dupsort table.

Ah nice, I thought the history tables were dupsort but you're right, they are not, my bad. Thanks for adding the assert 👍

mediocregopher

This LGTM but would definitely like a sanity check from @shekhirin or @joshieDo

This reverts commit 6a16d17.

shekhirin · 2025-11-21T12:23:49Z

^ committed above to a wrong branch, sorry -.-

shekhirin · 2025-11-21T12:36:17Z

I see, this makes sense! I think we can even improve it like that to avoid collecting last_shard into a vector first.

diff --git a/crates/storage/provider/src/providers/database/provider.rs b/crates/storage/provider/src/providers/database/provider.rs
index ca0564bbfa..b0baef3cd4 100644
--- a/crates/storage/provider/src/providers/database/provider.rs
+++ b/crates/storage/provider/src/providers/database/provider.rs
@@ -833,21 +833,19 @@ impl<TX: DbTxMut + DbTx + 'static, N: NodeTypes> DatabaseProvider<TX, N> {
             }
 
             // slow path: rechunk into multiple shards
-            let all_indices: Vec<u64> = last_shard.iter().collect();
-            let mut chunks = all_indices.chunks(sharded_key::NUM_OF_INDICES_IN_SHARD).peekable();
+            let chunks = last_shard.iter().chunks(sharded_key::NUM_OF_INDICES_IN_SHARD);
+            let mut chunks_peekable = chunks.into_iter().peekable();
 
-            while let Some(list) = chunks.next() {
-                let highest_block_number = if chunks.peek().is_some() {
-                    *list.last().expect("`chunks` does not return empty list")
+            while let Some(chunk) = chunks_peekable.next() {
+                let shard = BlockNumberList::new_pre_sorted(chunk);
+                let highest_block_number = if chunks_peekable.peek().is_some() {
+                    shard.iter().next_back().expect("`chunks` does not return empty list")
                 } else {
                     // Insert last list with `u64::MAX`.
                     u64::MAX
                 };
 
-                cursor.upsert(
-                    sharded_key_factory(partial_key, highest_block_number),
-                    &BlockNumberList::new_pre_sorted(list.iter().copied()),
-                )?;
+                cursor.upsert(sharded_key_factory(partial_key, highest_block_number), &shard)?;
             }
         }

duyquang6 · 2025-11-21T15:28:17Z

I see, this makes sense! I think we can even improve it like that to avoid collecting last_shard into a vector first.

diff --git a/crates/storage/provider/src/providers/database/provider.rs b/crates/storage/provider/src/providers/database/provider.rs
index ca0564bbfa..b0baef3cd4 100644
--- a/crates/storage/provider/src/providers/database/provider.rs
+++ b/crates/storage/provider/src/providers/database/provider.rs
@@ -833,21 +833,19 @@ impl<TX: DbTxMut + DbTx + 'static, N: NodeTypes> DatabaseProvider<TX, N> {
             }
 
             // slow path: rechunk into multiple shards
-            let all_indices: Vec<u64> = last_shard.iter().collect();
-            let mut chunks = all_indices.chunks(sharded_key::NUM_OF_INDICES_IN_SHARD).peekable();
+            let chunks = last_shard.iter().chunks(sharded_key::NUM_OF_INDICES_IN_SHARD);
+            let mut chunks_peekable = chunks.into_iter().peekable();
 
-            while let Some(list) = chunks.next() {
-                let highest_block_number = if chunks.peek().is_some() {
-                    *list.last().expect("`chunks` does not return empty list")
+            while let Some(chunk) = chunks_peekable.next() {
+                let shard = BlockNumberList::new_pre_sorted(chunk);
+                let highest_block_number = if chunks_peekable.peek().is_some() {
+                    shard.iter().next_back().expect("`chunks` does not return empty list")
                 } else {
                     // Insert last list with `u64::MAX`.
                     u64::MAX
                 };
 
-                cursor.upsert(
-                    sharded_key_factory(partial_key, highest_block_number),
-                    &BlockNumberList::new_pre_sorted(list.iter().copied()),
-                )?;
+                cursor.upsert(sharded_key_factory(partial_key, highest_block_number), &shard)?;
             }
         }

Thanks! this should be more efficient since we avoid the intermediate allocation, updated

perf(persistence): optimize append_history_index with upsert instead …

e0bf7ac

…of delete+insert

duyquang6 requested review from joshieDo, rakita and shekhirin as code owners November 18, 2025 06:16

github-project-automation bot added this to Reth Tracker Nov 18, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Nov 18, 2025

mattsse requested a review from mediocregopher November 18, 2025 15:26

chore: append_history_index only use for non-dupsort

425727b

duyquang6 force-pushed the push-tmzxkosqppnn branch from 25ae39c to 425727b Compare November 20, 2025 09:07

mediocregopher approved these changes Nov 20, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Nov 20, 2025

add sccache

6a16d17

shekhirin requested review from DaniPopes and gakonst as code owners November 21, 2025 12:19

Revert "add sccache"

1356a36

This reverts commit 6a16d17.

chore: use chunk iter as suggestion

8bfff21

emhane added C-perf A change motivated by improving speed, memory usage or disk footprint A-db Related to the database labels Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(persistence): optimize append_history_index with upsert #19825

perf(persistence): optimize append_history_index with upsert #19825

duyquang6 commented Nov 18, 2025

Uh oh!

mediocregopher commented Nov 19, 2025

Uh oh!

duyquang6 commented Nov 20, 2025

Uh oh!

mediocregopher commented Nov 20, 2025

Uh oh!

mediocregopher left a comment

Uh oh!

shekhirin commented Nov 21, 2025 •

edited

Loading

Uh oh!

shekhirin commented Nov 21, 2025

Uh oh!

duyquang6 commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf(persistence): optimize append_history_index with upsert #19825

Are you sure you want to change the base?

perf(persistence): optimize append_history_index with upsert #19825

Conversation

duyquang6 commented Nov 18, 2025

Summary

Changes

Before

After

Uh oh!

mediocregopher commented Nov 19, 2025

Uh oh!

duyquang6 commented Nov 20, 2025

Uh oh!

mediocregopher commented Nov 20, 2025

Uh oh!

mediocregopher left a comment

Choose a reason for hiding this comment

Uh oh!

shekhirin commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shekhirin commented Nov 21, 2025

Uh oh!

duyquang6 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shekhirin commented Nov 21, 2025 •

edited

Loading

duyquang6 commented Nov 21, 2025 •

edited

Loading