feat!: Store FRI proofs locally, not in S3 by kostia244 · Pull Request #891 · matter-labs/zksync-os-server

kostia244 · 2026-02-18T15:54:15Z

Summary

We've been working on eliminating the node dependency on the Proof Storage. Now we don't use it for anything but debug purposes. This PR drops S3 support and only stores them locally. Since we no longer use S3, zksync_os_object_store crate is also removed.

Next Steps

We could use these proofs instead of recomputing to improve latency when there are a lot of restarts.

Breaking Changes

Who is affected? Those who need data from previous version of ProofStorage
What exactly is breaking?
- Different storage system for proofs, you cannot access old data(most likely not needed)
- Different config for ProofStorage (can result in unexpected disk usage)

Rollout Instructions

The config parameters are disjoint. So old config won't break the new one and vice versa.
I suggest you do things in this order:

See if you want to change defaults for new config and do it.
Upgrade
Remove old config parameters

New Config

Change prover_api_proof_storage_path=./db/fri_proofs/ if the default value doesn't work
Change prover_api_proof_storage_batch_with_proof_capacity, prover_api_proof_storage_failed_capacity (the storage cap for successful proofs(along with metadata) and failed proofs; set to 1GB by default) if you don't like the defaults.

Old Config

All the old flags are now not used, and will not interfere with the new version. You can remove them after the upgrade to avoid config changes in case of rollback.

itegulov

Mostly drive-by comments, didn't look closely through the main logic

Also, I'd elaborate on rollout instructions a little bit. Right now it's not exactly clear what operator needs to choose in the new config (i.e. either mention that there is no need to specify anything and current default is reasonable enough or put some example in the description).

lib/object_store/Cargo.toml

integration-tests/src/lib.rs

node/bin/src/prover_api/proof_storage.rs

github-actions · 2026-02-20T16:15:43Z

Test results

186 tests 186 ✅ 19m 38s ⏱️
28 suites 0 💤
1 files 0 ❌

Results for commit ac6b459.

♻️ This comment has been updated with latest results.

RomanBrodetski · 2026-02-23T12:47:34Z

let's update the PR title - it will end up in the changelog, let's make it self-explonatory. Something like "Store FRI proofs locally, not in S3".
Also please add some context to the description - namely, something like "We've been working on eliminating the node dependency on the Proof Storage. Now we don't use it for anything but debug purposes. This PR drops the S3 support and only stores them locally."

RomanBrodetski · 2026-02-23T12:51:04Z

node/bin/src/config/mod.rs

+    /// The disk usage in bytes for batches with proofs,
+    /// old entries are removed to keep usage capped
+    #[config(default_t = 1073741824)]
+    pub batch_with_proof_capacity: u64,


Can we use smarter units here? E.g. in yaml: batch_with_proof_capacity=1gb. smart config supports it for some other units (e.g. time) - not sure about sizes. But if not, please add the unit suffix to the field name (ie _bytes ).

Besides, I feel like the field names are not very clear. This is mitigated by the comment, but I'd still call it batch_with_proof_size_limit_bytes and batch_with_failed_proof_size_limit_bytes.

@slowli does smart config support gb/mb out of the box? If not - no prob, let's just use bytes.
also @kostia244 please add the default value disambiguation - so that one doesn't need to count digits to understand what the default value is (1gb)

There is ByteSize type, I will change the code to use it

RomanBrodetski · 2026-02-23T12:56:28Z

node/bin/src/prover_api/proof_storage.rs

-impl StoredObject for StoredBatch {
-    const BUCKET: Bucket = Bucket("fri_batch_envelopes");
-    type Key<'a> = u64;
+    /// Persist a BatchWithProof. Overwrites any existing entry for the same batch.


Given that we store it for debug purposes, I'd not overwrite but keep both batches on disc. Maybe just add the timestamp as a suffix?

Do you want me to change the code to expose the duplicats through the API as well?

good question - no, I don't think we should. We never used the said API in the real world. Let's not invest time in it. But it's good to have those files on disc.

RomanBrodetski · 2026-02-23T13:02:47Z

node/bin/src/prover_api/proof_storage.rs

+    failed: Arc<Mutex<BoundedFileStorage>>,
 }
+impl ProofStorage {
+    // This limit is present because we scan the files on every write to get the size


It's a little assymetric that we limit both number of files and their size, but number is set as constant in code whereas the size it set in config. I would probably also extract this value to the config.

Alternatively, maybe we can get away without this number at all? We could cache the current size in memory (as a field of BoundedFileStorage struct) and delete old files when it's exceeded. We can iterate over files from oldest and stop the loop once enough space is cleared.
On startup we would of course need to interate over all files to populate this field. We would also log a warning if on startup it's too much data - this way we'll see if there is a bug in this logic

AFAIK there isn't a way to get the oldest file from directory. To make it efficient for large number of files we need to cache the file names & timestamps. Also there will be extra logic to make this consistent with duplicate handling.

It's not hard to do, but I wouldn't want to complicate things unless we're actually going to have many files. Are we?

AFAIK there isn't a way to get the oldest file from directory. To make it efficient for large number of files we need to cache the file names & timestamps. Also there will be extra logic to make this consistent with duplicate handling.

Ah, I thought you can iterate in the order of created_at but apparently not.

It's not hard to do, but I wouldn't want to complicate things unless we're actually going to have many files. Are we?

I can imagine... Ultimately for very fast interop we may want to commit as often as there is an L1 block - 12seconds on L1 - which is like 7K batches per day

RomanBrodetski

please see comments

slowli · 2026-02-24T15:15:35Z

node/bin/src/config/mod.rs

+    pub failed_capacity: ByteSize,
+}
+
+impl Default for ProofStorageConfig {


This implementation can be derived by placing #[config(derive(Default))] on the config.

slowli · 2026-02-24T15:24:37Z

node/bin/src/prover_api/proof_storage.rs

+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_secs();
+            path_buf.set_extension(format!("overwritten_{now}"));


Bear in mind that set_extension will overwrite the previously set extension. I'd strictly append this suffix instead.

slowli · 2026-02-24T15:26:21Z

node/bin/src/prover_api/proof_storage.rs

+            let old_data = fs::read(&path_buf).await?;
+            fs::remove_file(&path_buf).await?;
+            self.current_size -= old_data.len() as u64;


This looks inefficient. Can't the file be renamed instead, and its len obtained via metadata()?

slowli · 2026-02-24T15:29:17Z

node/bin/src/prover_api/proof_storage.rs

-    pub fn batch_number(&self) -> u64 {
-        self.failed_proof.batch_number
+    /// Delete old files to make space for the new file
+    /// Returns disk usage


Outdated comment? The method doesn't return a value.

slowli · 2026-02-24T15:37:38Z

node/bin/src/prover_api/proof_storage.rs

+            if let Some(duplicates) = self.skip_cnt.get_mut(file)
+                && *duplicates > 0
+            {
+                *duplicates -= 1;
+                continue;
+            }


I don't understand this piece of logic. IIUC, duplicates are not counted against current_size (deducted in handle_duplicate), so why are files with duplicates skipped? Also, shouldn't the skip_cnt entry be removed together with the file below?

I changed the code for the logic to be more clear. In short: when we overwrite the old entry in queue is marked as outdated and we push both the new file and the old file to the back of the queue.

slowli · 2026-02-24T16:02:18Z

node/bin/src/prover_api/proof_storage.rs

+    /// removes old files to enforce capacity constraints and
+    /// returns disk usage
+    async fn store<T: Serialize>(&mut self, key: &str, value: &T) -> anyhow::Result<u64> {
+        fs::create_dir_all(&self.base_dir).await?;


Can key represent a path (e.g., contain /s?)? It's weird that base_dir is created if missing but the parent dir for the stored file is not.

I implicitly assume it cannot, especially in new, where I get the list of files. I'll make this more clear.

slowli · 2026-02-24T16:03:50Z

node/bin/src/prover_api/proof_storage.rs

+    async fn store<T: Serialize>(&mut self, key: &str, value: &T) -> anyhow::Result<u64> {
+        fs::create_dir_all(&self.base_dir).await?;
+
+        let file_path = self.base_dir.join(key);


IMO, switching to a path here for handle_duplicate and write_file is confusing; both methods logically deal with file keys.

slowli · 2026-02-24T16:10:02Z

node/bin/src/prover_api/proof_storage.rs

-pub struct StoredFailedProof {
-    pub failed_proof: FailedFriProof,
+/// Storage for data blobs that
+/// automatically removes old files to keep disk usage within capacity_bytes


I'm out of context, but I'd suggest to remove files older than (configurable?) time interval, e.g. 1 month instead. This has less counterintuitive pitfalls like not saving large files (or in general, removing files still in use), and can probably be implemented statelessly, by running cleanup on node start and on schedule.

Node can recover just fine if these files are missing. The biggest problem here is running out of space on the disk. Disk usage per batch depends on the chain, so using time interval would make it hard to predict.

I also prefer disk usage cap. Cleanup is not meant to make sure we don't accumulate junk, but to ensure the node won't OOD.

I also agree with the general feeling -- we won't be able to big files, but overall proof size is capped.

slowli · 2026-02-24T16:12:50Z

node/bin/src/prover_api/proof_storage.rs

+                .await?
+                .is_some()
+        );
+        //This should remove all the old entries


Bikeshedding: Comments without a space afterwards visually look like commented out code; I'd suggest to always use a space after a comment.

slowli · 2026-02-24T16:13:41Z

node/bin/src/prover_api/proof_storage.rs

+impl ProofStorage {
+    pub async fn new(config: ProofStorageConfig) -> anyhow::Result<Self> {
+        tracing::info!(
+            path = config.path.to_str().unwrap(),


There's non-panicking Path::display().

EmilLuta

Overall looks good. Left a few nits. Please check the other comments as well.

EmilLuta · 2026-03-02T10:17:29Z

node/bin/src/lib.rs

+        .unwrap();



Can we choose an expect here? Let's try to provide a useful/meaningful message here.

EmilLuta · 2026-03-02T10:19:30Z

node/bin/src/config/mod.rs

    pub max_fris_per_snark: usize,

-    /// Default: backed by files under `./db/shared` folder.
    #[config(nest, default)]


I think the comment is still relevant -- at least the default path.

EmilLuta · 2026-03-02T10:22:30Z

node/bin/src/prover_api/proof_storage.rs

-pub struct StoredFailedProof {
-    pub failed_proof: FailedFriProof,
+/// Storage for data blobs that
+/// automatically removes old files to keep disk usage within capacity_bytes


I also prefer disk usage cap. Cleanup is not meant to make sure we don't accumulate junk, but to ensure the node won't OOD.

I also agree with the general feeling -- we won't be able to big files, but overall proof size is capped.

EmilLuta · 2026-03-02T10:24:54Z

node/bin/src/prover_api/proof_storage.rs

+    erase_queue: VecDeque<(PathBuf, Metadata)>,
+    // `value` is the number of times the `key` has been moved back in `erase_queue`
+    // So while this number is not zero we won't erase the file, but decrement this
+    skip_cnt: HashMap<String, u64>,


Maybe just me, but this naming doesn't add a lot of info about what's going on. skip_cnt tells me to skip continuation? Would there be something more descriptive?

P.S. The description is great, but doesn't provide input on how files move to and out of the queue.

agreed - I'd at least rename to skip_count

EmilLuta · 2026-03-02T10:26:08Z

node/bin/src/prover_api/proof_storage.rs

-    type Key<'a> = u64;
+impl BoundedFileStorage {
+    async fn new(base_dir: PathBuf, capacity_bytes: u64) -> anyhow::Result<Self> {
+        // List all files sorted by timestamp (descending)


Dumb question, but given the directory is just created, shouldn't it be empty?

Logically it's "if dir doesn't exist, create it, otherwise get data about files". But this way it looks cleaner IMO.

EmilLuta · 2026-03-02T10:27:05Z

node/bin/src/prover_api/proof_storage.rs


-    fn encode_key(key: Self::Key<'_>) -> String {
-        format!("failed_fri_proof_{key}.json")
+        let mut res = Self {


Bikeshedding: Interesting choice for variable name.

kostia244 · 2026-03-02T14:11:36Z

@RomanBrodetski Please check the updated verion

RomanBrodetski · 2026-03-03T12:56:22Z

integration-tests/src/lib.rs

-            .map(|t| t.path())
-            .unwrap_or(tempdir.path())
-            .join("object_store");
+        //ENs will not use this dir


Suggested change

//ENs will not use this dir

// ENs will not use this dir

RomanBrodetski · 2026-03-03T13:02:13Z

node/bin/src/prover_api/proof_storage.rs

+            let meta = entry.metadata().await?;
+            if meta.is_file() {
+                let filename = entry.file_name().into_string();
+                if let Ok(filename) = filename {


there is no need in a local variable, you can do if let Ok(filename) = entry.file_name().into_string() . Introducing a separate variable increases mental overhead when reading - like, whether we use thi filename var later or not - but then we also have another variable (inside Ok(filename)) with the same name, which also doesn't help.

IMO a little cleaner:

match entry.file_name().into_string() { Ok(name) => files.push((name, meta)), Err(os_name) => tracing::warn!( "Unrelated file detected in {} ({}): the name cannot be represented using a String", base_dir.display(), os_name.to_string_lossy() ), }

RomanBrodetski · 2026-03-03T13:07:03Z

node/bin/src/prover_api/proof_storage.rs

+        let mut current_size = 0_u64;
+        for (_, meta) in &files {
+            current_size += meta.len();
+        }


IMO declarative style here is easier to grasp

let current_size: u64 = files.iter().map(|(_, meta)| meta.len()).sum();

RomanBrodetski

Overall looks OK - please consider the comments and let's merge it!

Kostia Savchuk added 4 commits February 18, 2026 08:53

New ProofStorage

0898697

get rid of old objectstore

cf1e014

Add test for capacity & logs

2a6e7ce

names & comments

3748c3a

itegulov reviewed Feb 19, 2026

View reviewed changes

lib/object_store/Cargo.toml Show resolved Hide resolved

integration-tests/src/lib.rs Outdated Show resolved Hide resolved

node/bin/src/prover_api/proof_storage.rs Outdated Show resolved Hide resolved

Kostia Savchuk added 11 commits February 19, 2026 11:29

Move ProofStorageConfig to mod.rs

e2583b2

Update cargo files

4240f94

serialization error instead of panic

a0446e8

TODO comments

623fea5

Add lock

6a477d0

Improve BoundedFileStorage

5159609

Add metrics

c100b07

clean up TODOs

c612762

remove mainn_node_tempdir

aaabb57

Merge remote-tracking branch 'origin/main' into proof_storage_local

83c208a

Fix merge

8e72df5

fix linter

11dff3d

RomanBrodetski reviewed Feb 23, 2026

View reviewed changes

RomanBrodetski requested changes Feb 23, 2026

View reviewed changes

Use ByteSize for config

5ac7544

kostia244 changed the title ~~feat!: Proof storage local~~ feat!: Store FRI proofs locally, not in S3 Feb 23, 2026

cache for files for BoundedFileStorage

da93835

slowli reviewed Feb 24, 2026

View reviewed changes

Kostia Savchuk added 2 commits March 2, 2026 10:57

fix clippy

adee454

simplification & comment improvement

4f5ec0d

EmilLuta reviewed Mar 2, 2026

View reviewed changes

Kostia Savchuk added 6 commits March 2, 2026 11:44

Change path arguments to key

bb4eccc

erase queue indexed with key

4e56dd9

Improve remove_queue documentation

7c92664

small improvements

8fb920a

Handle OsStrings gracefully

9c14903

Fix clippy

a57304b

kostia244 force-pushed the proof_storage_local branch from dd1b0dd to a57304b Compare March 2, 2026 12:06

Kostia Savchuk added 5 commits March 2, 2026 13:18

Merge remote-tracking branch 'origin/main' into proof_storage_local

54b0243

No object store

0e33a2b

add comment for keys

84fe7e2

fix config

eb9cba4

Duplicate handling

009b489

RomanBrodetski reviewed Mar 3, 2026

View reviewed changes

Kostia Savchuk added 2 commits March 3, 2026 14:56

refactor

8e2cf6a

Cargo.lock

ac6b459

Conversation

kostia244 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Next Steps

Breaking Changes

Rollout Instructions

New Config

Old Config

Uh oh!

itegulov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test results

Uh oh!

RomanBrodetski commented Feb 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RomanBrodetski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmilLuta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kostia244 commented Feb 18, 2026 •

edited

Loading

github-actions bot commented Feb 20, 2026 •

edited

Loading

RomanBrodetski Mar 3, 2026 •

edited

Loading