Add ANALYZE statement to sync batch for SQLite query planner optimization #2372

findolor · 2025-12-24T10:26:38Z

Motivation

See issues:

Improve the performance of local db sync process #2371

After syncing data to the local SQLite database, the query planner may have stale statistics which can lead to suboptimal query plans. This becomes more pronounced as the database grows with new events, tokens, and vault balances being inserted during each sync cycle.

Solution

Added an ANALYZE statement as the final step in the build_batch method of ApplyPipeline. This ensures SQLite's internal statistics are updated after each batch of inserts, keeping query plans optimal for subsequent reads.

Changes:

Added ANALYZE statement at the end of each sync batch in apply.rs
Updated empty_work_window_only_watermark test to account for the new statement
Added dedicated analyze_emitted_exactly_once_and_is_last test to verify:
- ANALYZE appears exactly once per batch
- ANALYZE is always the last statement in the batch

Checks

By submitting this for review, I'm confirming I've done the following:

made this PR as small as possible
unit-tested any new functionality
linked any relevant issues or PRs
included screenshots (if this involves a front-end change)

fix #2371

Summary by CodeRabbit

Bug Fixes
- Improved database query performance by automatically optimizing SQLite query execution plans during batch operations, ensuring faster and more efficient data processing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-24T10:26:50Z

Walkthrough

This PR modifies the ApplyPipeline's batch construction to automatically append an ANALYZE statement after preparing each batch. This refreshes SQLite planner statistics, supporting performance optimization of the local database sync process linked to issue #2371.

Changes

Cohort / File(s)	Summary
Batch preparation and testing `crates/common/src/local_db/pipeline/adapters/apply.rs`	Added `SqlStatement` import; injected `ANALYZE` statement into every built batch via `batch.add(SqlStatement::new("ANALYZE"))`; updated tests to account for the new statement (expected count increased from 3 to 4 in empty-work scenario); added verification test for `ANALYZE` emission and positioning as final statement.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Add ApplyPipeline logic #2275: Modifies the same ApplyPipeline implementation in apply.rs and affects build_batch behavior.

Suggested reviewers

0xgleb
hardyjosh

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding an ANALYZE statement to optimize SQLite query planning in the sync batch process.
Linked Issues check	✅ Passed	The code changes directly address issue #2371 by adding ANALYZE to improve SQLite query planner performance, reducing suboptimal query plans and sync latency.
Out of Scope Changes check	✅ Passed	All changes are focused on adding ANALYZE statement to the sync batch; no unrelated modifications or out-of-scope additions are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 2025-12-24-localdb-sync

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45d9b49 and f5066d4.

📒 Files selected for processing (1)

crates/common/src/local_db/pipeline/adapters/apply.rs

🧰 Additional context used

📓 Path-based instructions (3)

crates/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

crates/**/*.rs: For Rust crates in crates/*, run lints using nix develop -c cargo clippy --workspace --all-targets --all-features -D warnings
For Rust crates in crates/*, run tests using nix develop -c cargo test --workspace or --package <crate>

Files:

crates/common/src/local_db/pipeline/adapters/apply.rs

**/crates/**

📄 CodeRabbit inference engine (AGENTS.md)

Rust workspace organized as crates/* with subdirectories: cli, common, bindings, js_api, quote, subgraph, settings, math, integration_tests

Files:

crates/common/src/local_db/pipeline/adapters/apply.rs

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Rust: format code with nix develop -c cargo fmt --all
Rust: lint with nix develop -c rainix-rs-static (preconfigured flags included)
Rust: crates and modules use snake_case; types use PascalCase

Files:

crates/common/src/local_db/pipeline/adapters/apply.rs

🧠 Learnings (3)

📚 Learning: 2025-10-18T10:38:41.273Z

Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2237
File: crates/common/src/raindex_client/local_db/sync.rs:79-89
Timestamp: 2025-10-18T10:38:41.273Z
Learning: In `crates/common/src/raindex_client/local_db/sync.rs`, the sync_database method currently only supports indexing a single orderbook per chain ID, which is why `.first()` is used to select the orderbook configuration. Multi-orderbook support per chain ID is planned for future PRs.

Applied to files:

crates/common/src/local_db/pipeline/adapters/apply.rs

📚 Learning: 2025-10-06T11:28:30.692Z

Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2145
File: crates/common/src/raindex_client/local_db/query/fetch_orders/query.sql:6-7
Timestamp: 2025-10-06T11:28:30.692Z
Learning: In `crates/common/src/raindex_client/local_db/query/fetch_orders/query.sql`, the orderbook_address is currently hardcoded to '0x2f209e5b67A33B8fE96E28f24628dF6Da301c8eB' because the system only supports a single orderbook at the moment. Multiorderbook logic is not yet implemented and will be added in the future.

Applied to files:

crates/common/src/local_db/pipeline/adapters/apply.rs

📚 Learning: 2025-12-03T10:40:25.429Z

Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2344
File: crates/common/src/local_db/pipeline/runner/mod.rs:18-31
Timestamp: 2025-12-03T10:40:25.429Z
Learning: In `crates/common/src/local_db/pipeline/runner/mod.rs`, the `TargetSuccess` struct does not need separate `ob_id` or `orderbook_key` fields because the contained `SyncOutcome` already includes orderbook identification information such as chain_id and orderbook_address. This avoids redundant data duplication.

Applied to files:

crates/common/src/local_db/pipeline/adapters/apply.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: standard-tests (ubuntu-latest, test-js-bindings)
GitHub Check: standard-tests (ubuntu-latest, rainix-sol-artifacts)
GitHub Check: standard-tests (ubuntu-latest, rainix-sol-legal)
GitHub Check: standard-tests (ubuntu-latest, rainix-rs-artifacts, true)
GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-test)
GitHub Check: standard-tests (ubuntu-latest, rainix-rs-static)
GitHub Check: standard-tests (ubuntu-latest, rainix-sol-test)
GitHub Check: standard-tests (ubuntu-latest, ob-rs-test, true)
GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-artifacts)
GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-browser-test)
GitHub Check: standard-tests (ubuntu-latest, rainix-sol-static)
GitHub Check: git-clean
GitHub Check: build-tauri (ubuntu-22.04, true)
GitHub Check: Deploy-Docs-Preview
GitHub Check: Deploy-Preview-Push

🔇 Additional comments (3)

crates/common/src/local_db/pipeline/adapters/apply.rs (3)

11-11: LGTM!

The SqlStatement import is correctly added to support the new ANALYZE statement creation.

354-364: LGTM!

Test correctly updated to expect 4 statements (vault balance refresh, running balances, watermark, ANALYZE) and verify ANALYZE is present in the batch.

894-936: LGTM!

Well-structured test that validates two key invariants:

ANALYZE appears exactly once per batch

ANALYZE is the last statement before transaction wrapping (ensuring all inserts complete before statistics are gathered)

The test correctly verifies the pre-transaction batch structure, which is appropriate since ensure_transaction() wraps the batch during persist().

coderabbitai · 2025-12-24T10:30:32Z

crates/common/src/local_db/pipeline/adapters/apply.rs

+        // Ensure SQLite planner stats are up to date so reads don't suffer from
+        // poor query plans.
+        batch.add(SqlStatement::new("ANALYZE"));
+


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🌐 Web query:

SQLite PRAGMA optimize documentation when introduced version

💡 Result:

PRAGMA optimize was first introduced (initial implementation) in SQLite 3.18.0, released 2017-03-30. It received significant enhancements in SQLite 3.46.0 (2024-05-23). [1][2]

Sources:

SQLite release notes: 3.18.0 (PRAGMA optimize initial implementation). [1]

SQLite release history / PRAGMA docs: enhancements in 3.46.0 (automatic analysis limit, new bitmask). [2]

🌐 Web query:

SQLite ANALYZE vs PRAGMA optimize performance comparison

💡 Result:

Short answer

ANALYZE: low‑level command that collects full query‑planner statistics (sqlite_stat1/ sqlite_stat4). A full ANALYZE can scan every index and be slow on large DBs but gives the most accurate stats. [1][2]

PRAGMA optimize: high‑level, usually a no‑op; when helpful it runs a bounded/approximate ANALYZE and other small optimizations. It uses an analysis_limit and heuristics to run ANALYZE only for tables that appear to need it, so it is much faster in typical use. Recommended in current SQLite docs as the preferred way to keep stats up to date. [1][2][3]

When to use which

Use PRAGMA optimize as your routine/production maintenance: run once on connection close for short‑lived clients, or run optimize=0x10002 at open + periodic PRAGMA optimize for long‑lived connections. It’s fast and safe. [1][2]

Use full ANALYZE only when you need the most accurate statistics (complex queries where planner decisions are critical) or after major data/schema changes and you explicitly want a complete re‑scan; you can speed ANALYZE with PRAGMA analysis_limit if full scans are too expensive. [2][4]

Versions/compatibility note

PRAGMA optimize was added in SQLite 3.18.0 and substantially improved in 3.46.0 (May 23, 2024); recommendations in recent docs prefer PRAGMA optimize over direct ANALYZE. If you run an older SQLite build, check your version. [2][1]

Key refs

SQLite PRAGMA documentation (optimize). [1]

SQLite ANALYZE documentation. [2]

Source/implementation notes for PRAGMA optimize. [3]

PRAGMA analysis_limit / approximate ANALYZE details. [4]

Consider using PRAGMA optimize as a lighter-weight alternative.

Running bare ANALYZE after every batch scans all tables and indexes to rebuild statistics, which can be expensive as the database grows. SQLite's PRAGMA optimize (available since 3.18.0) is the recommended approach—it uses heuristics to run a bounded ANALYZE only on tables where statistics are stale or missing, making it much faster in typical use.

If you want to reduce the cost, consider:

- // Ensure SQLite planner stats are up to date so reads don't suffer from - // poor query plans. - batch.add(SqlStatement::new("ANALYZE")); + // Ensure SQLite planner stats are up to date so reads don't suffer from + // poor query plans. PRAGMA optimize runs ANALYZE only when beneficial. + batch.add(SqlStatement::new("PRAGMA optimize"));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Ensure SQLite planner stats are up to date so reads don't suffer from

// poor query plans.

batch.add(SqlStatement::new("ANALYZE"));

// Ensure SQLite planner stats are up to date so reads don't suffer from

// poor query plans. PRAGMA optimize runs ANALYZE only when beneficial.

batch.add(SqlStatement::new("PRAGMA optimize"));

🤖 Prompt for AI Agents

crates/common/src/local_db/pipeline/adapters/apply.rs around lines 99 to 102: the code unconditionally adds "ANALYZE" after every batch which can be expensive; replace that with or switch to adding "PRAGMA optimize" (or use it instead of ANALYZE) so SQLite runs a bounded optimize only on stale statistics; implement by adding SqlStatement::new("PRAGMA optimize") in place of the ANALYZE statement (or make the choice configurable if you need fallback for older SQLite versions).

currently we don't have any issues with this approach. if we see some performance issues we can switch over to this query if we need to

@findolor, sounds good! You can revisit PRAGMA optimize if you notice any performance impact from running ANALYZE on every batch. The current approach will ensure statistics stay accurate.

add analyze command to apply batch

f5066d4

findolor requested review from 0xgleb and hardyjosh December 24, 2025 10:26

findolor self-assigned this Dec 24, 2025

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

0xgleb approved these changes Dec 24, 2025

View reviewed changes

hardyjosh approved these changes Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ANALYZE statement to sync batch for SQLite query planner optimization #2372

Add ANALYZE statement to sync batch for SQLite query planner optimization #2372

Uh oh!

findolor commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 24, 2025 •

edited

Loading

Uh oh!

findolor Dec 24, 2025

Uh oh!

coderabbitai bot Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add ANALYZE statement to sync batch for SQLite query planner optimization #2372

Are you sure you want to change the base?

Add ANALYZE statement to sync batch for SQLite query planner optimization #2372

Uh oh!

Conversation

findolor commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Checks

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findolor Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

findolor commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading

coderabbitai bot Dec 24, 2025 •

edited

Loading