Skip to content

Conversation

@findolor
Copy link
Collaborator

@findolor findolor commented Dec 24, 2025

Motivation

See issues:

After syncing data to the local SQLite database, the query planner may have stale statistics which can lead to suboptimal query plans. This becomes more pronounced as the database grows with new events, tokens, and vault balances being inserted during each sync cycle.

Solution

Added an ANALYZE statement as the final step in the build_batch method of ApplyPipeline. This ensures SQLite's internal statistics are updated after each batch of inserts, keeping query plans optimal for subsequent reads.

Changes:

  • Added ANALYZE statement at the end of each sync batch in apply.rs
  • Updated empty_work_window_only_watermark test to account for the new statement
  • Added dedicated analyze_emitted_exactly_once_and_is_last test to verify:
    • ANALYZE appears exactly once per batch
    • ANALYZE is always the last statement in the batch

Checks

By submitting this for review, I'm confirming I've done the following:

  • made this PR as small as possible
  • unit-tested any new functionality
  • linked any relevant issues or PRs
  • included screenshots (if this involves a front-end change)

fix #2371

Summary by CodeRabbit

  • Bug Fixes
    • Improved database query performance by automatically optimizing SQLite query execution plans during batch operations, ensuring faster and more efficient data processing.

✏️ Tip: You can customize this high-level summary in your review settings.

@findolor findolor requested review from 0xgleb and hardyjosh December 24, 2025 10:26
@findolor findolor self-assigned this Dec 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

Walkthrough

This PR modifies the ApplyPipeline's batch construction to automatically append an ANALYZE statement after preparing each batch. This refreshes SQLite planner statistics, supporting performance optimization of the local database sync process linked to issue #2371.

Changes

Cohort / File(s) Summary
Batch preparation and testing
crates/common/src/local_db/pipeline/adapters/apply.rs
Added SqlStatement import; injected ANALYZE statement into every built batch via batch.add(SqlStatement::new("ANALYZE")); updated tests to account for the new statement (expected count increased from 3 to 4 in empty-work scenario); added verification test for ANALYZE emission and positioning as final statement.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • 0xgleb
  • hardyjosh

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding an ANALYZE statement to optimize SQLite query planning in the sync batch process.
Linked Issues check ✅ Passed The code changes directly address issue #2371 by adding ANALYZE to improve SQLite query planner performance, reducing suboptimal query plans and sync latency.
Out of Scope Changes check ✅ Passed All changes are focused on adding ANALYZE statement to the sync batch; no unrelated modifications or out-of-scope additions are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 2025-12-24-localdb-sync

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45d9b49 and f5066d4.

📒 Files selected for processing (1)
  • crates/common/src/local_db/pipeline/adapters/apply.rs
🧰 Additional context used
📓 Path-based instructions (3)
crates/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

crates/**/*.rs: For Rust crates in crates/*, run lints using nix develop -c cargo clippy --workspace --all-targets --all-features -D warnings
For Rust crates in crates/*, run tests using nix develop -c cargo test --workspace or --package <crate>

Files:

  • crates/common/src/local_db/pipeline/adapters/apply.rs
**/crates/**

📄 CodeRabbit inference engine (AGENTS.md)

Rust workspace organized as crates/* with subdirectories: cli, common, bindings, js_api, quote, subgraph, settings, math, integration_tests

Files:

  • crates/common/src/local_db/pipeline/adapters/apply.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Rust: format code with nix develop -c cargo fmt --all
Rust: lint with nix develop -c rainix-rs-static (preconfigured flags included)
Rust: crates and modules use snake_case; types use PascalCase

Files:

  • crates/common/src/local_db/pipeline/adapters/apply.rs
🧠 Learnings (3)
📚 Learning: 2025-10-18T10:38:41.273Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2237
File: crates/common/src/raindex_client/local_db/sync.rs:79-89
Timestamp: 2025-10-18T10:38:41.273Z
Learning: In `crates/common/src/raindex_client/local_db/sync.rs`, the sync_database method currently only supports indexing a single orderbook per chain ID, which is why `.first()` is used to select the orderbook configuration. Multi-orderbook support per chain ID is planned for future PRs.

Applied to files:

  • crates/common/src/local_db/pipeline/adapters/apply.rs
📚 Learning: 2025-10-06T11:28:30.692Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2145
File: crates/common/src/raindex_client/local_db/query/fetch_orders/query.sql:6-7
Timestamp: 2025-10-06T11:28:30.692Z
Learning: In `crates/common/src/raindex_client/local_db/query/fetch_orders/query.sql`, the orderbook_address is currently hardcoded to '0x2f209e5b67A33B8fE96E28f24628dF6Da301c8eB' because the system only supports a single orderbook at the moment. Multiorderbook logic is not yet implemented and will be added in the future.

Applied to files:

  • crates/common/src/local_db/pipeline/adapters/apply.rs
📚 Learning: 2025-12-03T10:40:25.429Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2344
File: crates/common/src/local_db/pipeline/runner/mod.rs:18-31
Timestamp: 2025-12-03T10:40:25.429Z
Learning: In `crates/common/src/local_db/pipeline/runner/mod.rs`, the `TargetSuccess` struct does not need separate `ob_id` or `orderbook_key` fields because the contained `SyncOutcome` already includes orderbook identification information such as chain_id and orderbook_address. This avoids redundant data duplication.

Applied to files:

  • crates/common/src/local_db/pipeline/adapters/apply.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: standard-tests (ubuntu-latest, test-js-bindings)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-artifacts)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-legal)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-rs-artifacts, true)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-test)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-rs-static)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-test)
  • GitHub Check: standard-tests (ubuntu-latest, ob-rs-test, true)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-artifacts)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-browser-test)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-static)
  • GitHub Check: git-clean
  • GitHub Check: build-tauri (ubuntu-22.04, true)
  • GitHub Check: Deploy-Docs-Preview
  • GitHub Check: Deploy-Preview-Push
🔇 Additional comments (3)
crates/common/src/local_db/pipeline/adapters/apply.rs (3)

11-11: LGTM!

The SqlStatement import is correctly added to support the new ANALYZE statement creation.


354-364: LGTM!

Test correctly updated to expect 4 statements (vault balance refresh, running balances, watermark, ANALYZE) and verify ANALYZE is present in the batch.


894-936: LGTM!

Well-structured test that validates two key invariants:

  1. ANALYZE appears exactly once per batch
  2. ANALYZE is the last statement before transaction wrapping (ensuring all inserts complete before statistics are gathered)

The test correctly verifies the pre-transaction batch structure, which is appropriate since ensure_transaction() wraps the batch during persist().

Comment on lines +99 to +102
// Ensure SQLite planner stats are up to date so reads don't suffer from
// poor query plans.
batch.add(SqlStatement::new("ANALYZE"));

Copy link
Contributor

@coderabbitai coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🌐 Web query:

SQLite PRAGMA optimize documentation when introduced version

💡 Result:

PRAGMA optimize was first introduced (initial implementation) in SQLite 3.18.0, released 2017-03-30. It received significant enhancements in SQLite 3.46.0 (2024-05-23). [1][2]

Sources:

  • SQLite release notes: 3.18.0 (PRAGMA optimize initial implementation). [1]
  • SQLite release history / PRAGMA docs: enhancements in 3.46.0 (automatic analysis limit, new bitmask). [2]

🌐 Web query:

SQLite ANALYZE vs PRAGMA optimize performance comparison

💡 Result:

Short answer

  • ANALYZE: low‑level command that collects full query‑planner statistics (sqlite_stat1/ sqlite_stat4). A full ANALYZE can scan every index and be slow on large DBs but gives the most accurate stats. [1][2]
  • PRAGMA optimize: high‑level, usually a no‑op; when helpful it runs a bounded/approximate ANALYZE and other small optimizations. It uses an analysis_limit and heuristics to run ANALYZE only for tables that appear to need it, so it is much faster in typical use. Recommended in current SQLite docs as the preferred way to keep stats up to date. [1][2][3]

When to use which

  • Use PRAGMA optimize as your routine/production maintenance: run once on connection close for short‑lived clients, or run optimize=0x10002 at open + periodic PRAGMA optimize for long‑lived connections. It’s fast and safe. [1][2]
  • Use full ANALYZE only when you need the most accurate statistics (complex queries where planner decisions are critical) or after major data/schema changes and you explicitly want a complete re‑scan; you can speed ANALYZE with PRAGMA analysis_limit if full scans are too expensive. [2][4]

Versions/compatibility note

  • PRAGMA optimize was added in SQLite 3.18.0 and substantially improved in 3.46.0 (May 23, 2024); recommendations in recent docs prefer PRAGMA optimize over direct ANALYZE. If you run an older SQLite build, check your version. [2][1]

Key refs

  • SQLite PRAGMA documentation (optimize). [1]
  • SQLite ANALYZE documentation. [2]
  • Source/implementation notes for PRAGMA optimize. [3]
  • PRAGMA analysis_limit / approximate ANALYZE details. [4]

Consider using PRAGMA optimize as a lighter-weight alternative.

Running bare ANALYZE after every batch scans all tables and indexes to rebuild statistics, which can be expensive as the database grows. SQLite's PRAGMA optimize (available since 3.18.0) is the recommended approach—it uses heuristics to run a bounded ANALYZE only on tables where statistics are stale or missing, making it much faster in typical use.

If you want to reduce the cost, consider:

-        // Ensure SQLite planner stats are up to date so reads don't suffer from
-        // poor query plans.
-        batch.add(SqlStatement::new("ANALYZE"));
+        // Ensure SQLite planner stats are up to date so reads don't suffer from
+        // poor query plans. PRAGMA optimize runs ANALYZE only when beneficial.
+        batch.add(SqlStatement::new("PRAGMA optimize"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Ensure SQLite planner stats are up to date so reads don't suffer from
// poor query plans.
batch.add(SqlStatement::new("ANALYZE"));
// Ensure SQLite planner stats are up to date so reads don't suffer from
// poor query plans. PRAGMA optimize runs ANALYZE only when beneficial.
batch.add(SqlStatement::new("PRAGMA optimize"));
🤖 Prompt for AI Agents
crates/common/src/local_db/pipeline/adapters/apply.rs around lines 99 to 102:
the code unconditionally adds "ANALYZE" after every batch which can be
expensive; replace that with or switch to adding "PRAGMA optimize" (or use it
instead of ANALYZE) so SQLite runs a bounded optimize only on stale statistics;
implement by adding SqlStatement::new("PRAGMA optimize") in place of the ANALYZE
statement (or make the choice configurable if you need fallback for older SQLite
versions).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we don't have any issues with this approach. if we see some performance issues we can switch over to this query if we need to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, sounds good! You can revisit PRAGMA optimize if you notice any performance impact from running ANALYZE on every batch. The current approach will ensure statistics stay accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the performance of local db sync process

4 participants