-
Notifications
You must be signed in to change notification settings - Fork 13
Add ANALYZE statement to sync batch for SQLite query planner optimization #2372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR modifies the ApplyPipeline's batch construction to automatically append an ANALYZE statement after preparing each batch. This refreshes SQLite planner statistics, supporting performance optimization of the local database sync process linked to issue Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
crates/common/src/local_db/pipeline/adapters/apply.rs
🧰 Additional context used
📓 Path-based instructions (3)
crates/**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
crates/**/*.rs: For Rust crates incrates/*, run lints usingnix develop -c cargo clippy --workspace --all-targets --all-features -D warnings
For Rust crates incrates/*, run tests usingnix develop -c cargo test --workspaceor--package <crate>
Files:
crates/common/src/local_db/pipeline/adapters/apply.rs
**/crates/**
📄 CodeRabbit inference engine (AGENTS.md)
Rust workspace organized as
crates/*with subdirectories: cli, common, bindings, js_api, quote, subgraph, settings, math, integration_tests
Files:
crates/common/src/local_db/pipeline/adapters/apply.rs
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Rust: format code withnix develop -c cargo fmt --all
Rust: lint withnix develop -c rainix-rs-static(preconfigured flags included)
Rust: crates and modules usesnake_case; types usePascalCase
Files:
crates/common/src/local_db/pipeline/adapters/apply.rs
🧠 Learnings (3)
📚 Learning: 2025-10-18T10:38:41.273Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2237
File: crates/common/src/raindex_client/local_db/sync.rs:79-89
Timestamp: 2025-10-18T10:38:41.273Z
Learning: In `crates/common/src/raindex_client/local_db/sync.rs`, the sync_database method currently only supports indexing a single orderbook per chain ID, which is why `.first()` is used to select the orderbook configuration. Multi-orderbook support per chain ID is planned for future PRs.
Applied to files:
crates/common/src/local_db/pipeline/adapters/apply.rs
📚 Learning: 2025-10-06T11:28:30.692Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2145
File: crates/common/src/raindex_client/local_db/query/fetch_orders/query.sql:6-7
Timestamp: 2025-10-06T11:28:30.692Z
Learning: In `crates/common/src/raindex_client/local_db/query/fetch_orders/query.sql`, the orderbook_address is currently hardcoded to '0x2f209e5b67A33B8fE96E28f24628dF6Da301c8eB' because the system only supports a single orderbook at the moment. Multiorderbook logic is not yet implemented and will be added in the future.
Applied to files:
crates/common/src/local_db/pipeline/adapters/apply.rs
📚 Learning: 2025-12-03T10:40:25.429Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2344
File: crates/common/src/local_db/pipeline/runner/mod.rs:18-31
Timestamp: 2025-12-03T10:40:25.429Z
Learning: In `crates/common/src/local_db/pipeline/runner/mod.rs`, the `TargetSuccess` struct does not need separate `ob_id` or `orderbook_key` fields because the contained `SyncOutcome` already includes orderbook identification information such as chain_id and orderbook_address. This avoids redundant data duplication.
Applied to files:
crates/common/src/local_db/pipeline/adapters/apply.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: standard-tests (ubuntu-latest, test-js-bindings)
- GitHub Check: standard-tests (ubuntu-latest, rainix-sol-artifacts)
- GitHub Check: standard-tests (ubuntu-latest, rainix-sol-legal)
- GitHub Check: standard-tests (ubuntu-latest, rainix-rs-artifacts, true)
- GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-test)
- GitHub Check: standard-tests (ubuntu-latest, rainix-rs-static)
- GitHub Check: standard-tests (ubuntu-latest, rainix-sol-test)
- GitHub Check: standard-tests (ubuntu-latest, ob-rs-test, true)
- GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-artifacts)
- GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-browser-test)
- GitHub Check: standard-tests (ubuntu-latest, rainix-sol-static)
- GitHub Check: git-clean
- GitHub Check: build-tauri (ubuntu-22.04, true)
- GitHub Check: Deploy-Docs-Preview
- GitHub Check: Deploy-Preview-Push
🔇 Additional comments (3)
crates/common/src/local_db/pipeline/adapters/apply.rs (3)
11-11: LGTM!The
SqlStatementimport is correctly added to support the newANALYZEstatement creation.
354-364: LGTM!Test correctly updated to expect 4 statements (vault balance refresh, running balances, watermark, ANALYZE) and verify ANALYZE is present in the batch.
894-936: LGTM!Well-structured test that validates two key invariants:
ANALYZEappears exactly once per batchANALYZEis the last statement before transaction wrapping (ensuring all inserts complete before statistics are gathered)The test correctly verifies the pre-transaction batch structure, which is appropriate since
ensure_transaction()wraps the batch duringpersist().
| // Ensure SQLite planner stats are up to date so reads don't suffer from | ||
| // poor query plans. | ||
| batch.add(SqlStatement::new("ANALYZE")); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🌐 Web query:
SQLite PRAGMA optimize documentation when introduced version
💡 Result:
PRAGMA optimize was first introduced (initial implementation) in SQLite 3.18.0, released 2017-03-30. It received significant enhancements in SQLite 3.46.0 (2024-05-23). [1][2]
Sources:
- SQLite release notes: 3.18.0 (PRAGMA optimize initial implementation). [1]
- SQLite release history / PRAGMA docs: enhancements in 3.46.0 (automatic analysis limit, new bitmask). [2]
🌐 Web query:
SQLite ANALYZE vs PRAGMA optimize performance comparison
💡 Result:
Short answer
- ANALYZE: low‑level command that collects full query‑planner statistics (sqlite_stat1/ sqlite_stat4). A full ANALYZE can scan every index and be slow on large DBs but gives the most accurate stats. [1][2]
- PRAGMA optimize: high‑level, usually a no‑op; when helpful it runs a bounded/approximate ANALYZE and other small optimizations. It uses an analysis_limit and heuristics to run ANALYZE only for tables that appear to need it, so it is much faster in typical use. Recommended in current SQLite docs as the preferred way to keep stats up to date. [1][2][3]
When to use which
- Use PRAGMA optimize as your routine/production maintenance: run once on connection close for short‑lived clients, or run optimize=0x10002 at open + periodic PRAGMA optimize for long‑lived connections. It’s fast and safe. [1][2]
- Use full ANALYZE only when you need the most accurate statistics (complex queries where planner decisions are critical) or after major data/schema changes and you explicitly want a complete re‑scan; you can speed ANALYZE with PRAGMA analysis_limit if full scans are too expensive. [2][4]
Versions/compatibility note
- PRAGMA optimize was added in SQLite 3.18.0 and substantially improved in 3.46.0 (May 23, 2024); recommendations in recent docs prefer PRAGMA optimize over direct ANALYZE. If you run an older SQLite build, check your version. [2][1]
Key refs
- SQLite PRAGMA documentation (optimize). [1]
- SQLite ANALYZE documentation. [2]
- Source/implementation notes for PRAGMA optimize. [3]
- PRAGMA analysis_limit / approximate ANALYZE details. [4]
Consider using PRAGMA optimize as a lighter-weight alternative.
Running bare ANALYZE after every batch scans all tables and indexes to rebuild statistics, which can be expensive as the database grows. SQLite's PRAGMA optimize (available since 3.18.0) is the recommended approach—it uses heuristics to run a bounded ANALYZE only on tables where statistics are stale or missing, making it much faster in typical use.
If you want to reduce the cost, consider:
- // Ensure SQLite planner stats are up to date so reads don't suffer from
- // poor query plans.
- batch.add(SqlStatement::new("ANALYZE"));
+ // Ensure SQLite planner stats are up to date so reads don't suffer from
+ // poor query plans. PRAGMA optimize runs ANALYZE only when beneficial.
+ batch.add(SqlStatement::new("PRAGMA optimize"));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Ensure SQLite planner stats are up to date so reads don't suffer from | |
| // poor query plans. | |
| batch.add(SqlStatement::new("ANALYZE")); | |
| // Ensure SQLite planner stats are up to date so reads don't suffer from | |
| // poor query plans. PRAGMA optimize runs ANALYZE only when beneficial. | |
| batch.add(SqlStatement::new("PRAGMA optimize")); |
🤖 Prompt for AI Agents
crates/common/src/local_db/pipeline/adapters/apply.rs around lines 99 to 102:
the code unconditionally adds "ANALYZE" after every batch which can be
expensive; replace that with or switch to adding "PRAGMA optimize" (or use it
instead of ANALYZE) so SQLite runs a bounded optimize only on stale statistics;
implement by adding SqlStatement::new("PRAGMA optimize") in place of the ANALYZE
statement (or make the choice configurable if you need fallback for older SQLite
versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we don't have any issues with this approach. if we see some performance issues we can switch over to this query if we need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findolor, sounds good! You can revisit PRAGMA optimize if you notice any performance impact from running ANALYZE on every batch. The current approach will ensure statistics stay accurate.
Motivation
See issues:
After syncing data to the local SQLite database, the query planner may have stale statistics which can lead to suboptimal query plans. This becomes more pronounced as the database grows with new events, tokens, and vault balances being inserted during each sync cycle.
Solution
Added an
ANALYZEstatement as the final step in thebuild_batchmethod ofApplyPipeline. This ensures SQLite's internal statistics are updated after each batch of inserts, keeping query plans optimal for subsequent reads.Changes:
ANALYZEstatement at the end of each sync batch inapply.rsempty_work_window_only_watermarktest to account for the new statementanalyze_emitted_exactly_once_and_is_lasttest to verify:ANALYZEappears exactly once per batchANALYZEis always the last statement in the batchChecks
By submitting this for review, I'm confirming I've done the following:
fix #2371
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.