Enable Vectorscan rule caching by default and add regression tests#404
Open
mickgmdb wants to merge 8 commits into
Open
Enable Vectorscan rule caching by default and add regression tests#404mickgmdb wants to merge 8 commits into
mickgmdb wants to merge 8 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Enables persistent caching of compiled Vectorscan rule databases (to speed up repeated scans) by default, adds a rules compile-cache command for pre-warming the cache, and updates CLI/config/docs/tests across Kingfisher to support and validate the behavior.
Changes:
- Implement compiled-rule cache keying, on-disk read/write, and cache directory resolution; wire cache usage into scan rule compilation.
- Add
kingfisher rules compile-cacheplus new CLI/config plumbing for cache toggles and cache directory overrides. - Add regression tests and update docs/changelog/versioning to reflect default-on caching.
Reviewed changes
Copilot reviewed 29 out of 32 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| vendor/vectorscan-rs/vectorscan-rs/src/wrapper.rs | Add byte-oriented serialize/deserialize helpers for database persistence. |
| vendor/vectorscan-rs/vectorscan-rs/src/native.rs | Expose Vectorscan runtime version + block DB serialize/deserialize API used by caching. |
| crates/kingfisher-rules/src/rules_database.rs | Implement cache header/keying, default cache dirs, cache load/store, and cache-focused tests. |
| crates/kingfisher-rules/src/lib.rs | Re-export RuleCacheConfig for downstream use. |
| src/rules_database.rs | Re-export RuleCacheConfig from kingfisher_rules. |
| src/scanner/runner.rs | Use cached compilation path when rule cache is enabled. |
| src/cli/commands/rules.rs | Add RuleCacheArgs, RuleCacheDirArgs, and rules compile-cache CLI surface. |
| src/cli/commands/scan.rs | Add RuleCacheArgs to scan CLI args via flattening. |
| src/main.rs | Apply config precedence for cache settings; add run_rules_compile_cache; default scan args include cache args; tests for precedence. |
| src/cli/config.rs | Extend config schema (rules.cache, rules.cache_dir) and update config parsing tests/examples. |
| src/cli/global.rs | Add --debug alias for --verbose. |
| src/direct_validate.rs | Update minimal scan args construction to include rule cache args. |
| src/reporter/json_format.rs | Update reporter tests/args wiring to include rule cache args. |
| src/reporter.rs | Update reporter tests/args wiring to include rule cache args. |
| tests/int_vulnerable_files.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_validation_cache.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_teams.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_slack.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_redact.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_postman.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_gitlab.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_github.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_dedup.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_bitbucket.rs | Update integration test scan args to include RuleCacheArgs. |
| tests/int_allowlist.rs | Update integration test scan args to include RuleCacheArgs. |
| README.md | Document cache pre-warming flow for faster pre-commit/CI. |
| docs/INSTALLATION.md | Document cache pre-warming in hook instructions. |
| docs/CONFIG.md | Document new config keys for rule cache enablement and directory. |
| docs/ADVANCED.md | Add “Compiled Rule Cache” section explaining behavior and locations. |
| CHANGELOG.md | Add v1.104.0 entry describing default-on caching and new command/logging. |
| Cargo.toml | Bump version to 1.104.0. |
| Cargo.lock | Update lockfile for version bump. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+468
to
+476
| file.write_all(CACHE_MAGIC)?; | ||
| file.write_all(&(header_bytes.len() as u32).to_le_bytes())?; | ||
| file.write_all(&header_bytes)?; | ||
| file.write_all(&db_bytes)?; | ||
| file.sync_all().ok(); | ||
| drop(file); | ||
| fs::rename(&tmp_path, path) | ||
| .with_context(|| format!("rename {} to {}", tmp_path.display(), path.display()))?; | ||
| Ok(()) |
Comment on lines
+418
to
+423
| let header_len = u32::from_le_bytes(len_bytes) as usize; | ||
| let header_start = 4; | ||
| let header_end = header_start + header_len; | ||
| if rest.len() < header_end { | ||
| bail!("truncated cache header"); | ||
| } |
| kingfisher rules compile-cache | ||
| ``` | ||
|
|
||
| Pass `--debug` or `-v` to see which cache directory and cache entry Kingfisher is using, whether it was a hit or miss, and when a new entry is written. |
Comment on lines
+52
to
+60
| /// Directory for the compiled rule cache | ||
| #[arg( | ||
| global = true, | ||
| long = "rule-cache-dir", | ||
| env = "KF_RULE_CACHE_DIR", | ||
| value_name = "PATH", | ||
| value_hint = ValueHint::DirPath | ||
| )] | ||
| pub rule_cache_dir: Option<PathBuf>, |
Comment on lines
+33
to
+41
| /// Cache the compiled Vectorscan rule database between runs (default) | ||
| #[arg( | ||
| global = true, | ||
| long = "rule-cache", | ||
| default_value_t = false, | ||
| conflicts_with = "no_rule_cache", | ||
| hide = true | ||
| )] | ||
| pub rule_cache: bool, |
Comment on lines
+1782
to
+1787
| let rules_db = | ||
| RulesDatabase::from_rules_with_cache(resolved.into_iter().cloned().collect(), &cache) | ||
| .context("Failed to compile rules with Vectorscan cache")?; | ||
|
|
||
| println!("Rule cache ready: {} rules in {}", rules_db.num_rules(), cache.cache_dir().display()); | ||
| Ok(()) |
Comment on lines
+63
to
+66
| impl RuleCacheArgs { | ||
| pub fn enabled(&self) -> bool { | ||
| !self.no_rule_cache | ||
| } |
Comment on lines
1475
to
+1477
| }; | ||
| init_progress.set_message("Recording rules..."); | ||
| datastore | ||
| .lock() | ||
| .unwrap() | ||
| .record_rules(rules_db.rules().iter().cloned().collect::<Vec<_>>().as_slice()); | ||
| datastore.lock().unwrap().record_rules(rules_db.rules().to_vec().as_slice()); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a major performance improvement to Kingfisher by enabling compiled Vectorscan rule caching by default. This change significantly speeds up repeated scans (e.g., pre-commit hooks, CI) by persisting the compiled rule database and reusing it when possible. The cache is automatically refreshed when rules change or custom rules are used, and several configuration options and documentation updates have been added to support this new feature.
Compiled Vectorscan Rule Caching:
--rule-cache-dirorKF_RULE_CACHE_DIR. There is an opt-out flag--no-rule-cache. [1] [2] [3] [4] [5] [6]kingfisher rules compile-cachefor explicit cache prebuilding, and added INFO logging to show cache usage and location. [1] [2]README.md,ADVANCED.md,INSTALLATION.md, andCONFIG.mdto document cache behavior, configuration, and usage examples. [1] [2] [3] [4] [5]RuleCacheConfigstruct, platform-specific cache directory logic, robust cache read/write with validation, and tests to ensure cache correctness and automatic refresh when rules change. [1] [2] [3] [4] [5]Other:
v1.104.0.These changes together make Kingfisher significantly faster and more user-friendly for repeated scans in development and CI environments.