Skip to content

Enable Vectorscan rule caching by default and add regression tests#404

Open
mickgmdb wants to merge 8 commits into
mainfrom
development
Open

Enable Vectorscan rule caching by default and add regression tests#404
mickgmdb wants to merge 8 commits into
mainfrom
development

Conversation

@mickgmdb

Copy link
Copy Markdown
Collaborator

This pull request introduces a major performance improvement to Kingfisher by enabling compiled Vectorscan rule caching by default. This change significantly speeds up repeated scans (e.g., pre-commit hooks, CI) by persisting the compiled rule database and reusing it when possible. The cache is automatically refreshed when rules change or custom rules are used, and several configuration options and documentation updates have been added to support this new feature.

Compiled Vectorscan Rule Caching:

  • Rule cache enabled by default: Kingfisher now persists the compiled Vectorscan rule database by default, dramatically reducing scan startup time for repeated runs. The cache is stored in a platform-appropriate directory unless overridden by --rule-cache-dir or KF_RULE_CACHE_DIR. There is an opt-out flag --no-rule-cache. [1] [2] [3] [4] [5] [6]
  • Cache keying and refresh: Cache entries are keyed on rule content, platform, and Vectorscan version. Changing rules (including custom rules) automatically refreshes the cache entry.
  • New CLI command and logging: Added kingfisher rules compile-cache for explicit cache prebuilding, and added INFO logging to show cache usage and location. [1] [2]
  • Extensive documentation updates: Updated README.md, ADVANCED.md, INSTALLATION.md, and CONFIG.md to document cache behavior, configuration, and usage examples. [1] [2] [3] [4] [5]
  • Robust cache implementation and tests: Added a new RuleCacheConfig struct, platform-specific cache directory logic, robust cache read/write with validation, and tests to ensure cache correctness and automatic refresh when rules change. [1] [2] [3] [4] [5]

Other:

  • Version bump: Bumped Kingfisher version to v1.104.0.
  • Minor codebase refactoring: Some imports and CLI code were updated to support the new caching feature.

These changes together make Kingfisher significantly faster and more user-friendly for repeated scans in development and CI environments.

Copilot AI review requested due to automatic review settings June 19, 2026 17:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables persistent caching of compiled Vectorscan rule databases (to speed up repeated scans) by default, adds a rules compile-cache command for pre-warming the cache, and updates CLI/config/docs/tests across Kingfisher to support and validate the behavior.

Changes:

  • Implement compiled-rule cache keying, on-disk read/write, and cache directory resolution; wire cache usage into scan rule compilation.
  • Add kingfisher rules compile-cache plus new CLI/config plumbing for cache toggles and cache directory overrides.
  • Add regression tests and update docs/changelog/versioning to reflect default-on caching.

Reviewed changes

Copilot reviewed 29 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vendor/vectorscan-rs/vectorscan-rs/src/wrapper.rs Add byte-oriented serialize/deserialize helpers for database persistence.
vendor/vectorscan-rs/vectorscan-rs/src/native.rs Expose Vectorscan runtime version + block DB serialize/deserialize API used by caching.
crates/kingfisher-rules/src/rules_database.rs Implement cache header/keying, default cache dirs, cache load/store, and cache-focused tests.
crates/kingfisher-rules/src/lib.rs Re-export RuleCacheConfig for downstream use.
src/rules_database.rs Re-export RuleCacheConfig from kingfisher_rules.
src/scanner/runner.rs Use cached compilation path when rule cache is enabled.
src/cli/commands/rules.rs Add RuleCacheArgs, RuleCacheDirArgs, and rules compile-cache CLI surface.
src/cli/commands/scan.rs Add RuleCacheArgs to scan CLI args via flattening.
src/main.rs Apply config precedence for cache settings; add run_rules_compile_cache; default scan args include cache args; tests for precedence.
src/cli/config.rs Extend config schema (rules.cache, rules.cache_dir) and update config parsing tests/examples.
src/cli/global.rs Add --debug alias for --verbose.
src/direct_validate.rs Update minimal scan args construction to include rule cache args.
src/reporter/json_format.rs Update reporter tests/args wiring to include rule cache args.
src/reporter.rs Update reporter tests/args wiring to include rule cache args.
tests/int_vulnerable_files.rs Update integration test scan args to include RuleCacheArgs.
tests/int_validation_cache.rs Update integration test scan args to include RuleCacheArgs.
tests/int_teams.rs Update integration test scan args to include RuleCacheArgs.
tests/int_slack.rs Update integration test scan args to include RuleCacheArgs.
tests/int_redact.rs Update integration test scan args to include RuleCacheArgs.
tests/int_postman.rs Update integration test scan args to include RuleCacheArgs.
tests/int_gitlab.rs Update integration test scan args to include RuleCacheArgs.
tests/int_github.rs Update integration test scan args to include RuleCacheArgs.
tests/int_dedup.rs Update integration test scan args to include RuleCacheArgs.
tests/int_bitbucket.rs Update integration test scan args to include RuleCacheArgs.
tests/int_allowlist.rs Update integration test scan args to include RuleCacheArgs.
README.md Document cache pre-warming flow for faster pre-commit/CI.
docs/INSTALLATION.md Document cache pre-warming in hook instructions.
docs/CONFIG.md Document new config keys for rule cache enablement and directory.
docs/ADVANCED.md Add “Compiled Rule Cache” section explaining behavior and locations.
CHANGELOG.md Add v1.104.0 entry describing default-on caching and new command/logging.
Cargo.toml Bump version to 1.104.0.
Cargo.lock Update lockfile for version bump.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +468 to +476
file.write_all(CACHE_MAGIC)?;
file.write_all(&(header_bytes.len() as u32).to_le_bytes())?;
file.write_all(&header_bytes)?;
file.write_all(&db_bytes)?;
file.sync_all().ok();
drop(file);
fs::rename(&tmp_path, path)
.with_context(|| format!("rename {} to {}", tmp_path.display(), path.display()))?;
Ok(())
Comment on lines +418 to +423
let header_len = u32::from_le_bytes(len_bytes) as usize;
let header_start = 4;
let header_end = header_start + header_len;
if rest.len() < header_end {
bail!("truncated cache header");
}
Comment thread docs/ADVANCED.md Outdated
kingfisher rules compile-cache
```

Pass `--debug` or `-v` to see which cache directory and cache entry Kingfisher is using, whether it was a hit or miss, and when a new entry is written.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 36 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings June 20, 2026 00:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 44 changed files in this pull request and generated 3 comments.

Comment thread src/cli/commands/rules.rs
Comment on lines +52 to +60
/// Directory for the compiled rule cache
#[arg(
global = true,
long = "rule-cache-dir",
env = "KF_RULE_CACHE_DIR",
value_name = "PATH",
value_hint = ValueHint::DirPath
)]
pub rule_cache_dir: Option<PathBuf>,
Comment thread src/cli/commands/rules.rs
Comment on lines +33 to +41
/// Cache the compiled Vectorscan rule database between runs (default)
#[arg(
global = true,
long = "rule-cache",
default_value_t = false,
conflicts_with = "no_rule_cache",
hide = true
)]
pub rule_cache: bool,
Comment thread src/main.rs
Comment on lines +1782 to +1787
let rules_db =
RulesDatabase::from_rules_with_cache(resolved.into_iter().cloned().collect(), &cache)
.context("Failed to compile rules with Vectorscan cache")?;

println!("Rule cache ready: {} rules in {}", rules_db.num_rules(), cache.cache_dir().display());
Ok(())

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 124 out of 128 changed files in this pull request and generated 3 comments.

Comment thread src/matcher/mod.rs Outdated
Comment thread src/matcher/mod.rs Outdated
Comment thread src/cli/commands/rules.rs
Comment on lines +63 to +66
impl RuleCacheArgs {
pub fn enabled(&self) -> bool {
!self.no_rule_cache
}

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 124 out of 128 changed files in this pull request and generated 1 comment.

Comment thread src/scanner/runner.rs
Comment on lines 1475 to +1477
};
init_progress.set_message("Recording rules...");
datastore
.lock()
.unwrap()
.record_rules(rules_db.rules().iter().cloned().collect::<Vec<_>>().as_slice());
datastore.lock().unwrap().record_rules(rules_db.rules().to_vec().as_slice());
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants