feat: add configurable ParquetMergePolicyConfig to index settings#6362
Merged
feat: add configurable ParquetMergePolicyConfig to index settings#6362
Conversation
8 tasks
Base automatically changed from
gtt/parquet-merge-policy
to
matthew.kim/metrics-partitioning
April 29, 2026 21:04
Adds `parquet_merge_policy` section to `IndexingSettings`, making the Parquet merge policy configurable per-index via YAML. Parameters: - merge_factor (default 10): min splits to trigger a merge - max_merge_factor (default 12): max splits per merge - max_merge_ops (default 4): bounds write amplification - target_split_size_bytes (default 256 MiB): target output size - maturation_period (default 48h): split maturity timeout - max_finalize_merge_operations (default 3): cold-window shutdown limit Mirrors the existing merge_policy config pattern for logs/traces. Updates index-config.md documentation with the new section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…secs Adds `parquet_indexing` section to `IndexingSettings` for per-index Parquet pipeline configuration: - `sort_fields`: sort schema override (Husky-style pipe-delimited syntax with /V2 suffix). Controls row ordering, query pruning, compression locality, and compaction scope. When omitted, uses the product-type default. - `window_duration_secs`: time window for split partitioning (default 900s / 15 min). Must divide 3600. Updates docs/configuration/index-config.md with: - "Parquet indexing settings" section explaining both parameters - Full sort schema syntax reference (column types, direction overrides, & LSM cutoff marker) - Examples showing minimal, custom, and advanced configurations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4fae305 to
ebab487
Compare
mattmkim
approved these changes
Apr 30, 2026
ebab487 to
8656c44
Compare
guilload
reviewed
Apr 30, 2026
c7e3eff to
d1e5980
Compare
74b1ce1 to
fc03d3d
Compare
Adding ParquetMergePolicyConfig and ParquetIndexingConfig to IndexingSettings changes the Hash output, which changes the pipeline params fingerprints. Updated the hardcoded test constants. Added a comment explaining how to recompute them when IndexingSettings fields change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fc03d3d to
58d07d0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #6351 (Phase 2 merge policy). Addresses review feedback to make the Parquet merge policy configurable per-index.
Adds
parquet_merge_policysection toIndexingSettings, exposing all merge policy parameters via YAML:Mirrors the existing
merge_policyconfig pattern for logs/traces (StableLog/ConstWriteAmplification).Changes
quickwit-config: NewParquetMergePolicyConfigstruct with serde + human-readable durationsIndexingSettings: Newparquet_merge_policyfield (defaults match current hardcoded values)docs/configuration/index-config.md: New "Parquet merge policy" documentation sectionFollow-up
The downstream wiring (converting this config to
Arc<dyn ParquetMergePolicy>inquickwit-indexing) is done in the Phase 3 pipeline PRs.Test plan
cargo clippyclean🤖 Generated with Claude Code