Skip to content

feat: add configurable ParquetMergePolicyConfig to index settings#6362

Merged
g-talbot merged 3 commits intomainfrom
gtt/parquet-merge-policy-config
Apr 30, 2026
Merged

feat: add configurable ParquetMergePolicyConfig to index settings#6362
g-talbot merged 3 commits intomainfrom
gtt/parquet-merge-policy-config

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

Summary

Stacked on #6351 (Phase 2 merge policy). Addresses review feedback to make the Parquet merge policy configurable per-index.

Adds parquet_merge_policy section to IndexingSettings, exposing all merge policy parameters via YAML:

indexing_settings:
  parquet_merge_policy:
    merge_factor: 10
    max_merge_factor: 12
    max_merge_ops: 4
    target_split_size_bytes: 268435456  # 256 MiB
    maturation_period: 48h
    max_finalize_merge_operations: 3

Mirrors the existing merge_policy config pattern for logs/traces (StableLog/ConstWriteAmplification).

Changes

  • quickwit-config: New ParquetMergePolicyConfig struct with serde + human-readable durations
  • IndexingSettings: New parquet_merge_policy field (defaults match current hardcoded values)
  • docs/configuration/index-config.md: New "Parquet merge policy" documentation section

Follow-up

The downstream wiring (converting this config to Arc<dyn ParquetMergePolicy> in quickwit-indexing) is done in the Phase 3 pipeline PRs.

Test plan

  • 95 existing config tests pass
  • cargo clippy clean
  • Documentation updated with parameter table and YAML example

🤖 Generated with Claude Code

Base automatically changed from gtt/parquet-merge-policy to matthew.kim/metrics-partitioning April 29, 2026 21:04
@g-talbot g-talbot requested review from guilload and mattmkim and removed request for guilload and mattmkim April 30, 2026 02:32
Base automatically changed from matthew.kim/metrics-partitioning to main April 30, 2026 13:34
g-talbot and others added 2 commits April 30, 2026 09:40
Adds `parquet_merge_policy` section to `IndexingSettings`, making the
Parquet merge policy configurable per-index via YAML. Parameters:

- merge_factor (default 10): min splits to trigger a merge
- max_merge_factor (default 12): max splits per merge
- max_merge_ops (default 4): bounds write amplification
- target_split_size_bytes (default 256 MiB): target output size
- maturation_period (default 48h): split maturity timeout
- max_finalize_merge_operations (default 3): cold-window shutdown limit

Mirrors the existing merge_policy config pattern for logs/traces.
Updates index-config.md documentation with the new section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…secs

Adds `parquet_indexing` section to `IndexingSettings` for per-index
Parquet pipeline configuration:

- `sort_fields`: sort schema override (Husky-style pipe-delimited
  syntax with /V2 suffix). Controls row ordering, query pruning,
  compression locality, and compaction scope. When omitted, uses
  the product-type default.
- `window_duration_secs`: time window for split partitioning
  (default 900s / 15 min). Must divide 3600.

Updates docs/configuration/index-config.md with:
- "Parquet indexing settings" section explaining both parameters
- Full sort schema syntax reference (column types, direction
  overrides, & LSM cutoff marker)
- Examples showing minimal, custom, and advanced configurations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/parquet-merge-policy-config branch 3 times, most recently from 4fae305 to ebab487 Compare April 30, 2026 14:17
@g-talbot g-talbot force-pushed the gtt/parquet-merge-policy-config branch from ebab487 to 8656c44 Compare April 30, 2026 14:32
Comment thread quickwit/quickwit-config/src/index_config/mod.rs Outdated
@g-talbot g-talbot force-pushed the gtt/parquet-merge-policy-config branch 2 times, most recently from c7e3eff to d1e5980 Compare April 30, 2026 19:16
@g-talbot g-talbot requested a review from guilload April 30, 2026 19:19
@g-talbot g-talbot force-pushed the gtt/parquet-merge-policy-config branch 2 times, most recently from 74b1ce1 to fc03d3d Compare April 30, 2026 19:58
Adding ParquetMergePolicyConfig and ParquetIndexingConfig to
IndexingSettings changes the Hash output, which changes the pipeline
params fingerprints. Updated the hardcoded test constants.

Added a comment explaining how to recompute them when IndexingSettings
fields change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/parquet-merge-policy-config branch from fc03d3d to 58d07d0 Compare April 30, 2026 20:04
@g-talbot g-talbot merged commit a36f4fe into main Apr 30, 2026
9 checks passed
@g-talbot g-talbot deleted the gtt/parquet-merge-policy-config branch April 30, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants