Skip to content

feat(tag_cardinality_limit): add per-tag cache_size_per_key override in probabilistic mode#25650

Open
ArunPiduguDD wants to merge 1 commit into
masterfrom
feature/tag-cardinality-per-tag-cache-size
Open

feat(tag_cardinality_limit): add per-tag cache_size_per_key override in probabilistic mode#25650
ArunPiduguDD wants to merge 1 commit into
masterfrom
feature/tag-cardinality-per-tag-cache-size

Conversation

@ArunPiduguDD

@ArunPiduguDD ArunPiduguDD commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

In probabilistic mode, the bloom filter cache size (cache_size_per_key) was a single global/per-metric setting inherited by every tag. However, when specifying a per-tag override, it doesn't really make sense to inherit this value, especially in cases where the per-tag override specifies a much higher limit than the global/per-metric limit.

This adds an optional cache_size_per_key field to per-tag limit_override entries. When set, it overrides the bloom filter size for that tag only. When omitted, it inherits from the enclosing per-metric or global config as before. Setting it in exact mode is silently ignored.

Example:

mode: probabilistic
cache_size_per_key: 5120
per_tag_limits:
  trace_id:
    mode: limit_override
    value_limit: 1000
    cache_size_per_key: 32768  # bigger filter for this tag only

Vector configuration

See example above. Works in both top-level per_tag_limits and inside per_metric_limits.

How did you test this PR?

  • Added unit tests for apply_cache_size_override covering probabilistic+Some, probabilistic+None, and exact+Some (no-op).
  • Added YAML deserialization tests for both per_metric_limits and global per_tag_limits scopes.
  • All existing 41 tag cardinality tests continue to pass.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

@github-actions github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: transforms Anything related to Vector's transform components domain: external docs Anything related to Vector's external, public documentation and removed docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. labels Jun 18, 2026
@ArunPiduguDD ArunPiduguDD marked this pull request as ready for review June 18, 2026 02:36
@ArunPiduguDD ArunPiduguDD requested review from a team as code owners June 18, 2026 02:36
@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch from 3bc46fc to b845b1c Compare June 18, 2026 02:47
@github-actions github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label Jun 18, 2026
@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch from b845b1c to 9850e50 Compare June 18, 2026 02:48
@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch from 9850e50 to ce0e240 Compare June 18, 2026 02:49

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce0e240502

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 191 to 194
type: {
required: true
type: string: enum: {
file: "Exposes data from a static file as an enrichment table."
memory: """
Exposes data from a memory cache as an enrichment table. The cache can be written to using
a sink.
"""
geoip: """
Exposes data from a [MaxMind][maxmind] [GeoIP2][geoip2] database as an enrichment table.

[maxmind]: https://www.maxmind.com/
[geoip2]: https://www.maxmind.com/en/geoip2-databases
"""
mmdb: """
Exposes data from a [MaxMind][maxmind] database as an enrichment table.

[maxmind]: https://www.maxmind.com/
"""
}
type: string: enum: file: "Exposes data from a static file as an enrichment table."
description: "enrichment table type"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore generated metadata for supported config types

This regenerated reference metadata narrows enrichment_tables.*.type to only file, even though the code still defines memory, geoip, and mmdb enrichment table variants in src/enrichment_tables/mod.rs; the same hunk also drops unrelated API and aws_secrets_manager metadata. As committed, the generated configuration reference will stop documenting valid production configuration options, so this file should be regenerated with the full feature set or reverted except for the tag-cardinality changes.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: external docs Anything related to Vector's external, public documentation domain: transforms Anything related to Vector's transform components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant