Skip to content

feat(source sender): make chunk size configurable#25637

Open
sakateka wants to merge 1 commit into
vectordotdev:masterfrom
sakateka:configurable-chunk-size
Open

feat(source sender): make chunk size configurable#25637
sakateka wants to merge 1 commit into
vectordotdev:masterfrom
sakateka:configurable-chunk-size

Conversation

@sakateka

Copy link
Copy Markdown

Summary

Hi! First, thank you for building and maintaining such a great project.

I would like to propose a feature that helps us reduce memory usage for Vector DaemonSet pods in production.

We use vector-operator and try to configure each pipeline through its own VectorPipeline custom resource as a fully independent source -> transforms... -> sink chain. Users configure these chains independently in their own Kubernetes namespaces, so a configuration issue in one user's pipeline does not affect other users because the pipelines are isolated.

The tradeoff is that the logging agent running on Kubernetes nodes can become quite memory-heavy when many independent pipelines are configured. I tried tuning VECTOR_THREADS, which helped somewhat, but not enough. After looking through the codebase, I found that CHUNK_SIZE is a useful control point for reducing memory usage because it affects source sender batching and source output buffer sizing.

I added this functionality in a fork and tested it in our testing environment. For example, with VECTOR_CHUNK_SIZE=200, 12 independent pipelines fit into ~2 GB of memory with when_full: block configured at every stage, even when several pipelines were blocked because their sinks could not send logs while the downstream collector was unavailable. Vector also handled log write spikes without issues. Interestingly, Vector's own operational metrics in our Grafana dashboards also appeared to become more stable and predictable.

This PR adds --chunk-size / VECTOR_CHUNK_SIZE so operators can tune this value without rebuilding Vector.

If this is not the right direction, I would be happy to hear your suggestions.

Vector configuration

Example CLI usage:

vector --config ./vector.yaml --chunk-size 200

The same value can be configured with the environment variable:

VECTOR_CHUNK_SIZE=200 vector --config ./vector.yaml

Minimal example configuration:

# any valid Vector configuration can be used
sources:
  in:
    type: stdin

sinks:
  out:
    type: console
    inputs:
      - in
    encoding:
      codec: json

How did you test this PR?

Ran:

cargo check -p vector-core --no-default-features
cargo check -p vector-core --no-default-features --features test --tests
cargo clippy -p vector --lib

cargo vdev run ./test-cfg.yml -- --chunk-size 20 -vvv
VECTOR_CHUNK_SIZE=200 cargo vdev run ./test-cfg.yml -- -vvv

Also tested this change in our fork in a Kubernetes environment with multiple independent VectorPipeline pipelines and VECTOR_CHUNK_SIZE=200.

Change Type

  • New feature

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Changelog fragment is added based on the guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

None.

Notes

  • Added a changelog fragment.
  • Added CLI reference documentation for --chunk-size and VECTOR_CHUNK_SIZE.
  • This PR does not change Vector dependencies.

@sakateka sakateka requested review from a team as code owners June 15, 2026 22:06
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@github-actions github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: topology Anything related to Vector's topology code domain: external docs Anything related to Vector's external, public documentation domain: core Anything related to core crates i.e. vector-core, core-common, etc and removed docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. labels Jun 15, 2026
@sakateka

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba2d828047

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/builder.rs

pub(crate) static SOURCE_SENDER_BUFFER_SIZE: LazyLock<usize> =
LazyLock::new(|| *TRANSFORM_CONCURRENCY_LIMIT * CHUNK_SIZE);
LazyLock::new(|| *TRANSFORM_CONCURRENCY_LIMIT * chunk_size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject chunk sizes that overflow buffer capacity

When a user supplies a very large but parseable --chunk-size/VECTOR_CHUNK_SIZE, this unchecked multiplication can wrap in release builds. For example, on a 64-bit build --threads 2 --chunk-size 9223372036854775808 makes the source sender buffer size wrap to 0, and the first source build then panics in Output::new_with_buffer when it unwraps NonZeroUsize::new(n); other overflowing values silently produce a much smaller buffer than requested. Please validate the configured chunk size or use checked arithmetic and return a config error instead of allowing startup panics/wraparound.

Useful? React with 👍 / 👎.

@domalessi domalessi self-assigned this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: external docs Anything related to Vector's external, public documentation domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants