Skip to content

Rate & Memory limiter framework #919

@jmacd

Description

@jmacd

Part of open-telemetry/opentelemetry-collector#9591
Part of open-telemetry/opentelemetry-collector#12603

Covering a series of draft Collector changes, presently the most recent:
open-telemetry/opentelemetry-collector#13265

In #otel-arrow-dev a conversation on this topic specific to Rust/Arrow, selected fragments:

@jmacd wrote:

Being familiar with Go and gRPC-Go, I've thought about how to manage the contextual information that goes with requests in the Go collector. This includes state client.Info covering key:values from the arriving gRPC or HTTP headers, from the Auth extension, timeout and cancellation, and e.g., tracing state (SpanContext).

Looking to @lquerel especially for design thoughts on. How does the rate limiter processor briefly pause a request, if the token bucket says to wait for an interval? If a rate limiter forwards a request (with or without a delay) does it need to track active requests to match downstream Nack controls to upstream Nack controls? Will every component that maps incoming message IDs to outgoing message IDs ...

After studying the code, my first guess would be to extend the Message::PData enum like

    /// A pipeline data message traversing the pipeline with context
    PData{ data: PData, ctx: Context },

The question is how to manage metadata used for limiter requests as part of the Context. See Rate limit — envoy 1.36.0-dev-bad4e8 documentation for the model that inspires this question.

@lquerel responded with helpful ideas:

This might be a bit of an oversimplification, but for me the rate limiter is a processor that regulates the speed of incoming pdata messages based on a configuration. As long as the maximum rate hasn't been reached, the rate limiter acts as a pass-through. As soon as the rate reaches (or approaches) or exceeds the limit, the rate limiter pauses and stops reading incoming messages for a duration defined by the rate limiter. During this pause, messages from upstream components will accumulate in the pdata channel, which will have several consequences:
the output rate of the pdata will drop,
the backpressure mechanism (since all channels are bounded) will slow down the upstream nodes (e.g. a receiver).
Unlike the batch processor, this processor doesn’t accumulate pdata in its state; it accumulates them in its pdata channel. The rate limiter’s state is only used to measure, in real time, an estimate of the output message rate.

The advantage of implementing the rate limiter as a "normal" processor is that it can be placed anywhere in the DAG, and even in several branches of the DAG to handle complex cases. We could even imagine leveraging the out-port mechanism for the rate limiter to express overflow policies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    flow-controlCovers pipeline-/flow-level control mechanisms such as rate limiting, backpressure, etc.resource-control

    Type

    No type

    Projects

    Status

    Priority 2

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions