Internal logging safety: protocol buffer size limits by jmacd · Pull Request #2792 · open-telemetry/otel-arrow

jmacd · 2026-04-30T16:25:08Z

Change Summary

Makes protocol buffer generation limited. Use a stack-allocated array (256 bytes) for logged attributes, a fixed constant.

Replaces try_into for OtapPayloads with try_into_with_options and try_into_with_default to apply default settings. The default size is large to avoid surprises (256MiB, the hard max implied by 4-byte placeholders). In a future PR we can tighten this and add configurable limits that apply by default anywhere OTLP protos are generated in a pipeline.

Implements dropped_attributes accounting for records the internal logging pipeline.

Implements string truncation for string-valued fields, with [...] inserted if possible. Truncated fields are counted in dropped_attributes. This applies a measure of fairness to the fields of a log record as follows: when the string value is inserted it is allowed to use up to 50% of the remaining bytes or be truncated.

What issue does this PR close?

Fixes #1746
Part of #2725

How are these changes tested?

A new benchmark was added, a heavy self-tracing workload with 10 string attributes and body, deliberately overflowing the 256B inline budget. The old code would re-allocate a Vec to log this event, the new code does not.

Benchmark	upstream/main	jmacd/logs_sdk	Δ
`new_record`	170.4 µs	183.2 µs	+7.5%
`encode_proto`	195.3 µs	195.2 µs	~0%
`encode_proto_with_scope`	206.0 µs	225.6 µs	+9.5%
`encode_and_format`	862.1 µs	559.6 µs	−35.1%
`format_with_entity`	934.3 µs	627.5 µs	−32.8%

There is some cost for protocol writers, this is the cost of safety. For console writers, this is a win because we use a stack-allocated buffer for the attributes encoding. Note that raw_error! is now allocation-free.

Are there any user-facing changes?

Yes. We begin to observe dropped logs attributes instead of permitting very large log records in the internal logs pipeline.

Two features for issue open-telemetry#1746: 1. ProtoBuffer<const INLINE: usize = 0> is now generic over inline storage size. INLINE=0 (default) behaves like Vec<u8>; INLINE>0 uses SmallVec to keep data on the stack during encoding, avoiding heap allocation in the safety-critical encoding phase. 2. DirectFieldVisitor now tracks dropped_count—the number of attribute fields dropped due to truncation. LogRecord exposes this as dropped_attributes_count. The new LogRecord::new_bounded() constructor reserves 3 bytes (tag + 2-byte varint) by reducing the buffer limit, guaranteeing space for the dropped_attributes_count protobuf field even when the buffer is full. raw_error! is now fully zero-allocation: it encodes body/attributes into a stack-allocated ProtoBuffer<256> and formats directly from the raw bytes via ConsoleWriter::print_raw_log(), bypassing LogRecord and Bytes entirely. LogRecord::body_attrs_bytes remains Bytes for cheap reference-counted cloning through channels and retention. The encoding phase uses ProtoBuffer<256> on the stack; after encoding completes the result is converted to Bytes with a correctly-sized allocation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three features for issue open-telemetry#1746: 1. ProtoBufferInline<const INLINE: usize> is the new generic protobuf encoding buffer backed by SmallVec. ProtoBuffer is a type alias for ProtoBufferInline<0> (heap-only, matching the old behavior). 2. Replaced truncated:bool with dropped:u32 — counts how many fields were dropped due to the size limit. try_encode(f) provides atomic field-level writes: either the entire field lands or the buffer is unchanged and the dropped counter increments by one. 3. DirectFieldVisitor uses try_encode() for concise, correct attribute encoding. No more manual checkpoint/rollback/count boilerplate. raw_error! is fully zero-allocation via __encode_event_impl! shared with __log_record_impl./tools/sanitycheck.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BorrowedLogRecord<'a> is a lightweight borrowed view of a log record that carries &[u8] body/attrs bytes, a SavedCallsite, and a dropped count. Both raw_error! (zero-alloc stack path) and LogRecord (owned Bytes path) now format through the same ConsoleWriter::print_log() and StyledBufWriter::format_log() methods via this shared type. - LogRecord::as_view() -> BorrowedLogRecord<'_> for zero-copy access - raw_error! constructs BorrowedLogRecord directly from stack buffer - Removed print_raw_log() and format_raw_log() (replaced by print_log/format_log) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move the zero-alloc encode+format logic into ConsoleWriter::print_event(), eliminating __encode_event_impl! and the _private::ProtoBufferInline re-export. raw_error! now just defines its callsite and calls print_event(&event) — same pattern as __log_record_impl! but without the Bytes allocation. Also removes LogRecord::from_encoded() which is no longer needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

__log_record_impl! now returns StackLogRecord — an encoded log record that lives on the stack. Callers choose how to consume it: - .into_record(context) for owned LogRecord with Bytes This makes raw_error! a one-line change from the original: just print_log(now, &record.as_view(), ...) instead of print_log_record(). Removes ConsoleWriter::print_event() and LogRecord::from_encoded() which are no longer needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move LOG_RECORD_DROPPED_ATTRIBUTES_COUNT encoding from LogRecord construction to encode_log_record(), where timestamp/severity are encoded. The dropped count is retained as a LogRecord field until the final OTLP encoding. Also make new_bounded() generic on INLINE, with new() delegating as new_bounded::<ENCODE_INLINE>(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d/logs_sdk

Replace all proto_encode_len_delimited_unknown_size! macro invocations with ProtoBufferInline::encode_len_delimited() method calls, which auto-select the narrowest placeholder width based on remaining buffer space. Remove the _unknown_size, _small, and _large macro variants. Retain proto_encode_len_delimited_of_size! only for tests needing explicit width control. Add test_encode_len_delimited_auto_width to verify width selection at boundary values (127 vs 128 byte buffers). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When encoding multiple attributes from a tracing event into a bounded inline buffer, a single large value used to be able to consume all remaining space, dropping every subsequent attribute entirely. Now each attribute is allocated at most half of the currently-remaining buffer space, so a long sequence of large values truncates progressively rather than blowing out the buffer on the first one. String attributes additionally use a new truncating encoder (`encode_string_truncating`, `encode_len_delimited_partial`) so that oversize values are clipped with the existing `[...]` suffix and still appear (in truncated form) in the OTLP output, while still being counted in `dropped_attributes_count`. For a 256-byte inline buffer with 4 attributes whose values are 1000 bytes each, all four values are now preserved in truncated form with lengths roughly [118, 55, 23, 7] bytes and `dropped_attributes_count` is 4. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…corruption Two related fixes for visibility into truncated/dropped attributes when log encoding overflows the inline ENCODE_INLINE buffer: 1. encoder.rs: wrap encode_body_string and encode_body_debug in try_encode. The previous code used encode_len_delimited directly, which on overflow leaves an unpatched length placeholder followed by partial body content bytes in the inline buffer. That malformed tail caused the parser to misread subsequent fields appended by encode_log_record (notably field 7 dropped_attributes_count), so downstream views always reported 0 drops. With try_encode, partial body bytes are rolled back cleanly on overflow, leaving the buffer well-formed for later appends. 2. formatter.rs: render dropped_attributes_count as a ' (N dropped)' suffix in write_body_and_attrs when it is non-zero, so operators can see when fair-budget halving caused truncation/drops. Adds a tracing_init regression test that drives many oversized attributes through the full ITS encode path (encode_export_logs_request) and parses them back via RawLogsData, confirming the dropped count round-trips through encode/parse. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds focused unit tests in pdata/otlp/common.rs for the transactional and bounded encoding mechanisms introduced for self-tracing: - try_encode_rolls_back_buffer_on_error: bytes are byte-identical before and after a failing closure. - encode_len_delimited_does_not_patch_on_inner_err: documents the trap (unpatched placeholder + partial bytes) that the body-encoder fix in the previous commit had to work around, and shows the safe pattern of wrapping in try_encode. - encode_len_delimited_partial_patches_length_on_err: partial-mode produces a valid LEN-prefixed wire field even when the inner closure returns Err, so subsequent fields parse correctly. - encode_string_truncating_three_outcomes: Ok(false) full fit, Ok(true) truncated with [...] suffix, Err(Dropped) hard fail leaves buffer unchanged. - encode_string_truncating_round_trips_via_prost: truncated bytes decode as a valid AnyValue ending in the suffix. - truncate_at_utf8_boundary_does_not_panic: encoding 4-byte UTF-8 characters near the truncation boundary is safe. - with_max_remaining_{restores_outer_limit,nests_correctly, saturates_and_honors_outer}: scoped limit lifecycle, nested composition, and saturating math against the outer limit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When the per-line stack buffer fills up mid-format, the underlying Cursor write_all returns Err(WriteZero) and the terminating '\n' is silently dropped (`let _ = self.write_all(b"\\n")`). The next log line is appended directly after, producing garbled output like '…RESOURCE[02026-04-27T15:48:52…│ SCOPE[2026-04-27T15:48:52…'. Add StyledBufWriter::finish_line which always terminates the line: if the buffer is already full, back up one byte and overwrite with '\n' rather than losing the newline. format_line_impl now calls finish_line for both log lines and RESOURCE/SCOPE header lines. Includes a unit test covering the overflow, normal, and exactly-full cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d/logs_sdk

…ckProtoBuffer The self_tracing log encoding path used SmallVec<[u8; 256]> for inline storage, which incurs a per-call inline-vs-spilled branch on every push and caused a ~4.5x regression on the encode hot path. Refactor splits storage into two concrete types sharing encoding logic via a BoundedBuf trait with default methods: - ProtoBuffer: Vec-backed, used for unbounded (or large-capacity) encodes. New with_capacity_and_limit() pre-allocates to the bound so the bounded log path performs at most one terminal alloc. - StackProtoBuffer<const N: usize>: array-backed, fully stack-resident, zero-alloc; preserves the raw_error! zero-alloc guarantee. All pdata encoders, the self_tracing encoder, and the OTLP exporters are migrated from <const INLINE: usize> + ProtoBufferInline<INLINE> to <B: BoundedBuf> generics. ProtoBufferInline is removed. Bench results vs upstream/main (self_tracing realistic): new_record 163 -> 170 us (+4%) encode_proto 177 -> 191 us (+8%) encode_proto_with_scope 194 -> 230 us (+19%) encode_and_format 799 -> 540 us (-32%) format_with_entity 851 -> 548 us (-36%) Formatter benches now beat upstream because the bounded encode path avoids upstream's allocate-then-truncate work at format time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…Buf imports After the BoundedBuf trait refactor, callers holding a concrete ProtoBuffer needed to import the trait just to call len()/clear()/ is_empty(). Restore parity with the upstream public API by exposing these as inherent methods that forward to the underlying Vec. This also lets the self_tracing benchmark file compile unchanged on both upstream/main and this branch (cross-branch perf comparison). Removes now-unused 'use crate::otlp::common::BoundedBuf' from decoder.rs, encode/mod.rs, two common.rs tests, and both OTLP exporters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

codecov · 2026-04-30T16:33:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.27%. Comparing base (524b78d) to head (c99f8e3).

❌ Your project check has failed because the head coverage (82.27%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2792       +/-   ##
===========================================
- Coverage   86.09%   82.27%    -3.82%     
===========================================
  Files         694      181      -513     
  Lines      263421    52878   -210543     
===========================================
- Hits       226789    43505   -183284     
+ Misses      36108     8849    -27259     
  Partials      524      524

Components	Coverage Δ
otap-dataflow	`∅ <ø> (∅)`
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.76% <ø> (ø)`
otel-arrow-go	`52.45% <ø> (ø)`
quiver	`∅ <ø> (∅)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jmacd and others added 25 commits April 17, 2026 17:26

truncation suport

ba1ba80

restore ProtoBuffer::new() to minimize diff

bce4125

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

try_from_with_options

b027863

wip

90cb985

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

f9a4891

…d/logs_sdk

try_from/into_with_default

6cab00c

wip

39a89ef

edit

906ca7b

placedholder width is dynamic (part 1)

53991db

256/512

5530f99

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

52cb8c0

…d/logs_sdk

realistic bench

833caec

github-project-automation Bot added this to OTel-Arrow Apr 30, 2026

github-actions Bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches labels Apr 30, 2026

truncate test

c99f8e3

jmacd mentioned this pull request May 2, 2026

Add try_into_with_options with empty ConversionOptions struct #2811

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal logging safety: protocol buffer size limits#2792

Internal logging safety: protocol buffer size limits#2792
jmacd wants to merge 26 commits intoopen-telemetry:mainfrom
jmacd:jmacd/logs_sdk

jmacd commented Apr 30, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmacd commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

What issue does this PR close?

How are these changes tested?

Are there any user-facing changes?

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jmacd commented Apr 30, 2026 •

edited

Loading

codecov Bot commented Apr 30, 2026 •

edited

Loading