Internal logging safety: protocol buffer size limits#2792
Draft
jmacd wants to merge 26 commits intoopen-telemetry:mainfrom
Draft
Internal logging safety: protocol buffer size limits#2792jmacd wants to merge 26 commits intoopen-telemetry:mainfrom
jmacd wants to merge 26 commits intoopen-telemetry:mainfrom
Conversation
Two features for issue open-telemetry#1746: 1. ProtoBuffer<const INLINE: usize = 0> is now generic over inline storage size. INLINE=0 (default) behaves like Vec<u8>; INLINE>0 uses SmallVec to keep data on the stack during encoding, avoiding heap allocation in the safety-critical encoding phase. 2. DirectFieldVisitor now tracks dropped_count—the number of attribute fields dropped due to truncation. LogRecord exposes this as dropped_attributes_count. The new LogRecord::new_bounded() constructor reserves 3 bytes (tag + 2-byte varint) by reducing the buffer limit, guaranteeing space for the dropped_attributes_count protobuf field even when the buffer is full. raw_error! is now fully zero-allocation: it encodes body/attributes into a stack-allocated ProtoBuffer<256> and formats directly from the raw bytes via ConsoleWriter::print_raw_log(), bypassing LogRecord and Bytes entirely. LogRecord::body_attrs_bytes remains Bytes for cheap reference-counted cloning through channels and retention. The encoding phase uses ProtoBuffer<256> on the stack; after encoding completes the result is converted to Bytes with a correctly-sized allocation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three features for issue open-telemetry#1746: 1. ProtoBufferInline<const INLINE: usize> is the new generic protobuf encoding buffer backed by SmallVec. ProtoBuffer is a type alias for ProtoBufferInline<0> (heap-only, matching the old behavior). 2. Replaced truncated:bool with dropped:u32 — counts how many fields were dropped due to the size limit. try_encode(f) provides atomic field-level writes: either the entire field lands or the buffer is unchanged and the dropped counter increments by one. 3. DirectFieldVisitor uses try_encode() for concise, correct attribute encoding. No more manual checkpoint/rollback/count boilerplate. raw_error! is fully zero-allocation via __encode_event_impl! shared with __log_record_impl./tools/sanitycheck.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BorrowedLogRecord<'a> is a lightweight borrowed view of a log record that carries &[u8] body/attrs bytes, a SavedCallsite, and a dropped count. Both raw_error! (zero-alloc stack path) and LogRecord (owned Bytes path) now format through the same ConsoleWriter::print_log() and StyledBufWriter::format_log() methods via this shared type. - LogRecord::as_view() -> BorrowedLogRecord<'_> for zero-copy access - raw_error! constructs BorrowedLogRecord directly from stack buffer - Removed print_raw_log() and format_raw_log() (replaced by print_log/format_log) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the zero-alloc encode+format logic into ConsoleWriter::print_event(), eliminating __encode_event_impl! and the _private::ProtoBufferInline re-export. raw_error! now just defines its callsite and calls print_event(&event) — same pattern as __log_record_impl! but without the Bytes allocation. Also removes LogRecord::from_encoded() which is no longer needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
__log_record_impl! now returns StackLogRecord — an encoded log record that lives on the stack. Callers choose how to consume it: - .into_record(context) for owned LogRecord with Bytes This makes raw_error! a one-line change from the original: just print_log(now, &record.as_view(), ...) instead of print_log_record(). Removes ConsoleWriter::print_event() and LogRecord::from_encoded() which are no longer needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move LOG_RECORD_DROPPED_ATTRIBUTES_COUNT encoding from LogRecord construction to encode_log_record(), where timestamp/severity are encoded. The dropped count is retained as a LogRecord field until the final OTLP encoding. Also make new_bounded() generic on INLINE, with new() delegating as new_bounded::<ENCODE_INLINE>(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all proto_encode_len_delimited_unknown_size! macro invocations with ProtoBufferInline::encode_len_delimited() method calls, which auto-select the narrowest placeholder width based on remaining buffer space. Remove the _unknown_size, _small, and _large macro variants. Retain proto_encode_len_delimited_of_size! only for tests needing explicit width control. Add test_encode_len_delimited_auto_width to verify width selection at boundary values (127 vs 128 byte buffers). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When encoding multiple attributes from a tracing event into a bounded inline buffer, a single large value used to be able to consume all remaining space, dropping every subsequent attribute entirely. Now each attribute is allocated at most half of the currently-remaining buffer space, so a long sequence of large values truncates progressively rather than blowing out the buffer on the first one. String attributes additionally use a new truncating encoder (`encode_string_truncating`, `encode_len_delimited_partial`) so that oversize values are clipped with the existing `[...]` suffix and still appear (in truncated form) in the OTLP output, while still being counted in `dropped_attributes_count`. For a 256-byte inline buffer with 4 attributes whose values are 1000 bytes each, all four values are now preserved in truncated form with lengths roughly [118, 55, 23, 7] bytes and `dropped_attributes_count` is 4. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…corruption Two related fixes for visibility into truncated/dropped attributes when log encoding overflows the inline ENCODE_INLINE buffer: 1. encoder.rs: wrap encode_body_string and encode_body_debug in try_encode. The previous code used encode_len_delimited directly, which on overflow leaves an unpatched length placeholder followed by partial body content bytes in the inline buffer. That malformed tail caused the parser to misread subsequent fields appended by encode_log_record (notably field 7 dropped_attributes_count), so downstream views always reported 0 drops. With try_encode, partial body bytes are rolled back cleanly on overflow, leaving the buffer well-formed for later appends. 2. formatter.rs: render dropped_attributes_count as a ' (N dropped)' suffix in write_body_and_attrs when it is non-zero, so operators can see when fair-budget halving caused truncation/drops. Adds a tracing_init regression test that drives many oversized attributes through the full ITS encode path (encode_export_logs_request) and parses them back via RawLogsData, confirming the dropped count round-trips through encode/parse. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds focused unit tests in pdata/otlp/common.rs for the transactional
and bounded encoding mechanisms introduced for self-tracing:
- try_encode_rolls_back_buffer_on_error: bytes are byte-identical
before and after a failing closure.
- encode_len_delimited_does_not_patch_on_inner_err: documents the
trap (unpatched placeholder + partial bytes) that the body-encoder
fix in the previous commit had to work around, and shows the safe
pattern of wrapping in try_encode.
- encode_len_delimited_partial_patches_length_on_err: partial-mode
produces a valid LEN-prefixed wire field even when the inner
closure returns Err, so subsequent fields parse correctly.
- encode_string_truncating_three_outcomes: Ok(false) full fit,
Ok(true) truncated with [...] suffix, Err(Dropped) hard fail
leaves buffer unchanged.
- encode_string_truncating_round_trips_via_prost: truncated bytes
decode as a valid AnyValue ending in the suffix.
- truncate_at_utf8_boundary_does_not_panic: encoding 4-byte UTF-8
characters near the truncation boundary is safe.
- with_max_remaining_{restores_outer_limit,nests_correctly,
saturates_and_honors_outer}: scoped limit lifecycle, nested
composition, and saturating math against the outer limit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the per-line stack buffer fills up mid-format, the underlying Cursor write_all returns Err(WriteZero) and the terminating '\n' is silently dropped (`let _ = self.write_all(b"\\n")`). The next log line is appended directly after, producing garbled output like '…RESOURCE[02026-04-27T15:48:52…│ SCOPE[2026-04-27T15:48:52…'. Add StyledBufWriter::finish_line which always terminates the line: if the buffer is already full, back up one byte and overwrite with '\n' rather than losing the newline. format_line_impl now calls finish_line for both log lines and RESOURCE/SCOPE header lines. Includes a unit test covering the overflow, normal, and exactly-full cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ckProtoBuffer The self_tracing log encoding path used SmallVec<[u8; 256]> for inline storage, which incurs a per-call inline-vs-spilled branch on every push and caused a ~4.5x regression on the encode hot path. Refactor splits storage into two concrete types sharing encoding logic via a BoundedBuf trait with default methods: - ProtoBuffer: Vec-backed, used for unbounded (or large-capacity) encodes. New with_capacity_and_limit() pre-allocates to the bound so the bounded log path performs at most one terminal alloc. - StackProtoBuffer<const N: usize>: array-backed, fully stack-resident, zero-alloc; preserves the raw_error! zero-alloc guarantee. All pdata encoders, the self_tracing encoder, and the OTLP exporters are migrated from <const INLINE: usize> + ProtoBufferInline<INLINE> to <B: BoundedBuf> generics. ProtoBufferInline is removed. Bench results vs upstream/main (self_tracing realistic): new_record 163 -> 170 us (+4%) encode_proto 177 -> 191 us (+8%) encode_proto_with_scope 194 -> 230 us (+19%) encode_and_format 799 -> 540 us (-32%) format_with_entity 851 -> 548 us (-36%) Formatter benches now beat upstream because the bounded encode path avoids upstream's allocate-then-truncate work at format time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Buf imports After the BoundedBuf trait refactor, callers holding a concrete ProtoBuffer needed to import the trait just to call len()/clear()/ is_empty(). Restore parity with the upstream public API by exposing these as inherent methods that forward to the underlying Vec. This also lets the self_tracing benchmark file compile unchanged on both upstream/main and this branch (cross-branch perf comparison). Removes now-unused 'use crate::otlp::common::BoundedBuf' from decoder.rs, encode/mod.rs, two common.rs tests, and both OTLP exporters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (82.27%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2792 +/- ##
===========================================
- Coverage 86.09% 82.27% -3.82%
===========================================
Files 694 181 -513
Lines 263421 52878 -210543
===========================================
- Hits 226789 43505 -183284
+ Misses 36108 8849 -27259
Partials 524 524
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Summary
Makes protocol buffer generation limited. Use a stack-allocated array (256 bytes) for logged attributes, a fixed constant.
Replaces
try_intofor OtapPayloads withtry_into_with_optionsandtry_into_with_defaultto apply default settings. The default size is large to avoid surprises (256MiB, the hard max implied by 4-byte placeholders). In a future PR we can tighten this and add configurable limits that apply by default anywhere OTLP protos are generated in a pipeline.Implements
dropped_attributesaccounting for records the internal logging pipeline.Implements string truncation for string-valued fields, with
[...]inserted if possible. Truncated fields are counted in dropped_attributes. This applies a measure of fairness to the fields of a log record as follows: when the string value is inserted it is allowed to use up to 50% of the remaining bytes or be truncated.What issue does this PR close?
Fixes #1746
Part of #2725
How are these changes tested?
A new benchmark was added, a heavy self-tracing workload with 10 string attributes and body, deliberately overflowing the 256B inline budget. The old code would re-allocate a Vec to log this event, the new code does not.
new_recordencode_protoencode_proto_with_scopeencode_and_formatformat_with_entityThere is some cost for protocol writers, this is the cost of safety. For console writers, this is a win because we use a stack-allocated buffer for the attributes encoding. Note that
raw_error!is now allocation-free.Are there any user-facing changes?
Yes. We begin to observe dropped logs attributes instead of permitting very large log records in the internal logs pipeline.