feat: Adopt encoding extension streaming support for AWS Logs encoding extension by Kavindu-Dodan · Pull Request #45804 · open-telemetry/opentelemetry-collector-contrib

Kavindu-Dodan · 2026-02-02T20:17:33Z

Description

Adopting encoder extension stream decoding added through #45567

All sub log types under AWS Logs Encoding extension now support streaming.

The table below summarizes streaming support details for each log type, along with the offset tracking mechanism,

Log Type	Sub Log Type/Source	Offset Tracking
CloudTrail	Generic records	Number of records processed
CloudTrail	Digest record	Always 0 (full payload processed)
ELB Access Logs	ALB/NLB/CLB	Bytes processed
Network Firewall	Alert/Flow/TLS	Bytes processed
S3 Access Logs	-	Bytes processed
Subscription filter	-	Always 0 (full payload processed)
VPC Flow Logs	S3 plain text	Bytes processed
VPC Flow Logs	CloudWatch subscription filter	Always 0 (full payload processed)
WAF Logs	-	Bytes processed

Testing

Dedicated unit tests to validate streaming behaviour.

Documentation

Code docs & usage pattern through tests

### Description This PR introduce interfaces for end-to-end streaming support for encoding extensions. This PR contains, - Introduce streaming contracts for encoding extensions - Abstracted experimental helpers for streaming with a dedicated module `pkg/xstreamencoding` - To prove concept, adopting streaming support for `textencodingextension` The streaming interface contract is as below, ```go // LogsDecoder unmarshals logs from a stream, returning one batch per DecodeLogs call. type LogsDecoder interface { // DecodeLogs is expected to be called iteratively to read all derived plog.Logs batches from the stream. // The last batch of logs should be returned with a nil error. io.EOF error should follow on the subsequent call. DecodeLogs() (plog.Logs, error) // OffSet returns the current offset read from the stream. // The exact meaning of the offset may vary by decoder (e.g. bytes, lines, records). // You may use this value with WithOffset option to resume reading from the same offset when retrying after a failure. Offset() int64 } // LogsDecoderExtension is an extension that unmarshals logs from a stream. type LogsDecoderExtension interface { extension.Extension NewLogsDecoder(reader io.Reader, options ...DecoderOption) (LogsDecoder, error) } // MetricsDecoder unmarshals metrics from a stream, returning one batch per DecodeMetrics call. type MetricsDecoder interface { // DecodeMetrics is expected to be called iteratively to read all derived pmetric.Metrics batches from the stream. // The last batch of metrics should be returned with a nil error. io.EOF error should follow on the subsequent call. DecodeMetrics() (pmetric.Metrics, error) // OffSet returns the current offset read from the stream. // The exact meaning of the offset may vary by decoder (e.g. bytes, lines, records). // You may use this value with WithOffset option to resume reading from the same offset when retrying after a failure. Offset() int64 } // MetricsDecoderExtension is an extension that unmarshals metrics from a stream. type MetricsDecoderExtension interface { extension.Extension NewMetricsDecoder(reader io.Reader, options ...DecoderOption) (MetricsDecoder, error) } ``` ### Why Streaming ? The need of streaming arises when dealing with blobs. For example, consider S3 objects as a signal source. These blobs can be large. If there is no end to end streaming, then OTel collector sourcing & decoding signals from blobs requires high memory allocation. With end-to-end streaming, the need for high memory goes away. And it allows OTel collector to process signals in batches and emit them to downstream consumers. ```mermaid flowchart TB subgraph WITHOUT_STREAMING["Without Streaming "] S3A["S3 Blob"] OTelA["OTel Collector - Load entire blob into memory"] ConsumerA["Downstream Consumer"] S3A --> OTelA --> ConsumerA end subgraph WITH_STREAMING["With Streaming"] S3B["S3 Blob"] OTelB["OTel Collector - Stream & process batches"] Batch1["Batch 1"] Batch2["Batch 2"] Batch3["Batch 3"] ConsumerB["Downstream Consumer"] S3B --> OTelB OTelB --> Batch1 --> ConsumerB OTelB --> Batch2 --> ConsumerB OTelB --> Batch3 --> ConsumerB end ``` ### Proof of concept: OTel collector as a Lambda Setup using combined changes of this PR, AWS logs adoption - #45804 & AWS metrics adoptions - #45805 - **Setup** : A collector deployment in AWS Lambda with [AWS Lambda Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/awslambdareceiver) - **Challenge** : AWS CloudTrail logs are stored as a JSON. Internally there's a record array that fits streaming solutionProcess. Further, AWS Lambda has defined memory limits. In the POC this is set to 512MB - **Test input**: Predictable S3 blobs with CloudTrail logs from [Data-gen](https://github.com/Kavindu-Dodan/data-gen) I ran the test round with CloudTrail logs of 30MB each file. This translates to around ~30K trails. #### Without streaming With no end-to-end streaming, Lambda's max memory consumption is at **360MB/512MB**. <details> <summary> Non streaming memory consumption</summary> <img width="401" height="547" alt="image" src="https://github.com/user-attachments/assets/9d647057-928f-4be2-b7b1-1d8bbd164541" /> </details> #### With streaming - using components from this PR With end to end streaming, memory consumption max was around **189MB/512MB** This leaves space for spikes which can go upto 50MB per known CloudTrail S3 file limits. <details> <summary> Streaming memory consumption</summary> <img width="401" height="547" alt="image" src="https://github.com/user-attachments/assets/a6f3103b-0375-40c4-85bc-dbdf2f32ba19" /> </details> #### POC conclusions - Streaming consumed 48% less memory - This is compromising 14% of CPU runtime (this is the tradeoff) Above shows why streaming is important and can be a good tradeoff to save costs. ### Link to tracking issue Fixes #38780 #### Testing Updated unit tests ### Documentation Code level documentation is added with example usage pattern for `textencodingextension` ___ If reviewing, recommends checking out and see details (`gh pr checkout 45567`) --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

MichaelKatsoulis · 2026-02-16T12:49:51Z

Thanks a lot @Kavindu-Dodan for this. Your code is very consistent along all formats!

MichaelKatsoulis · 2026-02-16T12:50:52Z

+	return xstreamencoding.NewLogsDecoderAdapter(decodeF, offsetF), nil
+}
+
 func (v *vpcFlowLogUnmarshaler) unmarshalPlainTextLogs(reader io.Reader) (plog.Logs, error) {


Is this dead code? I don't see it being used anymore

Thank you. Yes this is no longer used by the interface api contract. However, this was used by the becnhmark test. I migrated benchmark test to use the new implementation & removed this with commit e564f38

axw · 2026-02-18T07:15:21Z

@Kavindu-Dodan it's going to take me a very long time to get through a PR this big. Can you please break it up into a PR per format?

MichaelKatsoulis · 2026-02-18T15:10:35Z

@Kavindu-Dodan it's going to take me a very long time to get through a PR this big. Can you please break it up into a PR per format?

I feel this can be an overkill because there are 7 different formats.
Anyway I believe that all the formats follow the same pattern and that it makes sense to be in one PR.

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # extension/encoding/awslogsencodingextension/go.mod # extension/encoding/awslogsencodingextension/internal/unmarshaler/subscription-filter/unmarshaler.go

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

Kavindu-Dodan · 2026-02-18T21:00:33Z

@Kavindu-Dodan it's going to take me a very long time to get through a PR this big. Can you please break it up into a PR per format?

I feel this can be an overkill because there are 7 different formats. Anyway I believe that all the formats follow the same pattern and that it makes sense to be in one PR.

I am of the same opinion of having 7 PRs is tiresome. Also, we have two signals at metrics (PR - #45805) which means a split by signal brings total PR count to 9.

Anyway the core changes are only at unmarshaler.go for each signal. The rest is test additions to validate streaming & supporting test data.

Also, if we has to go with split PRs, this will delay the AWS Lambda adoption (PR - #46188 )

axw · 2026-02-19T02:27:31Z

The reality is I will not be able to review a ~3500 line diff any time soon, and definitely less soon than reviewing 7 targeted PRs.

MichaelKatsoulis · 2026-02-19T09:01:53Z

@axw Ok then! I opened PR for CloudTrail and more will follow

Kavindu-Dodan · 2026-02-19T21:51:20Z

@axw @MichaelKatsoulis I have broken this down to 8 individual PRs

Introduce Streaming contract - [extension/awslogs_encoding] introduce streaming contract for AWS logs encoding extension #46211

This is the base contract without any signal adoption. We can review and agree on the appraoch.

Based on the streaming contract PR, I have added follow up signal adoptions,

CloudTrail - [extension/awslogs_encoding] implement streaming contract for CloudTrail logs #46212
ELB logs - [extension/awslogs_encoding] implement streaming contract for ELB logs #46213
NW firewall logs - [extension/awslogs_encoding] implement streaming contract for Network Firewall logs #46217
S3 access logs - [extension/awslogs_encoding] implement streaming contract for S3 access logs #46218
CW subscription filter - [extension/awslogs_encoding] implement streaming contract for CloudWatch subscription filter #46220
VPC flow logs - [extension/awslogs_encoding] implement streaming contract for VPC flow logs #46221
WAF logs - [extension/awslogs_encoding] implement streaming contract for WAF logs #46223

…telemetry#45567) ### Description This PR introduce interfaces for end-to-end streaming support for encoding extensions. This PR contains, - Introduce streaming contracts for encoding extensions - Abstracted experimental helpers for streaming with a dedicated module `pkg/xstreamencoding` - To prove concept, adopting streaming support for `textencodingextension` The streaming interface contract is as below, ```go // LogsDecoder unmarshals logs from a stream, returning one batch per DecodeLogs call. type LogsDecoder interface { // DecodeLogs is expected to be called iteratively to read all derived plog.Logs batches from the stream. // The last batch of logs should be returned with a nil error. io.EOF error should follow on the subsequent call. DecodeLogs() (plog.Logs, error) // OffSet returns the current offset read from the stream. // The exact meaning of the offset may vary by decoder (e.g. bytes, lines, records). // You may use this value with WithOffset option to resume reading from the same offset when retrying after a failure. Offset() int64 } // LogsDecoderExtension is an extension that unmarshals logs from a stream. type LogsDecoderExtension interface { extension.Extension NewLogsDecoder(reader io.Reader, options ...DecoderOption) (LogsDecoder, error) } // MetricsDecoder unmarshals metrics from a stream, returning one batch per DecodeMetrics call. type MetricsDecoder interface { // DecodeMetrics is expected to be called iteratively to read all derived pmetric.Metrics batches from the stream. // The last batch of metrics should be returned with a nil error. io.EOF error should follow on the subsequent call. DecodeMetrics() (pmetric.Metrics, error) // OffSet returns the current offset read from the stream. // The exact meaning of the offset may vary by decoder (e.g. bytes, lines, records). // You may use this value with WithOffset option to resume reading from the same offset when retrying after a failure. Offset() int64 } // MetricsDecoderExtension is an extension that unmarshals metrics from a stream. type MetricsDecoderExtension interface { extension.Extension NewMetricsDecoder(reader io.Reader, options ...DecoderOption) (MetricsDecoder, error) } ``` ### Why Streaming ? The need of streaming arises when dealing with blobs. For example, consider S3 objects as a signal source. These blobs can be large. If there is no end to end streaming, then OTel collector sourcing & decoding signals from blobs requires high memory allocation. With end-to-end streaming, the need for high memory goes away. And it allows OTel collector to process signals in batches and emit them to downstream consumers. ```mermaid flowchart TB subgraph WITHOUT_STREAMING["Without Streaming "] S3A["S3 Blob"] OTelA["OTel Collector - Load entire blob into memory"] ConsumerA["Downstream Consumer"] S3A --> OTelA --> ConsumerA end subgraph WITH_STREAMING["With Streaming"] S3B["S3 Blob"] OTelB["OTel Collector - Stream & process batches"] Batch1["Batch 1"] Batch2["Batch 2"] Batch3["Batch 3"] ConsumerB["Downstream Consumer"] S3B --> OTelB OTelB --> Batch1 --> ConsumerB OTelB --> Batch2 --> ConsumerB OTelB --> Batch3 --> ConsumerB end ``` ### Proof of concept: OTel collector as a Lambda Setup using combined changes of this PR, AWS logs adoption - open-telemetry#45804 & AWS metrics adoptions - open-telemetry#45805 - **Setup** : A collector deployment in AWS Lambda with [AWS Lambda Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/awslambdareceiver) - **Challenge** : AWS CloudTrail logs are stored as a JSON. Internally there's a record array that fits streaming solutionProcess. Further, AWS Lambda has defined memory limits. In the POC this is set to 512MB - **Test input**: Predictable S3 blobs with CloudTrail logs from [Data-gen](https://github.com/Kavindu-Dodan/data-gen) I ran the test round with CloudTrail logs of 30MB each file. This translates to around ~30K trails. #### Without streaming With no end-to-end streaming, Lambda's max memory consumption is at **360MB/512MB**. <details> <summary> Non streaming memory consumption</summary> <img width="401" height="547" alt="image" src="https://github.com/user-attachments/assets/9d647057-928f-4be2-b7b1-1d8bbd164541" /> </details> #### With streaming - using components from this PR With end to end streaming, memory consumption max was around **189MB/512MB** This leaves space for spikes which can go upto 50MB per known CloudTrail S3 file limits. <details> <summary> Streaming memory consumption</summary> <img width="401" height="547" alt="image" src="https://github.com/user-attachments/assets/a6f3103b-0375-40c4-85bc-dbdf2f32ba19" /> </details> #### POC conclusions - Streaming consumed 48% less memory - This is compromising 14% of CPU runtime (this is the tradeoff) Above shows why streaming is important and can be a good tradeoff to save costs. ### Link to tracking issue Fixes open-telemetry#38780 #### Testing Updated unit tests ### Documentation Code level documentation is added with example usage pattern for `textencodingextension` ___ If reviewing, recommends checking out and see details (`gh pr checkout 45567`) --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

Kavindu-Dodan mentioned this pull request Feb 2, 2026

Introduce end to end streaming support for encoding extensions #45567

Merged

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch from cb2fde8 to acd6f3e Compare February 6, 2026 03:08

andykellr reviewed Feb 6, 2026

View reviewed changes

Comment thread extension/encoding/awslogsencodingextension/internal/unmarshaler/cloudtraillog/unmarshaler.go Outdated

andykellr reviewed Feb 6, 2026

View reviewed changes

Comment thread extension/encoding/awslogsencodingextension/internal/unmarshaler/cloudtraillog/unmarshaler.go Outdated

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch 6 times, most recently from fd6b745 to 48892ff Compare February 10, 2026 22:06

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch 2 times, most recently from 97bbda6 to 3db2d81 Compare February 12, 2026 17:36

Kavindu-Dodan requested review from andykellr and axw February 12, 2026 17:41

Kavindu-Dodan marked this pull request as ready for review February 12, 2026 17:42

Kavindu-Dodan requested a review from a team as a code owner February 12, 2026 17:42

github-actions bot assigned ArthurSens Feb 12, 2026

github-actions bot added the extension/encoding/awslogsencoding label Feb 12, 2026

github-actions bot requested a review from constanca-m February 12, 2026 17:42

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch 3 times, most recently from 7fd3015 to eb3b5f0 Compare February 12, 2026 18:18

Kavindu-Dodan changed the title ~~[WIP] Adopt end to end streaming for AWS Logs encoding~~ feat: Adopt encoding extension streaming support for AWS Logs encoding extension Feb 12, 2026

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch 2 times, most recently from 9a34b03 to faefb0e Compare February 13, 2026 15:44

MichaelKatsoulis reviewed Feb 16, 2026

View reviewed changes

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch from faefb0e to 0b0b22a Compare February 17, 2026 19:26

Kavindu-Dodan requested a review from MichaelKatsoulis February 17, 2026 19:58

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch 2 times, most recently from bc5a6f5 to 4f0d8b8 Compare February 17, 2026 21:48

Intrduce streaming support for aws logs

5179451

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # extension/encoding/awslogsencodingextension/go.mod # extension/encoding/awslogsencodingextension/internal/unmarshaler/subscription-filter/unmarshaler.go

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch from 4f0d8b8 to 097ebf0 Compare February 18, 2026 15:26

review changes: remove non api bound method

9497a7b

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

Kavindu-Dodan force-pushed the feat/streaming-for-aws-logs branch from 097ebf0 to 9497a7b Compare February 18, 2026 16:03

Kavindu-Dodan mentioned this pull request Feb 18, 2026

[WIP] feat: Adopt AWS encoding extension streaming for AWS Lambda receiver #46188

Closed

Kavindu-Dodan closed this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adopt encoding extension streaming support for AWS Logs encoding extension#45804

feat: Adopt encoding extension streaming support for AWS Logs encoding extension#45804
Kavindu-Dodan wants to merge 2 commits intoopen-telemetry:mainfrom
Kavindu-Dodan:feat/streaming-for-aws-logs

Kavindu-Dodan commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

MichaelKatsoulis commented Feb 16, 2026

Uh oh!

MichaelKatsoulis Feb 16, 2026

Uh oh!

Kavindu-Dodan Feb 17, 2026

Uh oh!

axw commented Feb 18, 2026

Uh oh!

MichaelKatsoulis commented Feb 18, 2026

Uh oh!

Kavindu-Dodan commented Feb 18, 2026

Uh oh!

axw commented Feb 19, 2026

Uh oh!

MichaelKatsoulis commented Feb 19, 2026

Uh oh!

Kavindu-Dodan commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Kavindu-Dodan commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Documentation

Uh oh!

Uh oh!

Uh oh!

MichaelKatsoulis commented Feb 16, 2026

Uh oh!

MichaelKatsoulis Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Kavindu-Dodan Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

axw commented Feb 18, 2026

Uh oh!

MichaelKatsoulis commented Feb 18, 2026

Uh oh!

Kavindu-Dodan commented Feb 18, 2026

Uh oh!

axw commented Feb 19, 2026

Uh oh!

MichaelKatsoulis commented Feb 19, 2026

Uh oh!

Kavindu-Dodan commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kavindu-Dodan commented Feb 2, 2026 •

edited

Loading