Description
The purpose and use-cases of the new component
AWS has various services and formats for logs, including CloudWatch, CloudTrail, CloudFront, VPC Flow Logs, and more. Some of these logs can be stored/delivered in multiple ways, e.g. through Amazon Data Firehose, Amazon Kinesis Data Streams, Lambda, and S3.
Currently, the awsfirehose receiver has built-in support for receiving CloudWatch Logs using the log group subscription filter method, e.g. see https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample
The log subscription format is not specific to Firehose, so I would like to extract it into an encoding extension so it may be developed independently of the Firehose receiver. For example, this would enable it to be used with the S3 receiver by configuring a subscription filter to send to Firehose with an S3 destination.
Rather than creating an encoding extension dedicated to just the CloudWatch log group subscription filter format, I propose we have an extension for decoding AWS logs in general. It would be possible to configure the extension with the specific format, such as CloudWatch log group subscription filter, CloudTrail, etc. By having them co-located, we can avoid proliferation of extensions, and better ensure consistency across the formats (e.g. set cloud.*
SemConv fields consistently).
We would start by extracting the code at https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/awsfirehosereceiver/internal/unmarshaler/cwlog and add more formats over time.
Example configuration for the component
extensions:
awslogs_encoding/cloudwatch:
format: "cloudwatch_logs_subscription_filter"
receivers:
awsfirehose:
endpoint: :1234
encoding: awslogs_encoding/cloudwatch
Telemetry data types supported
logs
Code Owner(s)
Sponsor (optional)
Additional context
No response