Partition based on signal timestamp, not ingestion time

### Component(s)

exporter/awss3

### Is your feature request related to a problem? Please describe.

### Current behavior

Currently the awss3exporter uses the collector's clock to determine the s3 object's key based on the `s3uploader.s3_partition_format` config (see `now` in line 88):

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/3a7efaccb2442bd2d380a8ed7e7b71142bc7a811/exporter/awss3exporter/internal/upload/writer.go#L86-L92

### Why it's problematic for my use-case
I have a use-case, where the difference between _ingestion time_ and the _timestamp of the ingested signals_ can be pretty huge, given that the producer is a mobile app with disk buffering enabled. Sometimes my devices go offline/are turned off for longer periods of time and then transmit their telemetry data only a couple of days later.

The lack of guarantees for ingestion times totally mess up my partitions, I have no way to know, in which objects I will find the telemetry about a chosen time period, it could be at any partition with timestamps after the investigated interval. As a result, I cannot query my logs efficiently.

### Describe the solution you'd like

I would expect to find in the partition `2026/01/28/` data that was _generated_ (and not _ingested_) on 28 January.

### Describe alternatives you've considered

#### Proposal 1: take the first time-stamp in the batch

This would be the least intrusive change, because the logs could be batched using `batchperresourceattr` the same way as they currently are:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f2c49523308ed022a23396047b465f3600420815/exporter/awss3exporter/factory.go#L96

This approach would not be perfect, if I partition by days for example, some batches might span over midnight, meaning that the records of the batch happening after midnight would be put in the wrong partition. However, the assumption is that the timestamps are pretty close to each other within a batch, so despite this imperfection the partitioning would work much better for my use-case than the current behavior.

#### Proposal 2: batch based on partitions

This would mean that instead of using the `batchperresourceattr` package

#### Backward compatibility in both cases

Regardless of the approach chosen, I'd introduce a new config parameter to make the new behavior transparently configurable (e.g. `s3uploader.partition_by` with values `ingestion`, `batch_timestamp`). IMO the new setting (`batch_timestamp`) is a better choice for default value, but I'm interested in hearing different opinions.

### Additional context

I am trying to access the logs with [Amazon Athena](https://aws.amazon.com/athena/), but this issue prevents me from being able to create efficient queries based on the chosen time interval.

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

	uploadInput := &s3.PutObjectInput{
	Bucket: aws.String(overrideBucket),
	Key: aws.String(sw.builder.Build(now, overridePrefix)),
	Body: content,
	StorageClass: sw.storageClass,
	ACL: sw.acl,
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition based on signal timestamp, not ingestion time #45691

Component(s)

Is your feature request related to a problem? Please describe.

Current behavior

Why it's problematic for my use-case

Describe the solution you'd like

Describe alternatives you've considered

Proposal 1: take the first time-stamp in the batch

Proposal 2: batch based on partitions

Backward compatibility in both cases

Additional context

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Partition based on signal timestamp, not ingestion time #45691

Description

Component(s)

Is your feature request related to a problem? Please describe.

Current behavior

Why it's problematic for my use-case

Describe the solution you'd like

Describe alternatives you've considered

Proposal 1: take the first time-stamp in the batch

Proposal 2: batch based on partitions

Backward compatibility in both cases

Additional context

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions