Description
Currently, the aggregation service only allows each 'shared ID' to be present in one query. A set of reports with the same shared ID cannot be split for separate queries, even if the resulting batches are disjoint.
One option to add more flexibility is to support an optional, custom field (a ‘label’) that is factored into the shared ID generation. We could consider a few different options:
- Putting the field in the shared_info: The reporting origin would be able to easily split reports into separate batches based on the label. However, this approach would require the label to be set outside the isolated (Shared Storage or Protected Audience) context. It also would require the report to be deterministic similar to the context ID, i.e. sending a null report if no contributions are made. This approach is therefore unlikely to work for Protected Audience bidders (see related discussion) and could increase the number of reports sent.
- Putting the field in the payload: This avoids the deterministic report requirement and would allow the label to be based on cross-site data, i.e. set from inside the isolated contexts. But, this also prevents the reporting origin from directly determining the label embedded in the report. The reporting origin may therefore have to send a larger number of reports to the aggregation service and ask it to filter based on a given set of labels. For certain use cases, the reporting origin may be able to maintain a context ID to label mapping that would avoid this increased scale, albeit less ergonomically than above.
- Allowing bucket range filtering: Instead of using an explicit label, we could allow filtering based on a range of buckets, with budget only used for that range. This could be more flexible but also increases the complexity of the Aggregation Service’s privacy budgeting implementation.
- A combination of the above: We could implement multiple of the above options and allow them to be used together or in different situations.
For all of the above approaches, we’ll also need a mechanism to limit the scale impact on the Privacy Budget Service. For example, we want to prevent developers from specifying a unique ‘label’ per report. There are a few options we could consider, including:
- The Aggregation Service could limit the number of labels/bucket ranges or shared IDs per query
- We could limit the space of allowed labels/bucket ranges directly, e.g. only allowing integer labels up to a maximum value.
This functionality would also be useful for the Attribution Reporting API, so we may want to align on an approach. (For example, bucket range filtering has been proposed earlier.) Note that Attribution Reporting does not currently support making deterministic reports.