Skip to content

Conversation

@yuzawa-san
Copy link
Contributor

@yuzawa-san yuzawa-san commented Mar 21, 2025

fixes #44

add some reasonable limits to the ranges. i am open to picking different limits. typically these ranges are used for vendor lists which i believe to have fewer entries than the limits i chose.

4202 https://vendor-list.consensu.org/v2/vendor-list.json
1358 https://vendor-list.consensu.org/v2/ca/vendor-list.json

i chose a limit of 8192 which is roughly 2x the number of current extant vendor id's per https://tools.iabtechlab.com/transparencycenter/explorer/business/gpp

i also added a reasonable limit to the fibonacci decoder to block very long strings, which may generate large numbers or possibly overflow int32.

@jdwieland8282
Copy link

I have a couple clarifying questions before providing feedback.

  1. The assumption we are making is that the gvl will never exceed 8192 entries, and if it does this max needs to be updated?
  2. Is it possible for < 8192 vendors to overflow the int32 type?
  3. Is this a limit (pagination) on vendor-list.json and ca/vendor-list.json, or a limit on the size of the GPP string itself?

If we're going to cap something we should be certain it can't break in some other way.

@yuzawa-san
Copy link
Contributor Author

@jdwieland8282

  1. if significantly more vendors/entities are added then yes this value would need to be increased. while the original GH issue was about detecting malformed inputs, it is possible that organically the list could grow and due to inaction/neglect/poor timing this does not get patched in time. how about a compromise in which instead of throwing exceptions like i currently do, we simply bail from the encoder, print a warning, and leave the caller with whatever vendors have accumulated already. so if the consent string had 8190,8191,8192,8193 and the decoder would return 8190,8191 along with a log warning. this would at least mitigate the existing of a day in which all of the old vendors fail to be parsed.
  2. the risk of overflow was not related to the number of vendors and that limit, rather it is related to the fibonacci decoding. a very long string could overflow an int32.
  3. to the best of my knowledge those json files are not paginated

@jdwieland8282
Copy link

Got it thanks @yuzawa-san. I like your compromise.

In the future when vendor 8192 is added to the gvl, we'll log a warning and someone will have to open an issue to have the the limit config (assuming that exists) updated?

@yuzawa-san
Copy link
Contributor Author

that is correct. i have pushed the compromise solution.

@bmayd
Copy link

bmayd commented Apr 25, 2025

In the future when vendor 8192 is added to the gvl, we'll log a warning and someone will have to open an issue to have the the limit config (assuming that exists) updated?

Is there something we could do to give the folks maintaining the libraries a heads up when the gvl is getting near the limit, maybe add something that checked the gvl database periodically and started warning when we were getting near the limit? Logging warnings is a rather subtle way of communicating what could be mighty difficult for the owner of entry 8192 to track down.

@fanwu
Copy link

fanwu commented Apr 30, 2025

Confirmed in Slack that 8192 will give us 5+ years before IAB TCF GVL out grows from current 4000. Then it looks good to use 8192. Rowena Lam asked me to give my input so please consider this as approved. (Don't have permission to approve PRs). Thanks.

@yuzawa-san
Copy link
Contributor Author

@bmayd maybe a github action? looks like that MSPA list needs a token, so i dont have permission to add that

@fanwu please feel free to give your approval even though you are not a repo owner

Copy link

@scothren scothren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for your work on this!

I agree with the comment from @bmayd about having something to notify us if that external list is reaching our limit so we can be proactive about increasing it.

@aitnitishshelage aitnitishshelage merged commit 1439300 into IABTechLab:master Jun 9, 2025
iabmayank pushed a commit that referenced this pull request Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

java.lang.OutOfMemoryError: Java heap space when decoding non GPP string

7 participants