fix: Correctly limit pushed dataset items in PPE-aware mode#570
fix: Correctly limit pushed dataset items in PPE-aware mode#570
Conversation
| { | ||
| pricingModel: 'PAY_PER_EVENT', | ||
| pricingPerEvent: { | ||
| actorChargeEvents: { |
There was a problem hiding this comment.
Let's do one more test where apify-default-dataset-item is in the pricing. The question is what SDK should do if user has that and does pushData(data, 'my-custom-event') and has both events defined. I think the correct is to charge both apify-default-dataset-item and my-custom-event
| unknown | ||
| >, | ||
| > extends DatasetClient<Data> { | ||
| private normalizeItems( |
There was a problem hiding this comment.
I will try to look for a way we could skip this extra JSON.parse but I don't have anything on my mind. I'm bit worried about potential perf implications here. But we can do it with a TODO later (and test first).
There was a problem hiding this comment.
Well worry no more, here's an actual performance problem 🙂 https://github.com/apify/apify-client-js/blob/master/src/resource_clients/dataset.ts#L285-L285 - if we pass an array of objects to the dataset client, it will run it through a much slower validation code path. I wonder if that's the reason we bother with the stringification.
I guess we can just re-stringify the items before calling super.pushItems though, right?
There was a problem hiding this comment.
Uh so that's another thing but it was there probably since inception.
I think the stringification in Crawlee storage is simply that we already have to do it anyway to check the item size to decide if/how we need to chunk it so it fits the 9MB API limit. And since we already have it stringified, we send it along because otherwise the HTTP client does that.
In ideal case, there is just a single JSON.stringify. With your PR, we now have stringify -> parse -> stringify. But maybe it would need deeper refactor.
There was a problem hiding this comment.
One ugly perf workaround would be to only do this parse/stringify in case of apify-default-dataset-item is present, otherwise we don't need to use the intercepted client. Then it will hit smaller portion of Actors. Just an option, not convinced it is worth it.
There was a problem hiding this comment.
I actually like that idea, let's go with that.
7f4e892 to
84c84db
Compare
Uh oh!
There was an error while loading. Please reload this page.