-
Notifications
You must be signed in to change notification settings - Fork 64
fix: Correctly limit pushed dataset items in PPE-aware mode #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| import { Actor, log } from 'apify'; | ||
|
|
||
| const actor = new Actor(); | ||
|
|
||
| await actor.init(); | ||
|
|
||
| const chargeResult = await actor.pushData( | ||
| new Array(100).fill({ hello: 'world' }), | ||
| 'result', | ||
| ); | ||
|
|
||
| log.info(`Charged: ${JSON.stringify(chargeResult)}`); | ||
|
|
||
| await actor.exit(); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| import assert from 'node:assert/strict'; | ||
| import test from 'node:test'; | ||
|
|
||
| import { ApifyClient } from 'apify'; | ||
| import { sleep } from 'crawlee'; | ||
|
|
||
| const client = new ApifyClient({ | ||
| token: process.env.APIFY_TOKEN, | ||
| }); | ||
|
|
||
| const actor = client.actor(process.argv[2]); | ||
| await actor.update({ | ||
| pricingInfos: [ | ||
| { | ||
| pricingModel: 'PAY_PER_EVENT', | ||
| pricingPerEvent: { | ||
| actorChargeEvents: { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's do one more test where |
||
| 'apify-actor-start': { | ||
| eventTitle: 'Actor start', | ||
| eventPriceUsd: 1, | ||
| eventDescription: | ||
| 'Charged automatically at the start of the run', | ||
| }, | ||
| result: { | ||
| eventTitle: 'Result', | ||
| eventPriceUsd: 0.01, | ||
| eventDescription: 'Single dataset item', | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }); | ||
|
|
||
| const runActor = async (input, options) => { | ||
| const { id: runId } = await actor.call(input, options); | ||
| await client.run(runId).waitForFinish(); | ||
| await sleep(6000); // wait for updates to propagate to MongoDB | ||
| return await client.run(runId).get(); | ||
| }; | ||
|
|
||
| test('unlimited push', async () => { | ||
| const run = await runActor({}, { maxTotalChargeUsd: 10 }); | ||
|
|
||
| assert.strictEqual(run.status, 'SUCCEEDED'); | ||
| assert.deepEqual(run.chargedEventCounts, { | ||
| 'apify-actor-start': 1, | ||
| result: 100, | ||
| }); | ||
| }); | ||
|
|
||
| test('charge limit works as intended', async () => { | ||
| const run = await runActor({}, { maxTotalChargeUsd: 1.5 }); | ||
|
|
||
| assert.strictEqual(run.status, 'SUCCEEDED'); | ||
| assert.deepEqual(run.chargedEventCounts, { | ||
| 'apify-actor-start': 1, | ||
| result: 50, | ||
| }); | ||
|
|
||
| const { items } = await client | ||
| .dataset(run.defaultDatasetId) | ||
| .listItems({ limit: 100 }); | ||
|
|
||
| assert.strictEqual( | ||
| items.length, | ||
| 50, | ||
| `default items (${items.length}) != 50`, | ||
| ); | ||
| }); | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| import { Actor, log } from 'apify'; | ||
|
|
||
| const actor = new Actor(); | ||
|
|
||
| await actor.init(); | ||
|
|
||
| const chargeResult = await actor.pushData( | ||
| new Array(100).fill({ hello: 'world' }), | ||
| 'result', | ||
| ); | ||
|
|
||
| log.info(`Charged: ${JSON.stringify(chargeResult)}`); | ||
|
|
||
| await actor.exit(); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| import assert from 'node:assert/strict'; | ||
| import test from 'node:test'; | ||
|
|
||
| import { ApifyClient } from 'apify'; | ||
| import { sleep } from 'crawlee'; | ||
|
|
||
| const client = new ApifyClient({ | ||
| token: process.env.APIFY_TOKEN, | ||
| }); | ||
|
|
||
| const actor = client.actor(process.argv[2]); | ||
| await actor.update({ | ||
| pricingInfos: [ | ||
| { | ||
| pricingModel: 'PAY_PER_EVENT', | ||
| pricingPerEvent: { | ||
| actorChargeEvents: { | ||
| 'apify-actor-start': { | ||
| eventTitle: 'Actor start', | ||
| eventPriceUsd: 1, | ||
| eventDescription: | ||
| 'Charged automatically at the start of the run', | ||
| }, | ||
| 'apify-default-dataset-item': { | ||
| eventTitle: 'Dataset item', | ||
| eventPriceUsd: 0.005, | ||
| eventDescription: 'Default charge per dataset item', | ||
| }, | ||
| result: { | ||
| eventTitle: 'Result', | ||
| eventPriceUsd: 0.01, | ||
| eventDescription: 'Custom charge per result', | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }); | ||
|
|
||
| const runActor = async (input, options) => { | ||
| const { id: runId } = await actor.call(input, options); | ||
| await client.run(runId).waitForFinish(); | ||
| await sleep(9000); // wait for updates to propagate to MongoDB | ||
| return await client.run(runId).get(); | ||
| }; | ||
|
|
||
| test('pushData charges both apify-default-dataset-item and custom event', async () => { | ||
| const run = await runActor({}, { maxTotalChargeUsd: 10 }); | ||
|
|
||
| assert.strictEqual(run.status, 'SUCCEEDED'); | ||
| assert.deepEqual(run.chargedEventCounts, { | ||
| 'apify-actor-start': 1, | ||
| 'apify-default-dataset-item': 100, | ||
| result: 100, | ||
| }); | ||
| }); | ||
|
|
||
| test('charge limit applies to combined cost of both events', async () => { | ||
| // Budget: $1.50 total | ||
| // apify-actor-start: 1 × $1 = $1 | ||
| // Remaining: $0.50 | ||
| // Each pushData item costs $0.005 (default) + $0.01 (result) = $0.015 | ||
| // Items within budget: floor($0.50 / $0.015) = 33 | ||
| const run = await runActor({}, { maxTotalChargeUsd: 1.5 }); | ||
|
|
||
| assert.strictEqual(run.status, 'SUCCEEDED'); | ||
| assert.deepEqual(run.chargedEventCounts, { | ||
| 'apify-actor-start': 1, | ||
| 'apify-default-dataset-item': 33, | ||
| result: 33, | ||
| }); | ||
|
|
||
| const { items } = await client | ||
| .dataset(run.defaultDatasetId) | ||
| .listItems({ limit: 100 }); | ||
|
|
||
| assert.strictEqual( | ||
| items.length, | ||
| 33, | ||
| `default items (${items.length}) != 33`, | ||
| ); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to look for a way we could skip this extra JSON.parse but I don't have anything on my mind. I'm bit worried about potential perf implications here. But we can do it with a TODO later (and test first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well worry no more, here's an actual performance problem 🙂 https://github.com/apify/apify-client-js/blob/master/src/resource_clients/dataset.ts#L285-L285 - if we pass an array of objects to the dataset client, it will run it through a much slower validation code path. I wonder if that's the reason we bother with the stringification.
I guess we can just re-stringify the items before calling
super.pushItemsthough, right?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh so that's another thing but it was there probably since inception.
I think the stringification in Crawlee storage is simply that we already have to do it anyway to check the item size to decide if/how we need to chunk it so it fits the 9MB API limit. And since we already have it stringified, we send it along because otherwise the HTTP client does that.
In ideal case, there is just a single JSON.stringify. With your PR, we now have stringify -> parse -> stringify. But maybe it would need deeper refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One ugly perf workaround would be to only do this parse/stringify in case of
apify-default-dataset-itemis present, otherwise we don't need to use the intercepted client. Then it will hit smaller portion of Actors. Just an option, not convinced it is worth it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like that idea, let's go with that.