-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix iterate
and greedyIterate
methods
#16
base: dev/0.6.0
Are you sure you want to change the base?
Conversation
while (currentPage.items.length > 0) { | ||
totalItems += currentPage.items.length; | ||
for (const item of currentPage.items) { | ||
yield item; | ||
} | ||
|
||
offset += pageSize; | ||
currentPage = await this.superClient.listItems({ offset, limit: pageSize }); | ||
currentOffset += pageSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to update currentOffset
with how many things has actually been returned to us, instead of how many items we asked for - if this is called while the run is still in progress we may see the following situation: pageSize === 10
, currentPage.items.length === 2
, but before the next iteration 10 more items has been added to the dataset, so we output another 2 items, not knowing we missed 8.
offset += pageSize; | ||
currentPage = await this.superClient.listItems({ offset, limit: pageSize }); | ||
currentOffset += pageSize; | ||
currentPage = await this.superClient.listItems({ offset: currentOffset, limit: pageSize }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it on purpose we don't forward listItemOptions
here?
let currentOffset = listItemOptions.offset ?? 0; | ||
let currentPage = await this.superClient.listItems({ | ||
...listItemOptions, | ||
offset: currentOffset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue for creating a const offset = () => totalItems + (listItemOptions.offset ?? 0)
such we have an offset which always reflects how many elements we have already seen.
while (runStatus && ['READY', 'RUNNING'].includes(runStatus)) { | ||
const datasetIterator = this.iterate({ ...iterateOptions, offset: currentOffset }); | ||
for await (const item of datasetIterator) { | ||
currentOffset++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this, and postponed adding my review till I had let it settle a bit. I ended up having to do pretty much the same when using the library to do "reasonable segmentation". I think this shows that our abstraction is leaky - whether there will actually be an await before we get to the next element. I think it would be better if we just returned AsyncGenerator<T[], void, void>
, only yielding if there actually were any elements. I think iterate
should have the same signature, even though it may often be used without a page size such it only actually yields once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would simplify both this code, and most (if not all) usages
I would argue that it doesn't really matter to us whether the run status is up to date - as long as it is not ahead of time (and as long as we can assume it at least has gone to the |
BREAKING CHANGES:
itemsThreshold
was removed.FIXES:
offset
is actually respected;greedyIterate
does not check anymore for the items count, but tries to download the remaining items at each iteration, removing the racing condition, due to the fact that the item count may be outdated.TODO:
greedyIterate
tests