Long intervals during resource iteration can lead to issues

Hello.

Recently there was this issue https://github.com/scrapinghub/python-scrapinghub/issues/121 for which a batch read workaround was implemented. I am now experiencing from what I believe to be same or similar issue but now while using JSON instead of msgpack. Basically when I do `for item in job.items.iter(..., count=X, ...):` if there are long intervals during iteration the count can end up being ignored. I was able to reproduce it with the following snippet:

```python
sh_client = ScrapinghubClient(APIKEY, use_msgpack=False)
take = 10_000
job_id = '168012/276/1'
for i, item in enumerate(sh_client.get_job(job_id).items.iter(count=take, meta='_key')):
    print(f'\r{i} ({item["_key"]})', end='')

    if i == 3000:
        print('\nsleeping')
        time.sleep(60*3)
    
    if i > take:
        print('\nWTF')
        break
```
With the sleep part removed the WTF section does not fire and the iterator stops on 168012/276/1/9999th item.

This seem to be more of a ScrapyCloud API platform problem but I am reporting it here to track nonetheless.

For now I am assuming resource/collections iteration is not robust if any delays are possible client side during retrieval (I haven't tested any other potential issues) and I will try either preloading all at once (`.list()`) or using `.list_iter()` when makes sense as a habit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Long intervals during resource iteration can lead to issues #141

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long intervals during resource iteration can lead to issues #141

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions