-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Fix broken companies/scroll pagination #69151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
👋 Welcome to Airbyte!Thank you for your contribution from iacobus/airbyte! We're excited to have you in the Airbyte community. Helpful Resources
PR Slash CommandsAs needed or by request, Airbyte Maintainers can execute the following slash commands on your PR:
If you have any questions, feel free to ask in the PR comments or join our Slack community. Tips for Working with CI
|
The companies/scroll API actually provides the same token in every page, so the previous mechanism of stopping pagination when the same token was observed was always limiting results to 2 pages (200 records). Instead, rely on receiving an empty `data` array, which is what occurs when the previous page was the last one. Doing that actually exposed a new issue. Every page now is requested with exactly the same URL. It appears that the HttpRequester instance is created with use_cache=True, so starting on page 3, every response is returned from cache, with always the same records from page 2, which creates an infinite loop due to never receiving an empty array. To fix this, this commit creates a NoCacheRequester that forces use_cache=False for this particular endpoint.
3245546 to
cfb4034
Compare
|
Hi team Airbyte, Please have a look at this fix for the currently broken The fix is tested end to end on a local installation, plus using Docker to execute the source in isolation. My guess is that the infinite-pagination behavior addressed by turning off HttpRequester cache is related to #58638. I'm not sure if other Intercom APIs behave like this companies/scroll. This is the one that we perceived as obviously broken in our Cloud production usage of Airbyte. Happy to support getting this across the line as soon as possible since it's blocking one of our product features. |
What
The companies sync is currently only syncing as many as 200 results due to broken pagination. The previous code made the incorrect assumption that each page would return a different token, but the behavior is the opposite.
How
The companies/scroll API actually provides the same token in every page, so the previous mechanism of stopping pagination when the same token was observed was always limiting results to 2 pages (200 records). Instead, rely on receiving an empty
dataarray, which is what occurs when the previous page was the last one.Doing that actually exposed a new issue. Every page now is requested with exactly the same URL. It appears that the HttpRequester instance is created with use_cache=True, so starting on page 3, every response is returned from cache, with always the same records from page 2, which creates an infinite loop due to never receiving an empty array. To fix this, this commit creates a NoCacheRequester that forces use_cache=False for this particular endpoint.
I tested these changes running Airbyte locally (using
abctl), plus executing the source in isolation via Docker command.Review guide
The description above seems sufficient to understand the problem and the solution. The Intercom docs describe the behavior of the scroll parameter well too.
User Impact
This fixes the companies sync, which is otherwise limited to 200 results (broken).
Can this PR be safely reverted and rolled back?