Skip to content

Conversation

@iacobus
Copy link

@iacobus iacobus commented Nov 3, 2025

What

The companies sync is currently only syncing as many as 200 results due to broken pagination. The previous code made the incorrect assumption that each page would return a different token, but the behavior is the opposite.

How

The companies/scroll API actually provides the same token in every page, so the previous mechanism of stopping pagination when the same token was observed was always limiting results to 2 pages (200 records). Instead, rely on receiving an empty data array, which is what occurs when the previous page was the last one.

Doing that actually exposed a new issue. Every page now is requested with exactly the same URL. It appears that the HttpRequester instance is created with use_cache=True, so starting on page 3, every response is returned from cache, with always the same records from page 2, which creates an infinite loop due to never receiving an empty array. To fix this, this commit creates a NoCacheRequester that forces use_cache=False for this particular endpoint.

I tested these changes running Airbyte locally (using abctl), plus executing the source in isolation via Docker command.

Review guide

The description above seems sufficient to understand the problem and the solution. The Intercom docs describe the behavior of the scroll parameter well too.

User Impact

This fixes the companies sync, which is otherwise limited to 200 results (broken).

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

👋 Welcome to Airbyte!

Thank you for your contribution from iacobus/airbyte! We're excited to have you in the Airbyte community.

Helpful Resources

PR Slash Commands

As needed or by request, Airbyte Maintainers can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
  • /run-connector-tests - Runs connector tests.
  • /run-cat-tests - Runs CAT tests.
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).

If you have any questions, feel free to ask in the PR comments or join our Slack community.

Tips for Working with CI

  1. Pre-Release Checks. Please pay attention to these, as they contain standard checks on the metadata.yaml file, docs requirements, etc. If you need help resolving a pre-release check, please ask a maintainer.
    • Note: If you are creating a new connector, please be sure to replace the default logo.svg file with a suitable icon.
  2. Connector CI Tests. Some failures here may be expected if your tests require credentials. Please review these results to ensure (1) unit tests are passing, if applicable, and (2) integration tests pass to the degree possible and expected.
  3. (Optional.) BYO Connector Credentials for tests in your fork. You can optionally set up your fork with BYO credentials for your connector. This can significantly speed up your review, ensuring your changes are fully tested before the maintainers begin their review.

📝 Edit this welcome message.

The companies/scroll API actually provides the same token
in every page, so the previous mechanism of stopping pagination
when the same token was observed was always limiting results to
2 pages (200 records). Instead, rely on receiving an empty `data`
array, which is what occurs when the previous page was the last one.

Doing that actually exposed a new issue. Every page now is requested
with exactly the same URL. It appears that the HttpRequester instance
is created with use_cache=True, so starting on page 3, every response
is returned from cache, with always the same records from page 2, which
creates an infinite loop due to never receiving an empty array. To fix
this, this commit creates a NoCacheRequester that forces use_cache=False
for this particular endpoint.
@iacobus
Copy link
Author

iacobus commented Nov 3, 2025

Hi team Airbyte,

Please have a look at this fix for the currently broken companies sync in source-intercom. I'm honestly not sure what's going on with the integration tests, nor I feel confident tweaking versions.

The fix is tested end to end on a local installation, plus using Docker to execute the source in isolation.

My guess is that the infinite-pagination behavior addressed by turning off HttpRequester cache is related to #58638.

I'm not sure if other Intercom APIs behave like this companies/scroll. This is the one that we perceived as obviously broken in our Cloud production usage of Airbyte.

Happy to support getting this across the line as soon as possible since it's blocking one of our product features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants