Improve pagination handling in Confluence API client #3321

WildDogOne · 2025-03-22T20:32:46Z

Closes #3320

The code is here to enhance the pagination handling in case the /rest/content/search API is used since there is no next link in that api.
I would be happy if someone could check if my logic is somewhat acceptable.
There is also a trade off right now since there seems to be no direct tracking of which data has already been processed in the script itself, so every incremental sync will pull all data again, but of course then only process the new files.
This means, if someone set's a filter which is extremely general, this could result in a huge amount of API calls

Checklists

Pre-Review Checklist

…r logging

seanstory · 2025-03-28T19:24:46Z

buildkite test this

seanstory

Thanks for submitting these! Would you be willing to add some unit tests as well, to exercise your changes and demonstrate the situations where you hit them?

seanstory · 2025-03-28T19:29:48Z

connectors/sources/confluence.py

            except Exception as exception:
                self._logger.warning(
-                    f"Skipping data for type {url_name} from {url}. Exception: {exception}."
+                    f"Skipping data for type {url_name} from {base_url}. Exception: {exception}."


I think we'd actually still want to log out url, since it would let us know which "page" of results had an issue.

the only reason I switched this to base_url is because it's possible that url does not exist if try step fails (or at least so I think it could happen)

Ah! Ok, hadn't caught this, but I think url needs to be defined outside of the loop to start. That way we can reference it here, and we also can be sure that it's not re-set on each iteration (see my most recent comment)

connectors/sources/confluence.py

seanstory · 2025-03-28T19:39:32Z

Also, the linter is failing. You can autoformat your code with make autoformat

artem-shelkovnikov · 2025-03-31T08:44:57Z

@WildDogOne when is the /rest/content/search API used? What's the best way to test your change against a real Confluence?

WildDogOne · 2025-03-31T14:13:38Z

@artem-shelkovnikov In the connector when you use an advanced filter in sync rules. For Example:

[
  {
    "query": "type in ('page', 'attachment') AND Space=BLABLA AND (created >= now('-1y') OR lastModified >= now('-1y'))"
  }
]

This will then need to use /search because there is a CQL Query to be run on ingest, which makes more sense compared to using the post filter

Co-authored-by: Sean Story <[email protected]>

seanstory · 2025-03-31T14:31:01Z

buildkite test this

seanstory · 2025-03-31T20:16:23Z

connectors/sources/confluence.py

        while True:
            try:
+                url = f"{base_url}&start={start}"


This line probably needs to be outside of the while True, otherwise it'll overwrite the last iteration's attempts to set the URL to the "next" link, if that strategy was used.

Improve pagination handling in Confluence API client and enhance erro…

1f37bb0

…r logging

WildDogOne requested a review from a team as a code owner March 22, 2025 20:32

github-actions bot added auto-backport v9.1.0 labels Mar 22, 2025

artem-shelkovnikov added the community-driven label Mar 24, 2025

Merge branch 'main' into main

983a68d

seanstory reviewed Mar 28, 2025

View reviewed changes

Update connectors/sources/confluence.py

db10dda

Co-authored-by: Sean Story <[email protected]>

seanstory reviewed Mar 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve pagination handling in Confluence API client #3321

Improve pagination handling in Confluence API client #3321

Uh oh!

WildDogOne commented Mar 22, 2025 •

edited

Loading

Uh oh!

seanstory commented Mar 28, 2025

Uh oh!

seanstory left a comment

Uh oh!

seanstory Mar 28, 2025

Uh oh!

WildDogOne Mar 31, 2025

Uh oh!

seanstory Mar 31, 2025

Uh oh!

Uh oh!

seanstory commented Mar 28, 2025

Uh oh!

artem-shelkovnikov commented Mar 31, 2025

Uh oh!

WildDogOne commented Mar 31, 2025

Uh oh!

seanstory commented Mar 31, 2025

Uh oh!

seanstory Mar 31, 2025

Uh oh!

Uh oh!

Improve pagination handling in Confluence API client #3321

Are you sure you want to change the base?

Improve pagination handling in Confluence API client #3321

Uh oh!

Conversation

WildDogOne commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Closes #3320

Checklists

Pre-Review Checklist

Uh oh!

seanstory commented Mar 28, 2025

Uh oh!

seanstory left a comment

Choose a reason for hiding this comment

Uh oh!

seanstory Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

WildDogOne Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

seanstory Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seanstory commented Mar 28, 2025

Uh oh!

artem-shelkovnikov commented Mar 31, 2025

Uh oh!

WildDogOne commented Mar 31, 2025

Uh oh!

seanstory commented Mar 31, 2025

Uh oh!

seanstory Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WildDogOne commented Mar 22, 2025 •

edited

Loading