Skip to content

Improve pagination handling in Confluence API client #3321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

WildDogOne
Copy link

@WildDogOne WildDogOne commented Mar 22, 2025

Closes #3320

The code is here to enhance the pagination handling in case the /rest/content/search API is used since there is no next link in that api.
I would be happy if someone could check if my logic is somewhat acceptable.
There is also a trade off right now since there seems to be no direct tracking of which data has already been processed in the script itself, so every incremental sync will pull all data again, but of course then only process the new files.
This means, if someone set's a filter which is extremely general, this could result in a huge amount of API calls

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

@seanstory
Copy link
Member

buildkite test this

Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting these! Would you be willing to add some unit tests as well, to exercise your changes and demonstrate the situations where you hit them?

except Exception as exception:
self._logger.warning(
f"Skipping data for type {url_name} from {url}. Exception: {exception}."
f"Skipping data for type {url_name} from {base_url}. Exception: {exception}."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd actually still want to log out url, since it would let us know which "page" of results had an issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only reason I switched this to base_url is because it's possible that url does not exist if try step fails (or at least so I think it could happen)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Ok, hadn't caught this, but I think url needs to be defined outside of the loop to start. That way we can reference it here, and we also can be sure that it's not re-set on each iteration (see my most recent comment)

@seanstory
Copy link
Member

Also, the linter is failing. You can autoformat your code with make autoformat

@artem-shelkovnikov
Copy link
Member

@WildDogOne when is the /rest/content/search API used? What's the best way to test your change against a real Confluence?

@WildDogOne
Copy link
Author

@artem-shelkovnikov In the connector when you use an advanced filter in sync rules. For Example:

[
  {
    "query": "type in ('page', 'attachment') AND Space=BLABLA AND (created >= now('-1y') OR lastModified >= now('-1y'))"
  }
]

This will then need to use /search because there is a CQL Query to be run on ingest, which makes more sense compared to using the post filter

@seanstory
Copy link
Member

buildkite test this

while True:
try:
url = f"{base_url}&start={start}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line probably needs to be outside of the while True, otherwise it'll overwrite the last iteration's attempts to set the URL to the "next" link, if that strategy was used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confluence Connector Pagination
3 participants