Skip to content

Conversation

aaronsteers
Copy link
Contributor

This PR targets the following PR:


feat(cloud): Add log reading capabilities to SyncAttempt

Summary

Adds two new methods to the SyncAttempt class to enable log retrieval from Cloud sync attempts:

  • get_log_text_tail(num_lines: int = 1000) -> str: Returns the last N lines of log text
  • get_full_log_text() -> str: Returns complete log text for the attempt

Both methods use the existing Config API integration (_fetch_attempt_info()) and handle both structured (LogEvents) and unstructured (LogRead) log formats returned by the /v1/attempts/get_for_job endpoint.

Review & Testing Checklist for Human

  • Test with real log data: Verify the methods work with actual Cloud sync attempts and return properly formatted logs
  • Verify API response format: Confirm the Config API actually returns logs in "events" and "logLines" fields as expected from the schema analysis
  • Test edge cases: Try with failed attempts, attempts with no logs, very large log files, and attempts with different log formats
  • Performance testing: Check behavior with large log files to ensure get_full_log_text() doesn't cause memory issues

Notes

  • Implementation follows existing PyAirbyte patterns using lazy loading via _fetch_attempt_info()
  • Field names (events vs logLines) were verified against the Config API OpenAPI schema
  • Could not test locally due to environment dependency issues (missing 'six' module in pandas chain)
  • Uses the established _make_config_api_request() pattern with proper noqa comments

Link to Devin run: https://app.devin.ai/sessions/946366cdd71140c88397678ef6fb16fd
Requested by: @aaronsteers

- Add get_log_text_tail() method for retrieving last N lines of logs
- Add get_full_log_text() method for complete log retrieval
- Use Config API /v1/attempts/get_for_job endpoint for log access
- Handle both structured (LogEvents) and unstructured (LogRead) formats
- Fix field names to match Config API schema (events vs logEvents)
- Follow existing PyAirbyte patterns for lazy loading and caching

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

Original prompt from AJ Steers
@Devin - In PyAirbyte, we have a method to check sync status for a Cloud sync. However, I don't think we have a way to pull logs. Can you confirm, and then, can you check if there are any API options for pulling back logs using either the public Airbyte REST API or the unofficial Config API (internal, but still reachable).
Thread URL: https://airbytehq-team.slack.com/archives/D089P0UPVT4/p1757542709231429?thread_ts=1757542709.231429

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1757984248-sync-attempt-log-operations' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1757984248-sync-attempt-log-operations'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Comment on lines 185 to 194
def get_log_text_tail(self, num_lines: int = 1000) -> str:
"""Return the last N lines of log text for this attempt.
Args:
num_lines: Maximum number of lines to return from the end of the logs.
Defaults to 1000 lines.
Returns:
String containing the last N lines of log text, with lines separated by newlines.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't add anything meaningful. Let's remove this method.

- Address GitHub comment from @aaronsteers
- Keep only get_full_log_text() method for complete log retrieval
- Simplify SyncAttempt log interface to single method

Co-Authored-By: AJ Steers <[email protected]>
Copy link

PyTest Results (Fast Tests Only, No Creds)

301 tests  ±0   300 ✅ ±0   4m 32s ⏱️ +3s
  1 suites ±0     1 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 4d07482. ± Comparison against base commit ddd22e6.

@aaronsteers aaronsteers merged commit 9cf622f into devin/1757543890-sync-attempt-abstraction Sep 16, 2025
15 of 16 checks passed
@aaronsteers aaronsteers deleted the devin/1757984248-sync-attempt-log-operations branch September 16, 2025 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant