Skip to content

Error handling missing from all GH API calls #529

@tonisyvanen

Description

@tonisyvanen

Describe the bug

TF-via-PR is not resilient against failures of the GH API. For example, I have witnessed multiple timeouts when the TF-via-PR code tries to download artifacts or post results to the PR's.

Not having a retry logic inside TF-via-PR is problematic if using plan-parity checks during the Terraform apply workflow. As the plan will be stale by then. So you can't retry running the whole workflow.

To Reproduce

Reproducing can be challenging to do deterministically. A timeout can happen at any place where gh api is called.

Expected behavior

All gh api calls and any other external dependency should be wrapped in a retry and backoff logic, giving TF-via-PR better resiliency.

Additional context

So far, I have seen the timeout happen in multiple locations whenever TF-via-PR calls gh api. Here are some example logs:

Run op5dev/tf-via-pr@v13
Run # Check for required tools.
Run # Populate variables.
Run # Unique identifier.
Get "https://api.github.com/repos/XXXXXX/XXXXXX/pulls?per_page=100": dial tcp 140.82.116.5:443: i/o timeout
Error: Process completed with exit code 1.

and

Run # Post output.
Patch "https://api.github.com/repos/XXXX/XXXXXXXX/check-runs/XXXXXXXX": dial tcp 140.82.116.6:443: i/o timeout
Error: Process completed with exit code 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions