Skip to content

fix(github): Fix Integration Tests#5485

Merged
justin-tahara merged 1 commit intomainfrom
jtahara/blacksmith-updates-for-backend-image
Sep 25, 2025
Merged

fix(github): Fix Integration Tests#5485
justin-tahara merged 1 commit intomainfrom
jtahara/blacksmith-updates-for-backend-image

Conversation

@justin-tahara
Copy link
Contributor

@justin-tahara justin-tahara commented Sep 25, 2025

Description

[Provide a brief description of the changes in this PR]
There have been some recent issues with the integration tests due to the blacksmith registry having caching issues. We are disabling the caching temporarily in order to unblock the CI.

Context for the issue can be found here: https://onyx-company.slack.com/archives/C09DHFK3220/p1758734368330299

How Has This Been Tested?

[Describe the tests you ran to verify your changes]
Tested by running the CI on a different PR with these same exact changes.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@justin-tahara justin-tahara requested a review from Weves September 25, 2025 02:28
@justin-tahara justin-tahara requested a review from a team as a code owner September 25, 2025 02:28
@vercel
Copy link

vercel bot commented Sep 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 25, 2025 2:29am

@justin-tahara justin-tahara merged commit 7580178 into main Sep 25, 2025
26 of 29 checks passed
@justin-tahara justin-tahara deleted the jtahara/blacksmith-updates-for-backend-image branch September 25, 2025 02:30
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR addresses integration test reliability issues by disabling Docker BuildKit caching in the Blacksmith registry infrastructure. The changes add no-cache: true to all Docker build steps and explicit platform specification (--platform linux/arm64) to prevent stale cache issues that were causing test failures.

Key changes:

  • Added no-cache: true to backend, model server, and integration test image builds
  • Added outputs: type=registry for consistent registry output format
  • Added explicit --platform linux/arm64 to all Docker pull commands
  • Applied identical changes to both regular and MIT integration test workflows

The changes are minimal, targeted, and address the specific caching problems mentioned in the PR description. This is a reasonable temporary workaround while the underlying registry caching issues are resolved.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it only disables caching as a temporary fix
  • Score reflects that this is a low-risk infrastructure fix targeting specific CI/CD issues with no code logic changes, well-tested approach, and clear temporary nature
  • No files require special attention - both workflow files contain identical, straightforward configuration changes

Important Files Changed

File Analysis

Filename        Score        Overview
.github/workflows/pr-integration-tests.yml 5/5 Disabled Docker BuildKit caching and added explicit platform specification to fix integration test reliability issues
.github/workflows/pr-mit-integration-tests.yml 5/5 Applied identical caching disable changes to MIT edition workflow for consistency

Sequence Diagram

sequenceDiagram
    participant Developer
    participant GitHub as GitHub Actions
    participant Registry as Blacksmith Registry
    participant BuildKit as Docker BuildKit
    participant Runner as Test Runner

    Developer->>GitHub: Push PR to main branch
    GitHub->>Registry: Login to private registry
    
    Note over GitHub,BuildKit: Build Phase (with no-cache: true)
    GitHub->>BuildKit: Build backend image with no-cache
    BuildKit-->>Registry: Push backend image (no cache reuse)
    GitHub->>BuildKit: Build model server image with no-cache
    BuildKit-->>Registry: Push model server image (no cache reuse)  
    GitHub->>BuildKit: Build integration test image with no-cache
    BuildKit-->>Registry: Push integration test image (no cache reuse)
    
    Note over GitHub,Runner: Test Execution Phase
    GitHub->>Registry: Pull images with explicit --platform linux/arm64
    Registry-->>GitHub: Return fresh images (no stale cache)
    GitHub->>Runner: Start Docker containers with fresh images
    Runner->>Runner: Execute integration tests
    Runner-->>GitHub: Return test results
    
    Note over GitHub: Clean up
    GitHub->>GitHub: Stop containers and clean up
    GitHub->>Developer: Report test results
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@blacksmith-sh
Copy link

blacksmith-sh bot commented Sep 25, 2025

3 Jobs Failed:

Run Integration Tests v2 / integration-tests (tests/api_key, tests-api_key) failed on "Wait for service to be ready"
[...]
  File "/app/onyx/setup.py", line 169, in setup_onyx
    warm_up_bi_encoder(
  File "/app/onyx/natural_language_processing/search_nlp_models.py", line 1107, in warm_up_bi_encoder
    retry_encode(texts=[warm_up_str], text_type=EmbedTextType.QUERY)
  File "/app/onyx/natural_language_processing/search_nlp_models.py", line 1069, in wrapper
    raise Exception(f"All retries failed: {exceptions}")
Exception: All retries failed: [HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327b7e850>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327ba2f50>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bb1210>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bb3610>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bb9f90>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bc8a50>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bcb450>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bd5e50>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327be4890>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327be7250>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bedc90>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bf86d0>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327bfb0d0>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a09b10>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a1c550>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a1ef50>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a29990>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a343d0>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a36410>: Failed to establish a new connection: [Errno 111] Connection refused'))"), HTTPError("Request failed: HTTPConnectionPool(host='inference_model_server', port=9000): Max retries exceeded with url: /encoder/bi-encoder-embed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffe327a38e50>: Failed to establish a new connection: [Errno 111] Connection refused'))")]

ERROR:    Application startup failed. Exiting.
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Service not ready yet (HTTP status 000curl_error). Retrying in 5 seconds...
Timeout reached. Service did not become ready in 5 minutes.
Error: Process completed with exit code 1.
Run Integration Tests v2 / required failed on "Run actions/github-script@v7"
[...]
  retry-exempt-status-codes: 400,401,403,404,422
env:
  PRIVATE_REGISTRY: experimental-registry.blacksmith.sh:5000
  PRIVATE_REGISTRY_USERNAME: ***
  PRIVATE_REGISTRY_PASSWORD: ***
  OPENAI_API_KEY: ***
  SLACK_BOT_TOKEN: ***
  CONFLUENCE_TEST_SPACE_URL: ***
  CONFLUENCE_USER_NAME: ***
  CONFLUENCE_ACCESS_TOKEN: ***
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: ***/onyx
Error: One or more upstream jobs failed or were cancelled.

1 job failed running on non-Blacksmith runners.


Summary: 8 successful workflows, 2 failed workflows

Last updated: 2025-09-25 03:02:22 UTC

razvanMiu pushed a commit to eea/danswer that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants