Skip to content

Gaudi: add CI #3160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft

Gaudi: add CI #3160

wants to merge 14 commits into from

Conversation

baptistecolle
Copy link
Collaborator

@baptistecolle baptistecolle commented Apr 10, 2025

What does this PR do?

This PR adds CI support for the Gaudi backend. It includes an integration test that starts the model "meta-llama/Llama-3.1-8B-Instruct", performs a few requests, and verifies that the outputs match the expected results.

Additional models are also supported, but running tests for all of them is quite slow, so they are not included in the CI by default. However, instructions on how to run the integration tests for all supported models have been added to the Gaudi backend README.

@baptistecolle baptistecolle requested review from Narsil and regisss April 22, 2025 09:56
@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Apr 22, 2025

I’ll wait for the Gaudi integration test CI to pass before merging anything:
https://github.com/huggingface/text-generation-inference/actions/runs/14591230970/job/40927197928?pr=3160

The previous run was green, which gives me confidence in the current changes:
https://github.com/huggingface/text-generation-inference/actions/runs/14384130453/job/40336095297

Unfortunately, it can take days to get assigned a Gaudi1 runner 😭, so I figured I could start iterating on your reviews in the meantime rather than wait for the CI to finish before requesting feedback. In any case, I’ll only merge once the Gaudi integration test passes in the CI also

@baptistecolle baptistecolle marked this pull request as ready for review April 22, 2025 10:01
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

We should soon have access to Gaudi2 and Gaudi3 ephemeral runners on demand, which will makes things much easier than waiting for a DL1 instance. I suggest we wait for this to be available to update and merge this PR.

@baptistecolle
Copy link
Collaborator Author

Ok, I will wait for the new runners before adding Gaudi to the CI, as indeed the DL1 runners are super unreliable

@baptistecolle baptistecolle marked this pull request as draft April 23, 2025 07:42
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants