Skip to content

Conversation

ddynwzh1992
Copy link

@ddynwzh1992 ddynwzh1992 commented May 26, 2025

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction. When we triage the issues, we will add labels to the issue like "Enhancement", "Bug" which should indicate to you that this issue can be worked on and we are looking forward to your PR. We would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

add a new ray service for Graviton based model inference with llamacpp

Motivation

provide a new cost effective solution for CPU based model inference on EKS

More

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

Copy link
Contributor

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the PR! Could you please create an issue first before we move forward with this?
This will need the following changes:

1/ Use the JARK stack as a prerequisite for deploying this model
2/ Ensure the Karpenter Nodepool is available to Graviton instances and use these labels in the Ray deployment. See this x86 example

x86-cpu-karpenter = {

3/ Create a folder called "Graviton" here: https://github.com/awslabs/ai-on-eks/tree/main/website/docs/blueprints/inference
4/ Move the README.md to this folder and make sure the documentation follows the same format as other examples like: https://awslabs.github.io/ai-on-eks/docs/blueprints/inference/GPUs/vLLM-rayserve

Let me know once you've got the issue created and we can track the progress there!

@omrishiv
Copy link
Contributor

@vara-bonthu this was missing from the split from DoEKS, it's already in there, but agree with the comments

@ddynwzh1992
Copy link
Author

@vara-bonthu I have created the issue , and move the doc to a new folder named Graviton

Copy link
Contributor

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR with comments. I would need your help to fix a few more things before approving the PR.

num-cpus: "29"
template:
spec:
nodeSelector:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any Karpenter nodepool that creates this node for you. You may need to create a new nodepool for ARM. See this example for x86 https://github.com/awslabs/ai-on-eks/blob/e45a5a71702c5a04c27ac9cca72a510001580a7b/infra/base/terraform/addons.tf#L483C1-L537C6

and create a new one for ARM.

@@ -0,0 +1,325 @@
import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Does this blueprint require custom benchmarking code or have we explored if we can use open source benchmarking tools? I am only thinking about the maintainability of this Go code in the longer term.

@@ -0,0 +1,74 @@
# Cost effective and Scalable Model Inference on AWS Graviton with Ray on EKS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this doc as an example to write this doc. You need to add sidebar_label etc.

You can test deploy this website locally with simple steps and modify the doc. Checkout the instructions https://github.com/awslabs/ai-on-eks/blob/main/website/README.md

@ddynwzh1992
Copy link
Author

Removed the sidebar_position

Copy link
Contributor

@omrishiv omrishiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vara-bonthu this looks good to me, can you re-review please? thank you

@omrishiv
Copy link
Contributor

omrishiv commented Aug 6, 2025

I know this was started way before we had the inference charts, but ultimately, that's where this deployment will move to. I will +1 the comment #88 (comment) that when we have benchmarking available for the inference charts, this will probably go away.

@omrishiv omrishiv mentioned this pull request Oct 2, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants