-
Notifications
You must be signed in to change notification settings - Fork 52
add a new ray service for Graviton based model inference with llamacpp #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising the PR! Could you please create an issue first before we move forward with this?
This will need the following changes:
1/ Use the JARK stack as a prerequisite for deploying this model
2/ Ensure the Karpenter Nodepool is available to Graviton instances and use these labels in the Ray deployment. See this x86 example
ai-on-eks/infra/base/terraform/addons.tf
Line 483 in e77964b
x86-cpu-karpenter = { |
3/ Create a folder called "Graviton" here: https://github.com/awslabs/ai-on-eks/tree/main/website/docs/blueprints/inference
4/ Move the README.md to this folder and make sure the documentation follows the same format as other examples like: https://awslabs.github.io/ai-on-eks/docs/blueprints/inference/GPUs/vLLM-rayserve
Let me know once you've got the issue created and we can track the progress there!
@vara-bonthu this was missing from the split from DoEKS, it's already in there, but agree with the comments |
@vara-bonthu I have created the issue , and move the doc to a new folder named Graviton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the PR with comments. I would need your help to fix a few more things before approving the PR.
num-cpus: "29" | ||
template: | ||
spec: | ||
nodeSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any Karpenter nodepool that creates this node for you. You may need to create a new nodepool for ARM. See this example for x86 https://github.com/awslabs/ai-on-eks/blob/e45a5a71702c5a04c27ac9cca72a510001580a7b/infra/base/terraform/addons.tf#L483C1-L537C6
and create a new one for ARM.
@@ -0,0 +1,325 @@ | |||
import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: Does this blueprint require custom benchmarking code or have we explored if we can use open source benchmarking tools? I am only thinking about the maintainability of this Go code in the longer term.
@@ -0,0 +1,74 @@ | |||
# Cost effective and Scalable Model Inference on AWS Graviton with Ray on EKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use this doc as an example to write this doc. You need to add sidebar_label
etc.
You can test deploy this website locally with simple steps and modify the doc. Checkout the instructions https://github.com/awslabs/ai-on-eks/blob/main/website/README.md
Removed the sidebar_position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vara-bonthu this looks good to me, can you re-review please? thank you
I know this was started way before we had the inference charts, but ultimately, that's where this deployment will move to. I will +1 the comment #88 (comment) that when we have benchmarking available for the inference charts, this will probably go away. |
What does this PR do?
🛑 Please open an issue first to discuss any significant work and flesh out details/direction. When we triage the issues, we will add labels to the issue like "Enhancement", "Bug" which should indicate to you that this issue can be worked on and we are looking forward to your PR. We would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.
add a new ray service for Graviton based model inference with llamacpp
Motivation
provide a new cost effective solution for CPU based model inference on EKS
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes