add a new ray service for Graviton based model inference with llamacpp #88

ddynwzh1992 · 2025-05-26T00:25:44Z

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction. When we triage the issues, we will add labels to the issue like "Enhancement", "Bug" which should indicate to you that this issue can be worked on and we are looking forward to your PR. We would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

add a new ray service for Graviton based model inference with llamacpp

Motivation

provide a new cost effective solution for CPU based model inference on EKS

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

vara-bonthu

Thanks for raising the PR! Could you please create an issue first before we move forward with this?
This will need the following changes:

1/ Use the JARK stack as a prerequisite for deploying this model
2/ Ensure the Karpenter Nodepool is available to Graviton instances and use these labels in the Ray deployment. See this x86 example

ai-on-eks/infra/base/terraform/addons.tf

Line 483 in e77964b

x86-cpu-karpenter = {

3/ Create a folder called "Graviton" here: https://github.com/awslabs/ai-on-eks/tree/main/website/docs/blueprints/inference
4/ Move the README.md to this folder and make sure the documentation follows the same format as other examples like: https://awslabs.github.io/ai-on-eks/docs/blueprints/inference/GPUs/vLLM-rayserve

Let me know once you've got the issue created and we can track the progress there!

omrishiv · 2025-05-27T19:18:24Z

@vara-bonthu this was missing from the split from DoEKS, it's already in there, but agree with the comments

ddynwzh1992 · 2025-05-29T00:14:24Z

@vara-bonthu I have created the issue , and move the doc to a new folder named Graviton

vara-bonthu

Thanks for updating the PR with comments. I would need your help to fix a few more things before approving the PR.

vara-bonthu · 2025-05-29T01:44:24Z

blueprints/inference/llamacpp-rayserve-graviton/ray-service-llamacpp.yaml

+        num-cpus: "29"
+      template:
+        spec:
+          nodeSelector:


I don't think we have any Karpenter nodepool that creates this node for you. You may need to create a new nodepool for ARM. See this example for x86 https://github.com/awslabs/ai-on-eks/blob/e45a5a71702c5a04c27ac9cca72a510001580a7b/infra/base/terraform/addons.tf#L483C1-L537C6

and create a new one for ARM.

vara-bonthu · 2025-05-29T01:48:45Z

blueprints/inference/llamacpp-rayserve-graviton/perf_benchmark.go

@@ -0,0 +1,325 @@
+import (


Just a question: Does this blueprint require custom benchmarking code or have we explored if we can use open source benchmarking tools? I am only thinking about the maintainability of this Go code in the longer term.

vara-bonthu · 2025-05-29T01:53:08Z

website/docs/blueprints/inference/Graviton/llamacpp-rayserve.md

@@ -0,0 +1,74 @@
+# Cost effective and Scalable Model Inference on AWS Graviton with Ray on EKS


Use this doc as an example to write this doc. You need to add sidebar_label etc.

You can test deploy this website locally with simple steps and modify the doc. Checkout the instructions https://github.com/awslabs/ai-on-eks/blob/main/website/README.md

ddynwzh1992 · 2025-08-06T04:40:20Z

Removed the sidebar_position

omrishiv

@vara-bonthu this looks good to me, can you re-review please? thank you

omrishiv · 2025-08-06T16:02:17Z

I know this was started way before we had the inference charts, but ultimately, that's where this deployment will move to. I will +1 the comment #88 (comment) that when we have benchmarking available for the inference charts, this will probably go away.

add a new ray service for Graviton based model inference with llamacpp

e580508

vara-bonthu reviewed May 27, 2025

View reviewed changes

ddynwzh1992 added 3 commits May 28, 2025 23:54

Merge branch 'awslabs:main' into main

c74ebef

add a new ray service for Graviton based model inference with llamacpp

29f6588

Merge remote changes while preserving local modifications

09dbf36

vara-bonthu reviewed May 29, 2025

View reviewed changes

ddynwzh1992 added 3 commits July 7, 2025 09:01

Merge branch 'awslabs:main' into main

d344247

update doc and graviton nodepool

c7ea77f

remove the sidebar_position

647a51e

omrishiv reviewed Aug 6, 2025

View reviewed changes

omrishiv mentioned this pull request Oct 2, 2025

add llama.cpp to inference chart #188

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add a new ray service for Graviton based model inference with llamacpp #88

add a new ray service for Graviton based model inference with llamacpp #88

Uh oh!

ddynwzh1992 commented May 26, 2025 •

edited

Loading

Uh oh!

vara-bonthu left a comment

Uh oh!

omrishiv commented May 27, 2025

Uh oh!

ddynwzh1992 commented May 29, 2025

Uh oh!

vara-bonthu left a comment

Uh oh!

vara-bonthu May 29, 2025

Uh oh!

vara-bonthu May 29, 2025

Uh oh!

vara-bonthu May 29, 2025

Uh oh!

ddynwzh1992 commented Aug 6, 2025

Uh oh!

omrishiv left a comment

Uh oh!

omrishiv commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,74 @@
		# Cost effective and Scalable Model Inference on AWS Graviton with Ray on EKS

add a new ray service for Graviton based model inference with llamacpp #88

Are you sure you want to change the base?

add a new ray service for Graviton based model inference with llamacpp #88

Uh oh!

Conversation

ddynwzh1992 commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

More

For Moderators

Additional Notes

Uh oh!

vara-bonthu left a comment

Choose a reason for hiding this comment

Uh oh!

omrishiv commented May 27, 2025

Uh oh!

ddynwzh1992 commented May 29, 2025

Uh oh!

vara-bonthu left a comment

Choose a reason for hiding this comment

Uh oh!

vara-bonthu May 29, 2025

Choose a reason for hiding this comment

Uh oh!

vara-bonthu May 29, 2025

Choose a reason for hiding this comment

Uh oh!

vara-bonthu May 29, 2025

Choose a reason for hiding this comment

Uh oh!

ddynwzh1992 commented Aug 6, 2025

Uh oh!

omrishiv left a comment

Choose a reason for hiding this comment

Uh oh!

omrishiv commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ddynwzh1992 commented May 26, 2025 •

edited

Loading