Skip to content

Playbook for 2 node RPC Clustering#79

Merged
danielholanda merged 8 commits intomainfrom
abdmalik/rpcclusterplaybook
Mar 13, 2026
Merged

Playbook for 2 node RPC Clustering#79
danielholanda merged 8 commits intomainfrom
abdmalik/rpcclusterplaybook

Conversation

@abdmalik-amd
Copy link
Copy Markdown
Collaborator

add playbook covering llama.cpp RPC-based distributed inference across two STX Halo systems, including VRAM configuration, build instructions, and model deployment.

update page.tsx to support h3 headers and in page section navigation

Copy link
Copy Markdown
Collaborator

@eddierichter-amd eddierichter-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @abdmalik-amd! I just had a couple small comments.

Comment thread playbooks/supplemental/clustering-rpc-server/playbook.json Outdated
Comment thread playbooks/supplemental/clustering-rpc-server/README.md
Comment thread playbooks/supplemental/clustering-rpc-server/README.md
Comment thread playbooks/supplemental/clustering-rpc-server/README.md
@danielholanda danielholanda requested a review from bog601 February 26, 2026 19:16
@danielholanda
Copy link
Copy Markdown
Collaborator

@eddierichter-amd were you able to actually reproduce the results here?

@eddierichter-amd
Copy link
Copy Markdown
Collaborator

@eddierichter-amd were you able to actually reproduce the results here?

You mean performance results? I don't have a 4-Strix Halo machine but I do infact have a 2-Strix Halo setup and following the same steps functionally works.

@adamlam2-amd
Copy link
Copy Markdown
Collaborator

Nice, I'm glad the functionality works. Some more minor UX considerations:

  1. maybe specify what exact memory values are needed to run the model we are working with. Can mention to set the variable graphics memory to ~75% and also the TTM values to there to load larger models
  2. i think rpc does not improve compute, but rather just allows for model offloading on the combined vram
  3. im not too familiar with building llama.cpp from source, but ensure they work. What is Ninja for windows? is that installed. Do we need build-essential for Linux?
  4. some windows v linux instructions - .\rpc-server.exe vs ./rpc-server
  5. any images/screenshots/gifs you can add would be helpful!

Other than that, pretty good and should be able to pass onto QA

@abdmalik-amd
Copy link
Copy Markdown
Collaborator Author

abdmalik-amd commented Mar 5, 2026

Nice, I'm glad the functionality works. Some more minor UX considerations:

  1. maybe specify what exact memory values are needed to run the model we are working with. Can mention to set the variable graphics memory to ~75% and also the TTM values to there to load larger models
  2. i think rpc does not improve compute, but rather just allows for model offloading on the combined vram
  3. im not too familiar with building llama.cpp from source, but ensure they work. What is Ninja for windows? is that installed. Do we need build-essential for Linux?
  4. some windows v linux instructions - .\rpc-server.exe vs ./rpc-server
  5. any images/screenshots/gifs you can add would be helpful!

Other than that, pretty good and should be able to pass onto QA

  1. Sure I will update the memory text, the current memory text is using the memory macro
  2. Correct
  3. Ninja is included in VS Code Build Tools
  4. Will fix this typo for the windows section
  5. Sure I will look into adding some pictures/gifs of usage

@abdmalik-amd
Copy link
Copy Markdown
Collaborator Author

@eddierichter-amd @adamlam2-amd

Images for llama-cli & llama-server interface have been added to the playbook

@danielholanda
Copy link
Copy Markdown
Collaborator

@adamlam2-amd @eddierichter-amd Any additional comments or requirements before we merge this?

Copy link
Copy Markdown
Collaborator

@adamlam2-amd adamlam2-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - one thing we should keep note of is to differentiate between RPC clustering and RCCL clustering so that users know exactly which one to use.

@danielholanda danielholanda merged commit d5cddfd into main Mar 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants