Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 15 additions & 7 deletions playbooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,23 +84,31 @@ Linux-only content

Content outside `@os` tags is always shown. Keep blocks focused—only tag the parts that differ.

### Pre-installed Software Dropdowns
### Shared Content Tags

For software that comes pre-installed on the AMD Halo Developer Platform, use the `@require` tag to reference installation instructions from the central `dependencies/` folder:
Use these tags to pull in shared content from `playbooks/dependencies/`. Both reference items defined in `registry.json`.

| Tag | Purpose | Display |
|-----|---------|---------|
| `@require` | Pre-installed software | Collapsible dropdown (optional info) |
| `@setup` | System configuration steps | Displayed directly (required steps) |

**Pre-installed software** — Use `@require` for software that comes pre-installed on the AMD Halo Developer Platform:

```markdown
<!-- @require:comfyui -->
<!-- @require:comfyui,pytorch --> <!-- multiple dependencies in one dropdown -->
```

For multiple dependencies, use comma-separated IDs to combine them into a **single dropdown**:
Displays a green checkmark with "Already pre-installed on your AMD Halo Developer Platform!" that expands to show manual installation instructions.

**System setup** — Use `@setup` for configuration steps users need to perform:

```markdown
<!-- @require:comfyui,pytorch -->
<!-- @setup:memory_config -->
```

Available dependencies are defined in `playbooks/dependencies/registry.json`. Each dependency has its own markdown file with OS-specific installation instructions.

The dropdown displays with a green checkmark and the text "Already pre-installed on your AMD Halo Developer Platform!" When expanded, it shows a notice explaining the software is pre-configured, followed by manual installation instructions.
Content displays directly since these are required steps, not optional reference info.

### Writing Tips

Expand Down
110 changes: 108 additions & 2 deletions playbooks/core/lmstudio-rocm-llms/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,109 @@
# Hands-On LLMs with ROCm and LM Studio: Chat, Test, and Serve Models
## Overview
Comment thread
danielholanda marked this conversation as resolved.

<!-- Playbook content goes here -->
Comment thread
danielholanda marked this conversation as resolved.
LM Studio is a powerful GUI-based wrapper for [llama.cpp](https://github.com/ggml-org/llama.cpp) and also provides an [OpenAI compliant endpoint](https://lmstudio.ai/docs/developer/openai-compat) for local serving of models. LM Studio provides a simple but powerful interface to quickly and easily download and deploy models using LM Studio. LM Studio offers both Vulkan and ROCm based backends (called runtimes) for AMD users.


## What You'll Learn
- How to configure and use LM Studio to leverage STX Halo hardware
- Test and manage LLMs in a completely offline environment
- Serve models via OpenAI Compatible API to power custom workflows and apps


## Installing Dependencies

<!-- @require:lmstudio -->

## System Setup

Comment thread
danielholanda marked this conversation as resolved.
<!-- @setup:memory-config -->


## Downloading Models

<!-- @require:lmstudio-models-gpt-oss-120b -->

## Chatting with an LLM
Learn how to start chatting with a ChatGPT-grade LLM completely locally.

1. Press "Ctrl" + "1" or click on the 👾 button on the top left of the screen to open the Chat window.
2. Press "Ctrl" + "M" to open the `Model Loader`, select "manually chose model load parameters", and click on "OpenAI GPT-OSS 120B"
3. Make sure "show advanced settings" is checked.
4. Change context size to "128,000". Make sure "Flash Attention" is On and "GPU offload layers" is set to maximum.
5. Check "Remember settings" and click on `Load Model`.
6. Send a message and start interacting with the model!

<p align="center">
<img src="assets/chat.png" alt="Chatting with gpt-oss-120b on LM Studio" width="600"/>
</p>

> Context size refers to the model's short-term memory limit, and with STX Halo, we can use 128,000 tokens to allow for handling extensive workflows that typically require cloud servers.

## Serve LLMs through an OpenAI compatible endpoint

LM Studio also offers an OpenAI compliant endpoint in the form of LM Studio Server. This has already been demonstrated in an agentic coding workflow with Cline [here](../playbooks/vscode-qwen3-coder). Another common use case is connecting LM Studio Server to any web application (React, Node.js, Python) by sending standard HTTP requests to the inference endpoint.

To set up LM Studio Server, use the following instructions:

1. On the left hand side, click on the "Developer" tab (command line icon) and then click on Server Settings.
2. If you want to serve the model over your LAN, check "Serve on Local Network", if you want to use with a website or extensive calling within VS Code, enable "CORS"; otherwise leave these as defaults.
3. Click on the toggle in front of Status: Stopped or press "Ctrl" + "R".
4. An OpenAI compliant endpoint will now be running. The address is typically http://127.0.0.1:1234
5. Staying on the same "Developer" tab, with the Status: Running, you can deploy an LLM by going through the steps mentioned in "Chatting with an LLM".


This model will now be accessible through the LM Studio Server endpoint and will support OpenAI endpoints including:

| Endpoint | Method | Docs |
|------------|----------|----------|
| /v1/models | GET | [Models](https://lmstudio.ai/docs/developer/openai-compat/models) |
| /v1/responses | POST | [Responses](https://lmstudio.ai/docs/developer/openai-compat/responses) |
| /v1/chat/completions | POST | [Chat Completions](https://lmstudio.ai/docs/developer/openai-compat/chat-completions) |
| /v1/embeddings | POST | [Embeddings](https://lmstudio.ai/docs/developer/openai-compat/embeddings) |
| /v1/completions | POST | [Completions](https://lmstudio.ai/docs/developer/openai-compat/completions) |

Comment thread
danielholanda marked this conversation as resolved.
#### Example: Pinging your Endpoint
Having just created the OpenAI Compatible endpoint, let's look at how to integrate this into a Python developer environment and use your system as a local API Provider.

```python
from openai import OpenAI # if not installed, run pip install openai in your selected environment

# Initialize the client specifically for your local server
# The API key is required by the library but ignored by LM Studio
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio"
)
print("Attempting to connect to local STX Halo server...")

try:
# Create a simple chat completion request
completion = client.chat.completions.create(
model="local-model", # The model identifier is optional in local mode
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a one-line Python joke."}
],
temperature=0.7,
)
# Print the response
print("\nConnection Successful! Server Response:\n")
print(completion.choices[0].message.content)

except Exception as e:
print(f"\nConnection Failed: {e}. Ensure LM Studio server is running on port 1234.")
```


#### Swapping between ROCm and Vulkan backends (Optional)

1. Press "Ctrl" + "Shift" + "R" on your keyboard. Alternatively click on the Discover tab (Magnifying Glass) on the left-hand side and then click on "Runtime" in the pop up.
2. In the bottom right quadrant of the pop-up, you should see the "Selections" drawer with the "Engines" sub-header.
3. The GGUF drop-down menu will show your currently selected backend. You can change this to ROCm or Vulkan llama.cpp depending on what you are trying to do.
> Warning: selecting CPU llama.cpp here will disable GPU usage.


## Next Steps
- **Custom App Integration**: Integrate your own Python scripts or applications using the local OpenAI-compatible API.
- **Advanced Frontends**: Connect powerful interfaces like Open WebUI to your server for chat history and persona management.

For more documentation, please visit: https://lmstudio.ai/docs/developer
Comment thread
danielholanda marked this conversation as resolved.
Binary file added playbooks/core/lmstudio-rocm-llms/assets/chat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 34 additions & 0 deletions playbooks/core/lmstudio-rocm-llms/platform.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Platform Configuration
Comment thread
danielholanda marked this conversation as resolved.

This document describes the expected platform configurations for running this playbook.

## Windows

### LM Studio Installation

LM Studio should be pre-installed:

| Component | Version | Location |
|-----------|---------|----------|
| **LM Studio (Models + Msc)** | v0.4.0 | `C:\Users\...\.lmstudio` |
| **LM Studio (Program)** | v0.4.0 | `C:\Program Files\LM Studio` |
| **LM Studio (Cache)** | v0.4.0 | `C:\Users\...\AppData\Roaming\LM Studio` |

### Model Download

The following models should already be present in the LM Studio models directory (`C:\Users\...\.lmstudio\models`):

| Model Type | Quantization | Size | Location |
|------------|--------------|------|----------|
| OpenAI GPT-OSS 120B | `MXFP4` | 59 GB | `models\ggml-org` |
Comment thread
danielholanda marked this conversation as resolved.

---

## Linux

### LM Studio Installation

See lmstudio.md (inside dependencies folder) for more details.
### Model Download

Same as on Windows.
6 changes: 3 additions & 3 deletions playbooks/core/lmstudio-rocm-llms/playbook.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{
"id": "lmstudio-rocm-llms",
"title": "Running LLMs with LM Studio and ROCm",
"description": "Set up LM Studio with ROCm acceleration to run large language models on STX Halo™",
"title": "Running and serving LLMs with LM Studio",
"description": "Set up LM Studio and LM Studio Server to run and serve large language models on STX Halo™",
"time": 30,
"platforms": ["windows", "linux"],
"difficulty": "beginner",
"isNew": false,
"isFeatured": false,
"published": true,
"tags": ["lm-studio", "rocm", "llm", "inference"]
"tags": ["lm-studio", "rocm", "vulkan", "llm", "inference"]
}
6 changes: 3 additions & 3 deletions playbooks/core/vscode-qwen3-coder/platform.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ LM Studio should be pre-installed:

| Component | Version | Location |
|-----------|---------|----------|
| **LM Studio (Models + Msc)** | v0.3.39 | `C:\Users\...\.lmstudio` |
| **LM Studio (Program)** | v0.3.39 | `C:\Users\...\AppData\Local\Programs\LM Studio` |
| **LM Studio (Cache)** | v0.3.39 | `C:\Users\...\AppData\Roaming\LM Studio` |
| **LM Studio (Models + Msc)** | v0.4.0 | `C:\Users\...\.lmstudio` |
| **LM Studio (Program)** | v0.4.0 | `C:\Program Files\LM Studio` |
| **LM Studio (Cache)** | v0.4.0 | `C:\Users\...\AppData\Roaming\LM Studio` |

### Model Download

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 14 additions & 1 deletion playbooks/dependencies/lmstudio.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
### LM Studio

<!-- TODO: Add installation instructions -->
<!-- @os:windows -->

1. Download the installer from here: [https://lmstudio.ai/download](https://lmstudio.ai/download)
2. Install.

<!-- @os:end -->

<!-- @os:linux -->
1. Download the appimage from here: [https://lmstudio.ai/download?os=linux](https://lmstudio.ai/download?os=linux)
2. run `sudo apt install libfuse2`
3. run `cd ~/Downloads`
4. run `chmod +x LM-Studio-*.AppImage`
5. run `/LM-Studio-*.AppImage`
<!-- @os:end -->
14 changes: 14 additions & 0 deletions playbooks/dependencies/lmstudio_models_gpt_oss_120b.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Downloading GPT-OSS 120B on LM Studio

To download the GPT-OSS 120B model:

1. Press "Ctrl" + "Shift" + "M" on your keyboard or click on the "Discover" tab (Magnifying Glass icon) on the left sidebar
2. Search for `ggml-org/gpt-oss-120b-GGUF`
3. Select `mxfp4` and click Download

<p align="center">
<img src="/api/dependencies/assets/lmstudio_download.png" alt="LM Studio Download Models" width="600"/>

LM Studio will automatically download and place the model in the correct directory.

Should you wish to download additional models, you can search for them in the Discover tab and LM Studio will handle the rest.
74 changes: 23 additions & 51 deletions playbooks/dependencies/memoryconfig.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,44 @@
### STX Halo Memory Configurations
### Memory configuration for running large models

<!-- @os:windows -->

On Windows, to run larger models that require higher memory, we need to use the AMD Variable Graphics Memory (iGPU VRAM) allocation.
On Windows, to run larger models that require higher memory, we need to use the AMD Variable Graphics Memory (iGPU VRAM) allocation. Although 64 GB is adequate for most workloads, running the largest models with high context may require 96 GB.

> 64 GB is adequate for most workloads but if you want to run the largest models with high context, you will need to set it to 96 GB.

This can be done by opening AMD Software: Adrenalin™ Edition control panel and navigating to: Performance > Tuning > AMD Variable Graphics Memory. Please reboot the system for the changes to take effect.
This can be done by opening AMD Software: Adrenalin™ Edition control panel and navigating to: `Performance > Tuning > AMD Variable Graphics Memory`. Please reboot the system for the changes to take effect.

<!-- @os:end -->

<!-- @os:linux -->

On Linux, ROCm utilizes a shared system memory pool, and this pool is configured by default to half the system memory.

This amount can be increased by changing the kernel’s Translation Table Manager (TTM) page setting, with the following instructions.

1. If possible, AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5GB)

2. Install the pipx utility and add the path for pipx installed wheels into the system search path.

```bash
sudo apt install pipx
pipx ensurepath
```

3. Install the amd-debug-tools wheel from PyPi.
```bash
pipx install amd-debug-tools
```
This amount can be increased by changing the kernel’s Translation Table Manager (TTM) page setting, with the following instructions. AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5GB)

4. Run the amd-ttm tool to query the current settings for shared memory.
```bash
amd-ttm
```
* Install the pipx utility and add the path for pipx installed wheels into the system search path.

5.
Reconfigure shared memory settings by using the --set argument (units in GB).
```bash
amd-ttm --set <NUM>
```
```bash
sudo apt install pipx
pipx ensurepath
```

6. Reboot the system for changes to take effect.
* Install the amd-debug-tools wheel from PyPi.
```bash
pipx install amd-debug-tools
```

* Run the amd-ttm tool to query the current settings for shared memory.
```bash
amd-ttm
```

#### amd-ttm Usage Examples
* Reconfigure shared memory settings by using the --set argument (units in GB).
```bash
amd-ttm --set <NUM>
```

##### Query effective memory settings in the current kernel
```bash
amd-ttm
💻 Current TTM pages limit: 16469033 pages (62.82 GB)
💻 Total system memory: 125.65 GB
```
* Reboot the system for changes to take effect.

##### Set usable shared memory
```bash
❯ amd-ttm --set 100
🐧 Successfully set TTM pages limit to 26214400 pages (100.00 GB)
🐧 Configuration written to /etc/modprobe.d/ttm.conf
○ NOTE: You need to reboot for changes to take effect.
Would you like to reboot the system now? (y/n): y
```

##### Clear TTM setting and revert to kernel defaults
```bash
❯ amd-ttm --clear
🐧 Configuration /etc/modprobe.d/ttm.conf removed
Would you like to reboot the system now? (y/n): y
```
For `amd-ttm` usage examples, see the [ROCm documentation](https://rocm.docs.amd.com/projects/radeon-ryzen/en/docs-7.0.2/docs/install/installryz/native_linux/install-ryzen.html#amd-ttm-usage-examples).

<!-- @os:end -->
20 changes: 14 additions & 6 deletions playbooks/dependencies/registry.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "./registry.schema.json",
"description": "Central registry of pre-installed software dependencies for AMD Halo Developer Platform",
"description": "Central registry of pre-installed software dependencies and setup steps for AMD Halo Developer Platform",
"dependencies": {
"comfyui": {
"name": "ComfyUI",
Expand Down Expand Up @@ -51,17 +51,25 @@
"platforms": ["windows", "linux"],
"file": "comfyui_models.md"
},
"lmstudio-models-gpt-oss-120b": {
"name": "GPT-OSS 120B Model",
"description": "OpenAI GPT-OSS 120B model for LM Studio",
"category": "model",
"platforms": ["windows", "linux"],
"file": "lmstudio_models_gpt_oss_120b.md"
},
"driver": {
"name": "AMD GPU Driver",
"description": "Latest AMD GPU driver for optimal performance",
"category": "driver",
"platforms": ["windows", "linux"],
"file": "driver.md"
},
"memory_config": {
"name": "AMD Memory Configuration",
"description": "Suggested memory configurations for Strix Halo Systems",
"category": "framework",
}
},
"setup": {
"memory-config": {
"name": "Memory Configuration",
"description": "Configure GPU memory allocation for running large models",
"platforms": ["windows", "linux"],
"file": "memoryconfig.md"
}
Expand Down
Loading
Loading