-
Notifications
You must be signed in to change notification settings - Fork 1
Add LMStudio Playbook #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
751fab1
Added vulkan and rocm instructions
4d0972c
Add playbook instructions
4eac1d8
Merge branch 'main' into usman/lmstudio
9fc4143
suggested changes
4f2e307
Update LMStudio version
danielholanda e06d732
updates as discussed: formatting, example, removing content.
adamlam2-amd f666dd1
Merge branch 'main' into usman/lmstudio
danielholanda 9383744
suggested changes on LMStudio path
danielholanda 193a6f8
Add syntax highlighting on code
danielholanda 598e2c1
Suggested changes
danielholanda 99b8819
Add memory configuration step
danielholanda d34bb3d
Add to playbok guide
danielholanda a91395f
More concise content
danielholanda 87b2488
Cleaner model download instructions
danielholanda 2f3d200
Update instructions to v0.4.0
danielholanda 4af2ccb
Enable images to be clicked on setuip
danielholanda c681884
Add image
danielholanda 9fb4717
Add link to cline playbook
danielholanda b07e518
Merge branch 'main' into usman/lmstudio
danielholanda 039bbb8
Fix page size
danielholanda File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,109 @@ | ||
| # Hands-On LLMs with ROCm and LM Studio: Chat, Test, and Serve Models | ||
| ## Overview | ||
|
|
||
| <!-- Playbook content goes here --> | ||
|
danielholanda marked this conversation as resolved.
|
||
| LM Studio is a powerful GUI-based wrapper for [llama.cpp](https://github.com/ggml-org/llama.cpp) and also provides an [OpenAI compliant endpoint](https://lmstudio.ai/docs/developer/openai-compat) for local serving of models. LM Studio provides a simple but powerful interface to quickly and easily download and deploy models using LM Studio. LM Studio offers both Vulkan and ROCm based backends (called runtimes) for AMD users. | ||
|
|
||
|
|
||
| ## What You'll Learn | ||
| - How to configure and use LM Studio to leverage STX Halo hardware | ||
| - Test and manage LLMs in a completely offline environment | ||
| - Serve models via OpenAI Compatible API to power custom workflows and apps | ||
|
|
||
|
|
||
| ## Installing Dependencies | ||
|
|
||
| <!-- @require:lmstudio --> | ||
|
|
||
| ## System Setup | ||
|
|
||
|
danielholanda marked this conversation as resolved.
|
||
| <!-- @setup:memory-config --> | ||
|
|
||
|
|
||
| ## Downloading Models | ||
|
|
||
| <!-- @require:lmstudio-models-gpt-oss-120b --> | ||
|
|
||
| ## Chatting with an LLM | ||
| Learn how to start chatting with a ChatGPT-grade LLM completely locally. | ||
|
|
||
| 1. Press "Ctrl" + "1" or click on the 👾 button on the top left of the screen to open the Chat window. | ||
| 2. Press "Ctrl" + "M" to open the `Model Loader`, select "manually chose model load parameters", and click on "OpenAI GPT-OSS 120B" | ||
| 3. Make sure "show advanced settings" is checked. | ||
| 4. Change context size to "128,000". Make sure "Flash Attention" is On and "GPU offload layers" is set to maximum. | ||
| 5. Check "Remember settings" and click on `Load Model`. | ||
| 6. Send a message and start interacting with the model! | ||
|
|
||
| <p align="center"> | ||
| <img src="assets/chat.png" alt="Chatting with gpt-oss-120b on LM Studio" width="600"/> | ||
| </p> | ||
|
|
||
| > Context size refers to the model's short-term memory limit, and with STX Halo, we can use 128,000 tokens to allow for handling extensive workflows that typically require cloud servers. | ||
|
|
||
| ## Serve LLMs through an OpenAI compatible endpoint | ||
|
|
||
| LM Studio also offers an OpenAI compliant endpoint in the form of LM Studio Server. This has already been demonstrated in an agentic coding workflow with Cline [here](../playbooks/vscode-qwen3-coder). Another common use case is connecting LM Studio Server to any web application (React, Node.js, Python) by sending standard HTTP requests to the inference endpoint. | ||
|
|
||
| To set up LM Studio Server, use the following instructions: | ||
|
|
||
| 1. On the left hand side, click on the "Developer" tab (command line icon) and then click on Server Settings. | ||
| 2. If you want to serve the model over your LAN, check "Serve on Local Network", if you want to use with a website or extensive calling within VS Code, enable "CORS"; otherwise leave these as defaults. | ||
| 3. Click on the toggle in front of Status: Stopped or press "Ctrl" + "R". | ||
| 4. An OpenAI compliant endpoint will now be running. The address is typically http://127.0.0.1:1234 | ||
| 5. Staying on the same "Developer" tab, with the Status: Running, you can deploy an LLM by going through the steps mentioned in "Chatting with an LLM". | ||
|
|
||
|
|
||
| This model will now be accessible through the LM Studio Server endpoint and will support OpenAI endpoints including: | ||
|
|
||
| | Endpoint | Method | Docs | | ||
| |------------|----------|----------| | ||
| | /v1/models | GET | [Models](https://lmstudio.ai/docs/developer/openai-compat/models) | | ||
| | /v1/responses | POST | [Responses](https://lmstudio.ai/docs/developer/openai-compat/responses) | | ||
| | /v1/chat/completions | POST | [Chat Completions](https://lmstudio.ai/docs/developer/openai-compat/chat-completions) | | ||
| | /v1/embeddings | POST | [Embeddings](https://lmstudio.ai/docs/developer/openai-compat/embeddings) | | ||
| | /v1/completions | POST | [Completions](https://lmstudio.ai/docs/developer/openai-compat/completions) | | ||
|
|
||
|
danielholanda marked this conversation as resolved.
|
||
| #### Example: Pinging your Endpoint | ||
| Having just created the OpenAI Compatible endpoint, let's look at how to integrate this into a Python developer environment and use your system as a local API Provider. | ||
|
|
||
| ```python | ||
| from openai import OpenAI # if not installed, run pip install openai in your selected environment | ||
|
|
||
| # Initialize the client specifically for your local server | ||
| # The API key is required by the library but ignored by LM Studio | ||
| client = OpenAI( | ||
| base_url="http://localhost:1234/v1", | ||
| api_key="lm-studio" | ||
| ) | ||
| print("Attempting to connect to local STX Halo server...") | ||
|
|
||
| try: | ||
| # Create a simple chat completion request | ||
| completion = client.chat.completions.create( | ||
| model="local-model", # The model identifier is optional in local mode | ||
| messages=[ | ||
| {"role": "system", "content": "You are a helpful coding assistant."}, | ||
| {"role": "user", "content": "Write a one-line Python joke."} | ||
| ], | ||
| temperature=0.7, | ||
| ) | ||
| # Print the response | ||
| print("\nConnection Successful! Server Response:\n") | ||
| print(completion.choices[0].message.content) | ||
|
|
||
| except Exception as e: | ||
| print(f"\nConnection Failed: {e}. Ensure LM Studio server is running on port 1234.") | ||
| ``` | ||
|
|
||
|
|
||
| #### Swapping between ROCm and Vulkan backends (Optional) | ||
|
|
||
| 1. Press "Ctrl" + "Shift" + "R" on your keyboard. Alternatively click on the Discover tab (Magnifying Glass) on the left-hand side and then click on "Runtime" in the pop up. | ||
| 2. In the bottom right quadrant of the pop-up, you should see the "Selections" drawer with the "Engines" sub-header. | ||
| 3. The GGUF drop-down menu will show your currently selected backend. You can change this to ROCm or Vulkan llama.cpp depending on what you are trying to do. | ||
| > Warning: selecting CPU llama.cpp here will disable GPU usage. | ||
|
|
||
|
|
||
| ## Next Steps | ||
| - **Custom App Integration**: Integrate your own Python scripts or applications using the local OpenAI-compatible API. | ||
| - **Advanced Frontends**: Connect powerful interfaces like Open WebUI to your server for chat history and persona management. | ||
|
|
||
| For more documentation, please visit: https://lmstudio.ai/docs/developer | ||
|
danielholanda marked this conversation as resolved.
|
||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| # Platform Configuration | ||
|
danielholanda marked this conversation as resolved.
|
||
|
|
||
| This document describes the expected platform configurations for running this playbook. | ||
|
|
||
| ## Windows | ||
|
|
||
| ### LM Studio Installation | ||
|
|
||
| LM Studio should be pre-installed: | ||
|
|
||
| | Component | Version | Location | | ||
| |-----------|---------|----------| | ||
| | **LM Studio (Models + Msc)** | v0.4.0 | `C:\Users\...\.lmstudio` | | ||
| | **LM Studio (Program)** | v0.4.0 | `C:\Program Files\LM Studio` | | ||
| | **LM Studio (Cache)** | v0.4.0 | `C:\Users\...\AppData\Roaming\LM Studio` | | ||
|
|
||
| ### Model Download | ||
|
|
||
| The following models should already be present in the LM Studio models directory (`C:\Users\...\.lmstudio\models`): | ||
|
|
||
| | Model Type | Quantization | Size | Location | | ||
| |------------|--------------|------|----------| | ||
| | OpenAI GPT-OSS 120B | `MXFP4` | 59 GB | `models\ggml-org` | | ||
|
danielholanda marked this conversation as resolved.
|
||
|
|
||
| --- | ||
|
|
||
| ## Linux | ||
|
|
||
| ### LM Studio Installation | ||
|
|
||
| See lmstudio.md (inside dependencies folder) for more details. | ||
| ### Model Download | ||
|
|
||
| Same as on Windows. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,12 @@ | ||
| { | ||
| "id": "lmstudio-rocm-llms", | ||
| "title": "Running LLMs with LM Studio and ROCm", | ||
| "description": "Set up LM Studio with ROCm acceleration to run large language models on STX Halo™", | ||
| "title": "Running and serving LLMs with LM Studio", | ||
| "description": "Set up LM Studio and LM Studio Server to run and serve large language models on STX Halo™", | ||
| "time": 30, | ||
| "platforms": ["windows", "linux"], | ||
| "difficulty": "beginner", | ||
| "isNew": false, | ||
| "isFeatured": false, | ||
| "published": true, | ||
| "tags": ["lm-studio", "rocm", "llm", "inference"] | ||
| "tags": ["lm-studio", "rocm", "vulkan", "llm", "inference"] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,16 @@ | ||
| ### LM Studio | ||
|
|
||
| <!-- TODO: Add installation instructions --> | ||
| <!-- @os:windows --> | ||
|
|
||
| 1. Download the installer from here: [https://lmstudio.ai/download](https://lmstudio.ai/download) | ||
| 2. Install. | ||
|
|
||
| <!-- @os:end --> | ||
|
|
||
| <!-- @os:linux --> | ||
| 1. Download the appimage from here: [https://lmstudio.ai/download?os=linux](https://lmstudio.ai/download?os=linux) | ||
| 2. run `sudo apt install libfuse2` | ||
| 3. run `cd ~/Downloads` | ||
| 4. run `chmod +x LM-Studio-*.AppImage` | ||
| 5. run `/LM-Studio-*.AppImage` | ||
| <!-- @os:end --> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| ### Downloading GPT-OSS 120B on LM Studio | ||
|
|
||
| To download the GPT-OSS 120B model: | ||
|
|
||
| 1. Press "Ctrl" + "Shift" + "M" on your keyboard or click on the "Discover" tab (Magnifying Glass icon) on the left sidebar | ||
| 2. Search for `ggml-org/gpt-oss-120b-GGUF` | ||
| 3. Select `mxfp4` and click Download | ||
|
|
||
| <p align="center"> | ||
| <img src="/api/dependencies/assets/lmstudio_download.png" alt="LM Studio Download Models" width="600"/> | ||
|
|
||
| LM Studio will automatically download and place the model in the correct directory. | ||
|
|
||
| Should you wish to download additional models, you can search for them in the Discover tab and LM Studio will handle the rest. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,72 +1,44 @@ | ||
| ### STX Halo Memory Configurations | ||
| ### Memory configuration for running large models | ||
|
|
||
| <!-- @os:windows --> | ||
|
|
||
| On Windows, to run larger models that require higher memory, we need to use the AMD Variable Graphics Memory (iGPU VRAM) allocation. | ||
| On Windows, to run larger models that require higher memory, we need to use the AMD Variable Graphics Memory (iGPU VRAM) allocation. Although 64 GB is adequate for most workloads, running the largest models with high context may require 96 GB. | ||
|
|
||
| > 64 GB is adequate for most workloads but if you want to run the largest models with high context, you will need to set it to 96 GB. | ||
|
|
||
| This can be done by opening AMD Software: Adrenalin™ Edition control panel and navigating to: Performance > Tuning > AMD Variable Graphics Memory. Please reboot the system for the changes to take effect. | ||
| This can be done by opening AMD Software: Adrenalin™ Edition control panel and navigating to: `Performance > Tuning > AMD Variable Graphics Memory`. Please reboot the system for the changes to take effect. | ||
|
|
||
| <!-- @os:end --> | ||
|
|
||
| <!-- @os:linux --> | ||
|
|
||
| On Linux, ROCm utilizes a shared system memory pool, and this pool is configured by default to half the system memory. | ||
|
|
||
| This amount can be increased by changing the kernel’s Translation Table Manager (TTM) page setting, with the following instructions. | ||
|
|
||
| 1. If possible, AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5GB) | ||
|
|
||
| 2. Install the pipx utility and add the path for pipx installed wheels into the system search path. | ||
|
|
||
| ```bash | ||
| sudo apt install pipx | ||
| pipx ensurepath | ||
| ``` | ||
|
|
||
| 3. Install the amd-debug-tools wheel from PyPi. | ||
| ```bash | ||
| pipx install amd-debug-tools | ||
| ``` | ||
| This amount can be increased by changing the kernel’s Translation Table Manager (TTM) page setting, with the following instructions. AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5GB) | ||
|
|
||
| 4. Run the amd-ttm tool to query the current settings for shared memory. | ||
| ```bash | ||
| amd-ttm | ||
| ``` | ||
| * Install the pipx utility and add the path for pipx installed wheels into the system search path. | ||
|
|
||
| 5. | ||
| Reconfigure shared memory settings by using the --set argument (units in GB). | ||
| ```bash | ||
| amd-ttm --set <NUM> | ||
| ``` | ||
| ```bash | ||
| sudo apt install pipx | ||
| pipx ensurepath | ||
| ``` | ||
|
|
||
| 6. Reboot the system for changes to take effect. | ||
| * Install the amd-debug-tools wheel from PyPi. | ||
| ```bash | ||
| pipx install amd-debug-tools | ||
| ``` | ||
|
|
||
| * Run the amd-ttm tool to query the current settings for shared memory. | ||
| ```bash | ||
| amd-ttm | ||
| ``` | ||
|
|
||
| #### amd-ttm Usage Examples | ||
| * Reconfigure shared memory settings by using the --set argument (units in GB). | ||
| ```bash | ||
| amd-ttm --set <NUM> | ||
| ``` | ||
|
|
||
| ##### Query effective memory settings in the current kernel | ||
| ```bash | ||
| amd-ttm | ||
| 💻 Current TTM pages limit: 16469033 pages (62.82 GB) | ||
| 💻 Total system memory: 125.65 GB | ||
| ``` | ||
| * Reboot the system for changes to take effect. | ||
|
|
||
| ##### Set usable shared memory | ||
| ```bash | ||
| ❯ amd-ttm --set 100 | ||
| 🐧 Successfully set TTM pages limit to 26214400 pages (100.00 GB) | ||
| 🐧 Configuration written to /etc/modprobe.d/ttm.conf | ||
| ○ NOTE: You need to reboot for changes to take effect. | ||
| Would you like to reboot the system now? (y/n): y | ||
| ``` | ||
|
|
||
| ##### Clear TTM setting and revert to kernel defaults | ||
| ```bash | ||
| ❯ amd-ttm --clear | ||
| 🐧 Configuration /etc/modprobe.d/ttm.conf removed | ||
| Would you like to reboot the system now? (y/n): y | ||
| ``` | ||
| For `amd-ttm` usage examples, see the [ROCm documentation](https://rocm.docs.amd.com/projects/radeon-ryzen/en/docs-7.0.2/docs/install/installryz/native_linux/install-ryzen.html#amd-ttm-usage-examples). | ||
|
|
||
| <!-- @os:end --> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.