Skip to content

Commit 50d12de

Browse files
Merge branch 'main' into n8n-playbook
2 parents 272e061 + 2c99c08 commit 50d12de

10 files changed

Lines changed: 694 additions & 15 deletions

File tree

Lines changed: 132 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,133 @@
1-
# Running LLMs on PyTorch with ROCm
1+
## Overview
22

3-
<!-- Playbook content goes here -->
3+
Want to run powerful AI language models on your own STX Halo™ ? This guide shows you how.
4+
This tutorial uses PyTorch powered by AMD's ROCm to run models that can summarize documents, answer questions, generate text, and more, all running locally.
5+
6+
## What You'll Learn
7+
8+
- Run LLMs like gpt-oss-20b and Mistral-7B-Instruct locally using PyTorch and ROCm
9+
- Create a document summarization tool using LLMs
10+
11+
## Setting Up Your Environment
12+
13+
### Create a Virtual Environment
14+
15+
<!-- @os:windows -->
16+
On Windows, open Command Prompt and run:
17+
```cmd
18+
python -m venv llm-env
19+
llm-env\Scripts\activate.bat
20+
```
21+
<!-- @os:end -->
22+
23+
<!-- @os:linux -->
24+
```bash
25+
sudo apt update
26+
sudo apt install -y python3-venv
27+
python3 -m venv llm-env
28+
source llm-env/bin/activate
29+
```
30+
<!-- @os:end -->
31+
32+
### Installing Basic Dependencies
33+
<!-- @require:pytorch -->
34+
35+
### Additional Dependencies
36+
37+
```bash
38+
pip install transformers accelerate sentencepiece protobuf
39+
```
40+
41+
## Quick Start with Example Scripts
42+
43+
This playbook includes ready-to-use scripts in the `assets/` folder (click to preview):
44+
45+
| Script | Description | Usage |
46+
|--------|-------------|-------|
47+
| [run_llm.py](assets/run_llm.py) | Basic LLM text generation | `python run_llm.py` |
48+
| [summarizer.py](assets/summarizer.py) | Document summarizer with Harmony support | `python summarizer.py --file document.txt` |
49+
50+
Both scripts support:
51+
- Model selection: `--model gptoss` (default) or `--model mistral`
52+
- Chat template formatting for proper model prompting especially useful for document summarization
53+
54+
## Loading and Running Your First LLM
55+
56+
The included [run_llm.py](assets/run_llm.py) script shows how to load and generate text with LLMs using PyTorch and AMD ROCm. On the first run, model weights are automatically downloaded.
57+
58+
Take a look at how prompts are tokenized and sent to the model. Understanding this process lets you adapt LLMs for any text generation or summarization task. Here’s a minimal example from the script:
59+
60+
```python
61+
import torch
62+
from transformers import AutoTokenizer, AutoModelForCausalLM
63+
64+
model_name = "openai/gpt-oss-20b"
65+
tokenizer = AutoTokenizer.from_pretrained(model_name)
66+
model = AutoModelForCausalLM.from_pretrained(
67+
model_name,
68+
torch_dtype=torch.bfloat16,
69+
device_map="auto"
70+
)
71+
```
72+
73+
To try it out:
74+
75+
```bash
76+
python run_llm.py
77+
```
78+
79+
## Building a Document Summarizer
80+
81+
Build on your LLM setup by turning it into a practical document summarizer. In this section, you will use the [summarizer.py](assets/summarizer.py) script to feed in a .txt file and automatically generate a concise summary, all running locally on your GPU.
82+
83+
The script is designed to work out of the box: point it at a text file, pick a model, and it returns a clear 2–3 sentence overview. As you explore the code, you can customize prompts, tweak parameters like length and temperature, and see how different models behave.
84+
85+
### Usage Examples
86+
87+
```bash
88+
89+
# Summarize document
90+
python summarizer.py
91+
92+
# Summarize a text file
93+
python summarizer.py --file example_document.txt
94+
95+
# Adjust creativity with temperature
96+
python summarizer.py --file document.txt --temperature 0.5
97+
98+
# Try different model instead
99+
python summarizer.py --file document.txt --model mistral
100+
101+
# Longer summaries with more tokens
102+
python summarizer.py --file document.txt --max-length 200
103+
```
104+
105+
## Generation Parameters
106+
107+
| Parameter | What It Controls | Typical Values |
108+
|-----------|------------------|----------------|
109+
| `max_new_tokens` | Length of output | 50–500 for summaries |
110+
| `temperature` | Randomness/creativity | 0.2–0.3 for summaries, 0.7–0.9 for creative tasks |
111+
| `top_p` | Nucleus sampling | 0.9 (standard) |
112+
113+
**Temperature Guide**:
114+
- 0.1–0.3: Focused, deterministic (good for summaries)
115+
- 0.5–0.7: Balanced (general use)
116+
- 0.8–1.0: Creative, varied (brainstorming)
117+
118+
## Real-World Applications
119+
120+
- **Research Paper Analysis**: Extract key findings from complex publications for quick review
121+
- **News Aggregation**: Summarize news articles into brief daily digests or highlights
122+
- **Meeting Notes**: Condense transcripts into actionable items and concise summaries
123+
- **Legal Document Review**: Extract relevant clauses or obligations from long legal texts quickly
124+
- **Code Documentation**: Generate concise repository overviews and function explanations
125+
126+
## Next Steps
127+
128+
- **Fine-tuning**: Adapt models to your specific field or jargon for better accuracy (see PyTorch Fine-tuning Playbook)
129+
- **RAG Systems**: Combine LLMs with document retrieval for context-aware answers and search
130+
- **Model Exploration**: Experiment with new models like Llama 3, Phi-3, or Qwen for better results
131+
- **Production Deployment**: Use tools like vLLM or TGI for scalable LLM serving in organizations
132+
133+
Your STX Halo gives you the power to run sophisticated language models locally. Experiment with different models, prompts, and parameters to discover what works best for your applications.
39.7 KB
Loading
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Large language models (LLMs) are neural networks with billions of parameters
2+
trained on massive text datasets. They learn to predict the next word in a sequence,
3+
developing an understanding of language patterns, facts, and reasoning. Modern LLMs like
4+
GPT-4, Claude, and Llama can perform diverse tasks including translation, question answering,
5+
code generation, and creative writing. The key breakthrough was the transformer architecture,
6+
which uses attention mechanisms to process sequences in parallel. Training these models requires
7+
enormous computational resources, but once trained, they can run on consumer hardware for
8+
inference tasks. Recent advances include instruction tuning, where models are fine-tuned to
9+
follow user instructions more accurately, and reinforcement learning from human feedback (RLHF),
10+
which aligns model outputs with human preferences. The field continues to evolve rapidly with
11+
new architectures, training techniques, and applications emerging regularly.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
"""
2+
Basic LLM Loading and Text Generation
3+
======================================
4+
5+
This script demonstrates how to:
6+
- Load a language model with ROCm acceleration
7+
- Generate text from a prompt
8+
- Use different generation parameters
9+
10+
Usage:
11+
python run_llm.py
12+
"""
13+
14+
import os
15+
import torch
16+
from transformers import AutoTokenizer, AutoModelForCausalLM
17+
18+
import logging
19+
import warnings
20+
21+
logging.getLogger("transformers").setLevel(logging.ERROR)
22+
warnings.filterwarnings("ignore", category=UserWarning)
23+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
24+
25+
def main():
26+
# Verify ROCm is available
27+
print("="*10 + " ROCm Configuration" + "="*10)
28+
print(f"ROCm available: {torch.cuda.is_available()}")
29+
if torch.cuda.is_available():
30+
print(f"GPU: {torch.cuda.get_device_name(0)}")
31+
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
32+
print()
33+
34+
# Load model and tokenizer
35+
model_name = "openai/gpt-oss-20b"
36+
# To use Mistral-7B instead of GPT-OSS-20B, uncomment the following line
37+
# model_name = "mistralai/Mistral-7B-Instruct-v0.3"
38+
39+
print(f"Loading {model_name}...")
40+
print("First run will download ~14GB, this may take a few minutes")
41+
print("For AMD Halo Developer Platforms, the model will be pre-installed.")
42+
43+
tokenizer = AutoTokenizer.from_pretrained(model_name)
44+
model = AutoModelForCausalLM.from_pretrained(
45+
model_name,
46+
torch_dtype=torch.bfloat16,
47+
device_map="auto"
48+
)
49+
50+
print("✓ Model loaded successfully!\n")
51+
52+
# Create a simple prompt
53+
prompt = "Explain what a large language model is in simple terms:"
54+
55+
print(f"Prompt: {prompt}\n")
56+
57+
# Tokenize input
58+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
59+
60+
# Generate response
61+
print("Generating... (this may take 10-30 seconds)")
62+
outputs = model.generate(
63+
**inputs,
64+
max_new_tokens=200,
65+
temperature=0.7,
66+
do_sample=True,
67+
top_p=0.9
68+
)
69+
70+
# Decode and print
71+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
72+
73+
print()
74+
print("Model Output:\n")
75+
response_text = response[len(prompt):].strip() if response.startswith(prompt) else response.strip()
76+
print(response_text)
77+
print("\nDone. Try changing the prompt or generation settings for different explanations.")
78+
79+
# Cleanup GPU memory and exit cleanly
80+
del model
81+
del tokenizer
82+
torch.cuda.empty_cache()
83+
84+
if __name__ == "__main__":
85+
main()

0 commit comments

Comments
 (0)