Skip to content

Commit 65ef5f5

Browse files
committed
Pushing tested version
1 parent cb78229 commit 65ef5f5

File tree

163 files changed

+201601
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+201601
-0
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Downloaded questions (can be large)
2+
solved_questions/
3+
4+
# Python
5+
__pycache__/
6+
*.py[cod]
7+
*$py.class
8+
*.so
9+
.Python
10+
env/
11+
venv/
12+
ENV/
13+
.venv
14+
15+
# IDE
16+
.vscode/
17+
.idea/
18+
*.swp
19+
*.swo
20+
*~
21+
22+
# OS
23+
.DS_Store
24+
Thumbs.db
25+
26+
# Models and Outputs
27+
lora_model/
28+
ollama_model/
29+
outputs/
30+
unsloth_compiled_cache/
31+
*.gguf
32+
Modelfile
33+
llama.cpp/
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM nvcr.io/nvidia/pytorch:25.10-py3
2+
3+
# Set CUDA environment variables
4+
ENV CUDA_HOME=/usr/local/cuda-13.0
5+
ENV CUDA_PATH=$CUDA_HOME
6+
ENV PATH=$CUDA_HOME/bin:$PATH
7+
ENV LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
8+
ENV C_INCLUDE_PATH=$CUDA_HOME/include:$C_INCLUDE_PATH
9+
ENV CPLUS_INCLUDE_PATH=$CUDA_HOME/include:$CPLUS_INCLUDE_PATH
10+
11+
# Install triton from source for latest blackwell support
12+
RUN git clone https://github.com/triton-lang/triton.git && \
13+
cd triton && \
14+
git checkout c5d671f91d90f40900027382f98b17a3e04045f6 && \
15+
pip install -r python/requirements.txt && \
16+
pip install . && \
17+
cd ..
18+
19+
# Install xformers from source for blackwell support
20+
RUN git clone --depth=1 https://github.com/facebookresearch/xformers --recursive && \
21+
cd xformers && \
22+
export TORCH_CUDA_ARCH_LIST="12.1" && \
23+
python setup.py install && \
24+
cd ..
25+
26+
# Install unsloth and other dependencies
27+
RUN pip install --no-deps bitsandbytes==0.48.0 transformers==4.56.2 trl==0.22.2
28+
RUN pip install unsloth unsloth_zoo
29+
30+
# Launch the shell
31+
CMD ["/bin/bash"]
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# NVIDIA Forum Scraper & Fine-tuning Pipeline
2+
3+
Tools to scrape NVIDIA Developer Forum questions, enrich them using a local LLM, and fine-tune GPT-OSS-20B on NVIDIA DGX Spark hardware.
4+
5+
## Features
6+
7+
- **Scraper**: Downloads all questions with complete thread data from the NVIDIA Developer Forum.
8+
- **Dataset Creation**: Enriches raw forum threads using a local LLM to create high-quality Q&A pairs.
9+
- **Fine-tuning**: Scripts and Docker configuration to fine-tune GPT-OSS-20B using Unsloth on DGX Spark.
10+
- **Analysis**: Tools to analyze the downloaded forum data.
11+
12+
## Requirements
13+
14+
- Python 3.8+
15+
- Docker (for fine-tuning)
16+
- Access to a local LLM server (e.g., via LM Studio) for dataset enrichment
17+
18+
## Installation
19+
20+
1. **Create a virtual environment:**
21+
22+
```bash
23+
python3 -m venv .venv
24+
source .venv/bin/activate
25+
```
26+
27+
2. **Install dependencies:**
28+
29+
```bash
30+
pip install -r requirements.txt
31+
```
32+
33+
## Usage
34+
35+
### 1. Scrape Forum Questions
36+
37+
Download questions from the NVIDIA Developer Forum (DGX Spark GB10 category).
38+
39+
```bash
40+
# Basic usage (downloads to all_questions/)
41+
python download_nvidia_forum.py
42+
43+
# Custom output directory
44+
python download_nvidia_forum.py -o my_questions
45+
46+
# Adjust rate limiting (delay in seconds)
47+
python download_nvidia_forum.py -d 2.0
48+
```
49+
50+
### 2. Create & Enrich Dataset
51+
52+
Convert downloaded questions into a ShareGPT-style JSON dataset. This step uses a local LLM to clean and summarize the threads.
53+
54+
1. **Configure LLM Servers:**
55+
Ensure `llm_config.json` is configured with your local LLM endpoints (e.g., LM Studio).
56+
57+
```json
58+
{
59+
"servers": [
60+
{
61+
"url": "http://localhost:1234/v1/chat/completions",
62+
"model": "gpt-oss-20b",
63+
"timeout": 300,
64+
"max_tokens": 3000,
65+
"temperature": 0.5
66+
}
67+
]
68+
}
69+
```
70+
71+
2. **Run Dataset Creation:**
72+
73+
```bash
74+
python create_dataset.py
75+
```
76+
77+
This will process the JSON files in `all_questions/` and save the enriched dataset to `dataset/nvidia_solved_questions_enriched_llm.json`.
78+
79+
### 3. Analyze Data (Optional)
80+
81+
Analyze the downloaded questions using `analyze_questions.py`.
82+
83+
```bash
84+
# Show statistics
85+
python analyze_questions.py -s
86+
87+
# Search questions
88+
python analyze_questions.py -q "GPU"
89+
```
90+
91+
## Fine-tuning on DGX Spark
92+
93+
This section explains how to fine-tune the GPT-OSS-20B model using Unsloth on NVIDIA DGX Spark hardware using the generated dataset.
94+
95+
### 1. Build the Docker Image
96+
97+
Use the provided `Dockerfile.dgx_spark` to build the image:
98+
99+
```bash
100+
docker build -f Dockerfile.dgx_spark -t unsloth-dgx-spark .
101+
```
102+
103+
### 2. Launch the Container
104+
105+
Run the container with GPU access and volume mounts:
106+
107+
```bash
108+
docker run -it \
109+
--gpus=all \
110+
--net=host \
111+
--ipc=host \
112+
--ulimit memlock=-1 \
113+
--ulimit stack=67108864 \
114+
-v $(pwd):$(pwd) \
115+
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
116+
-w $(pwd) \
117+
unsloth-dgx-spark
118+
```
119+
120+
### 3. Run Fine-tuning
121+
122+
Inside the container, run the fine-tuning script:
123+
124+
```bash
125+
python3 finetune_gpt_oss_spark.py
126+
```
127+
128+
This script will:
129+
1. Load the `unsloth/gpt-oss-20b` model.
130+
2. Load the `dataset/nvidia_solved_questions_enriched_llm.json` dataset.
131+
3. Fine-tune the model using LoRA.
132+
4. Save the fine-tuned adapters to `lora_model/`.
133+
134+
## Export to Ollama
135+
136+
After fine-tuning, you can export the model to GGUF format and run it locally using Ollama.
137+
138+
### 1. Export to GGUF
139+
140+
Inside the Docker container (where you ran the fine-tuning), run the export script:
141+
142+
```bash
143+
python3 export_to_ollama.py
144+
```
145+
146+
This will:
147+
1. Merge the LoRA adapters with the base model.
148+
2. Convert the model to GGUF format (quantized to q4_k_m).
149+
3. Generate a `Modelfile`.
150+
4. Save the output (e.g., `gpt-oss-20b.MXFP4.gguf`) in the current directory.
151+
152+
### 2. Import to LM Studio
153+
154+
You can also import the GGUF model directly into LM Studio:
155+
156+
```bash
157+
lms import gpt-oss-20b.MXFP4.gguf
158+
```
159+
160+
### 3. Create Ollama Model
161+
162+
Once the GGUF file and Modelfile are generated, you can create the Ollama model. You can do this inside the container (if Ollama is installed) or on your host machine.
163+
164+
```bash
165+
./create_ollama_model.sh
166+
```
167+
168+
This script will:
169+
1. Detect the generated GGUF file and Modelfile.
170+
2. Run `ollama create gpt-oss-spark -f Modelfile`.
171+
172+
### 4. Run the Model
173+
174+
You can now chat with your fine-tuned model:
175+
176+
```bash
177+
ollama run gpt-oss-spark
178+
```
179+
180+
## Cleanup
181+
182+
To remove build artifacts, generated models, and temporary files (preserving datasets), run:
183+
184+
```bash
185+
./cleanup.sh
186+
```
187+
188+
## License
189+
190+
This project is provided for educational and research purposes. Please respect NVIDIA's terms of service.

0 commit comments

Comments
 (0)