xTuring/README.md at cdd5de924ef9d7d0555e42c51b7b49dc1cee0463 · stochasticai/xTuring

Fine‑tune, evaluate, and run private, personalized LLMs

xTuring makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, Qwen3, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.

Why xTuring:

Simple API for data prep, training, and inference
Private by default: run locally or in your VPC
Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
Scales from CPU/laptop to multi‑GPU easily
Evaluate models with built‑in metrics (e.g., perplexity)

⚙️ Installation

pip install xturing

Development Installation

If you want to contribute to xTuring or run from source:

# Clone the repository
git clone https://github.com/stochasticai/xturing.git
cd xturing

# Install in editable mode with development dependencies
pip install -e .
pip install -r requirements-dev.txt

# Set up pre-commit hooks (required before contributing)
pre-commit install
pre-commit install --hook-type commit-msg

🚀 Quickstart

Run a small, CPU‑friendly example first:

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel

# Load a toy instruction dataset (Alpaca format)
dataset = InstructionDataset("./examples/models/llama/alpaca_data")

# Start small for quick iterations (works on CPU)
model = BaseModel.create("distilgpt2_lora")

# Fine‑tune and then generate
model.finetune(dataset=dataset)
output = model.generate(texts=["Explain quantum computing for beginners."])
print(f"Model output: {output}")

Want bigger models and reasoning controls? Try GPT‑OSS variants (requires significant resources):

from xturing.models import BaseModel

# 120B or 20B variants; also support LoRA/INT8/INT4 configs
model = BaseModel.create("gpt_oss_20b_lora")

You can find the data folder here.

🌟 What's new?

Highlights from recent updates:

GPT‑OSS integration – Use and fine‑tune gpt_oss_120b and gpt_oss_20b with off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.

from xturing.models import BaseModel

# Use the production-ready 120B model
model = BaseModel.create('gpt_oss_120b_lora')

# Or use the efficient 20B model for faster inference
model = BaseModel.create('gpt_oss_20b_lora')

# Both models support reasoning levels via system prompts

LLaMA 2 integration – Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via GenericModel or Llama2.

from xturing.models import Llama2
model = Llama2()

## or
from xturing.models import BaseModel
model = BaseModel.create('llama2')

Evaluation – Evaluate any causal LM on any dataset. Currently supports perplexity.

# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel

# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model (try GPT-OSS for advanced reasoning)
model = BaseModel.create('gpt_oss_20b')

# Run the Evaluation of the model on the dataset
result = model.evaluate(dataset)

# Print the result
print(f"Perplexity of the evalution: {result}")

INT4 precision – Fine‑tune many LLMs with INT4 using GenericLoraKbitModel.

# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel

# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Run the fine-tuning
model.finetune(dataset)

CPU inference – Run inference on CPUs (including laptops) via Intel® Extension for Transformers, using weight‑only quantization and optimized kernels on Intel platforms.

# Make the necessary imports
from xturing.models import BaseModel

# Initializes the model: quantize the model with weight-only algorithms
# and replace the linear with Itrex's qbits_linear kernel
model = BaseModel.create("llama2_int8")

# Once the model has been quantized, do inferences directly
output = model.generate(texts=["Why LLM models are becoming so important?"])
print(output)

Batching – Set batch_size in .generate() and .evaluate() to speed up processing.

# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel

# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Generate outputs on desired prompts
 outputs = model.generate(dataset = dataset, batch_size=10)

Qwen3 0.6B supervised fine-tuning – The lightweight Qwen3 0.6B checkpoint now has first-class support (registry, configs, docs, and examples) so you can launch SFT/LoRA jobs immediately.

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel

dataset = InstructionDataset("./examples/models/llama/alpaca_data")
model = BaseModel.create("qwen3_0_6b_lora")
model.finetune(dataset=dataset)

See examples/models/qwen3/qwen3_lora_finetune.py for a runnable script.

Qwen3-Omni dataset generation – Run the multimodal checkpoint locally (download from Hugging Face) to bootstrap instruction corpora without leaving your machine.

from xturing.datasets import InstructionDataset
from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI

# Download `Qwen/Qwen3-Omni-30B-A3B-Instruct` (or another HF variant) ahead of time
engine = Qwen3OmniTextGenerationAPI(model_name_or_path="Qwen/Qwen3-Omni-30B-A3B-Instruct")
dataset = InstructionDataset.generate_dataset("./tasks.jsonl", engine=engine)

An exploration of the Llama LoRA INT4 working example is recommended for an understanding of its application.

For an extended insight, consider examining the GenericModel working example available in the repository.

CLI playground

The xturing CLI provides interactive tools for working with fine-tuned models:

# Chat with a fine-tuned model
xturing chat -m "<path-to-model-folder>"

# Launch the UI playground (alternative to programmatic Playground)
xturing ui

# Get help and see all available commands
xturing --help

UI playground

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
from xturing.ui import Playground

dataset = InstructionDataset("./alpaca_data")
model = BaseModel.create("<model_name>")

model.finetune(dataset=dataset)

model.save("llama_lora_finetuned")

Playground().launch() ## launches localhost UI

📚 Tutorials

📊 Performance

Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the Alpaca dataset for fine-tuning. The dataset contains 52K instructions.

Hardware:

4xA100 40GB GPU, 335GB CPU RAM

Fine-tuning parameters:

{
  'maximum sequence length': 512,
  'batch size': 1,
}

LLaMA-7B	DeepSpeed + CPU Offloading	LoRA + DeepSpeed	LoRA + DeepSpeed + CPU Offloading
GPU	33.5 GB	23.7 GB	21.9 GB
CPU	190 GB	10.2 GB	14.9 GB
Time/epoch	21 hours	20 mins	20 mins

Contribute to this by submitting your performance results on other GPUs by creating an issue with your hardware specifications, memory consumption and time per epoch.

📎 Fine‑tuned model checkpoints

We have already fine-tuned some models that you can use as your base or start playing with.

Loading Models

Load from xTuring hub:

from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")

Load from local directory:

model = BaseModel.load("/path/to/saved/model")

Create a new model for fine-tuning:

model = BaseModel.create("llama_lora")

Available Pre-trained Models

model	dataset	Path
DistilGPT-2 LoRA	alpaca	`x/distilgpt2_lora_finetuned_alpaca`
LLaMA LoRA	alpaca	`x/llama_lora_finetuned_alpaca`

Supported Models

Below is a list of all the supported models via BaseModel class of xTuring and their corresponding keys to load them.

Model	Key
Bloom	bloom
Cerebras	cerebras
DistilGPT-2	distilgpt2
Falcon-7B	falcon
Galactica	galactica
GPT-OSS (20B/120B)	gpt_oss_20b, gpt_oss_120b
GPT-J	gptj
GPT-2	gpt2
LLaMA	llama
LLaMA2	llama2
MiniMaxM2	minimax_m2
OPT-1.3B	opt
Qwen3-0.6B	qwen3_0_6b

The above are the base variants. Use these templates for LoRA, INT8, and INT8 + LoRA versions:

Version	Template
LoRA	<model_key>_lora
INT8	<model_key>_int8
INT8 + LoRA	<model_key>_lora_int8

To load a model’s INT4 + LoRA version, use the GenericLoraKbitModel class:

model = GenericLoraKbitModel('<model_path>')

Replace <model_path> with a local directory or a Hugging Face model like facebook/opt-1.3b.

📈 Roadmap

🧪 Running Tests

The project uses pytest for testing. Test files are located in the tests/ directory.

Run all tests:

pytest

Run a specific test file:

pytest tests/xturing/models/test_qwen_model.py

Skip slow tests:

pytest -m "not slow"

Skip GPU tests (for CPU-only environments):

pytest -m "not gpu"

Test markers used in this project:

@pytest.mark.slow - Tests that take significant time to run
@pytest.mark.gpu - Tests requiring GPU hardware

🤝 Help and Support

If you have any questions, you can create an issue on this repository.

You can also join our Discord server and start a discussion in the #xturing channel.

🏗️ Project Structure

Understanding the codebase organization:

src/xturing/
├── models/          # Model classes and registry (BaseModel, LLaMA, GPT-2, etc.)
├── engines/         # Low-level model loading, tokenization, and operations
├── datasets/        # Dataset loaders (InstructionDataset, TextDataset)
├── trainers/        # Training loops (LightningTrainer with DeepSpeed support)
├── preprocessors/   # Data preprocessing and tokenization
├── config/          # YAML configurations for finetuning and generation
├── cli/             # CLI commands (chat, ui, api)
├── ui/              # Gradio UI playground
├── self_instruct/   # Dataset generation utilities
└── utils/           # Shared utilities

tests/xturing/       # Test suite mirroring src structure
examples/            # Example scripts organized by model and feature

Key architectural patterns:

Registry Pattern: Models and engines use a registry-based factory pattern via BaseModel.create() and BaseEngine.create()
Model Variants: Each model family has multiple variants following the naming template <base>_[lora]_[int8|kbit]
- Example: llama, llama_lora, llama_int8, llama_lora_int8
Configuration: Training and generation parameters are defined in YAML files per model in src/xturing/config/
Engines: Handle the low-level operations (loading weights, tokenization, DeepSpeed integration)
Models: Provide high-level API (finetune(), generate(), evaluate(), save(), load())

📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🌎 Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.

Quick Contribution Guidelines

Important: All pull requests should target the dev branch, not main.

The project uses pre-commit hooks to enforce code quality:

black - Code formatting
isort - Import sorting (black profile)
autoflake - Remove unused imports
absolufy-imports - Convert relative to absolute imports
gitlint - Commit message linting

You can manually format code:

black src/ tests/
isort src/ tests/

Pre-commit hooks will automatically run these checks when you commit. Make sure to install them:

pre-commit install
pre-commit install --hook-type commit-msg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine‑tune, evaluate, and run private, personalized LLMs

⚙️ Installation

Development Installation

🚀 Quickstart

🌟 What's new?

CLI playground

UI playground

📚 Tutorials

📊 Performance

📎 Fine‑tuned model checkpoints

Loading Models

Available Pre-trained Models

Supported Models

📈 Roadmap

🧪 Running Tests

🤝 Help and Support

🏗️ Project Structure

📝 License

🌎 Contributing

Quick Contribution Guidelines

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Fine‑tune, evaluate, and run private, personalized LLMs

⚙️ Installation

Development Installation

🚀 Quickstart

🌟 What's new?

CLI playground

UI playground

📚 Tutorials

📊 Performance

📎 Fine‑tuned model checkpoints

Loading Models

Available Pre-trained Models

Supported Models

📈 Roadmap

🧪 Running Tests

🤝 Help and Support

🏗️ Project Structure

📝 License

🌎 Contributing

Quick Contribution Guidelines