OptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based Distillation
Haochen Yang1,*
·
Ke Zhao1,*
·
Mengyuan Ma1
·
Xingyu Lu3
Xiangfeng Wang1
·
Hong Qian1,2,†
1East China Normal University | 2Shanghai Innovation Institute | 3AntGroup
OptSkills is a skill enhanced agent system for natural-language optimization tasks. It uses a function-calling LLM agent to formulate and solve problems with Python solver backends, while maintaining reusable optimization skills.
Gurobi requires a valid license. See Gurobi Academic Licensing.
- Python 3.12.12
- Conda
#clone this repository
cd OptSkills
#conda create
conda env create -f environment.yml
conda activate optskills
# Linux/macOS
cp .env.example .env
# Windows PowerShell
# Copy-Item .env.example .env| Variable | Meaning | Example |
|---|---|---|
OPTSKILL_BASE_URL |
OpenAI-compatible chat-completions API base URL. | https://api.deepseek.com/v1 |
OPTSKILL_API_KEY |
API key for the chat model. | sk-... |
OPTSKILL_MODEL |
Chat model identifier exposed by the provider. | deepseek-chat |
OPTSKILL_EMBED_BASE_URL |
OpenAI-compatible embeddings API base URL. | https://api.openai.com/v1 |
OPTSKILL_EMBED_API_KEY |
API key for the embedding endpoint. | sk-... |
OPTSKILL_EMBED_MODEL |
Embedding model identifier. | text-embedding-3-large |
OptSkills/
|-- main.py # Command-line entry point
|-- environment.yml # Reproducible Conda environment and solver dependencies
|-- agents/ # Function-calling agent and rollout orchestration
|-- llm/ # Chat and embedding API clients
|-- pipeline/ # Stage runners, dataset loading, outputs, and resume logic
|-- prompts/ # Extractor, solver-agent, and skill prompts
|-- skill_core/ # Skill extraction, clustering, selection, and refinement
|-- tools/ # Tool registration and run_code exposure
|-- utils/ # Logging, parsing, code execution, and skill syntax utilities
|-- lists/
| `-- solvers/ # Solver catalog exposed to the rollout agent
| |-- ortools/
| `-- pyomo/
|-- skill_library/ # Released skill libraries
| |-- skill_library_cluster/
| |-- skill_library_learned/
| `-- skill_library_nanoco_learned/
`-- datasets/
|-- train_set/ # Training datasets, including Nano-CO
`-- benchmark/ # Evaluation benchmarks, including MIPLIB-NL
The following command takes the first 150 instances of optmath-train-300.jsonl as the
clustering subset and saves the split definition into the run state.
python main.py --phase cluster \
--data datasets/train_set/optmath-train-300.jsonl \
--run-dir outputs/optmath_train/cluster \
--cluster-size 150 \
--cluster-eps 0.05 \
--cluster-min-samples 1 \
--cluster-workers 8 \
--analysis-workers 8 \
--builder-workers 4 \
--archetype-fusion-alpha 0.55 \
--top-k 3 \
--timeout 120 \
--agent-max-turns 12 \
--resumeRelevant controls:
| Argument | Meaning |
|---|---|
--cluster-size |
Number of training instances assigned to initial clustering. |
--cluster-eps, --cluster-min-samples |
DBSCAN parameters over archetype embeddings. |
--archetype-fusion-alpha |
Fusion weight used when constructing archetype embeddings. |
--cluster-workers |
Concurrent initial trajectory collection workers. |
--analysis-workers |
Concurrent trajectory analyses within skill distillation. |
--builder-workers |
Concurrent cluster-to-skill draft builders; library writes remain deterministic and serialized. |
When --parent-run-dir is used with the same training JSONL, learning
automatically processes only the items not assigned to the cluster subset.
For the command above, that is the remaining 150 OptMath instances.
python main.py --phase learning \
--data datasets/train_set/optmath-train-300.jsonl \
--run-dir outputs/optmath_learning \
--parent-run-dir outputs/optmath_train/cluster \
--top-k 3 \
--timeout 120 \
--agent-max-turns 12 \
--resumeTo start learning from an arbitrary released library rather than a cluster run,
provide --source-skill-library instead:
python main.py --phase learning \
--data datasets/train_set/optmath-train-300.jsonl \
--run-dir outputs/optmath_learning \
--source-skill-library skill_library/skill_library_cluster \
--resumepython main.py --phase eval \
--data datasets/benchmark/optmath_bench.jsonl \
--run-dir outputs/eval/optmath \
--source-skill-library outputs/optmath_learning/skill_library \
--eval-workers 8 \
--top-k 3 \
--timeout 120 \
--agent-max-turns 12 \
--resumeIf --source-skill-library is omitted, evaluation uses
skill_library/skill_library_learned.
Each --run-dir stores trajectories.jsonl, resume.json, and runtime_logs/.
The cluster and learning phases additionally store their current
skill_library/, while learning saves committed skill states under
checkpoints/. With --resume, learning restores the latest committed
checkpoint before processing remaining samples.
Evaluate MIPLIB-NL using a released library:
python main.py --phase eval \
--miplib-nl-bench \
--data datasets/benchmark/MIPLIB-NL/dataset \
--run-dir outputs/eval/miplib_nl \
--source-skill-library skill_library/skill_library_learned \
--eval-workers 8 \
--resume| Library | Description |
|---|---|
skill_library/skill_library_cluster |
Initial skills distilled from clustered solved trajectories. |
skill_library/skill_library_learned |
Default learned library for standard evaluation. |
skill_library/skill_library_nanoco_learned |
Learned library extended with Nano-CO trajectories. |
OptSkills performs remarkably across 5 benchmarks.
Comparison of the SA metric (Pass@1) across five benchmarks. OptSkills-DeepSeek and OptSkills-Qwen are based on DeepSeek-V3.2 and Qwen3-235B-A22b-instruct-2507, respectively. Bold indicates 1st, italic indicates 2nd, and underline indicates 3rd.
| Category | Models / Methods | Macro-Avg. | Micro-Avg. | Rank | OptiBench | Mamo.Complex | OptMATH | IndustryOR | ComplexOR |
|---|---|---|---|---|---|---|---|---|---|
| General Models | GPT-5.4 | 51.71 | 57.82 | 6 | 69.09 | 47.39 | 46.39 | 29.00 | 66.67 |
| General Models | Gemini-3.1-Pro | 53.58 | 57.88 | 5 | 66.56 | 46.45 | 54.22 | 34.00 | 66.67 |
| General Models | Qwen3-235B | 42.63 | 52.36 | 7 | 63.80 | 41.71 | 39.76 | 29.00 | 38.89 |
| General Models | DeepSeek-V3.2 | 43.21 | 49.45 | 9 | 62.64 | 27.49 | 40.36 | 30.00 | 55.56 |
| Agent-based Methods | Chain-of-Experts | 38.84 | 52.36 | 8 | 62.31 | 51.66 | 35.54 | 28.00 | 16.67 |
| Agent-based Methods | OptiMUS | 34.14 | 45.18 | 13 | 60.99 | 27.49 | 22.89 | 26.00 | 33.33 |
| Agent-based Methods | ORMind | 36.69 | 47.37 | 11 | 61.82 | 34.60 | 21.69 | 32.00 | 33.33 |
| Agent-based Methods | ORThought | 41.92 | 46.91 | 12 | 57.69 | 38.86 | 26.51 | 31.00 | 55.56 |
| Agent-based Methods | LEAN-LLM-OPT | 41.23 | 49.27 | 10 | 62.15 | 43.13 | 22.89 | 28.00 | 50.00 |
| Agent-based Methods | AlphaOPT | 50.92 | 58.55 | 4 | 70.25 | 54.50 | 36.75 | 32.00 | 61.11 |
| Skill-based Methods | Trace2Skill | 56.97 | 63.46 | 2 | 75.21 | 54.03 | 49.40 | 34.00 | 72.22 |
| Skill-based Methods | OptSkills-Qwen | 54.64 | 61.46 | 3 | 71.74 | 53.55 | 54.22 | 27.00 | 66.67 |
| Skill-based Methods | OptSkills-DeepSeek | 62.04 | 68.27 | 1 | 77.02 | 63.51 | 61.45 | 36.00 | 72.22 |
=======
If you find this repository useful in your research, please cite:
@misc{yang2026optskillslearninggeneralizableoptimization,
title={OptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based Distillation},
author={Haochen Yang and Ke Zhao and Mengyuan Ma and Xingyu Lu and Xiangfeng Wang and Hong Qian},
year={2026},
eprint={2605.29829},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.29829},
}