A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving - Experiment Resources 🤖 🔄 ⚙️
This repository contains the companion material for the following publication:
Timo Pierre Schrader, Lukas Lange, Tobias Kaminski, Simon Razniewski, Annemarie Friedrich. A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving AAAI 2026.
Please cite this paper if using the dataset or the code, and direct any questions regarding the dataset or code to
timo [DOT] schrader [AT] de [DOT] bosch [DOT] com.
This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.
In our paper, we propose a novel solver-in-the-loop method for automatically generating silver standard training data for answer set programming (ASP) code.
For that, we first start by processing a combinatorial problem, formulated in natural language, using an LLM by prompting it to create
Models trained on this silver standard training data exhibit superior performance on the test split of LogicPuzzles as well as the harder GridPuzzles dataset with only minimal prompt engineering effort necessary.
We propose to also use the solver-in-the-loop setup during inference.
For that, we introduce a problem-specific reward function
Using this reward function, we obtain a search tree during inference, which we can also use for backtracking if all alternatives were judged negatively according to
We encourage the interested reader to introduce novel reward functions for other types of problems that can be solved using ASP. 😊
Please download the following repositories and place them into 3rd_party_repos:
- gpt-asp-rules:
4d3357830175cbf4d70baccf2ff5a9587d74b519 - SAT-LM:
3e4eead883f17e6e30d1254fc7bb8ba571fe973b - GridPuzzle:
cc1631eff00629680937bd8bfc419eeb53de8652
We used the following two models for automatically obtaining ASP training data:
- meta-llama/Llama-3.3-70B-Instruct (snapshot: 6f6073b423013f6a7d4d9f39144961bfbfbc386b)
- Qwen/Qwen3-32B (snapshot: 9216db5781bf21249d130ec9da846c4624c16137)
We provide two distinct environments for separate creation - one for all training and inference scripts and one for the vllm model inference server, which our scripts will send the calls to.
Setup the Python virtual environment using conda by executing conda env create -f <FILE_NAME>.yml. Additionally, to be able to use flash attention, please execute the following command after having created and activated the conda environment: pip install flash-attn==2.6.3
You might additionally add the root folder of this project to the $PYTHONPATH environment variable. This enables all scripts to automatically find the imports.
IMPORTANT: Especially on GridPuzzles, it can happen that LLMs generated broken ASP encoding, leading to a complete crash of the solver, thereby destroying the experimental run.
To fix that, we provide a small one-line patch in src/clingo_patch/README.md that patches the solver to not panic due to a bad encoding.
Our code will process this error thrown by clingo as error.
Please refer to the README in order to patch the clingo executable.
As our experiments work in a client-sever fashion, it is necessary to configure the (local) IP addresses of the LLM-hosting servers. A template .env file is provided in .env_template.
Please copy this file, rename it to .env and enter all necessary values. In our experiments, we used local vllm deployments as well as Azure-based GPT deployments. The authentication with Azure was done using Kerberos tokens.
We provide a vllm starter script in scripts/start_vllm_server.sh. Please adapt the indicated settings as wanted. The resulting URL can then be inserted into the created .env file so that our experimental scripts can easily find this server.
We provide code and experiments for multiple fine-tuning and prompt-based approaches.
Runner scripts are available in scripts/ which should work out of the box.
The scripts add the path of the root folder of this repo to the $PYTHONPATH env variable.
If that somehow fails, please manually add this repo to your $PYTHONPATH.
Besides the API-based GPT-4.1 mini (version 2025-04-14), we use the following open-weight LLMs in our experiments:
In order to reproduce the numbers of our papers, we provide the list of all hyperparameters. We used Nvidia H200 GPUs for LoRA-based fine-tuning and Nvidia GPUs as well as Intel Gaudi 2 accelerator cards to perform inference with vllm.
| SFT | DPO | |
|---|---|---|
| Learning Rate | ||
| Batch Size |
|
|
| #GPUs (Zero3) | ||
| LoRA Rank | ||
| LoRA |
||
| Epochs | $ \leq 5$ |
The code for evaluating the fine-tuned models as well as for evaluating the out-of-the-box LLMs is located in src/experiments. Every script defines a set of command line arguments that should be provided. Alternatively, you can also use the runner scripts provided in scripts/experiments. Again, if something isn't clear yet, please feel free to ask at any time!
This software is open-sourced under the AGPL-3.0 license. See the LICENSE file for details. The ASP preference dataset is released under the CC BY-SA 4.0 license. See the LICENSE file for details. For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.
If you use our software or dataset in your scientific work, please cite our paper:
@misc{schrader2025solverintheloopframeworkimprovingllms,
title={A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving},
author={Timo Pierre Schrader and Lukas Lange and Tobias Kaminski and Simon Razniewski and Annemarie Friedrich},
year={2025},
eprint={2512.17093},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.17093},
}

