Skip to content

boschresearch/solver-in-the-loop-llm-asp

Repository files navigation

A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving - Experiment Resources 🤖 🔄 ⚙️

This repository contains the companion material for the following publication:

Timo Pierre Schrader, Lukas Lange, Tobias Kaminski, Simon Razniewski, Annemarie Friedrich. A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving AAAI 2026.

Please cite this paper if using the dataset or the code, and direct any questions regarding the dataset or code to timo [DOT] schrader [AT] de [DOT] bosch [DOT] com.

Purpose of this Software

This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.

Methodological Overview 💡

In our paper, we propose a novel solver-in-the-loop method for automatically generating silver standard training data for answer set programming (ASP) code. For that, we first start by processing a combinatorial problem, formulated in natural language, using an LLM by prompting it to create $n$ alternative ASP encodings. We do this once for every single problem description, choice rule, and constraints/rules. Now, for the set of $n$ alternatives, we obtain feedback from the external ASP solver, in our case clingo based on whether a partial ASP encoding produces errors, is unsatisfiable, or produdes the actual correct answer. We can then categorize each generated statement into either chosen or rejected. Furthermore, since ASP is declarative, we can combine partial encodings into longer ones, thereby obtaining more diverse training data.

Models trained on this silver standard training data exhibit superior performance on the test split of LogicPuzzles as well as the harder GridPuzzles dataset with only minimal prompt engineering effort necessary.

Test-Time Additions ➕

We propose to also use the solver-in-the-loop setup during inference. For that, we introduce a problem-specific reward function $f_{reward}$ that aims at judging how "promising" a generated alternative is. In our problem setting, we decide to prefer partial encodings that produce as few answers as possible as we want to arrive at a single answer in the end when having generated the complete ASP encoding for the problem, i.e., problem descriptions, choice rule, and all constraints/rules were processed.

Using this reward function, we obtain a search tree during inference, which we can also use for backtracking if all alternatives were judged negatively according to $f_{reward}$, even after re-generating another $2 \cdot n$ alternatives.

We encourage the interested reader to introduce novel reward functions for other types of problems that can be solved using ASP. 😊

3rd Party Repos and Data

Please download the following repositories and place them into 3rd_party_repos:

  • gpt-asp-rules: 4d3357830175cbf4d70baccf2ff5a9587d74b519
  • SAT-LM: 3e4eead883f17e6e30d1254fc7bb8ba571fe973b
  • GridPuzzle: cc1631eff00629680937bd8bfc419eeb53de8652

Data Generation Info 🗂️

We used the following two models for automatically obtaining ASP training data:

Running the Code </> 💻

Setting up the Python Environments 🐍

We provide two distinct environments for separate creation - one for all training and inference scripts and one for the vllm model inference server, which our scripts will send the calls to.

Setup the Python virtual environment using conda by executing conda env create -f <FILE_NAME>.yml. Additionally, to be able to use flash attention, please execute the following command after having created and activated the conda environment: pip install flash-attn==2.6.3

You might additionally add the root folder of this project to the $PYTHONPATH environment variable. This enables all scripts to automatically find the imports.

IMPORTANT: Especially on GridPuzzles, it can happen that LLMs generated broken ASP encoding, leading to a complete crash of the solver, thereby destroying the experimental run. To fix that, we provide a small one-line patch in src/clingo_patch/README.md that patches the solver to not panic due to a bad encoding. Our code will process this error thrown by clingo as error. Please refer to the README in order to patch the clingo executable.

Configuring the Environment Variables

As our experiments work in a client-sever fashion, it is necessary to configure the (local) IP addresses of the LLM-hosting servers. A template .env file is provided in .env_template. Please copy this file, rename it to .env and enter all necessary values. In our experiments, we used local vllm deployments as well as Azure-based GPT deployments. The authentication with Azure was done using Kerberos tokens.

Starting a vllm server

We provide a vllm starter script in scripts/start_vllm_server.sh. Please adapt the indicated settings as wanted. The resulting URL can then be inserted into the created .env file so that our experimental scripts can easily find this server.

Experiments

We provide code and experiments for multiple fine-tuning and prompt-based approaches. Runner scripts are available in scripts/ which should work out of the box. The scripts add the path of the root folder of this repo to the $PYTHONPATH env variable. If that somehow fails, please manually add this repo to your $PYTHONPATH.

Large Language Models

Besides the API-based GPT-4.1 mini (version 2025-04-14), we use the following open-weight LLMs in our experiments:

Reproducibility for Fine-Tuning Approaches

In order to reproduce the numbers of our papers, we provide the list of all hyperparameters. We used Nvidia H200 GPUs for LoRA-based fine-tuning and Nvidia GPUs as well as Intel Gaudi 2 accelerator cards to perform inference with vllm.

SFT DPO
Learning Rate $5e-05$ $5e-06$
Batch Size $4$ / GPU $1$ / GPU
#GPUs (Zero3) $2 / 4$ $2 / 4$
LoRA Rank $128$ $128$
LoRA $\alpha$ $128$ $128$
Epochs $\leq 10$ $ \leq 5$

Evaluation

The code for evaluating the fine-tuned models as well as for evaluating the out-of-the-box LLMs is located in src/experiments. Every script defines a set of command line arguments that should be provided. Alternatively, you can also use the runner scripts provided in scripts/experiments. Again, if something isn't clear yet, please feel free to ask at any time!

License

This software is open-sourced under the AGPL-3.0 license. See the LICENSE file for details. The ASP preference dataset is released under the CC BY-SA 4.0 license. See the LICENSE file for details. For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Citation

If you use our software or dataset in your scientific work, please cite our paper:

@misc{schrader2025solverintheloopframeworkimprovingllms,
      title={A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving},
      author={Timo Pierre Schrader and Lukas Lange and Tobias Kaminski and Simon Razniewski and Annemarie Friedrich},
      year={2025},
      eprint={2512.17093},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.17093},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published