SpecFix

SpecFix is a Python toolkit for repairing ambiguous programming problems by combining large language models with differential testing. It detects inconsistencies in task specifications, clusters candidate solutions by behaviour, and iteratively proposes refined requirements that better match the intended solution space.

This repository provides the official implementation and artifact for the ASE 2025 paper Automated Repair of Ambiguous Problem Descriptions for LLM-Based Code Generation.

Features

Detect specification ambiguity via clustering of LLM-generated candidate programs.
Repair requirements with automated prompt sequencing and requirement refinement.
Evaluate repaired specifications using pass@k, majority voting, and behavioural metrics.
Parallel processing pipeline that scales across CPU cores for generation and testing.

Repository Layout

specfix/: Core library (generation, clustering, evaluation, utilities, prompts).
datasets/: JSONL datasets used in the paper and experiments.
requirements.txt: Python dependencies needed for running SpecFix.

Requirements

Python 3.10 or newer.
Access to one or more supported chat-completion APIs (OpenAI-compatible endpoints).
System packages required by evalplus for executing Python reference solutions.

Install Python dependencies with:

pip install -r requirements.txt

Installation

git clone https://github.com/msv-lab/SpecFix.git
cd SpecFix
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
pip install -r requirements.txt

Running SpecFix

The entry point is specfix/main.py, which orchestrates generation, clustering, detection, and repair for every problem in a dataset. Use the module form to ensure the package resources resolve correctly:

python -m specfix.main \
  -d humaneval \
  -p datasets/humaneval.jsonl \
  -m qwen2.5-coder-7b-instruct

Command-Line Arguments

Flag	Required	Description
`-d`, `--dataset`	✓	Dataset identifier used for bookkeeping (e.g., `humaneval`, `mbpp`, `livecodebench`).
`-p`, `--path`	✓	Path to the JSONL file containing problems to process. The format matches the files in `dataset/`.
`-m`, `--model`	✓	Model name understood by the configured OpenAI-compatible endpoint (e.g., `gpt-4o`, `qwen2.5-coder-7b-instruct`).
`-t`, `--temperature`		Optional sampling temperature used when generating code (defaults to provider behaviour).
`-c`, `--cluster_sample_size`		Number of candidate programs generated per requirement during ambiguity detection (default `20`).
`-e`, `--evaluation_sample_size`		Number of test cases generated when evaluating repaired requirements (default `10`).
`-k`, `--passk`		Pass@k threshold used by the evaluator when computing metrics (default `1`).
`-w`, `--workers`		Number of worker processes for parallel execution. Defaults to sequential execution when omitted.

Example

python -m specfix.main \
  -d livecodebench \
  -p datasets/livecodebench.jsonl \
  -m deepseek-v3 \
  -t 0.2 \
  -c 24 \
  -e 12 \
  -k 5 \
  -w 8

Outputs

Results are written to specfix/Results/<model>/<timestamp>/<dataset>.jsonl.
Each entry includes the original requirement, detected clusters, repaired requirement (if produced), and summary metrics (passk, avg_pass_rate, majority voting outcome).
Intermediate logs include generated code snippets, test cases, and failure diagnostics to support manual analysis.

Datasets

Bundled datasets (dataset/*.jsonl) follow the structure documented in dataset/README.md and include fields such as requirement, entry_point, input_output_examples, inputs, and outputs.

Prompt Templates

Prompt templates for generation, testing, and requirement refinement live in specfix/prompting.py.
Modify these templates to experiment with alternative prompting strategies or to port SpecFix to new instruction-tuned models.

Contributing

Contributions, bug reports, and feature requests are welcome! Please open issues or submit pull requests.

Citation

If you use SpecFix or build upon this work, please cite:

Automated Repair of Ambiguous Problem Descriptions for LLM-Based Code Generation, Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025).

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
datasets		datasets
experiment		experiment
experiment_results		experiment_results
specfix		specfix
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpecFix

Table of Contents

Features

Repository Layout

Requirements

Installation

Running SpecFix

Command-Line Arguments

Example

Outputs

Datasets

Prompt Templates

Contributing

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

msv-lab/SpecFix

Folders and files

Latest commit

History

Repository files navigation

SpecFix

Table of Contents

Features

Repository Layout

Requirements

Installation

Running SpecFix

Command-Line Arguments

Example

Outputs

Datasets

Prompt Templates

Contributing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages