nl2sql-semantic-eval

A unified repository for evaluating the semantic correctness of SQL queries generated from natural‑language questions.

Combines:

Execution Accuracy (EA) — compares predicted vs. gold SQL results
CodeBERT — fine‑tuned classifier for SQL correctness
Qwen2.5 — LLM‑based reasoning evaluation
Ensemble Mode — merges EA & Qwen2.5 decisions

⚙ Installation

Clone the repo

git clone https://github.com/Harrumnoor/nl2sql-semantic-eval
cd nl2sql-semantic-eval

Create & activate a virtualenv

python3 -m venv venv
source venv/bin/activate      # macOS/Linux
venv\Scripts\activate       # Windows

Install dependencies
```
pip install -r requirements.txt
```

Configure paths (export as env vars or substitute directly)

export DATA_DIR=./data
export TRAIN_PATH=$DATA_DIR/train.json
export TEST_PATH=$DATA_DIR/test.json

Usage

1. Train CodeBERT (with Optuna)

python src/train_codebert.py \
  --train_path $TRAIN_PATH \
  --model_name microsoft/codebert-base \
  --max_len 512 \
  --batch_size 8 \
  --epochs 10 \
  --trials 10

2. Evaluate with CodeBERT

python src/inference_codebert.py \
  --model_path $CODEBERT_MODEL \
  --data_path $TEST_PATH \
  --batch_size 8

3. Hybrid Evaluation (Qwen2.5 + EA Ensemble)

python src/inference_qwen.py \
  --model_path $QWEN_MODEL \
  --data_path $TEST_PATH

Dataset Format

Each JSON record must include:

{
  "question":   "Natural-language question",
  "query":      "Predicted SQL query",
  "gold_parse": "Gold SQL query",
  "correctness": "0 | 1",
  "db_id":      "Database identifier",
  "category":   "Category code",
  "sub-category":"Sub-category code"
}

Repository Structure

nl2sql-semantic-eval/
├── data/
│   ├── code-bert/
│   │   ├── train.json       # Training and validation set for CodeBERT
│   │   └── test.json        # Test set for CodeBERT inference
│   └── qwen2.5/
│       └── train.jsonl      # Dataset for Qwen2.5 inference and evaluation
│
├── src/
│   ├── train_codebert.py       # Train CodeBERT with Optuna hyperparameter search
│   ├── inference_codebert.py   # Inference and metrics for CodeBERT classifier
│   └── inference_hybrid.py     # Qwen2.5 + EA  ensemble evaluation script
│
├── .gitignore                  # Files and folders to ignore in Git
├── requirements.txt            # Python dependencies
├── LICENSE                     # License information
└── README.md                   # This file

📄 License

MIT License — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nl2sql-semantic-eval

⚙ Installation

Usage

1. Train CodeBERT (with Optuna)

2. Evaluate with CodeBERT

3. Hybrid Evaluation (Qwen2.5 + EA Ensemble)

Dataset Format

Repository Structure

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

dsg-uwaterloo/nl2sql-semantic-eval

Folders and files

Latest commit

History

Repository files navigation

nl2sql-semantic-eval

⚙ Installation

Usage

1. Train CodeBERT (with Optuna)

2. Evaluate with CodeBERT

3. Hybrid Evaluation (Qwen2.5 + EA Ensemble)

Dataset Format

Repository Structure

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages