A unified repository for evaluating the semantic correctness of SQL queries generated from natural‑language questions.
Combines:
- Execution Accuracy (EA) — compares predicted vs. gold SQL results
- CodeBERT — fine‑tuned classifier for SQL correctness
- Qwen2.5 — LLM‑based reasoning evaluation
- Ensemble Mode — merges EA & Qwen2.5 decisions
-
Clone the repo
git clone https://github.com/Harrumnoor/nl2sql-semantic-eval cd nl2sql-semantic-eval
-
Create & activate a virtualenv
python3 -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Configure paths (export as env vars or substitute directly)
export DATA_DIR=./data export TRAIN_PATH=$DATA_DIR/train.json export TEST_PATH=$DATA_DIR/test.json
python src/train_codebert.py \
--train_path $TRAIN_PATH \
--model_name microsoft/codebert-base \
--max_len 512 \
--batch_size 8 \
--epochs 10 \
--trials 10
python src/inference_codebert.py \
--model_path $CODEBERT_MODEL \
--data_path $TEST_PATH \
--batch_size 8
python src/inference_qwen.py \
--model_path $QWEN_MODEL \
--data_path $TEST_PATH
Each JSON record must include:
{
"question": "Natural-language question",
"query": "Predicted SQL query",
"gold_parse": "Gold SQL query",
"correctness": "0 | 1",
"db_id": "Database identifier",
"category": "Category code",
"sub-category":"Sub-category code"
}
nl2sql-semantic-eval/
├── data/
│ ├── code-bert/
│ │ ├── train.json # Training and validation set for CodeBERT
│ │ └── test.json # Test set for CodeBERT inference
│ └── qwen2.5/
│ └── train.jsonl # Dataset for Qwen2.5 inference and evaluation
│
├── src/
│ ├── train_codebert.py # Train CodeBERT with Optuna hyperparameter search
│ ├── inference_codebert.py # Inference and metrics for CodeBERT classifier
│ └── inference_hybrid.py # Qwen2.5 + EA ensemble evaluation script
│
├── .gitignore # Files and folders to ignore in Git
├── requirements.txt # Python dependencies
├── LICENSE # License information
└── README.md # This file
MIT License — see LICENSE.