This project converts natural language questions into syntactically correct MySQL queries using Groq LLaMA-3. It ensures schema-aware, validated SQL generation for business analysts, developers, and data teams.
- 🗣️ Natural language to SQL via Groq API (LLaMA 3)
- 📚 Uses real database schema for accurate generation
- ✅ Validates SQL before execution (SELECT check, UNION mismatch, etc.)
- 🧪 Includes test cases to validate prompt accuracy
- 🔌 Modular structure for extensibility and future UI/API integration
mysql_nl_chatbot/
├── app.py # Main CLI chatbot interface
├── sql_generator.py # Core logic to generate and validate SQL from prompts
├── prompt_builder.py # Builds structured prompts for LLM using schema
├── schema_reader.py # Loads schema info from DB or source
├── schema_formatter.py # Formats schema as readable input for LLM
├── result_formatter.py # Formats and prints SQL output
├── query_runner.py # (Optional) Executes SQL on MySQL DB
├── test_question.py # Automated prompt testing for robustness
├── test_mysql_connection.py # Verifies DB connection setup
├── .env # Stores GROQ_API_KEY
├── .gitignore # Excludes pycache, .env, venv, etc.
├── requirements.txt # All Python dependencies
├── README.md # This file
└── __pycache__/ # Auto-generated Python cache files
git clone https://github.com/darunnatarajan/Chat_with_DB.git
cd mysql_nl_chatbot# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtCreate a .env file:
GROQ_API_KEY=your_actual_groq_api_key_hereMake sure your MySQL database includes the following tables and structure:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
signup_date DATE
);
CREATE TABLE invoices (
invoice_id INT PRIMARY KEY,
subscription_id INT,
invoice_date DATE,
total_amount DECIMAL(10,2),
tax_amount DECIMAL(10,2)
);
CREATE TABLE payments (
payment_id INT PRIMARY KEY,
invoice_id INT,
payment_date DATE,
amount_paid DECIMAL(10,2),
payment_method VARCHAR(50)
);
CREATE TABLE subscriptions (
subscription_id INT PRIMARY KEY,
customer_id INT,
plan_name VARCHAR(100),
amount DECIMAL(10,2),
start_date DATE,
end_date DATE
);python app.pyYou’ll be prompted:
Ask a question in natural language (or type 'exit' to quit):
Example questions:
- "List all customers who joined after 2023"
- "Show total payments made for each invoice"
- "Get subscriptions that ended this year"
Run tests to validate common prompts and catch Groq output errors:
python test_question.pyThis checks:
- Prompt-to-SQL generation
- Valid SELECT presence
- UNION column count mismatch
- Empty response detection
sql_generator.py: Sends prompt to Groq, cleans + validates SQLprompt_builder.py: Constructs LLM prompt using schematest_question.py: Runs a suite of prompt inputs and checks SQL outputschema_reader.py: (Extensible) Can later pull schema from MySQL DBquery_runner.py: (Optional) Executes SQL usingmysql-connector-pythonresult_formatter.py: Pretty prints or formats query output
- Refine system prompt in
sql_generator.pyfor stricter schema rules - Add more edge case prompts in
test_question.py - Enable JOIN validation logic
- Implement fuzzy table/column matching warning
- ✅ Streamlined schema loading from MySQL
- 🌐 Web interface using Streamlit or FastAPI
- 📈 SQL visualization / preview pane
- 🧩 Plugin architecture for Postgres or SQLite
Built by [Darun Natarajan] – [LinkedIn]
MIT License