Skip to content

Latest commit

 

History

History
181 lines (115 loc) · 3.99 KB

File metadata and controls

181 lines (115 loc) · 3.99 KB

AI SQL Data Analyst AgentScreenshot (390)

An AI-powered SQL agent that converts natural language questions into SQL queries and analyzes a relational database.

This project uses LangChain and Google Gemini to build an intelligent data analyst capable of querying the Chinook music database.

Users can ask questions like:

  • "Which artist has the most albums?"
  • "Top 5 customers by spending"
  • "Which genre has the most tracks?"

The AI agent automatically generates SQL, executes the query, and returns the results.


Features

  • Natural Language → SQL query generation
  • Automatic SQL execution
  • Multi-table SQL reasoning
  • AI-powered data insights
  • Interactive command line interface
  • Works with SQLite databases

Tech Stack

  • Python
  • LangChain
  • Google Gemini (gemini-2.5-flash)
  • SQLite
  • Chinook Sample Database

Project Structure

ai-sql-data-analyst-agent
│
├── chinook.db
├── sql_agent.py
├── requirements.txt
└── README.md

Setup

1 Clone Repository

git clone https://github.com/yourusername/ai-sql-data-analyst-agent.git
cd ai-sql-data-analyst-agent

2 Create Virtual Environment

python -m venv venv

Activate environment

Windows

venv\Scripts\activate

Mac/Linux

source venv/bin/activate

3 Install Dependencies

pip install -r requirements.txt

4 Set Gemini API Key

Get an API key from:

https://makersuite.google.com/app/apikey

Windows

set GOOGLE_API_KEY=your_api_key

Mac/Linux

export GOOGLE_API_KEY=your_api_key

Run the SQL Agent

python sql_agent.py

You will see a prompt where you can ask questions about the database.


Example Questions

Try asking:

List all tables in the database
How many albums are there?
How many artists exist?
Which artist has the most albums?
Which genre has the most tracks?
Top 5 customers by spending
Which artist generated the most revenue?
Show the first 10 tracks
Which country has the most customers?
Which album has the most songs?
What is the longest track in the database?

Example Workflow

User Question

Which artist has the most albums?

Generated SQL

SELECT Artist.Name, COUNT(Album.AlbumId)
FROM Artist
JOIN Album
ON Artist.ArtistId = Album.ArtistId
GROUP BY Artist.Name
ORDER BY COUNT(*) DESC
LIMIT 1;

Example Output

Iron Maiden — 21 albums

Dataset

This project uses the Chinook Database, a sample database representing a digital music store.

Tables include:

  • Artist
  • Album
  • Track
  • Customer
  • Invoice
  • Playlist
  • Genre

Repository: https://github.com/lerocha/chinook-database


Future Improvements

  • Streamlit dashboard
  • Data visualization
  • CSV + SQL hybrid AI agent
  • Query optimization tools
  • Multi-database support

Author

AI Agent project demonstrating Natural Language → SQL querying using LLMs (LangChain + Gemini).