In this this repository you will find an application designed to serve as a Data Analytics Assistant. You can chat with it in natural language and the assistant will translate it to SQL queries emerging by Chain of Thought, go to the db, execute the query and finally, answering your question in natural language.
It enables users to interact with and analyze BigQuery and MySQL databases seamlessly. The application is built using LangChain(SQL Agents), Streamlit, SQL Alchemy and Google Cloud-BigQuery.
It intends to meet the demands of those who need more general insights from the data.
- Some of those questions might be:
What are the top-performing products in the last quarter? Which marketing channels have the highest customer acquisition cost? How do sales promotions impact revenue?
Other concerns that could involve several joins and complex reasoning like churn, retention, life-time value, seasonality by region, etc might create hallucinations outcomming poor results in term of precision.
- Data Analytics Assistant : A user-friendly interface to ask in natural langauge about data.
- Database Support : Compatible with BigQuery and MySQL databases.
- Interactive UI : Built with Streamlit for an interactive user experience.
- LangChain: Open source framework for building context-aware reasoning applications, enabling almost unlimited power by large language models.
- Streamlit : Free and open-source front-end framework to rapidly build and share beautiful machine learning and data science web apps.
- SQL Alchemy : SQLAlchemy is a tool in Python that helps software developers work with databases more easily. It allows you to use Python code to run SQL queries and manage relationships between data, giving you the best of both worlds—Python's simplicity and SQL's capabilities.
git clone https://github.com/cremerf/ds_llm_sql.git .
DB_USER=""
DB_PASSWORD=""
DB_HOST= ""
DB_NAME= ""
PROJECT_ID_GCP= ""
DATASET_BQ= ""
OPENAI_API_KEY=""
Store this file in the root directory of the app
- Navigate to Google Cloud Console .
- Choose the project for which you want to create the service account.
- In the left sidebar, click on "IAM & Admin" and then select "Service accounts."
- Click on "Create Service Account."
- Fill in the required details like name, description, and click "Create."
- Assign necessary roles to the service account and click "Continue."
- Add users who can access this service account, if needed.
- Click "Done."
- In the Service accounts list, find the newly created service account.
- Click on the three-dot menu (actions) and select "Manage keys."
- Click on "Add Key" and choose "JSON."
Download the JSON key and store it in the credentials/ folder
- Download Miniconda installer
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
Install it in your home user
- Create a new environment with Python 3.11:
conda create --name ds_llm_analytics python=3.11
pip install -r requirements.txt
By default, the installed dependencies will only run with BigQuery databases. If you wish to interact with MySQL DB, create a conda env using this yml file: ds_llm_sql_92023.yml
conda env create --file docs/ds_llm_sql_92023.yml
streamlit run app.py
This will launch the Streamlit app in your default web browser. Follow the on-screen instructions to interact with BigQuery or MySQL databases.---
Feel free to fork this repository to better suit the actual content and functionality of your use case.