Database Awareness - Chat #1679

ngafar · 2025-04-25T14:04:31Z

Description

The Mito AI is now aware of database connections, and can write SQL queries.

For now this is limited to chat, and the connections must be hardcoded.

Testing

In the .mito/db dir add the new connections.json and schemas.json files.
Then start a Jupyter sever, and ask a question that required a db connection.

Be sure to ask about data that is relevant to a table, but does not exist.

Documentation

N/A - We should add documentation, but after we add the taskpane to add new db connections.

vercel · 2025-04-25T14:04:35Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
monorepo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 13, 2025 5:04pm

aarondr77 · 2025-04-25T20:16:39Z

The schema has column headers in uppercase, so if I sk it to write code like this: How does the location of a customer effect their likelihood to churn?

It ends up referencing the columns in pandas with all uppercase, even though the resulting dataframe has lowercase. If snowflake sql-alchemy always ends up with lowercase dataframe headers, (idk if this is the case), then we should update the schema probably.

import pandas as pd
from sqlalchemy import create_engine

# Database connection setup
user = "XXX"
password = "XXX"
account = "XXX"
warehouse = "COMPUTE_WH"
database = "TELCO_CHRUN"
schema = "PUBLIC"

conn_str = (
    f"snowflake://{user}:{password}@{account}/"
    f"{database}/{schema}?warehouse={warehouse}"
)
engine = create_engine(conn_str)

# Query for churn status and location data
query = """
SELECT 
    l.COUNTRY,
    l.STATE,
    l.CITY,
    s.CHURN_LABEL
FROM LOCATION_DATA l
JOIN STATUS_ANALYSIS s
ON l.CUSTOMER_ID = s.CUSTOMER_ID
"""

df = pd.read_sql(query, engine)

# Analyze churn rate by state (you can modify to COUNTRY or CITY)
churn_by_state = (
    df.groupby('STATE')['CHURN_LABEL']
    .mean()
    .sort_values(ascending=False)
    .reset_index()
    .rename(columns={'CHURN_LABEL': 'churn_rate'})
)
churn_by_state

mito-ai/mito_ai/prompt_builders/prompt_constants.py

mito-ai/mito_ai/prompt_builders/chat_system_message.py

aarondr77 · 2025-04-25T21:16:04Z

I made a bit bigger of a schema (still I think a lot smaller than real schemas our users will have) and the performance is definetly getting worse. I'll send you the schema so you can check it out.

I asked Claude to generate 12 questions that I could ask about the data and here is how it did:

Prompts it responded with something like 'no data available'

Which emplpoyee has the highest commission percentage
How many people are we paying more than 100k per year ?
Compare the sales performance of our top 5 products across different regions for the past 12 months.
Which marketing campaigns delivered the highest ROI last year, and what channels were most effective for each?

Prompts it asked a targetted question about:

Which departments have exceeded their annual budget allocations, and by how much? => "Is your budget/spending data stored in the database? If so, should I query the table FINANCE_DB.PUBLIC.tbl_budget_2024? Or is this data in one of your files (e.g., Prospects_16042025_500linesdataset.xlsx/csv), and if so, which one?"
How has our S&P 500 stock portfolio performed compared to the sector averages over the past 3 years? => "What constitutes "your portfolio"? (e.g., do you have a list of stock symbols and weights, or is it an equally weighted portfolio of all S&P 500 stocks?) Should sector averages be equally weighted or weighted by market cap or sector weights?"

Prompts it identified the correct tables for

Find all of the emplpoyees who have worked at the company for more than 1 year
Find the customer in each state who has made the single biggest purchase from our enterprise data
What's the average monthly revenue per customer in our telecom business, broken down by service type?
What's our customer churn rate for the telecom business, and how does it correlate with service plan types and monthly charges?
Show me a breakdown of IoT device performance by location, highlighting any devices with abnormal temperature readings.
Which employees have changed departments in the past year, and what was the impact on their salary?

…to db-awareness-chat-update

…conventions

aarondr77

I'm noticing that the tool is struggling to know when to query the database more often then I anticipated based on the evals. For example:

Unless I am very specific, it often tries to import data from a file (even if that file doesn't exist). For example: Whenever I say "Import the SP500 dataset", it tries to "sp500_df = pd.read_csv('SP500.csv')". But if I say "Import the SP500 stock dataset", then it will choose to import.

Can we turn this into an eval so we can iterate how we handle this? Maybe we should give it more clear instructions about how to check if it should query the database or not.

mito-ai/setup.py

mito-ai/mito_ai/prompt_builders/prompt_constants.py

ngafar added 3 commits April 24, 2025 16:38

Updated system prompt

4d3fe80

Added SQLAlchemy to requirements

8811bb7

Added instructions on querying

12ce26a

vercel bot deployed to Preview April 25, 2025 14:04 View deployment

ngafar requested a review from aarondr77 April 25, 2025 15:40

ngafar changed the title ~~[WIP] Database Awareness - Chat~~ Database Awareness - Chat Apr 25, 2025

ngafar added 2 commits April 25, 2025 14:19

Moved db rules to constants

8fbbc85

Moved all db rules stuff into one func

2ee7a13

vercel bot deployed to Preview April 25, 2025 18:49 View deployment

aarondr77 reviewed Apr 25, 2025

View reviewed changes

mito-ai/mito_ai/prompt_builders/prompt_constants.py Outdated Show resolved Hide resolved

mito-ai/mito_ai/prompt_builders/prompt_constants.py Show resolved Hide resolved

mito-ai/mito_ai/prompt_builders/chat_system_message.py Show resolved Hide resolved

Merge branch 'dev' into db-awareness-chat-update

a1d74db

vercel bot deployed to Preview May 12, 2025 13:45 View deployment

ngafar added 2 commits May 12, 2025 09:47

Adding changes

358f58b

Merge branch 'db-awareness-chat-update' of github.com:mito-ds/mito in…

0a90de5

…to db-awareness-chat-update

vercel bot deployed to Preview May 12, 2025 13:50 View deployment

Fixed merge shenanigans

3fb628a

vercel bot deployed to Preview May 12, 2025 13:51 View deployment

Refactor database connection handling - not hardcoded

e65cb21

vercel bot deployed to Preview May 12, 2025 16:52 View deployment

Update prompt constants to clarify DataFrame usage and column naming …

93c6be7

…conventions

ngafar requested a review from aarondr77 May 12, 2025 17:06

vercel bot deployed to Preview May 12, 2025 17:07 View deployment

Fixed typo

7f94ac2

vercel bot deployed to Preview May 12, 2025 17:26 View deployment

aarondr77 approved these changes May 12, 2025

View reviewed changes

ngafar added 2 commits May 12, 2025 16:01

Added snowflake-sqlalchemy

9c2221e

Typos

2be0ae6

vercel bot deployed to Preview May 12, 2025 20:04 View deployment

Added redact_sensitive_info function

ca745ee

vercel bot deployed to Preview May 12, 2025 21:39 View deployment

Added simpler lowercase rule

fe037a5

vercel bot deployed to Preview May 13, 2025 17:04 View deployment

ngafar merged commit ab2e1a9 into dev May 13, 2025
6 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Database Awareness - Chat #1679

Database Awareness - Chat #1679

Uh oh!

ngafar commented Apr 25, 2025 •

edited

Loading

Uh oh!

vercel bot commented Apr 25, 2025 •

edited

Loading

Uh oh!

aarondr77 commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aarondr77 commented Apr 25, 2025

Uh oh!

aarondr77 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Database Awareness - Chat #1679

Database Awareness - Chat #1679

Uh oh!

Conversation

ngafar commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Documentation

Uh oh!

vercel bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarondr77 commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aarondr77 commented Apr 25, 2025

Prompts it responded with something like 'no data available'

Prompts it asked a targetted question about:

Prompts it identified the correct tables for

Uh oh!

aarondr77 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngafar commented Apr 25, 2025 •

edited

Loading

vercel bot commented Apr 25, 2025 •

edited

Loading