-
Notifications
You must be signed in to change notification settings - Fork 180
Database Awareness - Chat #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
The schema has column headers in uppercase, so if I sk it to write code like this: How does the location of a customer effect their likelihood to churn? It ends up referencing the columns in pandas with all uppercase, even though the resulting dataframe has lowercase. If snowflake sql-alchemy always ends up with lowercase dataframe headers, (idk if this is the case), then we should update the schema probably. import pandas as pd
from sqlalchemy import create_engine
# Database connection setup
user = "XXX"
password = "XXX"
account = "XXX"
warehouse = "COMPUTE_WH"
database = "TELCO_CHRUN"
schema = "PUBLIC"
conn_str = (
f"snowflake://{user}:{password}@{account}/"
f"{database}/{schema}?warehouse={warehouse}"
)
engine = create_engine(conn_str)
# Query for churn status and location data
query = """
SELECT
l.COUNTRY,
l.STATE,
l.CITY,
s.CHURN_LABEL
FROM LOCATION_DATA l
JOIN STATUS_ANALYSIS s
ON l.CUSTOMER_ID = s.CUSTOMER_ID
"""
df = pd.read_sql(query, engine)
# Analyze churn rate by state (you can modify to COUNTRY or CITY)
churn_by_state = (
df.groupby('STATE')['CHURN_LABEL']
.mean()
.sort_values(ascending=False)
.reset_index()
.rename(columns={'CHURN_LABEL': 'churn_rate'})
)
churn_by_state |
I made a bit bigger of a schema (still I think a lot smaller than real schemas our users will have) and the performance is definetly getting worse. I'll send you the schema so you can check it out. I asked Claude to generate 12 questions that I could ask about the data and here is how it did: Prompts it responded with something like 'no data available'
Prompts it asked a targetted question about:
Prompts it identified the correct tables for
|
…to db-awareness-chat-update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing that the tool is struggling to know when to query the database more often then I anticipated based on the evals. For example:
Unless I am very specific, it often tries to import data from a file (even if that file doesn't exist). For example: Whenever I say "Import the SP500 dataset", it tries to "sp500_df = pd.read_csv('SP500.csv')". But if I say "Import the SP500 stock dataset", then it will choose to import.
Can we turn this into an eval so we can iterate how we handle this? Maybe we should give it more clear instructions about how to check if it should query the database or not.
Description
The Mito AI is now aware of database connections, and can write SQL queries.
For now this is limited to chat, and the connections must be hardcoded.
Testing
.mito/db
dir add the newconnections.json
andschemas.json
files.Be sure to ask about data that is relevant to a table, but does not exist.
Documentation
N/A - We should add documentation, but after we add the taskpane to add new db connections.