-
Notifications
You must be signed in to change notification settings - Fork 179
Database Awareness - Chat #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
The schema has column headers in uppercase, so if I sk it to write code like this: How does the location of a customer effect their likelihood to churn? It ends up referencing the columns in pandas with all uppercase, even though the resulting dataframe has lowercase. If snowflake sql-alchemy always ends up with lowercase dataframe headers, (idk if this is the case), then we should update the schema probably. import pandas as pd
from sqlalchemy import create_engine
# Database connection setup
user = "XXX"
password = "XXX"
account = "XXX"
warehouse = "COMPUTE_WH"
database = "TELCO_CHRUN"
schema = "PUBLIC"
conn_str = (
f"snowflake://{user}:{password}@{account}/"
f"{database}/{schema}?warehouse={warehouse}"
)
engine = create_engine(conn_str)
# Query for churn status and location data
query = """
SELECT
l.COUNTRY,
l.STATE,
l.CITY,
s.CHURN_LABEL
FROM LOCATION_DATA l
JOIN STATUS_ANALYSIS s
ON l.CUSTOMER_ID = s.CUSTOMER_ID
"""
df = pd.read_sql(query, engine)
# Analyze churn rate by state (you can modify to COUNTRY or CITY)
churn_by_state = (
df.groupby('STATE')['CHURN_LABEL']
.mean()
.sort_values(ascending=False)
.reset_index()
.rename(columns={'CHURN_LABEL': 'churn_rate'})
)
churn_by_state |
|
||
APP_DIR_PATH: Final[str] = os.path.join(MITO_FOLDER) | ||
|
||
with open(os.path.join(APP_DIR_PATH, 'db', 'connections.json'), 'r') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PoC I think this okay, but instead of hardcoding the username and password in the notebook, we should be importing from the config file. Hardcoding credentials in a notebook is obviously bad practice ...
- Do not use a with statement when creating the SQLAlchemy engine. Instead, initialize it once so it can be reused for multiple queries. | ||
- Always return the results of the query in a pandas DataFrame, unless instructed otherwise. | ||
- Column names in query results may be returned in lowercase. Always refer to columns using their lowercase names in the resulting DataFrame (e.g., df['date'] instead of df['DATE']). | ||
- If you think the requested data is stored in the database, but you are unsure, then ask the user for clarification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I've been having it respond with a question pretty often
@@ -37,6 +44,9 @@ def create_chat_system_message_prompt() -> str: | |||
|
|||
Notice in the example above that the citation uses line number 2 because citation line numbers are 0-indexed. | |||
|
|||
=== | |||
{get_database_rules()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this in the system prompt so we don't send it over and over again makes sense.
Eventually, it might even make sense to move this to a tool use that the agent and chat are allowed to use called check_snowflake_schema
or something like that so it can decide when to use it...
I made a bit bigger of a schema (still I think a lot smaller than real schemas our users will have) and the performance is definetly getting worse. I'll send you the schema so you can check it out. I asked Claude to generate 12 questions that I could ask about the data and here is how it did: Prompts it responded with something like 'no data available'
Prompts it asked a targetted question about:
Prompts it identified the correct tables for
|
Description
The Mito AI is now aware of database connections, and can write SQL queries.
For now this is limited to chat, and the connections must be hardcoded.
Testing
.mito/db
dir add the newconnections.json
andschemas.json
files.Be sure to ask about data that is relevant to a table, but does not exist.
Documentation
N/A - We should add documentation, but after we add the taskpane to add new db connections.