Guides how to answer data questions against the connected Postgres database. Writes SQL when asked, chooses when a chart helps, and renders results so the user can see what the numbers mean.
- The user asks a question that can only be answered by looking at the database (counts, trends, breakdowns, specific records, distributions).
- The user asks to "show", "graph", "chart", "plot", or "visualize" data.
- The user asks what's in the database or what tables exist.
Do not invoke this skill for questions the user is asking about the codebase, workflow, or anything that doesn't require a database lookup.
| Tool | Purpose |
|---|---|
get_schema |
Returns tables, columns, types, and foreign keys. Call once per session to learn what exists. |
run_query |
Executes a read-only SELECT and returns rows to you only — the user does not see anything from run_query. Use it for probes, schema sniffing, and fetching the rows that will become the final answer. |
present_result |
The only tool that updates the user's view. Pass the SQL, columns, rows, and (when the result has shape) a Vega-Lite spec. Call exactly once per user question, with the final answer. |
list_queries / search_queries / get_query |
Browse or recall previously presented results (auto-generated by Upjack). Useful when the user references an earlier chart. |
set_custom_instructions |
Saves user conventions (default time windows, schema scoping, soft-delete rules, naming quirks) so they persist across turns. Use only when the user explicitly asks; empty clears. Capped at 8 KiB. |
Call get_schema() the first time the user asks a data question in a session. Keep the returned structure in mind for the rest of the conversation — the schema does not change between turns.
If the schema turns out to be too large to reason about in one pass, focus on the tables most likely to be relevant based on the user's question and ask them to clarify if ambiguity remains.
Write the smallest SELECT that answers the question. Prefer named CTEs (WITH x AS (...)) when the query has multiple steps — they are easier for the user to read in the response than nested subqueries.
Always add a LIMIT when the result set could be large and the user is exploring rather than asking for a full export. The server caps results at 10,000 rows regardless — going over that just truncates without warning.
Date values return as ISO strings ("2025-04-20"). Vega-Lite parses these natively as temporal, so no casting is needed on the client.
Quote identifiers with double quotes when they contain mixed case or reserved words ("User", "createdAt"). Postgres folds unquoted identifiers to lowercase.
Any result with more than one row and at least one numeric or temporal column should be accompanied by a chart. Default to visualizing. The user can see the table too — the chart is what makes the shape immediately legible, which is the entire reason they asked.
Only skip the chart when:
- The result is a single row or single scalar (just answer in prose).
- Every column is free-text (names, descriptions, IDs) with no numeric or date dimension.
- The user explicitly asked for a raw list ("list the records", "show me the rows").
Match the result shape to the chart type:
| Shape of result | Chart type | Mark |
|---|---|---|
| One time column + one or more numeric columns | Line chart | {"type": "line", "point": true} |
| One time column + one categorical + one numeric | Multi-series line (color by category) | {"type": "line", "point": true} |
| One categorical (< ~30 unique) + one numeric | Bar chart | "bar" |
| Two categoricals + one numeric | Heatmap | "rect" |
| Two numeric columns (no temporal) | Scatter plot | "point" |
| One numeric column only | Histogram | "bar" with "bin": true on x |
| Stacked/grouped categorical over time | Area or stacked bar | "area" or "bar" |
These are two separate tools. run_query returns rows to you privately — the user sees nothing from it. present_result is the only tool that updates the UI.
- Call
run_query(sql=...)to fetch the rows that answer the question. You can call it multiple times (probes, samples, refinements) without affecting the user's view. - When you have the final rows, call
present_result(sql=..., columns=..., rows=..., vega_spec=..., question=..., summary=...)exactly once. Pass the rows fromrun_queryverbatim. Pass thevega_specwhenever the result has shape (see Step 3). Pass the user's original question inquestion(their words, not your paraphrase).
Never include a data field in the spec — the UI injects the returned rows automatically. Field names in the spec must exactly match column aliases in your SQL.
If you skip present_result, the user sees nothing. run_query alone is invisible.
Time series (line):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": {"type": "line", "point": true},
"encoding": {
"x": {"field": "month", "type": "temporal", "title": "Month"},
"y": {"field": "n", "type": "quantitative", "title": "Count"}
}
}Multi-series time series (line, color by category):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": {"type": "line", "point": true},
"encoding": {
"x": {"field": "month", "type": "temporal"},
"y": {"field": "total", "type": "quantitative"},
"color": {"field": "region", "type": "nominal"}
}
}Categorical breakdown (bar):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": "bar",
"encoding": {
"x": {"field": "department", "type": "nominal", "sort": "-y"},
"y": {"field": "total", "type": "quantitative"}
}
}Two-dim heatmap:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": "rect",
"encoding": {
"x": {"field": "hour", "type": "ordinal"},
"y": {"field": "day_of_week", "type": "ordinal"},
"color": {"field": "count", "type": "quantitative"}
}
}Scatter (correlation):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": "point",
"encoding": {
"x": {"field": "exposure_hours", "type": "quantitative"},
"y": {"field": "incident_count", "type": "quantitative"}
}
}Histogram (distribution of one numeric):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": "bar",
"encoding": {
"x": {"field": "duration_min", "type": "quantitative", "bin": true},
"y": {"aggregate": "count", "type": "quantitative"}
}
}Adapt the field names to match your SQL aliases. Keep the shape and swap the specifics.
After present_result lands, tell the user what you found in one or two sentences. Point out the notable thing in the data — the peak, the outlier, the trend direction — not just "here are the results." The chart shows the shape; your job is to say what matters.
- SELECT-only. The server rejects INSERT, UPDATE, DELETE, DDL, and multi-statement queries. If the user asks to modify data, explain that this tool is read-only.
- 10-second timeout. Queries that exceed it are killed. If a query might be slow (large aggregate over millions of rows), narrow the time window first or ask for a sampling strategy.
- 10,000 row cap. Larger results are silently truncated. For anything intended as a full export, aggregate server-side instead of pulling raw rows.
- Read-only role. The DB connection has
SELECTgrants only. Any attempt to modify fails at the database level even if it slipped through validation.
Never fabricate a query result. If a query fails or returns unexpected data, report what the database actually said rather than guessing.
Never invent column or table names. If the schema doesn't contain what the user is asking about, say so — don't write SQL that references columns you hope exist.
Prefer aggregation over raw row dumps. Users almost never want 10,000 rows in the UI; they want the pattern. Aggregate in SQL (GROUP BY, COUNT, SUM, AVG) rather than returning raw and expecting the chart to summarize.
When the user's question is ambiguous (which date column? which status values count as "active"?), ask before querying. A wrong answer is worse than a clarifying question.