Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions manifest.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{
"version": "2",
"updated_at": "2026-04-22T15:52:03Z",
"updated_at": "2026-04-24T09:37:12Z",
"skills": {
"databricks-apps": {
"version": "0.1.1",
"description": "Databricks Apps development and deployment",
"experimental": false,
"updated_at": "2026-04-14T11:07:19Z",
"updated_at": "2026-04-24T09:30:35Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -32,7 +32,7 @@
"version": "0.1.0",
"description": "Core Databricks skill for CLI, auth, and data exploration",
"experimental": false,
"updated_at": "2026-04-14T11:07:19Z",
"updated_at": "2026-04-23T13:47:44Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -47,7 +47,7 @@
"version": "0.0.0",
"description": "Declarative Automation Bundles (DABs) for deploying and managing Databricks resources",
"experimental": false,
"updated_at": "2026-04-14T11:07:19Z",
"updated_at": "2026-04-23T13:47:44Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -65,7 +65,7 @@
"version": "0.1.0",
"description": "Databricks Jobs orchestration and scheduling",
"experimental": false,
"updated_at": "2026-04-14T11:07:19Z",
"updated_at": "2026-04-23T13:47:44Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -75,24 +75,24 @@
},
"databricks-lakebase": {
"version": "0.1.0",
"description": "Databricks Lakebase database development",
"description": "Databricks Lakebase Postgres: projects, scaling, connectivity, synced tables, and Data API",
"experimental": false,
"updated_at": "2026-04-15T18:00:00Z",
"updated_at": "2026-04-24T09:15:50Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
"assets/databricks.png",
"assets/databricks.svg",
"references/computes-and-scaling.md",
"references/connectivity.md",
"references/reverse-etl.md"
"references/synced-tables.md"
]
},
"databricks-model-serving": {
"version": "0.1.0",
"description": "Databricks Model Serving endpoint management",
"experimental": false,
"updated_at": "2026-04-14T11:07:19Z",
"updated_at": "2026-04-23T13:47:44Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -104,7 +104,7 @@
"version": "0.1.0",
"description": "Databricks Pipelines (DLT) for ETL and streaming",
"experimental": false,
"updated_at": "2026-04-14T11:07:19Z",
"updated_at": "2026-04-23T13:47:44Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand Down
2 changes: 1 addition & 1 deletion scripts/skills.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"experimental": False,
},
"databricks-lakebase": {
"description": "Databricks Lakebase database development",
"description": "Databricks Lakebase Postgres: projects, scaling, connectivity, synced tables, and Data API",
"experimental": False,
},
"databricks-dabs": {
Expand Down
9 changes: 9 additions & 0 deletions skills/databricks-apps/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ Before writing any SQL, use the parent `databricks-core` skill for data explorat
**Lakebase apps** (`--features lakebase`): No SQL files or typegen. See [Lakebase Guide](references/appkit/lakebase.md) for the tRPC pattern: initialize schema at startup, write procedures in `server/server.ts`, then build the React frontend.

## When to Use What

> **If the user asks for fast, instant, or low-latency reads of lakehouse data — or mentions quick lookups, search by key/ID, feature serving, product catalog, real-time, or operational dashboards:** present two options before proceeding:
> - **(A) Analytics** — precompute aggregates in a SQL query, load once via `useAnalyticsQuery`, filter client-side. Simpler setup, but requires a running SQL warehouse and initial query takes seconds.
> - **(B) Synced tables** — sync a gold table from Delta into Lakebase Postgres for OLTP-speed point lookups. Requires a Lakebase project but gives true low-latency reads without a SQL warehouse. See [Lakebase Guide](references/appkit/lakebase.md).
>
> Let the user choose. If they don't have a strong preference, briefly explain the trade-off.

- **Read analytics data → display in chart/table**: Use visualization components with `queryKey` prop
- **Read analytics data → custom display (KPIs, cards)**: Use `useAnalyticsQuery` hook
- **Read analytics data → need computation before display**: Still use `useAnalyticsQuery`, transform client-side
Expand All @@ -80,6 +87,8 @@ Before writing any SQL, use the parent `databricks-core` skill for data explorat
- **⚠️ NEVER use tRPC to run SELECT queries against the warehouse** — always use SQL files in `config/queries/`
- **⚠️ NEVER use `useAnalyticsQuery` for Lakebase data** — it queries the SQL warehouse only

> **Choosing between Analytics and Lakebase for reads:** If the user doesn't mention latency or real-time needs, default to the analytics pattern (simpler setup). If they mention "fast", "instant", "low latency", "quick", "quickly", "search by ID/key", "don't want to wait", "real-time", "point queries", "feature serving", "product catalog", or "no warehouse" — always present both options from the decision gate above before committing to an approach.

## Frameworks

### AppKit (Recommended)
Expand Down
73 changes: 72 additions & 1 deletion skills/databricks-apps/references/appkit/lakebase.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ Use Lakebase when your app needs **persistent read/write storage** — forms, CR
| Pattern | Use Case | Data Source |
|---------|----------|-------------|
| Analytics | Read-only dashboards, charts, KPIs | Databricks SQL Warehouse |
| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase Autoscaling) |
| Lakebase | CRUD operations, persistent state, forms, low-latency reads of synced lakehouse data | PostgreSQL (Lakebase Autoscaling) |
| Both | Dashboard with user preferences/saved state | Warehouse + Lakebase |

> **Serving lakehouse data to apps?** If your app needs low-latency reads of Delta/UC tables (entity lookups, product catalogs, feature serving), use **synced tables** to materialize them into Lakebase instead of querying a SQL warehouse (which takes seconds to minutes). See *Reading from Synced Tables* below.

## Scaffolding

**ALWAYS scaffold with the correct feature flags** — do not add Lakebase manually to an analytics-only scaffold.
Expand Down Expand Up @@ -165,6 +167,75 @@ const adapter = new PrismaPg(pool);
const prisma = new PrismaClient({ adapter });
```

## Reading from Synced Tables

Synced tables materialize Delta/UC tables into Lakebase Postgres for low-latency app reads. The lakehouse remains the source of truth; Lakebase serves as a read-optimized index.

**Architecture:**
```
Delta gold tables → Synced tables (read-only) → App reads via pool.query()
App writes → Lakebase OLTP tables → optional Lakehouse Sync → Delta
```

**Use synced tables when** data is curated in Delta, changes relatively slowly, and must be served at OLTP latency:

- **Operational consoles over gold tables** — support portals, sales ops, supply-chain cockpits that need row-level drill-down with fast filters and point lookups on curated Delta tables (tickets, orders, assets, SLAs)
- **User-facing apps on analytical data** — serve product catalogs, personalization attributes, experiment assignments, pricing from Lakebase instead of hitting the warehouse (seconds to minutes) directly
- **Online feature serving / ML** — sync features or predictions (churn scores, recommendations, risk scores) from lakehouse into Lakebase for real-time inference; app writes feedback/overrides to separate OLTP tables
- **Hybrid read/write patterns** — join app-owned mutable state (tasks, approvals, comments) with read-only synced reference data (customers, products, policies, ML scores) for rich views
- **Postgres-specific capabilities on lakehouse data** — when the app benefits from B-tree/GiST/GIN indexes, JSONB, pgvector, or PostGIS on Delta-derived tables

**Do NOT use synced tables when:**
- OLAP-heavy workload (large scans, aggregations, heavy joins) — use DBSQL + materialized views (seconds-to-minutes latency is acceptable for dashboards)
- Your app aggregates across large synced tables (GROUP BY, JOINs on millions of rows) — pre-aggregate in Delta first, then sync the small result table
- You need to write back to the synced data — writes corrupt sync; use separate Lakebase OLTP tables
- Table is huge + high churn (>1TB) — sync only small serving/gold tables, keep raw data on Delta
- UC FGAC (row filters, column masks) is critical — synced tables don't propagate UC policies; use DBSQL with user authorization

### How It Works

Synced tables (created via `databricks postgres create-synced-table`) appear as regular Postgres tables. From the app's perspective, use the same `pool.query()` pattern but **read-only**.

**Key differences from CRUD tables:**

| | CRUD Tables | Synced Tables |
|--|-------------|---------------|
| Created by | App SP (via `CREATE TABLE`) | Sync pipeline (DLT) |
| Owned by | SP role | System role (`databricks_writer_*`) |
| Operations | Read + Write | **Read-only** (writes corrupt sync) |
| Schema init | App must `CREATE SCHEMA/TABLE` | Already exists after sync |
| Deploy-first | Required (SP must own schema) | Not required |

**Permission grant required:** The app's SP has `CAN_CONNECT_AND_CREATE` but does **not** have `pg_read_all_data`. To read synced tables, the project owner must grant access:

```sql
-- Run as project owner (databricks_superuser), not as the SP
GRANT USAGE ON SCHEMA public TO "<SP_CLIENT_ID>";
GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>";
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO "<SP_CLIENT_ID>";
```

**Example tRPC route reading synced taxi data:**

```typescript
topPickups: publicProcedure.query(async () => {
const { rows } = await pool.query(`
SELECT pickup_zip, COUNT(*) AS trip_count, AVG(fare_amount) AS avg_fare
FROM public.nyc_trips
GROUP BY pickup_zip
ORDER BY trip_count DESC
LIMIT 10
`);
return rows;
}),
```

> **Do not write to synced tables.** The sync pipeline manages the data — direct writes corrupt the sync state. For mixed read/write patterns, read from synced tables and write to separate app-owned tables.

For creating synced tables, see the **`databricks-lakebase`** skill's [synced-tables.md](../../../databricks-lakebase/references/synced-tables.md).

> **Creating synced tables for apps:** Use `databricks postgres create-synced-table` (see the **`databricks-lakebase`** skill). After sync completes and app is deployed, grant the app's SP read access — the GRANT SQL is shown above, connect via psql (see the lakebase skill's SKILL.md Other Workflows for connection steps).

## Key Differences from Analytics Pattern

| | Analytics | Lakebase |
Expand Down
Loading
Loading