Skip to content

Improve Lakebase skill: synced tables, connectivity, and best-practice alignment#56

Draft
pkosiec wants to merge 10 commits intomainfrom
pkosiec/lakebase-synced-tables
Draft

Improve Lakebase skill: synced tables, connectivity, and best-practice alignment#56
pkosiec wants to merge 10 commits intomainfrom
pkosiec/lakebase-synced-tables

Conversation

@pkosiec
Copy link
Copy Markdown
Member

@pkosiec pkosiec commented Apr 23, 2026

Summary

  • Synced tables reference: Replace reverse-etl.md with comprehensive synced-tables.md — sync modes, verified CLI syntax, prerequisites, data type mapping, capacity planning, DABs caveat, and a worked example
  • Connectivity: Add Python app connection pattern (Pattern 4), DNS resolution workaround, and JS/TS cross-reference to AppKit
  • Lakebase SKILL.md: Add psql workflow (step-by-step + scriptable), synced table GRANT commands, expanded troubleshooting (CDF, null bytes, DABs, storage_catalog)
  • Apps integration: Add synced tables decision gate in "When to Use What", and "Reading from Synced Tables" section in the apps lakebase reference
  • Best practices from agent testing: Fix psql sslmode syntax, broaden latency trigger keywords, add "sync gold tables not raw" guidance, align GRANT SQL across files

@pkosiec pkosiec force-pushed the pkosiec/lakebase-synced-tables branch 2 times, most recently from 04635d9 to 67bc5f1 Compare April 24, 2026 09:44
pkosiec added 10 commits April 24, 2026 17:10
- Add Pattern 4 (Python Databricks App with platform-injected env vars) to connectivity.md
- Clarify connection patterns are Python-specific; JS/AppKit apps get full auto-injection
- Remove duplicate Data API / Reverse ETL pointers already covered in reference docs block
- Fix missing blank lines before section headers in SKILL.md

Co-authored-by: Isaac
- Rename reverse-etl.md to synced-tables.md (official Databricks terminology)
- Replace all `databricks database` commands with `databricks postgres`
  (old Provisioned API → new Autoscaling API)
- Add `new_pipeline_spec` as required field with warning: storage_catalog
  must be a regular UC catalog, not the Lakebase catalog (DLT can't create
  event logs in Postgres-backed schemas)
- Add Prerequisites section for `create-catalog`
- Add complete field reference table
- Add NYC taxi example (end-to-end lifecycle)
- Add synced tables section to apps lakebase.md (read-only pattern,
  permission grants, differences from CRUD tables)
- All commands verified against live Lakebase project

Co-authored-by: Isaac
- Fix broken relative path in appkit/lakebase.md (../../ → ../../../)
- Trim duplicate autoscaling section in SKILL.md (defer to computes-and-scaling.md)
- Restore Data API and Synced Tables links in Other Workflows section
- Add "Previously known as Reverse ETL" to synced-tables.md for discoverability

Co-authored-by: Isaac
…sync troubleshooting

- Expand SKILL.md description with trigger words (synced tables, reverse ETL,
  Data API, scale-to-zero, OAuth, connection pooling) for better skill activation
- Switch to 3rd person voice per create-skill best practices
- Replace feature status table with bulleted Capabilities section
- Add 3 synced-table troubleshooting entries (storage_catalog, CDF, permissions)
- Update manifest description to match

Co-authored-by: Isaac
- Add DNS resolution (macOS) workaround to connectivity.md (was referenced from SKILL.md but missing)
- Rewrite synced table read example as tRPC route in appkit/lakebase.md (was Express-style, contradicting tRPC guidance)
- Deduplicate create-catalog command in synced-tables.md NYC taxi example
- Remove redundant storage_catalog warning callout (already in field table)
- Clarify that delete-synced-table leaves the Postgres table behind

Co-authored-by: Isaac
Add use-case guidance to help AI assistants discover synced tables for
the right patterns: operational consoles, user-facing apps on analytical
data, online feature serving, hybrid read/write, and Postgres-specific
capabilities. Include anti-patterns (OLAP, writes, huge tables, FGAC).

- Broaden Lakebase row in decision table to mention synced data
- Add callout below table pointing to synced tables section
- Add architecture pattern, use cases, and anti-patterns

Co-authored-by: Isaac
Add a bullet for "Read lakehouse data with low latency" that points
agents to the Lakebase guide when apps need fast point lookups. Add
guidance note for agents to ask about latency requirements rather than
defaulting to DBSQL for all read-from-UC use cases.

Co-authored-by: Isaac
Agent testing showed the "When to Use What" section routes to analytics
by default, even when the user explicitly asks for "fast" or "instant"
reads. The synced tables bullet was buried after three analytics bullets
that matched first.

- Add decision gate at top of section: if user mentions speed/latency,
  present both analytics and synced tables options before proceeding
- Rewrite guidance note to reference the gate instead of defaulting
- Remove redundant standalone synced tables bullet (covered by gate)

Co-authored-by: Isaac
…idance

From agent testing:
- Add psql connection workflow (generate-credential + psql) for running
  GRANT, CREATE INDEX, etc. against Lakebase
- Add post-deploy GRANT workflow for app SP access to synced tables
- Add source table guidance: don't run ad-hoc CREATE TABLE AS SELECT
- Add DAB caveat: synced_database_tables maps to Provisioned API, not
  Autoscaling. DAB support for postgres_synced_tables not yet available.
- Add creation + GRANT note to apps lakebase.md

Co-authored-by: Isaac
- Fix psql sslmode syntax (connection string instead of broken positional arg)
- Add scriptable psql workflow for agent automation
- Add endpoint path note for generate-database-credential
- Align GRANT SQL between SKILL.md and apps/lakebase.md (add ALTER DEFAULT PRIVILEGES)
- Surface DABs caveat in SKILL.md troubleshooting table
- Broaden latency decision gate triggers in apps SKILL.md
- Add "sync gold tables, not raw" as top best practice
- Clarify 8 TB storage limit scope (per branch)

Co-authored-by: Isaac
@pkosiec pkosiec force-pushed the pkosiec/lakebase-synced-tables branch from cfe90bc to 63dee0b Compare April 24, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant