[v2-rebuild] databricks-streaming-guardian: Delta / Liquid Clustering / Structured Streaming / DLT operations skill

## What this skill does

`databricks-streaming-guardian` is the data-operations skill of the pack — focused on Delta Lake, Liquid Clustering, Structured Streaming, and DLT. It is the largest skill in the rebuild because these four surfaces each ship with their own set of sharp edges that show up most visibly when production data flows through them at scale. Nine of the twelve failure modes here are not bugs; they are documented platform behaviors that surprise engineers — so the response pattern is friction-at-trigger-time (PreToolUse hooks on destructive operations), not bug reports.

## What it catches

- **ConcurrentDeleteDeleteException on OPTIMIZE collisions** — manual OPTIMIZE colliding with AUTO OPTIMIZE, which is silently enabled on every table touched by MERGE/UPDATE/DELETE. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D01)
- **ConcurrentAppendException on Liquid Clustering** — LC eliminates folder-based partition pruning but not writer conflicts; fan-out MERGEs break unless the MERGE predicate is narrowed to clustering keys. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D02)
- **DELTA_FILE_NOT_FOUND_DETAILED after VACUUM** — streaming checkpoint pins file paths; OPTIMIZE rewrites them; VACUUM 7 days later deletes the originals. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D03)
- **Silent checkpoint corruption** — months of healthy streaming then silent reset to batch 0, no documented root cause. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D04)
- **RocksDB off-heap memory pinning** — multi-GB beyond JVM GC reach kills the driver via OOM with heap looking fine. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D05)
- **Liquid Clustering migration costs and downstream breakage** — hidden full-rewrite cost plus consumer code that expected partition predicates that no longer exist. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D06)
- **Time travel breaking after VACUUM crosses retention boundary** — engineers learning during audit that time travel is not backup. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D07)
- **DLT `@dlt.table` thread race** — `ThreadPoolExecutor` registrations completing out of order, intermittent "table missing" failures. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D08)
- **DLT full refresh dropping data silently** — when source is non-replayable (Kafka past retention, truncate-and-load Delta source). ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D09)
- **Autoloader `UnknownFieldException` on schema evolution** — default mode stops the stream on every new column; rescue mode silently widens schema. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D10)
- **DLT predictive optimization cost on idle pipelines** — the maintenance cluster running 24x7 on Advanced tier. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D11 — primary mention in cost-leak-hunter, secondary here)
- **DIFFERENT_DELTA_TABLE_READ_BY_STREAMING_SOURCE** — checkpoints pin to source table UUID; CREATE OR REPLACE generates a new UUID; every active streaming consumer dies the moment the producer team runs the migration script. ([`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) D12)

## Design questions I want pushback on

1. **PreToolUse hook scope.** Plan is to block: OPTIMIZE / VACUUM / CREATE OR REPLACE / DROP TABLE when `system.streaming.query_progress` shows any active consumer on the target table. Is this the right list, or should it be wider (TRUNCATE, ALTER TABLE schema changes) or narrower?
2. **False-positive tolerance for hooks.** A consumer that has been idle for 2 hours but has not been formally stopped — block or warn? My current plan is warn + offer a `--force` escape hatch. Right call?
3. **Liquid Clustering migration triage.** D06 includes a full-rewrite cost that can be very large. Should the skill estimate the rewrite cost up front (requires DESCRIBE DETAIL + table-size math) before recommending migration, or only after the user opts in?
4. **DLT `@dlt.table` thread race detection.** D08 fires intermittently. Should the skill detect-and-warn at code-review time (static analysis of `ThreadPoolExecutor.submit(register_table, ...)` patterns), or only diagnose after a failure has occurred?
5. **Autoloader schema-evolution recommendation.** D10 has three modes (`addNewColumns`, `failOnNewColumns`, `rescue`). My instinct is "recommend `rescue` by default, document the silent-widening risk loudly." Is that right, or do production teams hate `rescue` for reasons I am missing?

## What I am not asking about right now

- Whether to split this into multiple skills — merging the delta-conflict-resolver into this one is decided per [`007-AT-ADEC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/007-AT-ADEC-databricks-v2-cto-decision.md) § Decision 2.
- Whether to add Kafka-side or Kinesis-side diagnostics — out of scope, source-system focus.
- Whether to use Delta Live Tables features that have not GA'd yet — DLT Direct Publishing Mode is on the watch list but not in scope until it stabilizes.

## How to respond

Comment below with any thoughts, leave thumbs-up / thumbs-down on individual bullets in the design questions, or send a voice memo on WhatsApp and I will transcribe it into the issue with attribution. English is not required for voice memos — Portuguese is fine.

Source bead: `claude-vjaw` in the local beads workspace.

---

## Reference material

Most relevant for this skill:

| Doc | What it covers |
|---|---|
| [`003-RL-RSRC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/003-RL-RSRC-databricks-delta-streaming-research.md) | Delta Lake / Liquid Clustering / Structured Streaming / DLT pain catalog |
| [`007-AT-ADEC`](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/main/plugins/saas-packs/databricks-pack/000-docs/007-AT-ADEC-databricks-v2-cto-decision.md) | CTO decision — Databricks Pack v2 rebuild |

Full reference set + cross-skill context: see umbrella issue [#795](https://github.com/jeremylongshore/claude-code-plugins-plus-skills/issues/795) § Reference material.

- Jeremy Longshore
intentsolutions.io



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v2-rebuild] databricks-streaming-guardian: Delta / Liquid Clustering / Structured Streaming / DLT operations skill #792

What this skill does

What it catches

Design questions I want pushback on

What I am not asking about right now

How to respond

Reference material

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Doc	What it covers
`003-RL-RSRC`	Delta Lake / Liquid Clustering / Structured Streaming / DLT pain catalog
`007-AT-ADEC`	CTO decision — Databricks Pack v2 rebuild

Uh oh!

[v2-rebuild] databricks-streaming-guardian: Delta / Liquid Clustering / Structured Streaming / DLT operations skill #792

Description

What this skill does

What it catches

Design questions I want pushback on

What I am not asking about right now

How to respond

Reference material

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions