This repository contains AI Agent skills for ClickZetta Lakehouse. The skills are designed for Codex, Claude Code, czcode, and other assistants that can load a SKILL.md-based instruction set.
The repository turns ClickZetta operational knowledge into reusable routing rules, workflows, SQL patterns, and reference documents. An assistant can use these skills to select the correct domain workflow, read the minimum required reference material, and generate or execute the right ClickZetta steps for the user's task.
The repository currently contains 26 top-level clickzetta-* skills and one official documentation knowledge base:
lakehouse-doc-en: English ClickZetta Lakehouse official documentation index and reference corpus.clickzetta-*: task-oriented skills for ingestion, Studio tasks, dbt, modeling, dynamic tables, connectors, external integrations, governance, and operations.
| Category | Skill | Scope |
|---|---|---|
| Official docs | lakehouse-doc-en | Official ClickZetta Lakehouse documentation covering SQL, functions, permissions, VClusters, data sharing, SDKs, BI tools, AI functions, and general platform usage. |
| Foundation and connectivity | clickzetta-overview | Product overview, object model, architecture, Studio modules, brand naming, and service endpoints. |
| Foundation and connectivity | clickzetta-sql-migration | SQL migration guidance from Snowflake, Databricks, and Spark SQL to ClickZetta SQL. |
| Data ingestion and pipelines | clickzetta-data-ingest-pipeline | Router for choosing ingestion methods based on source type, latency, sync scope, and whether ingestion is one-time or continuous. |
| Data ingestion and pipelines | clickzetta-file-import-pipeline | Import data from URLs, local files, or Volume paths using format inference, table creation, and COPY INTO. |
| Data ingestion and pipelines | clickzetta-oss-ingest-pipeline | Batch or continuous ingestion from OSS, S3, or COS through Storage Connections, External Volumes, Pipes, and COPY INTO. |
| Data ingestion and pipelines | clickzetta-kafka-ingest-pipeline | Kafka ingestion through READ_KAFKA Pipes or Kafka External Table plus Table Stream pipelines. |
| Data ingestion and pipelines | clickzetta-batch-sync-pipeline | Studio offline batch sync tasks for single-table sync, multi-table mirrors, and sharded-table merge scenarios. |
| Data ingestion and pipelines | clickzetta-realtime-sync-pipeline | Studio single-table real-time sync tasks for Kafka, MySQL, PostgreSQL, and related sources. |
| Data ingestion and pipelines | clickzetta-cdc-sync-pipeline | Studio multi-table CDC sync from MySQL or PostgreSQL into Lakehouse, including full database mirror and sharded-table merge modes. |
| Data ingestion and pipelines | clickzetta-sql-pipeline-manager | SQL-native management for Dynamic Tables, Materialized Views, Table Streams, Pipes, and layered SQL pipelines. |
| Data ingestion and pipelines | clickzetta-table-stream-pipeline | Table Stream change data capture workflows, offset handling, preview, consumption, and idempotent downstream writes. |
| Data ingestion and pipelines | clickzetta-studio-task-manager | Studio task creation, folder organization, scheduling, dependencies, deployment, task operations, and engineering conventions. |
| Data ingestion and pipelines | clickzetta-pipeline-review | Pipeline review and diagnostics across Studio tasks, Lakehouse objects, pipeline SQL, and run histories. |
| Data ingestion and pipelines | clickzetta-dbt-studio-pipeline | Publish dbt models into Studio assets and configure scheduled execution from dbt artifacts. |
| Modeling and analytics | clickzetta-dw-modeling | Data warehouse modeling for ODS/DWD/DWS/ADS, Medallion architecture, schema design, and pipeline-aware modeling. |
| Modeling and analytics | clickzetta-dbt-project-setup | dbt-clickzetta project initialization, profiles.yml, dbt_project.yml, and layered project standards. |
| Modeling and analytics | clickzetta-dbt-modeling | dbt source discovery, model design, incremental materialization, tests, and model generation. |
| Modeling and analytics | clickzetta-dynamic-table | Dynamic Table creation, refresh configuration, incremental computation, ALTER workflows, refresh history, and best practices. |
| Modeling and analytics | clickzetta-data-science | Data science workflows using SQL, ZettaPark, notebooks, EDA, feature engineering, inference, and vector retrieval. |
| Modeling and analytics | clickzetta-semantic-view | Semantic View modeling with logical tables, dimensions, metrics, filters, and semantic layer queries. |
| SDK and integrations | clickzetta-zettapark | ZettaPark DataFrame API, Session setup, reads, transformations, writes, file operations, and SQL execution. |
| SDK and integrations | clickzetta-spark-flink-connector | Spark Connector reads/writes and Flink Write Connector CDC or append-only writes. |
| SDK and integrations | clickzetta-external-function | External Functions, Python/Java UDF packaging, and cloud function integration. |
| SDK and integrations | clickzetta-ai-function | Built-in AI functions: AI_COMPLETE (call LLMs) and AI_EMBEDDING (text vectors) with API CONNECTION setup. |
| Operations and governance | clickzetta-volume-manager | External Volume, User Volume, Table Volume, object storage mounting, file operations, import, and export. |
| Operations and governance | clickzetta-table-lineage | Table lineage and cost visualization based on information_schema.job_history and generated HTML artifacts. |
Use the table below when deciding which skill should handle a user request.
| User intent | Recommended entry point |
|---|---|
| Understand ClickZetta concepts, Workspace, Schema, VCluster, object hierarchy, or Studio modules. | clickzetta-overview |
| Configure Python, JDBC, SQLAlchemy, ZettaPark, or general Lakehouse connections. | clickzetta-zettapark / lakehouse-doc-en |
| Choose an ingestion method without knowing whether to use files, object storage, Kafka, batch sync, real-time sync, or CDC. | clickzetta-data-ingest-pipeline |
| Import files from a local path, URL, Volume, object storage, or Kafka. | clickzetta-file-import-pipeline / clickzetta-oss-ingest-pipeline / clickzetta-kafka-ingest-pipeline |
| Build Studio offline sync, single-table real-time sync, or multi-table CDC tasks. | clickzetta-batch-sync-pipeline / clickzetta-realtime-sync-pipeline / clickzetta-cdc-sync-pipeline |
| Manage Studio tasks, folders, schedules, dependencies, deployments, and task state. | clickzetta-studio-task-manager |
| Initialize a dbt project, build dbt models, or publish dbt models to Studio. | clickzetta-dbt-project-setup / clickzetta-dbt-modeling / clickzetta-dbt-studio-pipeline |
| Design SQL pipelines, Dynamic Tables, Materialized Views, Pipes, or Table Streams. | clickzetta-sql-pipeline-manager / clickzetta-dynamic-table / clickzetta-table-stream-pipeline |
| Review an existing pipeline, diagnose task failures, inspect dependencies, or identify data quality issues. | clickzetta-pipeline-review |
| Write native ClickZetta SQL, look up syntax, functions, permissions, VClusters, or official product behavior. | lakehouse-doc-en |
| Migrate SQL from Snowflake, Databricks, or Spark SQL. | clickzetta-sql-migration |
| Query metadata, table structures, job history, cost attribution, permissions, Time Travel, recovery, or platform operations. | lakehouse-doc-en |
Investigate query performance, EXPLAIN, Result Cache, OPTIMIZE, small files, or execution plans. |
lakehouse-doc-en |
| Manage users, roles, grants, masking policy, network policy, lifecycle, data sharing, SDK, BI, or Java/Python application docs. | lakehouse-doc-en |
| Manage Volumes, object storage mounts, file upload/download, import, or export. | clickzetta-volume-manager |
| Work with Spark, Flink, ZettaPark, External Functions, or External Catalogs. | clickzetta-spark-flink-connector / clickzetta-zettapark / clickzetta-external-function / lakehouse-doc-en |
Each top-level skill is stored in one directory:
clickzetta-<domain>/
├── SKILL.md
└── references/
└── *.md
SKILL.md is the routing and workflow entry point. It should define:
- the skill name and trigger description in front matter;
- when the skill should and should not be used;
- the minimal workflow the agent should follow;
- which reference files to read for detailed syntax, examples, or troubleshooting.
references/ contains detailed technical material, SQL snippets, operational playbooks, API notes, and examples. Agents should load only the specific reference files needed for the current task.
Some skills contain additional sub-skill or best-practice directories. For example:
clickzetta-dynamic-table/
├── dt-creator/
├── sql-to-dt/
└── best-practices/
lakehouse-doc-en is the authoritative fallback for native ClickZetta behavior. Use it when:
- a topic has been consolidated into official documentation rather than a dedicated operational skill;
- the user asks for SQL syntax, functions, permissions, VCluster behavior, data sharing, Time Travel, SDK, BI, AI functions, or other official product capabilities;
- a workflow skill needs product-level confirmation from the docs.
When adding, renaming, or deleting a skill:
- Update the top-level skill directory.
- Update
.well-known/skills/index.json. - Update this
README.mdcatalog and routing table. - Search the repository for stale skill references.
For deleted or consolidated topics, route users to lakehouse-doc-en unless there is another active task-specific skill that clearly owns the workflow.
Install or expose this repository's skill directories to an AI coding assistant that supports SKILL.md-based skills. Users can then describe ClickZetta tasks directly, for example:
Import CSV files from OSS into public.orders on a schedule.
Build a CDC pipeline from MySQL to Lakehouse and publish it as a Studio task.
Create a Dynamic Table for the DWS order summary layer and configure refresh behavior.
The assistant should route the request through the appropriate SKILL.md, read only the required references, and generate the concrete SQL, cz-cli commands, Studio workflow, or troubleshooting steps for the task.