[Part 2 of 3]: Add in-memory artifact store and extraction layer#780
Closed
jairus-m wants to merge 4 commits into
Closed
[Part 2 of 3]: Add in-memory artifact store and extraction layer#780jairus-m wants to merge 4 commits into
jairus-m wants to merge 4 commits into
Conversation
86e5db5 to
b50d4bf
Compare
b50d4bf to
ed6d9c3
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Introduces an in-memory DuckDB-backed ArtifactStore and an extraction layer to load parsed dbt artifacts into normalized tables, enabling SQL querying and DuckDB FTS (BM25) search as groundwork for replacing get_job_run_artifacts.
Changes:
- Added DuckDB dependency and lockfile updates.
- Implemented
ArtifactStore(load/reset/query/search/indexing) plus table schemas for artifact-derived tables. - Added artifact extractors (manifest/catalog/run_results/sources) and new artifact-store error types; refactored tool-call error unions into
errors/classification.py.
Reviewed changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Locks the added DuckDB dependency. |
pyproject.toml |
Adds duckdb>=1.5.2 runtime dependency. |
.changes/unreleased/Under the Hood-20260514-151517.yaml |
Changelog entry for the new artifact store/extraction layer. |
src/dbt_mcp/dbt_admin/run_artifacts/tables.py |
Defines DuckDB table DDL + FTS/index configuration for artifact tables. |
src/dbt_mcp/dbt_admin/run_artifacts/extractors.py |
Extracts DuckDB-ready row tuples from parsed artifact dicts. |
src/dbt_mcp/dbt_admin/run_artifacts/store.py |
Implements in-memory DuckDB store with loading, query guard, FTS search, and indexing. |
src/dbt_mcp/errors/artifact_search.py |
Adds artifact-store specific error hierarchy. |
src/dbt_mcp/errors/classification.py |
Centralizes client/server tool-call error union types. |
src/dbt_mcp/errors/__init__.py |
Re-exports new errors and the new classification unions. |
tests/unit/dbt_admin/run_artifacts/test_store.py |
Adds unit/integration-style tests for store behavior (query/search/load/merge). |
tests/unit/dbt_admin/run_artifacts/__init__.py |
Adds test package marker. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Introduces DuckDB-backed store, row extractors for all 4 artifact types (manifest, catalog, run_results, sources), table DDL definitions, and error hierarchy for the ARTIFACT_SEARCH toolset (PR 2 of 3).
7ba40e4 to
a8c801c
Compare
… fix) and improve error checking
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Part 2 of 3 to replace
get_job_run_artifactswith an in-memory DuckDB store that letsLLMs query and run full-text search over dbt job run artifacts.
PR sequence:
Artifact parsing infrastructure([Part 1 of 3]: Replace hand-rolled Pydantic schemas with dbt-artifacts-parser #745)What Changed
duckdb>=1.5.2dependency for the in-memory analytical storeArtifactStoreclass (store.py) — manages an in-memory DuckDB database with:load_artifact()— parse, extract, and bulk-insert a single artifactquery()— read-only SQL with keyword-level mutation guard and 500-row capsearch()— BM25 full-text search via DuckDB's FTS extensionreset()/close()lifecycle methodsreindex=False+build_all_indexes()) for batch loadsextractors.py) — converts parsed artifact dicts into DuckDB row tuples:extract_from_manifest→ nodes, node_columns, edges, test_metadata, exposures, metrics, groups, macrosextract_from_catalog→ catalog_tables, catalog_stats, + column merge into node_columnsextract_from_run_results→ invocations, run_resultsextract_from_sources→ source_freshnesstables.py) —TableConfigdataclass with DDL, FTS columns, and index columns for all 13 tablesartifact_search.py) —ArtifactSearchError(server) andArtifactNotLoadedError(client)ClientToolCallError/ServerToolCallErrortype unions intoclassification.pyRelated Issues
Related to #413
Checklist
Mermaid ERD
erDiagram INVOCATIONS { int id PK int run_id varchar invocation_id varchar command varchar dbt_version float elapsed_time } RUN_RESULTS { int id PK int run_id varchar unique_id FK varchar invocation_id FK varchar status float execution_time text message } SOURCE_FRESHNESS { int id PK int run_id varchar unique_id FK varchar invocation_id FK varchar status varchar max_loaded_at } NODES { int id PK int run_id varchar unique_id varchar name varchar resource_type text description text raw_code text compiled_code } NODE_COLUMNS { int id PK int run_id varchar unique_id FK varchar column_name varchar declared_type varchar catalog_type varchar data_type } EDGES { int id PK int run_id varchar parent_unique_id FK varchar child_unique_id FK varchar edge_type } TEST_METADATA { int id PK int run_id varchar unique_id FK varchar test_name varchar attached_node FK } CATALOG_TABLES { int id PK int run_id varchar unique_id FK varchar table_type varchar database_name varchar schema_name } CATALOG_STATS { int id PK int run_id varchar unique_id FK varchar stat_id varchar stat_value } EXPOSURES { int id PK int run_id varchar unique_id varchar name varchar exposure_type } METRICS { int id PK int run_id varchar unique_id varchar name varchar metric_type } GROUPS { int id PK int run_id varchar unique_id varchar name } MACROS { int id PK int run_id varchar unique_id varchar name text macro_sql } INVOCATIONS ||--o{ RUN_RESULTS : "invocation_id" INVOCATIONS ||--o{ SOURCE_FRESHNESS : "invocation_id" NODES ||--o{ NODE_COLUMNS : "unique_id" NODES ||--o{ EDGES : "parent / child" NODES ||--o{ TEST_METADATA : "attached_node" NODES ||--o| CATALOG_TABLES : "unique_id" NODES ||--o{ CATALOG_STATS : "unique_id" NODES ||--o{ RUN_RESULTS : "unique_id" NODES ||--o{ SOURCE_FRESHNESS : "unique_id"