Skip to content

v0.5.3

Latest

Choose a tag to compare

@alexsku alexsku released this 17 Mar 05:17
54d8886

Highlights — New SQL-like Filter Syntax

The search and get_lineage tools now accept a human-readable, SQL-like filter string instead of the previous nested JSON dict. This makes filters dramatically easier for LLM agents (and humans) to write correctly.

Since MCP tools are discovered dynamically by LLM agents at the start of every session, this is not a breaking change — agents will automatically pick up the new filter parameter and its syntax documentation.

Before (0.5.x):

search(query="*", filters={"entity_type": ["DATASET"]})
search(query="*", filters={"and": [{"platform": ["snowflake"]}, {"env": ["PROD"]}]})

After (0.5.3):

search(query="*", filter="entity_type = dataset")
search(query="*", filter="platform = snowflake AND env = PROD")

The new syntax supports simple equality, IN lists, boolean logic (AND, OR, NOT, parentheses), comparisons (>, >=, <, <=), and existence checks (IS NULL, IS NOT NULL).

Added

  • search_filter_parser: Full SQL-like filter parser with tokenizer and recursive-descent parser. Compiles human-readable filter strings into DataHub SDK Filter objects. Includes comprehensive FILTER_DOCS injected into tool descriptions so LLM agents always have the syntax reference.
  • Modular tool architecture: Tools are now organized into dedicated modules under tools/ (search.py, entities.py, lineage.py, dataset_queries.py, assertions.py) instead of being defined inline in mcp_server.py.
  • graphql_helpers: Extracted shared GraphQL execution logic, token budgeting, and response processing into a dedicated module.
  • tool_context: New module for tool-level context management.
  • view_preference: Configurable view preference system (UseDefaultView, NoView, CustomView) for controlling which DataHub view is applied during search.
  • tools/assertions.py: New tool module for data quality assertion checks.

Changed

  • search tool: The filters parameter (JSON dict) is replaced by filter (string). See highlights section above.
  • get_lineage tool: Also uses the new string-based filter parameter for filtering lineage results.
  • mcp_server.py: Significantly slimmed down — tool implementations moved to dedicated modules, GraphQL helpers extracted, filter parsing extracted.
  • Smoke check safety: smoke_check.py now refuses to run against non-localhost DataHub instances to prevent accidental mutation of production data.

Removed

  • test_custom_filter_conversion.py: Removed obsolete test for the old dict-based filter format, replaced by search_filter_parser.

Full Changelog: v0.5.2...v0.5.3