Releases · acryldata/mcp-server-datahub

17 Mar 05:17

alexsku

v0.5.3

54d8886

v0.5.3 Latest

Latest

Highlights — New SQL-like Filter Syntax

The search and get_lineage tools now accept a human-readable, SQL-like filter string instead of the previous nested JSON dict. This makes filters dramatically easier for LLM agents (and humans) to write correctly.

Since MCP tools are discovered dynamically by LLM agents at the start of every session, this is not a breaking change — agents will automatically pick up the new filter parameter and its syntax documentation.

Before (0.5.x):

search(query="*", filters={"entity_type": ["DATASET"]})
search(query="*", filters={"and": [{"platform": ["snowflake"]}, {"env": ["PROD"]}]})

After (0.5.3):

search(query="*", filter="entity_type = dataset")
search(query="*", filter="platform = snowflake AND env = PROD")

The new syntax supports simple equality, IN lists, boolean logic (AND, OR, NOT, parentheses), comparisons (>, >=, <, <=), and existence checks (IS NULL, IS NOT NULL).

Added

search_filter_parser: Full SQL-like filter parser with tokenizer and recursive-descent parser. Compiles human-readable filter strings into DataHub SDK Filter objects. Includes comprehensive FILTER_DOCS injected into tool descriptions so LLM agents always have the syntax reference.
Modular tool architecture: Tools are now organized into dedicated modules under tools/ (search.py, entities.py, lineage.py, dataset_queries.py, assertions.py) instead of being defined inline in mcp_server.py.
graphql_helpers: Extracted shared GraphQL execution logic, token budgeting, and response processing into a dedicated module.
tool_context: New module for tool-level context management.
view_preference: Configurable view preference system (UseDefaultView, NoView, CustomView) for controlling which DataHub view is applied during search.
tools/assertions.py: New tool module for data quality assertion checks.

Changed

search tool: The filters parameter (JSON dict) is replaced by filter (string). See highlights section above.
get_lineage tool: Also uses the new string-based filter parameter for filtering lineage results.
mcp_server.py: Significantly slimmed down — tool implementations moved to dedicated modules, GraphQL helpers extracted, filter parsing extracted.
Smoke check safety: smoke_check.py now refuses to run against non-localhost DataHub instances to prevent accidental mutation of production data.

Removed

test_custom_filter_conversion.py: Removed obsolete test for the old dict-based filter format, replaced by search_filter_parser.

Full Changelog: v0.5.2...v0.5.3

Assets 2

25 Feb 04:51

alexsku

v0.5.2

f918eb8

v0.5.2

Fixed

HTTP transport ContextVar propagation: Fixed LookupError for _mcp_dh_client ContextVar when running with HTTP transport (stateless_http=True). Each HTTP request runs in a separate async context that doesn't inherit ContextVars from the main thread, causing DocumentToolsMiddleware and VersionFilterMiddleware to fail. Added _DataHubClientMiddleware that sets the ContextVar at the start of every MCP message.
create_app() initialization safety: The _app_initialized flag is now set only after all middleware is successfully added, so a failed setup can be retried.
--debug middleware ordering: LoggingMiddleware is now added before other middlewares so it wraps the full request/response lifecycle for maximum visibility.

Added

create_app() factory function: Extracted server setup into a factory function so that fastmcp dev / fastmcp run work correctly (they import the module but never call main()).
Multi-mode smoke testing: smoke_check.py now supports --url and --stdio-cmd options to test against running HTTP/SSE servers or stdio subprocesses, in addition to the default in-process mode.
test_all_modes.sh orchestrator: Runs smoke checks across all 5 transport modes (in-process, HTTP, SSE, stdio, fastmcp run), with per-mode log capture to scripts/logs/.
SMOKE_CHECK.md: Documentation with step-by-step reproduction instructions for all transport modes.
Core tool validation: Smoke check now verifies that all 8 core read-only tools are present, catching silent regressions in tool registration or middleware filtering.

Full Changelog: v0.5.1...v0.5.2

Assets 2

12 Feb 02:39

alexsku

v0.5.1

31b9e56

v0.5.1

Fixed

list_schema_fields — Fixed crash when a dataset has no schema metadata. Now gracefully returns an empty fields list.
save_document — Errors (e.g., authorization failures) are now raised as exceptions instead of being silently swallowed. LLM agents now see the actual error message.
update_description — Hidden from OSS instances where entity-level description updates are not supported. Available on Cloud only.

Added

scripts/smoke_check.py — Comprehensive smoke check script that exercises all available MCP tools against a live DataHub instance. Discovers URNs dynamically, respects version filtering middleware, and tests mutation tools with add-then-remove pairs. Usage: uv run python scripts/smoke_check.py --all

Changed

Version-aware tool filtering: update_description now requires Cloud (@min_version(cloud="0.3.16")), previously also allowed on OSS >= 1.4.0.

Full Changelog: v0.5.0...v0.5.1

Assets 2

10 Feb 19:16

alexsku

v0.5.0

af77e70

v0.5.0

New Tools

Mutation Tools

New tools for modifying metadata in DataHub. Enabled via TOOLS_IS_MUTATION_ENABLED=true.

add_tags / remove_tags — Add or remove tags from entities or schema fields. Supports bulk operations.
add_terms / remove_terms — Add or remove glossary terms from entities or schema fields.
add_owners / remove_owners — Add or remove ownership assignments. Supports different ownership types.
set_domains / remove_domains — Assign or remove domain membership for entities.
update_description — Update, append to, or remove descriptions for entities or schema fields.
add_structured_properties / remove_structured_properties — Manage typed metadata fields on entities.

Document Tools

New tools for working with documents (knowledge articles, runbooks, FAQs) stored in DataHub. Automatically hidden when no documents exist in the catalog.

search_documents — Search for documents using keyword search with filters.
grep_documents — Search within document content using regex patterns.
save_document — Save standalone documents to DataHub's knowledge base.

Enhancements

Semantic search support — Enable AI-powered semantic search for documents via SEMANTIC_SEARCH_ENABLED=true.
Document tools middleware — Automatically hides document tools when no documents exist, with a cached existence check (1-minute TTL).
Upgraded FastMCP to 2.14.5 (from 2.10.5) with MCP SDK 1.26.0 compatibility.
Relaxed pydantic pin to >=2.0,<3 (was <2.12).

Breaking Changes

Python 3.11+ is now required (previously 3.10+).
acryl-datahub >= 1.3.1.7 is now required.
MCP Inspector: Use fastmcp dev instead of mcp dev for development.

Security

Added SECURITY.md with vulnerability reporting guidelines.
Bumped authlib (1.6.0 → 1.6.6), urllib3 (2.4.0 → 2.6.3), aiohttp (3.12.7 → 3.13.3), python-multipart (0.0.20 → 0.0.22).

New Environment Variables

Variable	Default	Description
`TOOLS_IS_MUTATION_ENABLED`	`false`	Enable mutation tools
`TOOLS_IS_USER_ENABLED`	`false`	Enable user tools
`DATAHUB_MCP_DOCUMENT_TOOLS_DISABLED`	`false`	Completely disable document tools
`SAVE_DOCUMENT_TOOL_ENABLED`	`true`	Enable/disable save_document
`SAVE_DOCUMENT_PARENT_TITLE`	`Shared`	Parent folder title for saved documents
`SAVE_DOCUMENT_ORGANIZE_BY_USER`	`false`	Organize saved documents by user
`SAVE_DOCUMENT_RESTRICT_UPDATES`	`true`	Only allow updating documents in shared folder
`SEMANTIC_SEARCH_ENABLED`	`false`	Enable semantic (AI-powered) search

Assets 2

19 Nov 02:27

alexsku

v0.4.0

8761474

new mcp tools and other improvements

Response Token Budget Management

New TokenCountEstimator class for fast token counting using character-based heuristics
Automatic result truncation via _select_results_within_budget() to prevent context window issues
Configurable token limits:
- TOOL_RESPONSE_TOKEN_LIMIT environment variable (default: 80,000 tokens)
- ENTITY_SCHEMA_TOKEN_BUDGET environment variable (default: 16,000 tokens per entity)
90% safety buffer to account for token estimation inaccuracies
Ensures at least one result is always returned

Enhanced Search Capabilities

Enhanced Keyword Search:
- Supports pagination with start parameter
- Added viewUrn for view-based filtering
- Added sortInput for custom sorting

Query Entity Support

Native QueryEntity type support (SQL queries as first-class entities)
New query_entity.gql GraphQL query
Optimized entity retrieval with specialized query for QueryEntity types
Includes query statement, subjects (datasets/fields), and platform information

GraphQL Compatibility

Adaptive field detection for newer GMS versions
Caching mechanism for GMS version detection
Graceful fallback when newer fields aren't available
Support for #[CLOUD] and #[NEWER_GMS] conditional field markers
DISABLE_NEWER_GMS_FIELD_DETECTION environment variable override

Schema Field Optimization

Smart field prioritization to stay within token budgets:
1. Primary key fields (isPartOfKey=true)
2. Partitioning key fields (isPartitioningKey=true)
3. Fields with descriptions
4. Fields with tags or glossary terms
5. Alphabetically by field path
Generator-based approach for memory efficiency

Error Handling & Security

Enhanced error logging with full stack traces in async_background wrapper
Logs function name, args, and kwargs on failures
ReDoS protection in HTML sanitization with bounded regex patterns
Query truncation function (configurable via QUERY_LENGTH_HARD_LIMIT, default: 5,000 chars)

Default Views Support

Automatic default view application for all search operations
Fetches organization's default global view from DataHub
5-minute caching (configurable via VIEW_CACHE_TTL_SECONDS)
Can be disabled via DATAHUB_MCP_DISABLE_DEFAULT_VIEW environment variable
Ensures search results respect organization's data governance policies

Dependencies

Added cachetools>=5.0.0: For GMS field detection caching
Added types-cachetools (dev): Type stubs for mypy

Performance

Memory efficiency: Generator-based result selection avoids loading all results into memory
Caching: GMS version detection cached per graph instance
Fast token estimation: Character-based heuristic (no tokenizer overhead)
Smart truncation: Truncates less important schema fields first

Assets 2

10 Oct 04:00

mayurinehate

v0.3.10

8622b3d

v0.3.10

fix: workaround for https://github.com/jlowin/fastmcp/issues/1377 (#50)

Assets 2

05 Aug 16:48

hsheth2

v0.3.9

f7d528a

v0.3.9

What's Changed

feat: support get_dataset_queries, get_lineage for specific column by @mayurinehate in #37
feat: add view definition in get_entity by @mayurinehate in #38

New Contributors

@mayurinehate made their first contribution in #37

Full Changelog: v0.3.8...v0.3.9

Contributors

mayurinehate

Assets 2

26 Jul 00:20

hsheth2

v0.3.8

b89d749

v0.3.8

What's Changed

fix: improve filter compat by @hsheth2 in #36

Full Changelog: v0.3.7...v0.3.8

Contributors

hsheth2

Assets 2

25 Jul 22:00

hsheth2

v0.3.7

8a93f22

v0.3.7

What's Changed

bump acryl-datahub by @hsheth2 in #35

Full Changelog: v0.3.6...v0.3.7

Contributors

hsheth2

Assets 2

22 Jul 21:30

hsheth2

v0.3.6

ea18c45

v0.3.6

What's Changed

feat: make the search tool more robust by @hsheth2 in #32
bump acryl-datahub to use discriminated unions for filters by @hsheth2 in #33
perf: make all tools async by @hsheth2 in #34

Full Changelog: v0.3.5...v0.3.6

Contributors

hsheth2

Assets 2

Releases: acryldata/mcp-server-datahub

v0.5.3

Highlights — New SQL-like Filter Syntax

Added

Changed

Removed

Uh oh!

v0.5.2

Fixed

Added

Uh oh!

v0.5.1

Fixed

Added

Changed

Uh oh!

v0.5.0

New Tools

Mutation Tools

User Tools

Document Tools

Enhancements

Breaking Changes

Security

New Environment Variables

Uh oh!

new mcp tools and other improvements

Response Token Budget Management

Enhanced Search Capabilities

Query Entity Support

GraphQL Compatibility

Schema Field Optimization

Error Handling & Security

Default Views Support

Dependencies

Performance

Uh oh!

v0.3.10

Uh oh!

v0.3.9

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.8

What's Changed

Contributors

Uh oh!

v0.3.7

What's Changed

Contributors

Uh oh!

v0.3.6

What's Changed

Contributors

Uh oh!