Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,25 @@ jobs:
- name: Checks
run: make lint-check

unit-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v6

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install dependencies
run: make setup

- name: Run unit tests
run: uv run pytest tests/test_mcp/ -v

integration-test:
runs-on: ubuntu-latest
strategy:
Expand Down
50 changes: 50 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Docker

on:
release:
types: [published]

jobs:
build_and_publish:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4

- name: Extract version from release tag
id: version
run: |
echo "version=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"

- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
push: true
build-args: |
VERSION=${{ steps.version.outputs.version }}
tags: |
acryldata/mcp-server-datahub:${{ steps.version.outputs.version }}
acryldata/mcp-server-datahub:latest
ghcr.io/${{ github.repository }}:${{ steps.version.outputs.version }}
ghcr.io/${{ github.repository }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
32 changes: 32 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM python:3.11-slim

WORKDIR /app

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpinned uv:latest tag risks breaking Docker builds

Medium Severity

The COPY --from=ghcr.io/astral-sh/uv:latest uses an unpinned :latest tag, making the Docker build non-reproducible. If uv releases a breaking change (e.g., moving the binary path from /uv, or changing CLI behavior), builds will silently break. The existing wheels.yml workflow pins astral-sh/setup-uv@v6, but this Dockerfile has no version constraint at all. Pinning to a specific version or major version tag (e.g., uv:0.6) would prevent unexpected build failures.

Fix in Cursor Fix in Web

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignoring this as other use cases use uv:latest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pin? where are the other use cases that use uv:latest?


# Copy dependency files
COPY pyproject.toml uv.lock ./

# Install dependencies (no dev deps, no editable install yet)
RUN uv sync --frozen --no-dev --no-install-project

# Copy source
COPY src/ ./src/

# Inject version at build time so setuptools-scm fallback (0.0.0) is not used.
# The .git directory is not available during Docker builds, so we write
# _version.py directly from the VERSION build arg.
ARG VERSION=0.0.0
RUN printf '__version__ = version = "%s"\n__version_tuple__ = version_tuple = tuple(int(x) if x.isdigit() else x for x in "%s".lstrip("v").split("."))\n__commit_id__ = commit_id = None\n' \
"$VERSION" "$VERSION" \
> src/mcp_server_datahub/_version.py

# Install the project itself
RUN uv sync --frozen --no-dev
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker image always reports version 0.0.0

Medium Severity

The project uses setuptools-scm for versioning, which derives the version from git tags and writes _version.py. Since _version.py is in .gitignore (not tracked in git) and the Dockerfile never copies the .git directory, setuptools-scm cannot determine the version when uv sync runs during the build. It falls back to fallback_version = "0.0.0" from pyproject.toml. This means every Docker image — even those tagged with a real release version by the CI workflow — will report __version__ as "0.0.0", affecting the --version CLI output and the telemetry datahub_component string.

Fix in Cursor Fix in Web

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this


ENV PATH="/app/.venv/bin:$PATH"

EXPOSE 8000

CMD ["mcp-server-datahub", "--transport", "http"]
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,50 @@ The agent may either:
| `get_lineage_paths_between` | Understand deeper relationships between datasets. |


## Docker (HTTP Deployment)

The server can be run as a standalone HTTP service using Docker. In this mode, authentication tokens are supplied **per request** rather than baked into the server — making it suitable for multi-user deployments where each client has its own DataHub token.

### Authentication

Each request must supply a DataHub token via the `Authorization` header:

```
Authorization: Bearer <your-datahub-token>
```

### Docker Compose (recommended)

Create a `.env` file:

```env
DATAHUB_GMS_URL=https://your-datahub-instance
```

Then run:

```bash
docker compose up
```

### Docker (manual)

```bash
docker build -t mcp-server-datahub .
docker run -p 8000:8000 \
-e DATAHUB_GMS_URL=https://your-datahub-instance \
mcp-server-datahub
```

The server exposes two endpoints:

- `http://localhost:8000/mcp` — MCP endpoint (stateless HTTP transport)
- `http://localhost:8000/health` — Health check

### Optional environment variables

Pass any [configuration variables](#environment-variables) via `.env` or `-e` flags. For example, to enable mutation tools set `TOOLS_IS_MUTATION_ENABLED=true`.

## Developing

See [DEVELOPING.md](DEVELOPING.md).
9 changes: 9 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
services:
mcp-server-datahub:
build: .
ports:
- "${MCP_SERVER_PORT:-8000}:8000"
environment:
DATAHUB_GMS_URL: ${DATAHUB_GMS_URL}
TOOLS_IS_MUTATION_ENABLED: ${TOOLS_IS_MUTATION_ENABLED:-false}
TOOLS_IS_USER_ENABLED: ${TOOLS_IS_USER_ENABLED:-false}
39 changes: 34 additions & 5 deletions scripts/smoke_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@
# Against a running HTTP/SSE server:
uv run python scripts/smoke_check.py --url http://localhost:8000/mcp

# Against a running HTTP server with Bearer token auth:
uv run python scripts/smoke_check.py --url http://localhost:8000/mcp --token mytoken

# Via stdio subprocess (launches server as child process):
uv run python scripts/smoke_check.py --stdio-cmd "uv run mcp-server-datahub"

Expand Down Expand Up @@ -661,6 +664,7 @@ async def run_smoke_check(
test_urn: Optional[str] = None,
url: Optional[str] = None,
stdio_cmd: Optional[str] = None,
token: Optional[str] = None,
) -> SmokeCheckReport:
"""Run smoke checks against an MCP server.

Expand All @@ -678,12 +682,29 @@ async def run_smoke_check(
transport_target: Any # str (URL), StdioTransport, or FastMCP instance
if url:
# Remote HTTP/SSE — server is already running and configured
transport_target = url
if token:
from fastmcp.client.transports import StreamableHttpTransport

transport_target = StreamableHttpTransport(
url, headers={"Authorization": f"Bearer {token}"}
)
else:
transport_target = url
mode_label = f"HTTP/SSE → {url}"
elif stdio_cmd:
# Stdio subprocess — launch server as child process
# Stdio subprocess — launch server as child process.
# StdioTransport uses mcp's get_default_environment() which only
# passes a minimal env (HOME, PATH, etc.), so DATAHUB_GMS_URL and
# DATAHUB_GMS_TOKEN would be stripped. Pass them explicitly.
parts = shlex.split(stdio_cmd)
transport_target = StdioTransport(command=parts[0], args=parts[1:])
_datahub_env = {
k: v
for k in ("DATAHUB_GMS_URL", "DATAHUB_GMS_TOKEN")
if (v := os.environ.get(k))
}
transport_target = StdioTransport(
command=parts[0], args=parts[1:], env=_datahub_env or None
)
mode_label = f"stdio → {stdio_cmd}"
else:
# In-process (original behaviour)
Expand Down Expand Up @@ -752,15 +773,16 @@ async def run_smoke_check(

# 1b. Verify core tools are present — these should never be
# missing regardless of mode or middleware filtering.
# Note: search_documents and grep_documents are intentionally
# excluded — DocumentToolsMiddleware hides them when the instance
# has no Document entities, so their absence is expected and valid.
core_tools = {
"search",
"get_entities",
"get_lineage",
"get_dataset_queries",
"list_schema_fields",
"get_lineage_paths_between",
"search_documents",
"grep_documents",
}
missing_core = core_tools - available
if missing_core:
Expand Down Expand Up @@ -922,13 +944,19 @@ def _parse_pypi_args() -> Optional[tuple[Optional[str], list[str]]]:
default=None,
help='Launch server as stdio subprocess (e.g. "uv run mcp-server-datahub")',
)
@click.option(
"--token",
default=None,
help="Bearer token to send as Authorization header (only used with --url)",
)
def main(
mutations: bool,
user: bool,
test_all: bool,
urn: Optional[str],
url: Optional[str],
stdio_cmd: Optional[str],
token: Optional[str],
) -> None:
"""Smoke check all MCP server tools against a live DataHub instance."""
if test_all:
Expand All @@ -942,6 +970,7 @@ def main(
test_urn=urn,
url=url,
stdio_cmd=stdio_cmd,
token=token,
)
)
report.print_report()
Expand Down
66 changes: 62 additions & 4 deletions scripts/test_all_modes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,40 @@ LOG_DIR="$SCRIPT_DIR/logs"
# Extra arguments forwarded to every smoke_check invocation
EXTRA_ARGS=("$@")

# ---------------------------------------------------------------------------
# Bootstrap DATAHUB_GMS_URL / DATAHUB_GMS_TOKEN from ~/.datahubenv when they
# are not already present as environment variables. This is needed so that
# `env -u DATAHUB_GMS_TOKEN` in start_server() actually removes the token
# (if it only lives in the file, unsetting it is a no-op and the server would
# still load it via DataHubClient.from_env()).
# ---------------------------------------------------------------------------
if [[ -z "${DATAHUB_GMS_URL:-}" || -z "${DATAHUB_GMS_TOKEN:-}" ]]; then
DATAHUBENV="${HOME}/.datahubenv"
if [[ -f "$DATAHUBENV" ]]; then
if [[ -z "${DATAHUB_GMS_URL:-}" ]]; then
_url=$(python3 -c "
import yaml, sys
d = yaml.safe_load(open('$DATAHUBENV'))
print(d.get('gms', {}).get('server', '') or '')
" 2>/dev/null || true)
[[ -n "$_url" ]] && export DATAHUB_GMS_URL="$_url"
fi
if [[ -z "${DATAHUB_GMS_TOKEN:-}" ]]; then
_tok=$(python3 -c "
import yaml, sys
d = yaml.safe_load(open('$DATAHUBENV'))
print(d.get('gms', {}).get('token', '') or '')
" 2>/dev/null || true)
[[ -n "$_tok" ]] && export DATAHUB_GMS_TOKEN="$_tok"
fi
fi
fi

if [[ -z "${DATAHUB_GMS_URL:-}" ]]; then
echo "ERROR: DATAHUB_GMS_URL is not set and could not be read from ~/.datahubenv" >&2
exit 1
fi

# Server settings (FastMCP defaults: host=127.0.0.1, port=8000)
HOST="127.0.0.1"
PORT=8000
Expand Down Expand Up @@ -80,9 +114,20 @@ wait_for_server() {
start_server() {
local transport="$1"
local log_slug="$2"
# Optional third argument: space-separated list of env var names to unset
# for this server instance (e.g. "DATAHUB_GMS_TOKEN").
local unset_vars="${3:-}"

echo " Starting server (transport=$transport, port=$PORT)..."
cd "$PROJECT_DIR"
uv run mcp-server-datahub --transport "$transport" \

# Build an `env -u VAR ...` prefix for any vars that should be unset
local env_cmd=(env)
for var in $unset_vars; do
env_cmd+=(-u "$var")
done

"${env_cmd[@]}" uv run mcp-server-datahub --transport "$transport" \
>"$LOG_DIR/${log_slug}_server.stdout" \
2>"$LOG_DIR/${log_slug}_server.stderr" &
SERVER_PID=$!
Expand Down Expand Up @@ -144,19 +189,32 @@ run_smoke_check "HTTP (streamable-http)" --url "$HTTP_URL"
stop_server

# ---------------------------------------------------------------------------
# Mode 3: SSE
# Mode 3: HTTP with token passed as Authorization header
#
# Starts the server without DATAHUB_GMS_TOKEN so every request must carry a
# Bearer token, then passes the token via --token so smoke_check sends it as
# an Authorization header on every MCP request.
# ---------------------------------------------------------------------------
start_server "http" "http-token-auth" "DATAHUB_GMS_TOKEN"
run_smoke_check "HTTP (token as auth header)" \
--url "$HTTP_URL" \
--token "${DATAHUB_GMS_TOKEN:-}"
stop_server

# ---------------------------------------------------------------------------
# Mode 4: SSE
# ---------------------------------------------------------------------------
start_server "sse" "sse"
run_smoke_check "SSE" --url "$SSE_URL"
stop_server

# ---------------------------------------------------------------------------
# Mode 4: Stdio subprocess
# Mode 5: Stdio subprocess
# ---------------------------------------------------------------------------
run_smoke_check "Stdio (subprocess)" --stdio-cmd "uv run mcp-server-datahub"

# ---------------------------------------------------------------------------
# Mode 5: fastmcp run (create_app factory)
# Mode 6: fastmcp run (create_app factory)
#
# This exercises the create_app() entry point that `fastmcp dev` uses.
# Under the hood, `fastmcp dev` runs:
Expand Down
Loading
Loading