Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ dist/
downloads/
eggs/
.eggs/
lib/
!ui/lib/*
!ui/lib
./lib/
lib64/
parts/
sdist/
Expand Down Expand Up @@ -168,3 +170,7 @@ dev/cleanup.py
# docgen
docs/dqx/docs/reference/api
!docs/dqx/docs/reference/api/index.mdx

# generated UI files
src/databricks/labs/dqx/app/static/*
!src/databricks/labs/dqx/app/static/.gitkeep
56 changes: 56 additions & 0 deletions docs/dqx/docs/dev/contributing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -366,3 +366,59 @@ If you need a code example, use triple backticks, e.g.:
print("Hello, world!")
```
</Admonition>

## Contributing to the DQX App

DQX App is a web application that provides a user interface for DQX. It is built with React and FastAPI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention Lakehouse app as deployment mechanism to give a bit more context?


### Prerequisites

To run the DQX App locally, you need to have the following dependencies installed:

- Node.js 20.X or higher
- `yarn`
- Python 3.10+

The `ui` folder contains the frontend code for the DQX App.
The `src/databricks/labs/dqx/app` folder contains the backend code for the DQX App.

Add a new `.env` file in the root of the project with the following content:
```
DQX_DEV_TOKEN=<your-databricks-token>
Copy link
Contributor

@mwojtyczka mwojtyczka Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should recommend using oauth. Earlier in the docs we have a note on this.
Is this DQX_DEV_TOKEN correct? it's not a standard evn var and it is not used anywhere in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DQX_DEV_TOKEN=<your-databricks-token>
export DQX_DEV_TOKEN=<your-databricks-token>

DATABRICKS_CONFIG_PROFILE=<your-databricks-profile>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DATABRICKS_CONFIG_PROFILE=<your-databricks-profile>
export DATABRICKS_CONFIG_PROFILE=<your-databricks-profile>

```

N.b. - The `DQX_DEV_TOKEN` is a Databricks token that you can generate in the Databricks UI or via the Databricks CLI:
```bash
databricks token create --lifetime-seconds 3600 --comment "DQX App Development Token"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
databricks token create --lifetime-seconds 3600 --comment "DQX App Development Token"
databricks tokens create --lifetime-seconds 3600 --comment "DQX App Development Token"

```

### Running the DQX App locally - frontend

First, build the frontend code:
```bash
yarn --cwd ui install
yarn --cwd ui build
```

Then, run the frontend in development mode:
```bash
yarn --cwd ui dev
```

Leave this console open and open a new terminal to run the backend in development mode.

### Running the DQX App locally - backend

1. Sync app dependencies:
```bash
hatch run pip install -e ".[app]"
```

2. Run the backend in development mode:
```bash
uvicorn src.databricks.labs.dqx.app.app:app --reload
Copy link
Contributor

@mwojtyczka mwojtyczka Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way i could make it work on my machine was to enter the shell hatch shell or run it, or hatch run python -m uvicorn src.databricks.labs.dqx.app.app:app --reload, otherwise i was getting zsh: command not found: uvicorn

```

The UI should now be running at `http://localhost:5173`.
The backend should now be available at `http://localhost:8000`.
10 changes: 10 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ pii = [
# This may be required for the larger models due to Databricks connect memory limitations.
# The models cannot be delcared as dependency here buecase PyPI does not support URL-based dependencies which would prevent releases.
]
app = [
"fastapi",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pin the versions

"uvicorn",
"pydantic-settings",
"python-dotenv",
"sqlmodel",
"sqlalchemy",
]

[project.entry-points.databricks]
runtime = "databricks.labs.dqx.workflows_runner:main"
Expand Down Expand Up @@ -78,6 +86,7 @@ dependencies = [
"pylint~=3.3.1",
"pylint-per-file-ignores~=1.3",
"pylint-pytest==2.0.0a0",
"pylint_pydantic",
Copy link
Contributor

@mwojtyczka mwojtyczka Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pin the version

"pytest~=8.3.3",
"pytest-cov~=4.1.0",
"pytest-mock~=3.14.0",
Expand Down Expand Up @@ -269,6 +278,7 @@ load-plugins = [
"pylint.extensions.set_membership",
"pylint.extensions.typing",
"pylint_per_file_ignores",
"pylint_pydantic"
]

# Pickle collected data for later comparisons.
Expand Down
33 changes: 33 additions & 0 deletions src/databricks/labs/dqx/app/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from functools import partial
from fastapi import FastAPI, Request

from databricks.labs.dqx.app.models import VersionView, ProfileView
from databricks.labs.dqx.app.dependencies import get_user_workspace_client
from databricks.labs.dqx.app.config import rt
from databricks.labs.dqx.app.utils import custom_openapi


version_view = VersionView.current()

app = FastAPI(
title="DQX | UI",
version=version_view.version,
)


@app.get("/version", response_model=VersionView, operation_id="Version")
async def version():
return version_view


@app.get("/profile", response_model=ProfileView, operation_id="Profile")
async def profile(request: Request):
try:
ws = get_user_workspace_client(request)
return ProfileView.from_ws(ws)
except Exception as e:
rt.logger.error(f"Error getting user workspace client: {e}")
return ProfileView.from_request(request)


app.openapi = partial(custom_openapi, app) # type: ignore
45 changes: 45 additions & 0 deletions src/databricks/labs/dqx/app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi.responses import FileResponse
from fastapi.middleware.cors import CORSMiddleware
from fastapi.requests import Request
from databricks.labs.dqx.app.config import conf, rt
from databricks.labs.dqx.app.api import app as api_app


@asynccontextmanager
async def lifespan(app_instance: FastAPI):
rt.logger.info(f"Starting DQX App with instance {app_instance}")
yield


app = FastAPI(title="DQX App", lifespan=lifespan)
ui_app = StaticFiles(directory=conf.static_assets_path, html=True)

if conf.dev_token:
rt.logger.info("Adding CORS middleware for development")
origins = [
"http://localhost:8000",
"http://localhost:5173",
"http://0.0.0.0:5173",
]

api_app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

# note the order of mounts!

app.mount("/api", api_app)
app.mount("/", ui_app)


@app.exception_handler(404)
async def client_side_routing(request: Request, exc: Exception):
rt.logger.error(f"Not found: {exc} while handling request {request}, returning index.html")
return FileResponse(conf.static_assets_path / "index.html")
153 changes: 153 additions & 0 deletions src/databricks/labs/dqx/app/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
from __future__ import annotations

from functools import cached_property
import logging
from logging import Logger
from pathlib import Path
import uuid

from dotenv import load_dotenv
from pydantic import BaseModel, ConfigDict, Field, SecretStr, model_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
from sqlalchemy import Engine
from sqlmodel import Session, create_engine

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.app.utils import TimedCachedProperty, configure_logging

configure_logging()

logger = logging.getLogger("dqx.app")
logger.setLevel(logging.DEBUG)


# Load environment variables from .env file in the root directory of the project
project_root = Path(__file__).parent.parent.parent.parent.parent.parent
env_file = project_root / ".env"

if env_file.exists():
logger.info(f"Loading environment variables from {env_file}")
load_dotenv(dotenv_path=env_file)
else:
logger.info(f"Env file {env_file} not found, continuing with current environment variables")


class DatabaseConfig(BaseModel):
instance_name: str = Field(default="dqx")
port: int = Field(default=5432)
database: str = Field(default="databricks_postgres")


class AppConfig(BaseSettings):
model_config = SettingsConfigDict(env_file=env_file, env_prefix="DQX_", extra="allow")

static_assets_path: Path = Field(
default=Path(__file__).parent / "static" / "dist",
description="Path to the static assets directory",
)

dev_token: SecretStr | None = Field(
default=None,
description="Token for local development",
)

db: DatabaseConfig = Field(default_factory=DatabaseConfig)


class ConnectionInfo(BaseModel):
host: str
port: int
user: str
password: str
database: str

def to_url(self) -> str:
return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.database}?sslmode=require"


class DatabaseManager(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not belong to config

rt: Runtime

model_config = ConfigDict(ignored_types=(TimedCachedProperty,))

def get_connection_info(self) -> ConnectionInfo:
"""
Returns the SQLAlchemy engine URL used for database operations.
This URL is initialized with the database URL and can be used to create sessions.
The URL is cached for 30 minutes to improve performance while ensuring
credentials are refreshed periodically.
"""
instance = self.rt.ws.database.get_database_instance(name=self.rt.conf.db.instance_name)
cred = self.rt.ws.database.generate_database_credential(
request_id=str(uuid.uuid4()), instance_names=[self.rt.conf.db.instance_name]
)
user = self.rt.ws.current_user.me().user_name
pwd = cred.token
host = instance.read_write_dns
assert host is not None, "Host is not found"
assert user is not None, "User is not found"
assert pwd is not None, "Password is not found"

return ConnectionInfo(
host=host,
port=self.rt.conf.db.port,
user=user,
password=pwd,
database=self.rt.conf.db.database,
)

@TimedCachedProperty[Engine](ttl_seconds=30 * 60) # 30 minutes
def engine(self) -> Engine:
"""
Returns the SQLAlchemy engine used for database operations.
This engine is initialized with the database URL and can be used to create sessions.
The engine is cached for 30 minutes to improve performance while ensuring
credentials are refreshed periodically.
"""
self.rt.logger.info("Creating new SQLAlchemy engine (cache expired or first time)")
return create_engine(
self.get_connection_info().to_url(),
pool_size=2,
max_overflow=0,
)

def session(self) -> Session:
"""
Returns the SQLModel session used for database operations.
This session is initialized with the engine and can be used to create transactions.
The session is cached for 30 minutes to improve performance while ensuring
credentials are refreshed periodically.
"""
return Session(self.engine)


class Runtime(BaseModel):
conf: AppConfig

model_config = ConfigDict(ignored_types=(TimedCachedProperty,))

@cached_property
def logger(self) -> Logger:
return logger

@cached_property
def ws(self) -> WorkspaceClient:
"""
Returns the service principal client.
"""
return WorkspaceClient()

@model_validator(mode="after")
def validate_conf(self) -> Runtime:
try:
self.ws.current_user.me()
except Exception as e:
self.logger.error("Cannot connect to Databricks API using service principal")
raise e

logger.info(f"App initialized with config: {self.conf}")
return self


conf = AppConfig()
rt = Runtime(conf=conf)
22 changes: 22 additions & 0 deletions src/databricks/labs/dqx/app/dependencies.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from fastapi import Request

from databricks.sdk import WorkspaceClient
from databricks.labs.dqx.app.config import conf, rt


def get_user_workspace_client(
request: Request,
) -> WorkspaceClient:
"""
Returns a Databricks Workspace client with authentication behalf of user.
If the request contains an X-Forwarded-Access-Token header, on behalf of user authentication is used.
Otherwise, the client is created using the default environemnt variables (e.g. during local development)
"""
token = (
request.headers.get("X-Forwarded-Access-Token") or conf.dev_token.get_secret_value() if conf.dev_token else None
)
if not token:
raise ValueError("No token for authentication provided in request headers or environment variables")

rt.logger.info("Received OBO token, initializing client with it")
return WorkspaceClient(token=token, auth_type="pat") # set pat explicitly to avoid issues with SP client
Loading
Loading