-
Notifications
You must be signed in to change notification settings - Fork 69
✨ Add application skeleton #551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
e4b6289
26f6408
38b1546
003bd44
9a3e8d0
9c09325
ecba759
2de03f5
725b6a6
5adb5df
0cb5167
40235cd
a93f5b0
e9949e9
0f158d3
4de2ee6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -366,3 +366,59 @@ If you need a code example, use triple backticks, e.g.: | |||||
| print("Hello, world!") | ||||||
| ``` | ||||||
| </Admonition> | ||||||
|
|
||||||
| ## Contributing to the DQX App | ||||||
|
|
||||||
| DQX App is a web application that provides a user interface for DQX. It is built with React and FastAPI. | ||||||
|
|
||||||
| ### Prerequisites | ||||||
|
|
||||||
| To run the DQX App locally, you need to have the following dependencies installed: | ||||||
|
|
||||||
| - Node.js 20.X or higher | ||||||
| - `yarn` | ||||||
| - Python 3.10+ | ||||||
|
|
||||||
| The `ui` folder contains the frontend code for the DQX App. | ||||||
| The `src/databricks/labs/dqx/app` folder contains the backend code for the DQX App. | ||||||
|
|
||||||
| Add a new `.env` file in the root of the project with the following content: | ||||||
| ``` | ||||||
| DQX_DEV_TOKEN=<your-databricks-token> | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should recommend using oauth. Earlier in the docs we have a note on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| DATABRICKS_CONFIG_PROFILE=<your-databricks-profile> | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ``` | ||||||
|
|
||||||
| N.b. - The `DQX_DEV_TOKEN` is a Databricks token that you can generate in the Databricks UI or via the Databricks CLI: | ||||||
| ```bash | ||||||
| databricks token create --lifetime-seconds 3600 --comment "DQX App Development Token" | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ``` | ||||||
|
|
||||||
| ### Running the DQX App locally - frontend | ||||||
|
|
||||||
| First, build the frontend code: | ||||||
| ```bash | ||||||
| yarn --cwd ui install | ||||||
| yarn --cwd ui build | ||||||
| ``` | ||||||
|
|
||||||
| Then, run the frontend in development mode: | ||||||
| ```bash | ||||||
| yarn --cwd ui dev | ||||||
| ``` | ||||||
|
|
||||||
| Leave this console open and open a new terminal to run the backend in development mode. | ||||||
|
|
||||||
| ### Running the DQX App locally - backend | ||||||
|
|
||||||
| 1. Sync app dependencies: | ||||||
| ```bash | ||||||
| hatch run pip install -e ".[app]" | ||||||
| ``` | ||||||
|
|
||||||
| 2. Run the backend in development mode: | ||||||
| ```bash | ||||||
| uvicorn src.databricks.labs.dqx.app.app:app --reload | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only way i could make it work on my machine was to enter the shell |
||||||
| ``` | ||||||
|
|
||||||
| The UI should now be running at `http://localhost:5173`. | ||||||
| The backend should now be available at `http://localhost:8000`. | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,6 +47,14 @@ pii = [ | |
| # This may be required for the larger models due to Databricks connect memory limitations. | ||
| # The models cannot be delcared as dependency here buecase PyPI does not support URL-based dependencies which would prevent releases. | ||
| ] | ||
| app = [ | ||
| "fastapi", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should pin the versions |
||
| "uvicorn", | ||
| "pydantic-settings", | ||
| "python-dotenv", | ||
| "sqlmodel", | ||
| "sqlalchemy", | ||
| ] | ||
|
|
||
| [project.entry-points.databricks] | ||
| runtime = "databricks.labs.dqx.workflows_runner:main" | ||
|
|
@@ -78,6 +86,7 @@ dependencies = [ | |
| "pylint~=3.3.1", | ||
| "pylint-per-file-ignores~=1.3", | ||
| "pylint-pytest==2.0.0a0", | ||
| "pylint_pydantic", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pin the version |
||
| "pytest~=8.3.3", | ||
| "pytest-cov~=4.1.0", | ||
| "pytest-mock~=3.14.0", | ||
|
|
@@ -269,6 +278,7 @@ load-plugins = [ | |
| "pylint.extensions.set_membership", | ||
| "pylint.extensions.typing", | ||
| "pylint_per_file_ignores", | ||
| "pylint_pydantic" | ||
| ] | ||
|
|
||
| # Pickle collected data for later comparisons. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| from functools import partial | ||
| from fastapi import FastAPI, Request | ||
|
|
||
| from databricks.labs.dqx.app.models import VersionView, ProfileView | ||
| from databricks.labs.dqx.app.dependencies import get_user_workspace_client | ||
| from databricks.labs.dqx.app.config import rt | ||
| from databricks.labs.dqx.app.utils import custom_openapi | ||
|
|
||
|
|
||
| version_view = VersionView.current() | ||
|
|
||
| app = FastAPI( | ||
| title="DQX | UI", | ||
| version=version_view.version, | ||
| ) | ||
|
|
||
|
|
||
| @app.get("/version", response_model=VersionView, operation_id="Version") | ||
| async def version(): | ||
| return version_view | ||
|
|
||
|
|
||
| @app.get("/profile", response_model=ProfileView, operation_id="Profile") | ||
| async def profile(request: Request): | ||
| try: | ||
| ws = get_user_workspace_client(request) | ||
| return ProfileView.from_ws(ws) | ||
| except Exception as e: | ||
| rt.logger.error(f"Error getting user workspace client: {e}") | ||
| return ProfileView.from_request(request) | ||
|
|
||
|
|
||
| app.openapi = partial(custom_openapi, app) # type: ignore |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| from contextlib import asynccontextmanager | ||
| from fastapi import FastAPI | ||
| from fastapi.staticfiles import StaticFiles | ||
| from fastapi.responses import FileResponse | ||
| from fastapi.middleware.cors import CORSMiddleware | ||
| from fastapi.requests import Request | ||
| from databricks.labs.dqx.app.config import conf, rt | ||
| from databricks.labs.dqx.app.api import app as api_app | ||
|
|
||
|
|
||
| @asynccontextmanager | ||
| async def lifespan(app_instance: FastAPI): | ||
| rt.logger.info(f"Starting DQX App with instance {app_instance}") | ||
| yield | ||
|
|
||
|
|
||
| app = FastAPI(title="DQX App", lifespan=lifespan) | ||
| ui_app = StaticFiles(directory=conf.static_assets_path, html=True) | ||
|
|
||
| if conf.dev_token: | ||
| rt.logger.info("Adding CORS middleware for development") | ||
| origins = [ | ||
| "http://localhost:8000", | ||
| "http://localhost:5173", | ||
| "http://0.0.0.0:5173", | ||
| ] | ||
|
|
||
| api_app.add_middleware( | ||
| CORSMiddleware, | ||
| allow_origins=origins, | ||
| allow_credentials=True, | ||
| allow_methods=["*"], | ||
| allow_headers=["*"], | ||
| ) | ||
|
|
||
| # note the order of mounts! | ||
| app.mount("/api", api_app) | ||
| app.mount("/", ui_app) | ||
|
|
||
|
|
||
| @app.exception_handler(404) | ||
| async def client_side_routing(request: Request, exc: Exception): | ||
| rt.logger.error(f"Not found: {exc} while handling request {request}, returning index.html") | ||
| return FileResponse(conf.static_assets_path / "index.html") |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| from __future__ import annotations | ||
|
|
||
| from functools import cached_property | ||
| import logging | ||
| from logging import Logger | ||
| from pathlib import Path | ||
| import uuid | ||
|
|
||
| from dotenv import load_dotenv | ||
| from pydantic import BaseModel, ConfigDict, Field, SecretStr, model_validator | ||
| from pydantic_settings import BaseSettings, SettingsConfigDict | ||
| from sqlalchemy import Engine | ||
| from sqlmodel import Session, create_engine | ||
|
|
||
| from databricks.sdk import WorkspaceClient | ||
| from databricks.labs.dqx.app.utils import TimedCachedProperty, configure_logging | ||
|
|
||
| configure_logging() | ||
|
|
||
| logger = logging.getLogger("dqx.app") | ||
| logger.setLevel(logging.DEBUG) | ||
|
|
||
|
|
||
| # Load environment variables from .env file in the root directory of the project | ||
| project_root = Path(__file__).parent.parent.parent.parent.parent.parent | ||
| env_file = project_root / ".env" | ||
|
|
||
| if env_file.exists(): | ||
| logger.info(f"Loading environment variables from {env_file}") | ||
| load_dotenv(dotenv_path=env_file) | ||
| else: | ||
| logger.info(f"Env file {env_file} not found, continuing with current environment variables") | ||
|
|
||
|
|
||
| class DatabaseConfig(BaseModel): | ||
| instance_name: str = Field(default="dqx") | ||
| port: int = Field(default=5432) | ||
| database: str = Field(default="databricks_postgres") | ||
|
|
||
|
|
||
| class AppConfig(BaseSettings): | ||
| model_config = SettingsConfigDict(env_file=env_file, env_prefix="DQX_", extra="allow") | ||
|
|
||
| static_assets_path: Path = Field( | ||
| default=Path(__file__).parent / "static" / "dist", | ||
| description="Path to the static assets directory", | ||
| ) | ||
|
|
||
| dev_token: SecretStr | None = Field( | ||
| default=None, | ||
| description="Token for local development", | ||
| ) | ||
|
|
||
| db: DatabaseConfig = Field(default_factory=DatabaseConfig) | ||
|
|
||
|
|
||
| class ConnectionInfo(BaseModel): | ||
| host: str | ||
| port: int | ||
| user: str | ||
| password: str | ||
| database: str | ||
|
|
||
| def to_url(self) -> str: | ||
| return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.database}?sslmode=require" | ||
|
|
||
|
|
||
| class DatabaseManager(BaseModel): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should not belong to config |
||
| rt: Runtime | ||
|
|
||
| model_config = ConfigDict(ignored_types=(TimedCachedProperty,)) | ||
|
|
||
| def get_connection_info(self) -> ConnectionInfo: | ||
| """ | ||
| Returns the SQLAlchemy engine URL used for database operations. | ||
| This URL is initialized with the database URL and can be used to create sessions. | ||
| The URL is cached for 30 minutes to improve performance while ensuring | ||
| credentials are refreshed periodically. | ||
| """ | ||
| instance = self.rt.ws.database.get_database_instance(name=self.rt.conf.db.instance_name) | ||
| cred = self.rt.ws.database.generate_database_credential( | ||
| request_id=str(uuid.uuid4()), instance_names=[self.rt.conf.db.instance_name] | ||
| ) | ||
| user = self.rt.ws.current_user.me().user_name | ||
| pwd = cred.token | ||
| host = instance.read_write_dns | ||
| assert host is not None, "Host is not found" | ||
| assert user is not None, "User is not found" | ||
| assert pwd is not None, "Password is not found" | ||
|
|
||
| return ConnectionInfo( | ||
| host=host, | ||
| port=self.rt.conf.db.port, | ||
| user=user, | ||
| password=pwd, | ||
| database=self.rt.conf.db.database, | ||
| ) | ||
|
|
||
| @TimedCachedProperty[Engine](ttl_seconds=30 * 60) # 30 minutes | ||
| def engine(self) -> Engine: | ||
| """ | ||
| Returns the SQLAlchemy engine used for database operations. | ||
| This engine is initialized with the database URL and can be used to create sessions. | ||
| The engine is cached for 30 minutes to improve performance while ensuring | ||
| credentials are refreshed periodically. | ||
| """ | ||
| self.rt.logger.info("Creating new SQLAlchemy engine (cache expired or first time)") | ||
| return create_engine( | ||
| self.get_connection_info().to_url(), | ||
| pool_size=2, | ||
| max_overflow=0, | ||
| ) | ||
|
|
||
| def session(self) -> Session: | ||
| """ | ||
| Returns the SQLModel session used for database operations. | ||
| This session is initialized with the engine and can be used to create transactions. | ||
| The session is cached for 30 minutes to improve performance while ensuring | ||
| credentials are refreshed periodically. | ||
| """ | ||
| return Session(self.engine) | ||
|
|
||
|
|
||
| class Runtime(BaseModel): | ||
| conf: AppConfig | ||
|
|
||
| model_config = ConfigDict(ignored_types=(TimedCachedProperty,)) | ||
|
|
||
| @cached_property | ||
| def logger(self) -> Logger: | ||
| return logger | ||
|
|
||
| @cached_property | ||
| def ws(self) -> WorkspaceClient: | ||
| """ | ||
| Returns the service principal client. | ||
| """ | ||
| return WorkspaceClient() | ||
|
|
||
| @model_validator(mode="after") | ||
| def validate_conf(self) -> Runtime: | ||
| try: | ||
| self.ws.current_user.me() | ||
| except Exception as e: | ||
| self.logger.error("Cannot connect to Databricks API using service principal") | ||
| raise e | ||
|
|
||
| logger.info(f"App initialized with config: {self.conf}") | ||
| return self | ||
|
|
||
|
|
||
| conf = AppConfig() | ||
| rt = Runtime(conf=conf) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| from fastapi import Request | ||
|
|
||
| from databricks.sdk import WorkspaceClient | ||
| from databricks.labs.dqx.app.config import conf, rt | ||
|
|
||
|
|
||
| def get_user_workspace_client( | ||
| request: Request, | ||
| ) -> WorkspaceClient: | ||
| """ | ||
| Returns a Databricks Workspace client with authentication behalf of user. | ||
| If the request contains an X-Forwarded-Access-Token header, on behalf of user authentication is used. | ||
| Otherwise, the client is created using the default environemnt variables (e.g. during local development) | ||
mwojtyczka marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| """ | ||
| token = ( | ||
| request.headers.get("X-Forwarded-Access-Token") or conf.dev_token.get_secret_value() if conf.dev_token else None | ||
| ) | ||
| if not token: | ||
| raise ValueError("No token for authentication provided in request headers or environment variables") | ||
|
|
||
| rt.logger.info("Received OBO token, initializing client with it") | ||
| return WorkspaceClient(token=token, auth_type="pat") # set pat explicitly to avoid issues with SP client | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mention Lakehouse app as deployment mechanism to give a bit more context?