|
| 1 | +# Querybook |
| 2 | + |
| 3 | +Querybook is Pinterest's open-source Big Data IDE for discovering, creating, and sharing data analyses. It combines a rich-text editor, SQL query engine, charting, scheduling, and table documentation in a single web app. |
| 4 | + |
| 5 | +## Tech Stack |
| 6 | + |
| 7 | +- **Backend:** Python 3.10, Flask, SQLAlchemy (MySQL), Celery (Redis broker), Elasticsearch/OpenSearch, gevent + Flask-SocketIO (WebSockets), uWSGI (production) |
| 8 | +- **Frontend:** React 17, TypeScript, Redux, Webpack 5, CodeMirror (SQL editor), Draft.js (rich text), Chart.js/D3/ReactFlow |
| 9 | + |
| 10 | +## Directory Layout |
| 11 | + |
| 12 | +- `querybook/server/` — Flask backend |
| 13 | + - `app/` — app setup |
| 14 | + - `datasources/` — REST API endpoints |
| 15 | + - `logic/` — business logic |
| 16 | + - `models/` — SQLAlchemy models |
| 17 | + - `tasks/` — Celery tasks |
| 18 | + - `lib/` — utilities, executors, metastores |
| 19 | + - `env.py` — `QuerybookSettings` configuration |
| 20 | +- `querybook/webapp/` — React/TypeScript frontend |
| 21 | + - `components/` — React components |
| 22 | + - `hooks/` — custom React hooks |
| 23 | + - `redux/` — Redux store, actions, reducers |
| 24 | + - `lib/` — frontend utilities |
| 25 | + - `ui/` — reusable UI primitives |
| 26 | + - `resource/` — API client layer |
| 27 | +- `querybook/config/` — YAML config files |
| 28 | +- `plugins/` — plugin stubs (extension point for custom behavior) |
| 29 | +- `requirements/` — pip requirements (`base.txt`, `prod.txt`, `engine/*.txt`, `auth/*.txt`) |
| 30 | +- `containers/` — Docker Compose files (dev, prod, test) |
| 31 | +- `docs_website/` — Docusaurus documentation site |
| 32 | +- `helm/` / `k8s/` — Kubernetes deployment manifests |
| 33 | + |
| 34 | +## Plugin System |
| 35 | + |
| 36 | +Querybook is extended via plugins without forking. The env var `QUERYBOOK_PLUGIN` (default `./plugins`) points to a directory where plugin modules are discovered by `lib.utils.import_helper.import_module_with_default()`. |
| 37 | + |
| 38 | +Each plugin module exports a well-known variable (e.g. `ALL_PLUGIN_EXECUTORS`) that the server merges with built-in defaults. |
| 39 | + |
| 40 | +Key plugin types: `executor_plugin`, `metastore_plugin`, `auth_plugin`, `api_plugin`, `exporter_plugin`, `result_store_plugin`, `notifier_plugin`, `event_logger_plugin`, `stats_logger_plugin`, `job_plugin`, `tasks_plugin`, `dag_exporter_plugin`, `ai_assistant_plugin`, `vector_store_plugin`, `webpage_plugin`, `monkey_patch_plugin`, `query_validation_plugin`, `query_transpilation_plugin`, `engine_status_checker_plugin`, `table_uploader_plugin`. |
| 41 | + |
| 42 | +## Configuration |
| 43 | + |
| 44 | +Priority: **env vars > `querybook_config.yaml` > `querybook_default_config.yaml`**. |
| 45 | + |
| 46 | +Key settings live in `querybook/server/env.py` (`QuerybookSettings`). |
| 47 | + |
| 48 | +## Running Locally |
| 49 | + |
| 50 | +Start the full stack (web server, worker, scheduler, and all dependencies) with Docker Compose: |
| 51 | + |
| 52 | +```bash |
| 53 | +make |
| 54 | +``` |
| 55 | + |
| 56 | +This brings up everything and serves the app at http://localhost:10001. This is the primary command for local development. |
| 57 | + |
| 58 | +To restart individual services without bouncing the full stack: |
| 59 | + |
| 60 | +```bash |
| 61 | +make web # web server only |
| 62 | +make worker # celery worker |
| 63 | +make scheduler # celery beat |
| 64 | +``` |
| 65 | + |
| 66 | +## Making Commits |
| 67 | + |
| 68 | +When preparing a PR, run the relevant checks. CI runs all of the following via GitHub Actions (`.github/workflows/`), but must be manually triggered by a maintainer. |
| 69 | + |
| 70 | +Always run tests via `make test`, which builds a `querybook-test` Docker image and runs checks inside it. This ensures an isolated, reproducible environment. Do not run test commands (pytest, yarn, webpack) directly on the host. |
| 71 | + |
| 72 | +`make test` runs both backend and frontend checks: |
| 73 | +- **Backend** (anything under `querybook/server/`): pytest |
| 74 | +- **Frontend** (anything under `querybook/webapp/`): TypeScript type checking, Jest unit tests, ESLint, and production build verification |
| 75 | + |
| 76 | +**Formatting (all changes) — common CI failure:** |
| 77 | + |
| 78 | +`make test` does **not** run Prettier. CI runs Prettier separately via `pre-commit`, so formatting issues are a frequent cause of CI failures. After running `make test`, also run Prettier on changed files before pushing: |
| 79 | + |
| 80 | +```bash |
| 81 | +npx prettier --write <files> |
| 82 | +``` |
| 83 | + |
| 84 | +For a full formatting pass (Black for Python, Prettier for JS/TS, flake8): |
| 85 | + |
| 86 | +```bash |
| 87 | +pre-commit run --all-files |
| 88 | +``` |
| 89 | + |
| 90 | +## Maintaining This File |
| 91 | + |
| 92 | +**Include:** |
| 93 | +- Repo purpose, tech stack, and high-level architecture |
| 94 | +- Directory layout (key paths only) |
| 95 | +- How to run, test, and lint locally |
| 96 | +- Commit and PR workflow expectations |
| 97 | +- Plugin system overview and extension points |
| 98 | + |
| 99 | +**Do not include:** |
| 100 | +- Detailed API docs or function-level documentation |
| 101 | +- Inline code examples longer than 5 lines |
| 102 | +- Deployment runbooks or operational procedures (keep in README or docs/) |
| 103 | +- Credentials, secrets, or internal URLs |
| 104 | +- Information that changes frequently (version numbers, dependency lists) |
| 105 | +- Content already covered in README.md |
| 106 | +- Content that can be easily derived by AI agents (e.g. reading file trees, package.json) |
| 107 | +- References to internal/proprietary repos — this is an open-source project |
0 commit comments