- Who this is for: AI agents and developers working inside this repo.
- What you get: The minimum set of facts, files, and commands to navigate, modify, and run KFP locally.
- Last updated: 2026-03-13
- Scope: KFP master branch (v2 engine), backend (Go), SDK (Python), frontend (React 17)
- If you change commands, file paths, Make targets, environment variables, or workflows in this repo, update this guide in the relevant sections (Local development, Local testing, Local execution, Regenerate protobufs, Frontend development, CI/CD).
- When you add or change generated files, update the "🚫 NEVER EDIT DIRECTLY (Generated files)" section with sources and regeneration commands.
- When you change CI matrices (Kubernetes versions, pipeline stores, proxy/cache toggles, Argo versions) or add/remove workflows, update the CI/CD section.
- If you come across new common errors or fixes, extend "Common error patterns and quick fixes".
- Always bump the "Last updated" date above when you make substantive changes.
- Always reuse existing functions, helpers, and utilities before writing new code. Search the codebase for existing implementations that accomplish the same goal.
- Do not duplicate logic that already exists elsewhere in the repo. If a function, method, or pattern is already implemented, import and call it rather than reimplementing it.
- When adding new functionality, check related packages and modules for shared code that can be leveraged.
- If existing code needs slight modifications to be reusable, prefer refactoring the existing code to be more general over duplicating it with changes.
- Use descriptive variable and function names. Avoid abbreviations or single-letter names — prefer full, meaningful names that clearly convey purpose (e.g.,
executionIDoverexecID,fingerPrintoverfp).
- Every new non-trivial function, method, or exported API must have accompanying unit tests before merging. Trivial helpers and glue code may be excluded when testing adds no meaningful value.
- All existing tests must pass locally before pushing changes. Run the relevant test suites listed in the Local testing and Quick reference sections (Go backend, Python SDK,
kfp-kubernetes, and frontend). - When modifying existing functions, verify that existing tests still pass and add new test cases if the behavior changes.
- Do not submit changes that break existing tests. If a test failure is pre-existing and unrelated to your changes, note it explicitly in the PR description.
- Always sign off on commits with
git commit -s(adds aSigned-off-by:trailer). - Never include AI agents (e.g. Claude Code, Copilot, or similar tools) as co-authors on commits. The human author is responsible for the work.
- See
CONTRIBUTING.mdat the repo root for DCO sign-off requirements and general PR conventions.
- Start with inspecting the architectural diagram found here
images/kfp-cluster-wide-architecture.drawio.xml(rendered format can be found here:images/kfp-cluster-wide-architecture.png).
- SDK:
- Compiles Python DSL to the pipeline spec (IR YAML). See
sdk/python/kfp/compiler/pipeline_spec_builder.py. - The pipeline spec schema is defined via Protobufs under
api/. - Can execute pipelines locally via Subprocess or Docker runner modes.
- Compiles Python DSL to the pipeline spec (IR YAML). See
- API Server:
- On run creation, compiles the pipeline spec to Argo Workflows
Workflowobjects. - Uploads and runs pipelines remotely on a Kubernetes cluster.
- On run creation, compiles the pipeline spec to Argo Workflows
- Driver:
- Resolves input parameters.
- Computes the pod spec patch based on component resource requests/limits.
- All other Kubernetes configuration originates from the platform spec implemented by
kubernetes_platform.
- Launcher:
- Not used by Subprocess/Docker runners.
- Downloads input artifacts, uploads outputs, invokes the Python executor, handles executor results.
- Python executor:
- Entrypoint:
sdk/python/kfp/dsl/executor_main.py. - Never involved during the pipeline compilation stage.
- During task runtime,
kfpis installed with--no-depsand_KFP_RUNTIME=truedisables most SDK imports. - API Server mode: the Go launcher (copied via
initcontainer) executes the executor inside the user container defined by the componentbase_image(there is a default). - Subprocess/Docker runners: the launcher is skipped; executor runs directly.
- Entrypoint:
- All Python packages are installed under the
kfpnamespace. - KFP Python packages:
- kfp: Primary SDK (DSL, client, local execution).
- kfp-pipeline-spec: Protobuf-defined API contract used by SDK and backend.
- kfp-kubernetes: Kubernetes Python extension layer for
kfplocated atkubernetes_platform/pythonfor Kubernetes-specific settings and platform spec.
- The
kfp-kubernetespackage imports generated Python code fromkfp-pipeline-specand renames imports viakubernetes_platform/python/generate_proto.pyto resolve inconsistencies.
- Always use a
.venvvirtual environment.
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
make -C api python-dev
make -C kubernetes_platform python-dev
pip install -e api/v2alpha1/python --config-settings editable_mode=strict
pip install -e sdk/python --config-settings editable_mode=strict
pip install -e kubernetes_platform/python --config-settings editable_mode=strict- Ginkgo CLI for running Go-based test suites.
Install locally into ./bin:
make ginkgo
export PATH="$PWD/bin:$PATH" # ensure the ginkgo binary is on PATHOr install directly with go install into a project-local ./bin:
GOBIN=$PWD/bin go install github.com/onsi/ginkgo/v2/ginkgo@latest
export PATH="$PWD/bin:$PATH"KFP provides Make targets for setting up local Kind clusters for development and testing:
For deploying the latest master branch in standalone mode (single-user, no authentication):
make -C backend kind-cluster-agnosticThis target:
- Creates a Kind cluster named
dev-pipelines-api - Deploys KFP in standalone mode using
manifests/kustomize/env/platform-agnostic - Sets up MySQL database and metadata services
- Switches kubectl context to the
kubeflownamespace
For local API server development with additional debugging capabilities:
make -C backend dev-kind-clusterThis target:
- Creates a Kind cluster with webhook proxy support
- Installs cert-manager for certificate management
- Deploys KFP using
manifests/kustomize/env/dev-kind - Includes webhook proxy for advanced debugging scenarios
KFP supports two main deployment modes:
Standalone Mode:
- Single-user deployment without authentication
- Simpler setup, ideal for development and testing
- Uses manifests from
manifests/kustomize/env/platform-agnosticormanifests/kustomize/env/cert-manager/platform-agnostic-k8s-native - All users have full access to all pipelines and experiments
Multi-user Mode:
- Multi-tenant deployment with authentication and authorization
- Requires integration with identity providers (e.g., Dex, OIDC)
- Uses manifests from
manifests/kustomize/env/cert-manager/platform-agnostic-multi-userormanifests/kustomize/env/cert-manager/platform-agnostic-multi-user-k8s-native - Includes user isolation, namespace-based access control, and Istio integration
- Suitable for production environments with multiple users/teams
See manifests/kustomize/README.md for full installation options and environment-specific instructions.
- Python (SDK):
pip install -r sdk/python/requirements-dev.txt
pytest -v sdk/python/kfp- Python (
kfp-kubernetes):
pytest -v kubernetes_platform/python/test- Go (backend) unit tests only, excluding integration/API/Compiler/E2E tests:
go test -v $(go list ./backend/... | \
grep -v backend/test/v2/api | \
grep -v backend/test/integration | \
grep -v backend/test/v2/integration | \
grep -v backend/test/initialization | \
grep -v backend/test/v2/initialization | \
grep -v backend/test/compiler | \
grep -v backend/test/end2end)Notes:
- API Server tests under
backend/test/v2/apiare integration tests run with Ginkgo; they require a running cluster and are not part of unit tests. - Compiler tests live under
backend/test/compilerand E2E tests underbackend/test/end2end; both are Ginkgo-based and excluded from unit presubmits. - See
test/README.mdfor integration/E2E test infrastructure details (Kind clusters, GitHub Actions setup).
- Compiler tests:
# Run compiler tests
ginkgo -v ./backend/test/compiler
# Update compiled workflow goldens when intended
ginkgo -v ./backend/test/compiler -- -updateCompiledFiles=true
# Auto-create missing goldens (default true); disable with:
ginkgo -v ./backend/test/compiler -- -createGoldenFiles=false- v2 API integration tests (label-filterable):
# All API tests
ginkgo -v ./backend/test/v2/api
# Example: run only Smoke-labeled tests with ginkgo
ginkgo -v --label-filter="Smoke" ./backend/test/v2/api- End-to-end tests:
ginkgo -v ./backend/test/end2end -- -namespace=kubeflow -isDebugMode=trueTest data is centralized under:
test_data/pipeline_files/valid/(inputs) with avalid/critical/subset for smoke lanestest_data/compiled-workflows/(expected compiled Argo Workflows)
- Subprocess Runner (no Docker required):
from kfp import local
local.init(runner=local.SubprocessRunner())
# Run components directly
task = my_component(param="value")
print(task.output)- Docker Runner (requires Docker):
from kfp import local
local.init(runner=local.DockerRunner())
# Runs components in containers
task = my_component(param="value")- Pipeline execution:
# Pipelines can be executed like regular functions
run = my_pipeline(input_param="test")
# If the pipeline has a single output:
print(run.output)
# Or, for named outputs:
print(run.outputs['<output_name>'])Note: Local execution outputs are stored in ./local_outputs by default.
Notes:
- SubprocessRunner supports only Lightweight Python Components (executes the KFP Python executor directly).
- Use DockerRunner for Container Components or when task images require containerized execution.
- Pipeline spec Protobufs live under
api/(seeapi/README.mdfor proto code generation details). - Backend API generation tools and prerequisites are documented in
backend/api/README.md. - Run both Python and Go generations:
make -C api python && make -C api golang-
Note for Linux with SELinux: protoc-related steps may fail under enforcing mode.
- Temporarily disable before generation:
sudo setenforce 0 - Re-enable after:
sudo setenforce 1
- Temporarily disable before generation:
-
api/v2alpha1/python/kfp/pipeline_spec/pipeline_spec_pb2.pyis NOT committed. Any workflow or script installingkfp/apifrom source must generate this file beforehand.
The following files are generated; edit their sources and regenerate:
api/v2alpha1/python/kfp/pipeline_spec/pipeline_spec_pb2.py- Source:
api/v2alpha1/pipeline_spec.proto - Generate:
make -C api python(ormake -C api python-devfor editable local dev)
- Source:
kubernetes_platform/python/kfp/kubernetes/kubernetes_executor_config_pb2.py- Source:
kubernetes_platform/proto/kubernetes_executor_config.proto - Generate:
make -C kubernetes_platform python(ormake -C kubernetes_platform python-dev)
- Source:
- Frontend API clients under
frontend/src/apisandfrontend/src/apisv2beta1- Sources: Swagger specs under
backend/api/**/swagger/*.json - Generate:
cd frontend && npm run apis/npm run apis:v2beta1/npm run apis:all(uses pinned Docker imageopenapitools/openapi-generator-cli:v7.19.0)
- Sources: Swagger specs under
- Frontend MLMD proto outputs under
frontend/src/third_party/mlmd/generated- Sources:
third_party/ml-metadata/*.proto - Generate:
cd frontend && npm run build:protos
- Sources:
- Architecture diagram:
images/kfp-cluster-wide-architecture.png - SDK compiler:
sdk/python/kfp/compiler/pipeline_spec_builder.py - DSL core:
sdk/python/kfp/dsl/(e.g.,component_factory.py,pipeline_task.py,pipeline_context.py) - Executor entrypoint:
sdk/python/kfp/dsl/executor_main.py - Platform integration (Python):
kubernetes_platform/python/kfp/ - Platform spec proto:
kubernetes_platform/proto/(seekubernetes_platform/README.mdfor proto generation and library setup) - API definitions (Protobufs):
api/ - Backend (API server, driver, launcher, etc.):
backend/(seebackend/README.mdfor build, test, and local development setup) - Backend test suites:
backend/test/compiler,backend/test/v2/api,backend/test/end2end - Frontend:
frontend/(React TypeScript, seefrontend/README.mdfor quick start andfrontend/CONTRIBUTING.mdfor detailed guidelines) - Manifests (Kustomize bases/overlays for deployments):
manifests/ - CI manifests and overlays used by workflows:
.github/resources/manifests/{kubernetes-native,multiuser,standalone} - Test data (inputs/goldens):
test_data/pipeline_files/valid/,test_data/compiled-workflows/
- SDK reference docs are auto-generated with Sphinx using autodoc from Python docstrings. Keep SDK docstrings user-facing and accurate, as they appear in published documentation.
The KFP frontend is a React TypeScript application that provides the web UI for Kubeflow Pipelines.
-
Node.js version specified in
frontend/.nvmrc(currently v22.19.0) -
Docker (required for frontend API client generation via OpenAPI Generator container)
-
Use nvm or fnm for Node version management:
# With fnm (faster) fnm install 22.19.0 && fnm use 22.19.0 # With nvm nvm install 22.19.0 && nvm use 22.19.0
cd frontend
npm ci # Install exact dependencies from package-lock.jsonQuick start for UI development without backend dependencies:
npm run mock:api # Start mock backend server on port 3001
npm start # Start Vite dev server on port 3000 (hot reload)For full integration testing against a real KFP deployment:
-
Single-user mode:
# Deploy KFP standalone (see Local cluster deployment section) make -C backend kind-cluster-agnostic # Scale down cluster UI kubectl -n kubeflow scale --replicas=0 deployment/ml-pipeline-ui # Start local development npm run start:proxy-and-server # Proxy to cluster + hot reload
-
Multi-user mode:
export VITE_NAMESPACE=kubeflow-user-example-com npm run build # Install mod-header Chrome extension for auth headers npm run start:proxy-and-server
- React 17 with TypeScript
- Material-UI v3 for components
- React Router v5 for navigation
- Dagre for graph layout visualization
- D3 for data visualization
- Vitest + Testing Library for UI testing
- Jest for frontend server tests (UI tests migrated off Jest/Enzyme)
- Prettier + ESLint for code formatting/linting
- Storybook for component development
- Tailwind CSS for utility-first styling
npm start- Start Vite dev server with hot reload (port 3000)npm run start:proxy-and-server- Full development with cluster proxynpm run mock:api- Start mock backend API server (port 3001)npm run build- Production buildnpm run test- Run Vitest UI tests (same astest:ui, withLC_ALLset)npm run test:ui- Run Vitest UI testsnpm run test:ui:coverage- Run Vitest UI tests with coveragenpm run test:ui:coverage:loop- Run Vitest UI coverage with a capped worker count (stability loop)npm run test -u- Update Vitest snapshotsnpm run lint- Run ESLintnpm run typecheck- Run TypeScript typecheck (tsc --noEmit)npm run check:react-peers- Enforce lockfile React peer compatibility for current target (React 17 today)npm run check:react-peers:18- Preview lockfile React peer compatibility against React 18npm run check:react-peers:19- Preview lockfile React peer compatibility against React 19npm run format- Format code with Prettiernpm run storybook- Start Storybook on port 6006
The frontend includes several generated code components:
-
API clients: Generated from backend Swagger specs
npm run apis:all # Generate all frontend + server API clients npm run apis # Generate v1 API clients npm run apis:v2beta1 # Generate v2beta1 API clients
Note: These commands use Docker image
openapitools/openapi-generator-cli:v7.19.0. Ensure Docker is running. -
Protocol Buffers: Generated from proto definitions
npm run build:protos # MLMD protos npm run build:pipeline-spec # Pipeline spec protos npm run build:platform-spec:kubernetes-platform # K8s platform spec
- UI tests:
npm run test:uiornpm test(Vitest + Testing Library) - Server tests:
npm run test:server:coverage(Jest) - Coverage:
npm run test:ui:coverage(Vitest) +npm run test:coverage(Vitest UI + Jest server) - Stability loop:
npm run test:ui:coverage:loop(Vitest coverage with capped workers) - CI pipeline:
npm run test:ci(format check + lint + typecheck + lockfile React peer check + Vitest UI coverage + Jest coverage) - Snapshot tests: Auto-update with
npm test -uornpm run test:ui -- -u(Vitest) - Frontend integration tests: See
test/frontend-integration-test/README.mdfor the containerized local flow. Supported debug env vars includeDEBUG=1,HEADLESS=false, andWDIO_SPECS=./tensorboard-example.spec.js; headful runs expose Selenium's noVNC desktop on port7900.
- Workflows:
.github/workflows/(build, test, lint, release) - Composite actions:
.github/actions/(e.g.,kfp-k8s,create-cluster,deploy,test-and-report) - Typical checks: Go unit tests (backend), Python SDK tests, frontend tests/lint, image builds.
- Frontend workflow (
frontend.yml) verifies generated API clients are up to date by runningnpm run apis:alland failing on diff.
- Kubernetes versions: CI runs a matrix across a low and high supported version, commonly
v1.29.2andv1.31.0.- Examples:
e2e-test.yml,sdk-execution.yml,upgrade-test.yml,kfp-kubernetes-execution-tests.yml,kfp-webhooks.yml,api-server-tests.yml,compiler-tests.yml,legacy-v2-api-integration-tests.yml,integration-tests-v1.yml, and frontend integration ine2e-test-frontend.yml.
- Examples:
- Pipeline store variants (v2 engine): tests run with
databaseandkubernetesstores, and a dedicated job compiles pipelines to Kubernetes-native manifests.- Example:
e2e-test.ymljob "API integration tests v2 - K8s with ${pipeline_store}" and "compile pipelines with Kubernetes".
- Example:
- Argo Workflows version matrix for compatibility (where relevant): e.g.,
e2e-test.ymlincludes an Argo job (e.g.,v3.5.14). - Proxy / cache toggles: dedicated jobs run with HTTP proxy enabled and with execution cache disabled to validate those modes.
- Artifacts: failing logs and test outputs are uploaded as workflow artifacts for debugging.
- Kind-based clusters are provisioned via the
kfp-clustercomposite action, parameterized byk8s_version,pipeline_store,proxy,cache_enabled, and optionalargo_version. - The
create-clusteranddeployactions are used by newer suites;kfp-k8sinstalls SDK components from source inside jobs that execute Python-based tests. - The
protobufcomposite action preparesprotocand related dependencies when compiling Python protobufs. - The
create-clusteraction caches Kind node images by Kubernetes version to reduce Docker Hub pulls. - Python workflows use
actions/cache@v5for pip cache to reduce repeated dependency installs.
- Prettier config in
.prettierrc.yaml:- Single quotes, trailing commas, 100 char line width
- Format:
npm run format - Check:
npm run format:check
- ESLint extends
react-appwith custom rules in.eslintrc.yaml - Auto-format on save: Configure your IDE with the Prettier extension
Notes:
- Legacy
kfp-samples.ymlandperiodic.ymlworkflows were removed.
To verify all GitHub workflow path references are valid:
- Iterate through all workflow files in
.github/workflows/(both.ymland.yamlfiles) - Parse each YAML file and extract path references from:
working-directoryfieldsdockerfileandcontextfields in Docker build stepsscriptor command paths (look for./prefixes)- Any string values that appear to be file/directory paths
- Action references (e.g.,
./.github/actions/...)
- Clean extracted paths by removing
./prefixes and variable expansions - Verify each extracted path exists in the project filesystem
- Report missing paths and which workflows reference them
This verification ensures workflow integrity and prevents CI failures due to missing files or incorrect path references.
KFP frontend supports feature flags for development:
- Configure in
src/features.ts - Access via
http://localhost:3000/#/frontend_features - Manage locally:
localStorage.setItem('flags', "")
- Add new API: Update swagger specs, run
npm run apis:all - Update proto definitions: Modify protos, run respective build commands
- Add new component: Create in
atoms/orcomponents/, add tests and stories - Debug server: Use
npm run start:proxy-and-server-inspect - Bundle analysis:
npm run analyze-bundle
- Port conflicts: Frontend uses 3000 (React), 3001 (Node server), 3002 (API proxy)
- Node version issues: Ensure you're using the version in
.nvmrc - API generation failures: Ensure Docker is running and
dockerCLI is available in PATH - Proto generation: Requires
protocandprotoc-gen-grpc-webin PATH - Mock backend: Limited API support; use real cluster for full testing
- Go lint (CI uses
golangci-lint):
golangci-lint run- Python SDK import/order and unused import cleanups:
pip install -r sdk/python/requirements-dev.txt
pycln --check sdk/python
isort --check --profile google sdk/python- Python SDK formatting (YAPF + string fixer):
pip install yapf pre_commit_hooks
python3 -m pre_commit_hooks.string_fixer $(find sdk/python/kfp/**/*.py -type f)
yapf --recursive --diff sdk/python/- Python SDK docstring formatting:
pip install docformatter
docformatter --check --recursive sdk/python/ --exclude "compiler_test.py"- Modify pipeline spec schema:
- Edit Protobufs under
api/ - Regenerate:
make -C api python && make -C api golang - Update SDK/backend usages as needed
- Edit Protobufs under
- Adjust Kubernetes behavior for tasks:
- Resource requests/limits: set on component specs; the Driver converts these into pod spec patches.
- All other Kubernetes config: handled via
kubernetes_platformplatform spec.
- Compile pipeline:
kfp dsl compile --py pipeline.py --output pipeline.yaml - Generate protos:
make -C api python && make -C api golang - Deploy local cluster (standalone):
make -C backend kind-cluster-agnostic - Deploy local cluster (development) and run the API server in the IDE:
make -C backend dev-kind-cluster - Run SDK tests:
pytest -v sdk/python/kfp - Run backend unit tests:
go test -v $(go list ./backend/... | grep -v backend/test/) - Run compiler tests:
ginkgo -v ./backend/test/compiler - Run API tests:
ginkgo -v --label-filter="Smoke" ./backend/test/v2/api - Run E2E tests:
ginkgo -v ./backend/test/end2end -- -namespace=kubeflow - Check formatting:
yapf --recursive --diff sdk/python/ && pycln --check sdk/python && isort --check --profile google sdk/python - Frontend dev server:
cd frontend && npm start - Frontend with cluster:
cd frontend && npm run start:proxy-and-server - Frontend tests:
cd frontend && npm run test:ui(Vitest) ornpm test(same astest:ui) - Frontend React peer gate:
cd frontend && npm run check:react-peers(orcheck:react-peers:18/check:react-peers:19) - Frontend formatting:
cd frontend && npm run format - Generate frontend APIs:
cd frontend && npm run apis:all
_KFP_RUNTIME=true: Disables SDK imports during task executionVITE_NAMESPACE=...: Sets the target namespace for the frontend in multi-user modeLOCAL_API_SERVER=true: Enables local API server testing mode when running integration tests on a Kind cluster
_KFP_RUNTIME=trueduring executor runtime disables much of the SDK; avoid importing SDK-only modules from task code.kfpis installed into task containers with--no-deps; ensure runtime dependencies are present inbase_image.- SELinux enforcing can break proto generation; toggle with
setenforceas noted above. - Do not assume
pipeline_spec_pb2.pyexists in the repo; it must be generated. - Frontend API generation requires Docker (
openapitools/openapi-generator-cli:v7.19.0). - Frontend proto generation requires
protocandprotoc-gen-grpc-webbinaries. - Node version must match
.nvmrc; use nvm/fnm to manage versions. - Frontend port conflicts: 3000 (Vite), 3001 (Node server), 3002 (API proxy), 6006 (Storybook).
- Protobuf generation fails with "protoc: command not found": use the Make targets that run this in a container.
- Protobuf generation fails under SELinux enforcing: temporarily disable with
sudo setenforce 0; re-enable after. - API client generation fails with Docker errors (for example permission denied to Docker socket): ensure Docker is running and your user can access the Docker daemon.
- Frontend fails to start due to Node version mismatch:
nvm use $(cat frontend/.nvmrc)orfnm use. - Runtime component imports SDK-only modules:
_KFP_RUNTIME=truedisables many SDK imports; avoid importing SDK-only modules in task code.