Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ istio-*
venv/
.python-version

# Claude code
# AI agent tool configs (keep local, don't track)
.cursor/
.claude/

**/go.work
Expand Down
54 changes: 50 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ AI agents should:
- Include in the commit message: `Assisted-by: [Agent Name]`, or `Co-authored-by: [Agent Name]`.
- Avoid automated responses/comments to the Pull Requests or Issues on GitHub.


Agents must NOT:

- Bypass tests or linters
- Introduce dependencies without updating `go.mod` / `go.sum`
- Introduce dependencies without updating `go.mod` / `go.sum` (Go), `pyproject.toml` (Python), or `package.json` (TypeScript)
- Generate or commit large autogenerated files (use the `gen/*` Make targets instead)
- Modify OpenAPI specs (`api/openapi/`) without regenerating dependent code
- Modify CRD schemas or API versions without explicit instruction

### Context Awareness

Expand All @@ -35,6 +35,10 @@ Before writing code, agents should:
- Match import patterns and error-handling conventions from neighboring files
- Preserve existing logging style (uses `golang/glog`)
- Understand the module boundary — this is a Go workspace with multiple `go.mod` files
- Review OpenAPI specs in `api/openapi/` before changing API structures
- Call out any breaking changes introduced and follow the deprecation policy

For additional context see the [Model Registry documentation](https://www.kubeflow.org/docs/components/model-registry/overview/).

## Repository Map

Expand All @@ -48,7 +52,9 @@ catalog/ # Catalog service — federated model discovery
├── Makefile # Catalog-specific build/test targets
clients/ # Client libraries and UI
├── python/ # Python client for model-registry API (Poetry)
└── ui/ # React UI + Go BFF (separate workflow, not covered here)
└── ui/ # React UI + Go BFF (separate workflow, see UI Development below)
├── bff/ # Backend-for-Frontend (Go)
└── frontend/ # React/TypeScript frontend
cmd/ # CLI entry points
├── controller/ # Kubernetes controller for ModelRegistry CRDs
├── csi/ # Container Storage Interface (storage-initializer)
Expand Down Expand Up @@ -78,6 +84,7 @@ jobs/ # Background jobs
manifests/kustomize/ # Kubernetes deployment manifests (Kustomize)
scripts/ # Build and utility scripts
devenv/ # Local development environment (Tilt-based)
docs/ # Documentation (see docs/dev_kind_environment.md for Kind setup)
test/ # Integration/E2E test scripts
```

Expand Down Expand Up @@ -233,6 +240,24 @@ make compose/down # Stop services
make compose/clean # Remove all volumes and networks
```

### UI Development

```bash
# From clients/ui/
make dev-install-dependencies # Install all UI dependencies
make dev-start # Start BFF + frontend for standalone mode

# Or individually:
cd clients/ui/bff && go run ./cmd --port=4000 --dev-mode --deployment-mode=standalone
cd clients/ui/frontend && DEPLOYMENT_MODE=standalone STYLE_THEME=patternfly npm run start:dev

# UI tests
cd clients/ui/frontend && npm run test:unit
cd clients/ui/frontend && npm run test:type-check
```

For Kind-based UI development with Tilt, see [docs/dev_kind_environment.md](./docs/dev_kind_environment.md).

## CI Checks (what runs on PRs)

The following checks run automatically on pull requests:
Expand Down Expand Up @@ -300,10 +325,23 @@ The following directories contain auto-generated code. Modify the sources and re
- Use the `stretchr/testify` package for test assertions (already a project dependency)
- Use Testcontainers for integration tests that need a real database

### TypeScript/React conventions

- Follow existing component patterns in `clients/ui/frontend/src/`
- Use PatternFly components for UI elements
- Prefer functional components with hooks
- Write unit tests for new components

### Python conventions

- Follow existing patterns in `clients/python/` and `catalog/clients/python/`
- Use Poetry for dependency management
- Write tests with pytest

### Testing requirements

- Bug fixes MUST include a test that reproduces the bug
- Unit tests live alongside source files as `*_test.go`
- Unit tests live alongside source files as `*_test.go` (Go), `*.test.ts`/`*.test.tsx` (TypeScript), `*_test.py`/`test_*.py` (Python)
- Integration tests use Testcontainers (MySQL and PostgreSQL) — see `internal/db/service/*_test.go` for patterns
- Controller tests use `envtest` (kubebuilder test framework) — see `cmd/controller/`

Expand Down Expand Up @@ -331,3 +369,11 @@ When modifying the REST API:
3. Validate: `make openapi/validate`
4. Regenerate server and client code: `make gen/openapi gen/openapi-server`
5. Update any affected handler logic in `internal/`

## Additional Resources

- [CONTRIBUTING.md](./CONTRIBUTING.md) — DCO, code of conduct, ARM/Mac setup, Kind deployment
- [clients/ui/CONTRIBUTING.md](./clients/ui/CONTRIBUTING.md) — UI-specific contribution guide
- [docs/dev_kind_environment.md](./docs/dev_kind_environment.md) — Kind local dev environment setup and troubleshooting
- [devenv/README.md](./devenv/README.md) — Tilt-based dev environment
- [Model Registry documentation](https://www.kubeflow.org/docs/components/model-registry/overview/)
184 changes: 184 additions & 0 deletions docs/dev_kind_environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Kind Dev Environment

Local Kubernetes development environment for the model-registry UI using Kind and Tilt.

## Prerequisites

- A Docker-compatible runtime (e.g. Docker Desktop, Colima, Podman with `podman-docker`)
- [Kind](https://kind.sigs.k8s.io/) installed
- kubectl installed
- Go >= 1.25.7
- Node.js >= 22.0.0
- Tilt v0.33.22+ (auto-downloaded by `make tilt-up` if not present)

## Architecture

The dev environment runs three concurrent processes:

| Component | Directory | Port | Purpose |
|-----------|-----------|------|---------|
| Infrastructure | `devenv/` | 10350 (Tilt) | Kind cluster + Tilt for deploying model-registry resources |
| BFF | `clients/ui/bff/` | 4000 | Go backend-for-frontend server |
| Frontend | `clients/ui/frontend/` | 9000 | React dev server (proxies to BFF) |

## Quick Start

### 1. Start Infrastructure

```bash
# Start your Docker runtime (example with Colima on macOS):
colima start

# Create the Kind cluster (if it doesn't exist):
kind get clusters | grep -q '^model-registry$' || kind create cluster --name model-registry
kubectl config use-context kind-model-registry

# Start Tilt (auto-downloads if not installed):
cd devenv && make tilt-up
```

### 2. Start BFF

```bash
cd clients/ui/bff
go run ./cmd --port=4000 --dev-mode \
--dev-mode-model-registry-port=8080 \
--dev-mode-catalog-port=8082 \
--deployment-mode=standalone
```

Add `--mock-k8s-client` if you don't need a real cluster for basic UI testing.

### 3. Start Frontend

```bash
cd clients/ui/frontend
DEPLOYMENT_MODE=standalone STYLE_THEME=patternfly npm run start:dev
```

### 4. RBAC Setup (real K8s only)

When using the real K8s client, the BFF's namespace registry access check uses SubjectAccessReview with only the `User` field (no groups), so group-based RoleBindings don't take effect. Create a ClusterRoleBinding that directly binds each namespace's default ServiceAccount:

```bash
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: model-registry-all-sa-service-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: model-registry-ui-services-reader
subjects:
- kind: ServiceAccount
name: default
namespace: default
- kind: ServiceAccount
name: default
namespace: kubeflow
- kind: ServiceAccount
name: default
namespace: minio
- kind: ServiceAccount
name: default
namespace: kube-system
- kind: ServiceAccount
name: default
namespace: kube-public
- kind: ServiceAccount
name: default
namespace: kube-node-lease
- kind: ServiceAccount
name: default
namespace: local-path-storage
EOF
```

Without this, the UI shows "The selected namespace does not have access to this model registry" for every namespace.

## Optional Components

### Model Catalog (with performance data)

```bash
./scripts/deploy_catalog_demo_on_kind.sh
kubectl port-forward -n model-catalog svc/model-catalog-server 8082:8080
```

The BFF connects to this on `--dev-mode-catalog-port=8082`. See [deploy_catalog_demo_on_kind.sh](../scripts/deploy_catalog_demo_on_kind.sh) for details.

### MinIO (S3 storage for transfer jobs)

```bash
./scripts/deploy_minio_on_kind.sh
```

| Detail | Value |
|--------|-------|
| Internal endpoint | `http://minio.minio.svc.cluster.local:9000` |
| Bucket | `default` |
| Credentials | `minioadmin` / `minioadmin` |
| Console NodePort | `30091` |
| K8s Secret | `minio-secret` (namespace: `minio`) |

Upload test data:

```bash
kubectl run minio-upload --rm -i --restart=Never -n minio \
--image=minio/mc --command -- sh -c '
mc --config-dir /tmp alias set local http://minio:9000 minioadmin minioadmin
echo "sample model content" | mc --config-dir /tmp pipe local/default/models/sample-model/model.txt
'
```

### OCI Model Transfer Jobs

Test S3-to-OCI model transfer jobs end-to-end. Requires MinIO (above) and a destination OCI registry (e.g. quay.io).

No local ARM64 image build is needed — upstream, midstream, and downstream each have their own async-upload images.

**Form values when creating a transfer job via UI:**

| Field | Value | Notes |
|-------|-------|-------|
| Source type | `s3` | |
| S3 endpoint | `http://minio.minio:9000` | Internal cluster DNS |
| S3 bucket | `default` | |
| S3 key | `models/sample-model/` | Directory prefix, **not** full file path |
| S3 access key | `minioadmin` | |
| S3 secret key | `minioadmin` | |
| Destination type | `oci` | |
| Destination URI | `quay.io/yourorg/yourrepo:tag` | OCI ref format, **no** `https://` |
| Destination registry | `quay.io` | |

## Teardown

```bash
# Stop everything and delete cluster:
./scripts/dev_teardown.sh

# Or keep the cluster for quick restart:
./scripts/dev_teardown.sh --keep-cluster
```

Override default ports with environment variables:

```bash
FRONTEND_PORT=9001 BFF_PORT=4001 ./scripts/dev_teardown.sh
```

## Troubleshooting

| Problem | Fix |
|---------|-----|
| Tilt refuses to start (production context) | `kubectl config use-context kind-model-registry` |
| Port conflict (4000, 9000, or 10350) | `lsof -ti:PORT \| xargs kill -9` |
| Frontend proxy `ECONNREFUSED` | BFF not ready yet; wait for it to start |
| `ImagePullBackOff` on async-upload job | Verify the correct image is configured for your environment (upstream/midstream/downstream each have their own) |
| "namespace does not have access to this model registry" | Apply the RBAC ClusterRoleBinding (see above). The BFF's SAR uses `User` only, not groups. |
| MinIO nodePort 30091 already allocated | MinIO already exists in `minio` namespace. Don't apply without `-n minio` or it creates a duplicate in `default`. |
| MinIO bucket missing after pod restart | MinIO has no PV. Re-run: `./scripts/deploy_minio_on_kind.sh` |
| `envtest` port lock error in Go tests | `rm -f ~/Library/Caches/kubebuilder-envtest/port-*` |
| Transfer job S3 download fails with EBUSY | Use directory prefix as source key, not full file path |
| Transfer job OCI push "invalid reference" | Use OCI ref format `quay.io/org/repo:tag`, not web URL |
37 changes: 37 additions & 0 deletions scripts/deploy_catalog_demo_on_kind.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env bash
# Run a kind cluster with Model Catalog demo overlay (full performance data).
# Prerequisites: Docker (or Colima) running, kind and kubectl installed.

set -e

CATALOG_NAMESPACE="${CATALOG_NAMESPACE:-model-catalog}"
CLUSTER_NAME="${CLUSTER_NAME:-model-registry}"
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"

cd "$REPO_ROOT"

echo "=== Creating kind cluster (if needed) ==="
if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
echo "Cluster ${CLUSTER_NAME} already exists."
kubectl config use-context "kind-${CLUSTER_NAME}"
else
kind create cluster --name "$CLUSTER_NAME"
fi

echo "=== Creating namespace ${CATALOG_NAMESPACE} ==="
kubectl create namespace "$CATALOG_NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

echo "=== Deploying Model Catalog with demo overlay (full performance data) ==="
kubectl apply -k manifests/kustomize/options/catalog/overlays/demo -n "$CATALOG_NAMESPACE"

echo "=== Waiting for Model Catalog Postgres to be ready ==="
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=postgres,app.kubernetes.io/part-of=model-catalog -n "$CATALOG_NAMESPACE" --timeout=120s 2>/dev/null || true
echo "=== Waiting for Model Catalog server (with perf data) to be available ==="
kubectl wait --for=condition=available deployment/model-catalog-server -n "$CATALOG_NAMESPACE" --timeout=5m

echo "=== Done ==="
kubectl get pods -n "$CATALOG_NAMESPACE"
echo ""
echo "To access the Model Catalog API (with performance metrics):"
echo " kubectl port-forward -n $CATALOG_NAMESPACE svc/model-catalog-server 8080:8080"
echo "Then open http://localhost:8080 (or use the API with performance-metrics from /perf-data)."
51 changes: 51 additions & 0 deletions scripts/dev_teardown.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env bash
# Tear down the model-registry dev environment.
# Stops Frontend, BFF, and Tilt. Optionally deletes the Kind cluster.
#
# Usage:
# ./scripts/dev_teardown.sh # Stop everything, delete cluster
# ./scripts/dev_teardown.sh --keep-cluster # Stop processes, keep cluster
#
# Environment variables (override default ports):
# FRONTEND_PORT (default: 9000)
# BFF_PORT (default: 4000)

set -e

FRONTEND_PORT="${FRONTEND_PORT:-9000}"
BFF_PORT="${BFF_PORT:-4000}"
CLUSTER_NAME="${CLUSTER_NAME:-model-registry}"
KEEP_CLUSTER=false
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"

for arg in "$@"; do
case "$arg" in
--keep-cluster) KEEP_CLUSTER=true ;;
esac
done

cd "$REPO_ROOT"

echo "=== Stopping Frontend (port ${FRONTEND_PORT}) ==="
lsof -ti:"${FRONTEND_PORT}" | xargs kill -9 2>/dev/null || true

echo "=== Stopping BFF (port ${BFF_PORT}) ==="
lsof -ti:"${BFF_PORT}" | xargs kill -9 2>/dev/null || true

echo "=== Stopping Tilt ==="
cd devenv && make tilt-down 2>/dev/null || true
cd "$REPO_ROOT"

if [ "$KEEP_CLUSTER" = false ]; then
echo "=== Deleting Kind cluster '${CLUSTER_NAME}' ==="
kind delete cluster --name "$CLUSTER_NAME"

if [ $(which colima 2>/dev/null) ]; then
echo "=== Stopping Colima ==="
colima stop 2>/dev/null
fi
else
echo "=== Keeping Kind cluster running ==="
fi

echo "=== Done ==="
Loading