Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ istio-*
venv/
.python-version

# Claude code
# AI agent tool configs (keep local, don't track)
.cursor/
.claude/

**/go.work
Expand Down
54 changes: 50 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ AI agents should:
- Include in the commit message: `Assisted-by: [Agent Name]`, or `Co-authored-by: [Agent Name]`.
- Avoid automated responses/comments to the Pull Requests or Issues on GitHub.


Agents must NOT:

- Bypass tests or linters
- Introduce dependencies without updating `go.mod` / `go.sum`
- Introduce dependencies without updating `go.mod` / `go.sum` (Go), `pyproject.toml` (Python), or `package.json` (TypeScript)
- Generate or commit large autogenerated files (use the `gen/*` Make targets instead)
- Modify OpenAPI specs (`api/openapi/`) without regenerating dependent code
- Modify CRD schemas or API versions without explicit instruction

### Context Awareness

Expand All @@ -35,6 +35,10 @@ Before writing code, agents should:
- Match import patterns and error-handling conventions from neighboring files
- Preserve existing logging style (uses `golang/glog`)
- Understand the module boundary — this is a Go workspace with multiple `go.mod` files
- Review OpenAPI specs in `api/openapi/` before changing API structures
- Call out any breaking changes introduced and follow the deprecation policy

For additional context see the [Model Registry documentation](https://www.kubeflow.org/docs/components/model-registry/overview/).

## Repository Map

Expand All @@ -48,7 +52,9 @@ catalog/ # Catalog service — federated model discovery
├── Makefile # Catalog-specific build/test targets
clients/ # Client libraries and UI
├── python/ # Python client for model-registry API (Poetry)
└── ui/ # React UI + Go BFF (separate workflow, not covered here)
└── ui/ # React UI + Go BFF (separate workflow, see UI Development below)
├── bff/ # Backend-for-Frontend (Go)
└── frontend/ # React/TypeScript frontend
cmd/ # CLI entry points
├── controller/ # Kubernetes controller for ModelRegistry CRDs
├── csi/ # Container Storage Interface (storage-initializer)
Expand Down Expand Up @@ -78,6 +84,7 @@ jobs/ # Background jobs
manifests/kustomize/ # Kubernetes deployment manifests (Kustomize)
scripts/ # Build and utility scripts
devenv/ # Local development environment (Tilt-based)
docs/ # Documentation (see docs/dev_kind_environment.md for Kind setup)
test/ # Integration/E2E test scripts
```

Expand Down Expand Up @@ -233,6 +240,24 @@ make compose/down # Stop services
make compose/clean # Remove all volumes and networks
```

### UI Development

```bash
# From clients/ui/
make dev-install-dependencies # Install all UI dependencies
make dev-start # Start BFF + frontend for standalone mode

# Or individually:
cd clients/ui/bff && go run ./cmd --port=4000 --dev-mode --deployment-mode=standalone
cd clients/ui/frontend && DEPLOYMENT_MODE=standalone STYLE_THEME=patternfly npm run start:dev

# UI tests
cd clients/ui/frontend && npm run test:unit
cd clients/ui/frontend && npm run test:type-check
```

For Kind-based UI development with Tilt, see [docs/dev_kind_environment.md](./docs/dev_kind_environment.md).

## CI Checks (what runs on PRs)

The following checks run automatically on pull requests:
Expand Down Expand Up @@ -300,10 +325,23 @@ The following directories contain auto-generated code. Modify the sources and re
- Use the `stretchr/testify` package for test assertions (already a project dependency)
- Use Testcontainers for integration tests that need a real database

### TypeScript/React conventions

- Follow existing component patterns in `clients/ui/frontend/src/`
- Use PatternFly components for UI elements
- Prefer functional components with hooks
- Write unit tests for new components

### Python conventions

- Follow existing patterns in `clients/python/` and `catalog/clients/python/`
- Use Poetry for dependency management
- Write tests with pytest

### Testing requirements

- Bug fixes MUST include a test that reproduces the bug
- Unit tests live alongside source files as `*_test.go`
- Unit tests live alongside source files as `*_test.go` (Go), `*.test.ts`/`*.test.tsx` (TypeScript), `*_test.py`/`test_*.py` (Python)
- Integration tests use Testcontainers (MySQL and PostgreSQL) — see `internal/db/service/*_test.go` for patterns
- Controller tests use `envtest` (kubebuilder test framework) — see `cmd/controller/`

Expand Down Expand Up @@ -331,3 +369,11 @@ When modifying the REST API:
3. Validate: `make openapi/validate`
4. Regenerate server and client code: `make gen/openapi gen/openapi-server`
5. Update any affected handler logic in `internal/`

## Additional Resources

- [CONTRIBUTING.md](./CONTRIBUTING.md) — DCO, code of conduct, ARM/Mac setup, Kind deployment
- [clients/ui/CONTRIBUTING.md](./clients/ui/CONTRIBUTING.md) — UI-specific contribution guide
- [docs/dev_kind_environment.md](./docs/dev_kind_environment.md) — Kind local dev environment setup and troubleshooting
- [devenv/README.md](./devenv/README.md) — Tilt-based dev environment
- [Model Registry documentation](https://www.kubeflow.org/docs/components/model-registry/overview/)
184 changes: 184 additions & 0 deletions docs/dev_kind_environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Kind Dev Environment

Local Kubernetes development environment for the model-registry UI using Kind and Tilt.

## Prerequisites

- A Docker-compatible runtime (e.g. Docker Desktop, Colima, Podman with `podman-docker`)
- [Kind](https://kind.sigs.k8s.io/) installed
- kubectl installed
- Go >= 1.25.7
- Node.js >= 22.0.0
- Tilt v0.33.22+ (auto-downloaded by `make tilt-up` if not present)

## Architecture

The dev environment runs three concurrent processes:

| Component | Directory | Port | Purpose |
|-----------|-----------|------|---------|
| Infrastructure | `devenv/` | 10350 (Tilt) | Kind cluster + Tilt for deploying model-registry resources |
| BFF | `clients/ui/bff/` | 4000 | Go backend-for-frontend server |
| Frontend | `clients/ui/frontend/` | 9000 | React dev server (proxies to BFF) |

## Quick Start

### 1. Start Infrastructure

```bash
# Start your Docker runtime (example with Colima on macOS):
colima start

# Create the Kind cluster (if it doesn't exist):
kind get clusters | grep -q '^model-registry$' || kind create cluster --name model-registry
kubectl config use-context kind-model-registry

# Start Tilt (auto-downloads if not installed):
cd devenv && make tilt-up
```

### 2. Start BFF

```bash
cd clients/ui/bff
go run ./cmd --port=4000 --dev-mode \
--dev-mode-model-registry-port=8080 \
--dev-mode-catalog-port=8082 \
--deployment-mode=standalone
```

Add `--mock-k8s-client` if you don't need a real cluster for basic UI testing.

### 3. Start Frontend

```bash
cd clients/ui/frontend
DEPLOYMENT_MODE=standalone STYLE_THEME=patternfly npm run start:dev
```

### 4. RBAC Setup (real K8s only)

When using the real K8s client, the BFF's namespace registry access check uses SubjectAccessReview with only the `User` field (no groups), so group-based RoleBindings don't take effect. Create a ClusterRoleBinding that directly binds each namespace's default ServiceAccount:

```bash
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: model-registry-all-sa-service-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: model-registry-ui-services-reader
subjects:
- kind: ServiceAccount
name: default
namespace: default
- kind: ServiceAccount
name: default
namespace: kubeflow
- kind: ServiceAccount
name: default
namespace: minio
- kind: ServiceAccount
name: default
namespace: kube-system
- kind: ServiceAccount
name: default
namespace: kube-public
- kind: ServiceAccount
name: default
namespace: kube-node-lease
- kind: ServiceAccount
name: default
namespace: local-path-storage
EOF
```

Without this, the UI shows "The selected namespace does not have access to this model registry" for every namespace.

## Optional Components

### Model Catalog (with performance data)

```bash
./scripts/deploy_catalog_demo_on_kind.sh
kubectl port-forward -n model-catalog svc/model-catalog-server 8082:8080
```

The BFF connects to this on `--dev-mode-catalog-port=8082`. See [deploy_catalog_demo_on_kind.sh](../scripts/deploy_catalog_demo_on_kind.sh) for details.

### MinIO (S3 storage for transfer jobs)

```bash
./scripts/deploy_minio_on_kind.sh
```

| Detail | Value |
|--------|-------|
| Internal endpoint | `http://minio.minio.svc.cluster.local:9000` |
| Bucket | `default` |
| Credentials | `minioadmin` / `minioadmin` |
| Console NodePort | `30091` |
| K8s Secret | `minio-secret` (namespace: `minio`) |

Upload test data:

```bash
kubectl run minio-upload --rm -i --restart=Never -n minio \
--image=minio/mc --command -- sh -c '
mc --config-dir /tmp alias set local http://minio:9000 minioadmin minioadmin
echo "sample model content" | mc --config-dir /tmp pipe local/default/models/sample-model/model.txt
'
```

### OCI Model Transfer Jobs

Test S3-to-OCI model transfer jobs end-to-end. Requires MinIO (above) and a destination OCI registry (e.g. quay.io).

No local ARM64 image build is needed — upstream, midstream, and downstream each have their own async-upload images.

**Form values when creating a transfer job via UI:**

| Field | Value | Notes |
|-------|-------|-------|
| Source type | `s3` | |
| S3 endpoint | `http://minio.minio:9000` | Internal cluster DNS |
| S3 bucket | `default` | |
| S3 key | `models/sample-model/` | Directory prefix, **not** full file path |
| S3 access key | `minioadmin` | |
| S3 secret key | `minioadmin` | |
| Destination type | `oci` | |
| Destination URI | `quay.io/yourorg/yourrepo:tag` | OCI ref format, **no** `https://` |
| Destination registry | `quay.io` | |

## Teardown

```bash
# Stop everything and delete cluster:
./scripts/dev_teardown.sh

# Or keep the cluster for quick restart:
./scripts/dev_teardown.sh --keep-cluster
```

Override default ports with environment variables:

```bash
FRONTEND_PORT=9001 BFF_PORT=4001 ./scripts/dev_teardown.sh
```

## Troubleshooting

| Problem | Fix |
|---------|-----|
| Tilt refuses to start (production context) | `kubectl config use-context kind-model-registry` |
| Port conflict (4000, 9000, or 10350) | `lsof -ti:PORT \| xargs kill -9` |
| Frontend proxy `ECONNREFUSED` | BFF not ready yet; wait for it to start |
| `ImagePullBackOff` on async-upload job | Verify the correct image is configured for your environment (upstream/midstream/downstream each have their own) |
| "namespace does not have access to this model registry" | Apply the RBAC ClusterRoleBinding (see above). The BFF's SAR uses `User` only, not groups. |
| MinIO nodePort 30091 already allocated | MinIO already exists in `minio` namespace. Don't apply without `-n minio` or it creates a duplicate in `default`. |
| MinIO bucket missing after pod restart | MinIO has no PV. Re-run: `./scripts/deploy_minio_on_kind.sh` |
| `envtest` port lock error in Go tests | `rm -f ~/Library/Caches/kubebuilder-envtest/port-*` |
| Transfer job S3 download fails with EBUSY | Use directory prefix as source key, not full file path |
| Transfer job OCI push "invalid reference" | Use OCI ref format `quay.io/org/repo:tag`, not web URL |
37 changes: 37 additions & 0 deletions scripts/deploy_catalog_demo_on_kind.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env bash
# Run a kind cluster with Model Catalog demo overlay (full performance data).
# Prerequisites: Docker (or Colima) running, kind and kubectl installed.

set -e

CATALOG_NAMESPACE="${CATALOG_NAMESPACE:-model-catalog}"
CLUSTER_NAME="${CLUSTER_NAME:-model-registry}"
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"

cd "$REPO_ROOT"

echo "=== Creating kind cluster (if needed) ==="
if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
echo "Cluster ${CLUSTER_NAME} already exists."
kubectl config use-context "kind-${CLUSTER_NAME}"
else
kind create cluster --name "$CLUSTER_NAME"
fi

echo "=== Creating namespace ${CATALOG_NAMESPACE} ==="
kubectl create namespace "$CATALOG_NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

echo "=== Deploying Model Catalog with demo overlay (full performance data) ==="
kubectl apply -k manifests/kustomize/options/catalog/overlays/demo -n "$CATALOG_NAMESPACE"

echo "=== Waiting for Model Catalog Postgres to be ready ==="
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=postgres,app.kubernetes.io/part-of=model-catalog -n "$CATALOG_NAMESPACE" --timeout=120s 2>/dev/null || true
echo "=== Waiting for Model Catalog server (with perf data) to be available ==="
kubectl wait --for=condition=available deployment/model-catalog-server -n "$CATALOG_NAMESPACE" --timeout=5m

echo "=== Done ==="
kubectl get pods -n "$CATALOG_NAMESPACE"
echo ""
echo "To access the Model Catalog API (with performance metrics):"
echo " kubectl port-forward -n $CATALOG_NAMESPACE svc/model-catalog-server 8080:8080"
echo "Then open http://localhost:8080 (or use the API with performance-metrics from /perf-data)."
49 changes: 49 additions & 0 deletions scripts/dev_teardown.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env bash
# Tear down the model-registry dev environment.
# Stops Frontend, BFF, and Tilt. Optionally deletes the Kind cluster.
#
# Usage:
# ./scripts/dev_teardown.sh # Stop everything, delete cluster
# ./scripts/dev_teardown.sh --keep-cluster # Stop processes, keep cluster
#
# Environment variables (override default ports):
# FRONTEND_PORT (default: 9000)
# BFF_PORT (default: 4000)

set -e

FRONTEND_PORT="${FRONTEND_PORT:-9000}"
BFF_PORT="${BFF_PORT:-4000}"
CLUSTER_NAME="${CLUSTER_NAME:-model-registry}"
KEEP_CLUSTER=false
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"

for arg in "$@"; do
case "$arg" in
--keep-cluster) KEEP_CLUSTER=true ;;
esac
done

cd "$REPO_ROOT"

echo "=== Stopping Frontend (port ${FRONTEND_PORT}) ==="
lsof -ti:"${FRONTEND_PORT}" | xargs kill -9 2>/dev/null || true

echo "=== Stopping BFF (port ${BFF_PORT}) ==="
lsof -ti:"${BFF_PORT}" | xargs kill -9 2>/dev/null || true

echo "=== Stopping Tilt ==="
cd devenv && make tilt-down 2>/dev/null || true
cd "$REPO_ROOT"

if [ "$KEEP_CLUSTER" = false ]; then
echo "=== Deleting Kind cluster '${CLUSTER_NAME}' ==="
kind delete cluster --name "$CLUSTER_NAME"

echo "=== Stopping Colima ==="
colima stop 2>/dev/null || true
else
echo "=== Keeping Kind cluster and Colima running ==="
Comment thread
manaswinidas marked this conversation as resolved.
Outdated
fi

echo "=== Done ==="
Loading