Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 5 additions & 8 deletions .github/workflows/async-upload-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,7 @@ on:
- ".github/workflows/**"

env:
IMG_REGISTRY: ghcr.io
IMG_ORG: kubeflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ease maintenance in m/s i would really prefer to have these at the top, this way we just adjust OCI reference in a single place when porting. Can you restore these Envs here, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are some issues with how the ENV vars are being passed around to downstream make commands, etc. So I think a more comprehensive refactor of all the make files will be needed. This was really done to get all the tests to actually use the variables it should be using. For example, even providing an IMG to the make file for the mr-server image will not always take the correct IMG_VERSION. There are some hard-coded instances in some of these makefiles.

Let me take a look and see if I can do a minimal refactor with the env vars here restored

IMG_NAME: model-registry/job/async-upload
PUSH_IMAGE: false
BRANCH: ${{ github.base_ref }}

jobs:
py-test:
Expand Down Expand Up @@ -49,6 +45,9 @@ jobs:
set -x
sudo apparmor_parser -R /etc/apparmor.d/usr.sbin.mysqld
- name: Run E2E tests
env:
IMG_VERSION: e2e
IMG: ghcr.io/kubeflow/model-registry/server:e2e
run: |
make test-e2e
job-test:
Expand All @@ -69,8 +68,6 @@ jobs:
run: |
set -x
sudo apparmor_parser -R /etc/apparmor.d/usr.sbin.mysqld
- name: Start KinD cluster and load images
- name: Execute Sample Job E2E test
run: |
make deploy-latest-mr # this also starts the KinD cluster
make dev-load-image
# TODO: to be continued here
make test-integration
1 change: 1 addition & 0 deletions jobs/async-upload/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.port-forwards.pid
37 changes: 29 additions & 8 deletions jobs/async-upload/Makefile
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
IMG_VERSION ?= latest
IMG_REGISTRY ?= ghcr.io
IMG_ORG ?= kubeflow
IMG_NAME ?= model-registry/job/async-upload
IMG ?= $(IMG_REGISTRY)/$(IMG_ORG)/$(IMG_NAME):$(IMG_VERSION)
JOB_IMG_VERSION ?= latest
JOB_IMG_REGISTRY ?= ghcr.io
JOB_IMG_ORG ?= kubeflow
JOB_IMG_NAME ?= model-registry/job/async-upload
JOB_IMG ?= $(JOB_IMG_REGISTRY)/$(JOB_IMG_ORG)/$(JOB_IMG_NAME):$(JOB_IMG_VERSION)
BUILD_IMAGE ?= true # whether to build the MR server image
CLUSTER_NAME ?= mr-e2e

# MR Server Params
IMG_VERSION ?= latest
IMG ?= ghcr.io/kubeflow/model-registry/server:$(IMG_VERSION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really tempted to

Suggested change
IMG ?= ghcr.io/kubeflow/model-registry/server:$(IMG_VERSION)
IMG ?= $(JOB_IMG_REGISTRY)/$(JOB_IMG_ORG)/model-registry/server:$(IMG_VERSION)

but I don't want to diverge on the scope of the PR too much :)


.PHONY: deploy-latest-mr
deploy-latest-mr:
cd ../../ && \
$(if $(filter true,$(BUILD_IMAGE)),\
IMG_VERSION=${IMG_VERSION} make image/build ARGS="--load$(if ${DEV_BUILD}, --target dev-build)" && \
IMG_VERSION=${IMG_VERSION} IMG=${IMG} make image/build ARGS="--load$(if ${DEV_BUILD}, --target dev-build)" && \
,\
docker pull $(IMG) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit suspicious; we're entering this code path when BUILD_IMAGE is true, which is also the default with BUILD_IMAGE ?= true definition; we don't have to docker pull which may pull from remote container registry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second block (the docker pull ...) should be executed when BUILD_IMAGE=false (technically, when BUILD_IMAGE is anything but true).

I added this because during the later step to load the image into the kind registry, it would fail since the image was not present locally. I suppose this block should not be needed anymore since I've changed the ci/cd to always build the image anyways. I will remove them

) \
LOCAL=1 ./scripts/deploy_on_kind.sh
kubectl port-forward -n kubeflow services/model-registry-service 8080:8080 & echo $$! >> .port-forwards.pid
Expand All @@ -28,8 +35,8 @@ deploy-local-registry:

.PHONY: dev-load-image
dev-load-image:
docker buildx build --load -t $(IMG) .
kind load docker-image $(IMG) -n mr-e2e
docker buildx build --load -t $(JOB_IMG) .
kind load docker-image $(JOB_IMG) -n $(CLUSTER_NAME)

.PHONY: test
test:
Expand Down Expand Up @@ -60,6 +67,20 @@ test-e2e-cleanup:
rm -f .port-forwards.pid; \
fi

.PHONY: test-integration
test-integration: deploy-latest-mr deploy-local-registry deploy-test-minio dev-load-image
@echo "Starting test-integration"
-$(MAKE) test-integration-run; STATUS=$$?
$(MAKE) test-e2e-cleanup
@exit $$STATUS

.PHONY: test-integration-run
test-integration-run:
@echo "Ensuring all extras are installed..."
poetry install --all-extras --with integration
@echo "Running integration tests..."
CONTAINER_IMAGE_URI=$(JOB_IMG) poetry run pytest --integration tests/integration/ -vs

.PHONY: install
install:
poetry install
385 changes: 379 additions & 6 deletions jobs/async-upload/poetry.lock

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion jobs/async-upload/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ model-registry = { version = ">=0.2.19,<0.3.0", extras = ["boto3", "olot"] }
pytest = ">=8.3.5,<9.0.0"
aiohttp = "^3.12.14"

[tool.poetry.group.integration.dependencies]
kubernetes = "^33.0.0"
requests = "^2.31.0"

[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]
build-backend = "poetry.core.masonry.api"
Expand All @@ -23,7 +27,8 @@ build-backend = "poetry.core.masonry.api"
testpaths = ["tests"]
python_files = ["test_*.py"]
markers = [
"e2e: marks tests as end-to-end tests"
"e2e: marks tests as end-to-end tests",
"integration: marks tests as integration tests"
]

[tool.poetry.scripts]
Expand Down
3 changes: 3 additions & 0 deletions jobs/async-upload/samples/sample_job_s3_to_oci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ apiVersion: v1
kind: Secret
metadata:
name: my-s3-credentials
namespace: default
stringData:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
Expand All @@ -14,6 +15,7 @@ apiVersion: v1
kind: Secret
metadata:
name: my-oci-credentials
namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
.dockerconfigjson: '{"auths": {"distribution-registry-test-service.default.svc.cluster.local:5001": {"auth": "","email": "[email protected]"}}}'
Expand All @@ -24,6 +26,7 @@ apiVersion: batch/v1
kind: Job
metadata:
name: my-async-upload-job
namespace: default
labels:
app.kubernetes.io/name: model-registry-async-job
app.kubernetes.io/component: async-job
Expand Down
25 changes: 23 additions & 2 deletions jobs/async-upload/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,29 @@ def pytest_collection_modifyitems(config, items):
skip_e2e = pytest.mark.skip(
reason="this is an end-to-end test, requires explicit opt-in --e2e option to run."
)
skip_integration = pytest.mark.skip(
reason="this is an integration test, requires explicit opt-in --integration option to run."
)
skip_not_e2e = pytest.mark.skip(
reason="skipping non-e2e tests; opt-out of --e2e -like options to run."
)
skip_not_integration = pytest.mark.skip(
reason="skipping non-integration tests; opt-out of --integration -like options to run."
)

e2e_option = config.getoption("--e2e")
integration_option = config.getoption("--integration")

if "e2e" in item.keywords:
if not config.getoption("--e2e"):
if not e2e_option:
item.add_marker(skip_e2e)
elif config.getoption("--e2e"):
elif "integration" in item.keywords:
if not integration_option:
item.add_marker(skip_integration)
elif e2e_option:
item.add_marker(skip_not_e2e)
elif integration_option:
item.add_marker(skip_not_integration)


def pytest_addoption(parser):
Expand All @@ -25,3 +40,9 @@ def pytest_addoption(parser):
default=False,
help="opt-in to run tests marked with e2e",
)
parser.addoption(
"--integration",
action="store_true",
default=False,
help="opt-in to run tests marked with integration",
)
83 changes: 83 additions & 0 deletions jobs/async-upload/tests/integration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Integration Tests for Async-Upload Job

This directory contains integration tests for the async-upload job functionality, specifically testing the complete workflow from model creation through job execution and validation.

## Installation

To run the integration tests, you need to install the integration test dependencies:

```bash
# Install all dependencies including integration test dependencies
poetry install --with integration

# Or install just the integration group
poetry install --only integration

# No external CLI tools required - everything is pure Python!
```

## Dependencies Added

The integration tests require the following additional dependencies:

### Main Dependencies (added to `[tool.poetry.dependencies]`)

- **`requests`**: For HTTP calls (downloading models, uploading to MinIO)
- **`pyyaml`**: For YAML processing (kustomization files)

### Integration Test Dependencies (added to `[tool.poetry.group.integration.dependencies]`)

- **`kubernetes`**: Official Python client for Kubernetes API operations

### Pure Python Approach

The integration tests use a pure Python approach without external dependencies:

- **No subprocess calls**: All operations use Python libraries
- **No kustomize CLI**: YAML patching is done using pure Python dict operations
- **No shell commands**: Everything is handled through Python APIs

## Running the Tests

```bash
# Run integration tests only
poetry run pytest --integration tests/integration/ -v

# Run with environment variables
MR_HOST_URL=http://my-registry:8080 poetry run pytest --integration tests/integration/ -v

# Run all tests including integration
poetry run pytest --integration tests/ -v
```

## Test Requirements

The integration tests require:

1. **Model Registry service** running (default: `http://localhost:8080`)
2. **MinIO service** running (default: `http://localhost:9000`)
3. **Kubernetes cluster** with kubectl configured
4. **OCI registry** for job artifact storage

## Environment Variables

- `MR_HOST_URL`: Model Registry URL (default: `http://localhost:8080`)
- `CONTAINER_IMAGE_URI`: Container image for the async-upload job (default: `ghcr.io/kubeflow/model-registry/job/async-upload:latest`)

## What the Tests Do

The integration tests validate the complete async-upload job workflow:

1. **Model Registry Setup**: Creates RegisteredModel, ModelVersion, and placeholder ModelArtifact
2. **File Operations**: Downloads ONNX model and uploads to MinIO using pure Python
3. **Kubernetes Job**: Creates and applies job using pure Python YAML patching (no kustomize CLI)
4. **Validation**: Verifies job completion and artifact state updates using kubernetes client

## Debugging Failed Tests

If tests fail, check:

1. **Services are running**: Model Registry, MinIO, Kubernetes cluster
2. **Connectivity**: Can reach all required services
3. **Permissions**: Kubernetes permissions for job creation
4. **Logs**: Integration test captures and displays pod logs on failure
1 change: 1 addition & 0 deletions jobs/async-upload/tests/integration/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Integration tests for async-upload job functionality."""
Loading
Loading