Skip to content

Commit 6c62f81

Browse files
xdu31mchmarny
andauthored
feat: implement Job-based validation framework with test wrapper infrastructure (#76)
Co-authored-by: Mark Chmarny <mchmarny@users.noreply.github.com>
1 parent 8a824e5 commit 6c62f81

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+14220
-83
lines changed

.github/actions/e2e/action.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ runs:
7272
# Verify image is available
7373
curl -sf http://localhost:5001/v2/eidos/tags/list
7474
75+
- name: Build and push validator image to local registry
76+
shell: bash
77+
run: |
78+
# Build validator image with Go toolchain for running validation tests
79+
docker build -f Dockerfile.validator -t localhost:5001/eidos-validator:local .
80+
docker push localhost:5001/eidos-validator:local
81+
# Verify image is available
82+
curl -sf http://localhost:5001/v2/eidos-validator/tags/list
83+
7584
- name: Set up fake GPU environment
7685
shell: bash
7786
run: |
@@ -99,6 +108,7 @@ runs:
99108
shell: bash
100109
env:
101110
EIDOS_IMAGE: localhost:5001/eidos:local
111+
EIDOS_VALIDATOR_IMAGE: localhost:5001/eidos-validator:local
102112
FAKE_GPU_ENABLED: "true"
103113
run: ./tests/e2e/run.sh
104114

.github/workflows/on-push.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ jobs:
6363
golangci_lint_version: ${{ steps.versions.outputs.golangci_lint }}
6464
addlicense_version: ${{ steps.versions.outputs.addlicense }}
6565
coverage_report: 'true'
66-
coverage_threshold: '70'
66+
coverage_threshold: '66'
6767

6868
integration:
6969
name: Integration Tests

.github/workflows/on-tag.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,13 @@ jobs:
161161
tag: ${{ github.ref_name }}
162162
crane_version: ${{ steps.versions.outputs.crane }}
163163

164+
- name: Attest eidos-validator image
165+
uses: ./.github/actions/attest-image-from-tag
166+
with:
167+
image_name: ghcr.io/nvidia/eidos-validator
168+
tag: ${{ github.ref_name }}
169+
crane_version: ${{ steps.versions.outputs.crane }}
170+
164171
# =============================================================================
165172
# Deploy Job (runs after attestation succeeds)
166173
# =============================================================================

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ sbom.json
5151
# =============================================================================
5252
# Generated recipe/bundle output
5353
/recipe.yaml
54+
/bundle/
5455
/bundles/
5556
/bundles.zip
5657

.goreleaser.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,24 @@ kos:
119119
preserve_import_paths: false
120120
bare: true
121121

122+
dockers:
123+
- id: eidos-validator
124+
dockerfile: Dockerfile.validator
125+
use: buildx
126+
image_templates:
127+
- "ghcr.io/nvidia/eidos-validator:latest"
128+
- "ghcr.io/nvidia/eidos-validator:{{.Tag}}"
129+
- "ghcr.io/nvidia/eidos-validator:v{{.Major}}"
130+
- "ghcr.io/nvidia/eidos-validator:v{{.Major}}.{{.Minor}}"
131+
build_flag_templates:
132+
- "--platform=linux/amd64"
133+
- "--platform=linux/arm64"
134+
- "--label=org.opencontainers.image.created={{.Date}}"
135+
- "--label=org.opencontainers.image.title={{.ProjectName}}-validator"
136+
- "--label=org.opencontainers.image.revision={{.FullCommit}}"
137+
- "--label=org.opencontainers.image.version={{.Version}}"
138+
- "--label=org.opencontainers.image.source={{.GitURL}}"
139+
122140
archives:
123141
- id: eidos
124142
ids:

CONTRIBUTING.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,32 @@ Before contributing:
5656
- Ensure all tests pass and code meets quality standards
5757
- Write tests for new functionality
5858

59+
#### Adding Validation Constraints
60+
61+
Eidos uses a validator framework to check cluster state against requirements. To add new validation constraints:
62+
63+
**Quick Start:**
64+
```bash
65+
# Generate all necessary files
66+
eidos generate-validator \
67+
--constraint Deployment.my-app.version \
68+
--phase deployment \
69+
--description "Validates my-app version"
70+
```
71+
72+
This creates three files with TODOs guiding implementation:
73+
- Helper functions with validation logic
74+
- Unit tests with table-driven test cases
75+
- Integration test with automatic registration
76+
77+
**Next Steps:**
78+
1. Implement the TODOs in generated files
79+
2. Add comprehensive test cases
80+
3. Run `make test` - registration validation ensures completeness
81+
4. Submit PR - CI enforces all requirements
82+
83+
**See [pkg/validator/checks/README.md](pkg/validator/checks/README.md) for complete guide with examples, architecture overview, and troubleshooting.**
84+
5985
## Design Principles
6086

6187
These principles guide all design decisions in Eidos. When faced with trade-offs, these principles take precedence.

DEVELOPMENT.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ This guide covers project setup, architecture, development workflows, and toolin
1313
- [KWOK Simulated Cluster Testing](#kwok-simulated-cluster-testing)
1414
- [Make Targets Reference](#make-targets-reference)
1515
- [Debugging](#debugging)
16+
- [Validator Development](#validator-development)
1617

1718
## Quick Start
1819

@@ -316,6 +317,13 @@ make e2e
316317
# With local Kubernetes cluster (requires make dev-env first)
317318
make e2e-tilt
318319

320+
# Run E2E tests exactly like CI (automated setup + teardown)
321+
./scripts/run-e2e-local.sh
322+
323+
# Run with options
324+
./scripts/run-e2e-local.sh --skip-cleanup # Keep cluster after tests
325+
./scripts/run-e2e-local.sh --collect-artifacts # Collect artifacts even on success
326+
319327
# KWOK simulated cluster tests (no GPU hardware required)
320328
make kwok-test-all # All recipes
321329
make kwok-e2e RECIPE=eks-training # Single recipe
@@ -453,13 +461,27 @@ make dev-reset # Full reset (tear down and recreate)
453461
### Running E2E Tests with Tilt
454462

455463
```bash
464+
# Option 1: Automated (exactly like CI)
465+
./scripts/run-e2e-local.sh
466+
467+
# Option 2: Manual setup (for development/debugging)
456468
# Start the dev environment
457469
make dev-env
458470

459471
# In another terminal, run E2E tests against the Tilt cluster
460472
make e2e-tilt
461473
```
462474

475+
The automated script (`run-e2e-local.sh`) replicates the exact CI workflow:
476+
- Creates Kind cluster with local registry
477+
- Starts Tilt in CI mode
478+
- Builds and pushes both images (`eidos:local`, `eidos-validator:local`)
479+
- Injects fake nvidia-smi into worker nodes
480+
- Sets up port forwarding to eidosd
481+
- Runs E2E tests with proper environment variables
482+
- Collects debug artifacts on failure
483+
- Cleans up cluster and resources
484+
463485
### Testing the API Server Locally (without Kubernetes)
464486

465487
For quick iteration without Kubernetes:
@@ -551,8 +573,9 @@ See [kwok/README.md](kwok/README.md) for adding recipes, profiles, and troublesh
551573
| Target | Description |
552574
|--------|-------------|
553575
| `make build` | Build binaries for current OS/arch |
554-
| `make image` | Build and push container image |
555-
| `make release` | Full release with goreleaser |
576+
| `make image` | Build and push eidos container image (Ko) |
577+
| `make image-validator` | Build and push validator image with Go toolchain (Docker) |
578+
| `make release` | Full release with goreleaser (includes all images) |
556579
| `make bump-major` | Bump major version (1.2.3 → 2.0.0) |
557580
| `make bump-minor` | Bump minor version (1.2.3 → 1.3.0) |
558581
| `make bump-patch` | Bump patch version (1.2.3 → 1.2.4) |
@@ -649,6 +672,20 @@ tilt logs -f tilt/Tiltfile
649672
make dev-reset
650673
```
651674

675+
## Validator Development
676+
677+
For detailed information on adding validation checks and constraint validators, see:
678+
679+
**[pkg/validator/checks/README.md](../pkg/validator/checks/README.md)**
680+
681+
This comprehensive guide covers:
682+
- Architecture overview (Job-based validation, test registration framework)
683+
- Quick start with code generator: `eidos generate-validator`
684+
- How-to guides for adding checks and constraint validators
685+
- Testing patterns (unit tests vs integration tests)
686+
- Enforcement mechanisms (automated registration validation)
687+
- Troubleshooting common issues
688+
652689
## Additional Resources
653690

654691
### Project Documentation

Dockerfile.validator

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# Validator image includes Go toolchain for running validation tests in-cluster
16+
FROM golang:1.25-bookworm
17+
18+
# Install basic dependencies
19+
RUN apt-get update && apt-get install -y \
20+
ca-certificates \
21+
git \
22+
&& rm -rf /var/lib/apt/lists/*
23+
24+
# Set working directory
25+
WORKDIR /workspace
26+
27+
# Copy go mod files first for better caching
28+
COPY go.mod go.sum ./
29+
RUN go mod download
30+
31+
# Copy source code
32+
COPY . .
33+
34+
# Pre-compile test binaries to speed up Job execution
35+
# This is optional but improves startup time
36+
RUN go test -c -o /tmp/readiness.test ./pkg/validator/checks/readiness && \
37+
go test -c -o /tmp/deployment.test ./pkg/validator/checks/deployment && \
38+
go test -c -o /tmp/performance.test ./pkg/validator/checks/performance && \
39+
go test -c -o /tmp/conformance.test ./pkg/validator/checks/conformance || true
40+
41+
# Default command runs bash for debugging
42+
CMD ["/bin/bash"]

Makefile

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ GO_VERSION := $(shell go env GOVERSION 2>/dev/null | sed 's/go//')
1212
GOLINT_VERSION = $(shell golangci-lint --version 2>/dev/null | awk '{print $$4}' | sed 's/golangci-lint version //' || echo "not installed")
1313
KO_VERSION = $(shell ko version 2>/dev/null || echo "not installed")
1414
GORELEASER_VERSION = $(shell goreleaser --version 2>/dev/null | sed -n 's/^GitVersion:[[:space:]]*//p' || echo "not installed")
15-
COVERAGE_THRESHOLD ?= 70
15+
COVERAGE_THRESHOLD ?= 66
1616

1717
# Tilt/ctlptl configuration
1818
CTLPTL_CONFIG_FILE = .ctlptl.yaml
@@ -122,10 +122,10 @@ license: ## Add/verify license headers in source files
122122
@addlicense -f .github/headers/LICENSE $(LICENSE_IGNORES) .
123123

124124
.PHONY: test
125-
test: ## Runs unit tests with race detector and coverage
125+
test: ## Runs unit tests with race detector and coverage (use -short to skip integration tests)
126126
@set -e; \
127127
echo "Running tests with race detector..."; \
128-
go test -count=1 -race -covermode=atomic -coverprofile=coverage.out ./... || exit 1; \
128+
go test -short -count=1 -race -covermode=atomic -coverprofile=coverage.out ./... || exit 1; \
129129
echo "Test coverage:"; \
130130
go tool cover -func=coverage.out | tail -1
131131

@@ -192,6 +192,16 @@ image: ## Builds and pushes container image (IMAGE_REGISTRY, IMAGE_TAG)
192192
echo "Building and pushing image to $(IMAGE_REGISTRY)/eidos:$(IMAGE_TAG)"; \
193193
KO_DOCKER_REPO=$(IMAGE_REGISTRY) ko build --bare --sbom=none --tags=$(IMAGE_TAG) ./cmd/eidos
194194

195+
.PHONY: image-validator
196+
image-validator: ## Builds validator image with Go toolchain (IMAGE_REGISTRY, IMAGE_TAG)
197+
@set -e; \
198+
echo "Building validator image to $(IMAGE_REGISTRY)/eidos-validator:$(IMAGE_TAG)"; \
199+
docker build -f Dockerfile.validator -t $(IMAGE_REGISTRY)/eidos-validator:$(IMAGE_TAG) .; \
200+
if [ -n "$(IMAGE_REGISTRY)" ] && [ "$(IMAGE_REGISTRY)" != "localhost:5005" ]; then \
201+
echo "Pushing validator image to $(IMAGE_REGISTRY)/eidos-validator:$(IMAGE_TAG)"; \
202+
docker push $(IMAGE_REGISTRY)/eidos-validator:$(IMAGE_TAG); \
203+
fi
204+
195205
.PHONY: release
196206
release: ## Runs the full release process with goreleaser
197207
@set -e; \

cmd/eidos/main.go

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@
1414

1515
package main
1616

17-
import "github.com/NVIDIA/eidos/pkg/cli"
17+
import (
18+
"github.com/NVIDIA/eidos/pkg/cli"
19+
20+
// Import check packages for side-effect registration.
21+
// Each package's init() function registers its validators.
22+
_ "github.com/NVIDIA/eidos/pkg/validator/checks/deployment"
23+
_ "github.com/NVIDIA/eidos/pkg/validator/checks/readiness"
24+
)
1825

1926
func main() {
2027
cli.Execute()

0 commit comments

Comments
 (0)