Summary
Add integration tests using Kind (Kubernetes in Docker) to validate K8s scheduling correctness — topology constraints, affinity rules, taints, RBAC, resource quotas.
Motivation
The current InMemoryK8sService fake is good for unit-testing K8sTaskProvider logic (manifest construction, state mapping, log fetching) but it implements a simplified scheduler that doesn't handle:
podAffinity / podAntiAffinity with topologyKey
- Real resource quota enforcement
- RBAC validation
- Topology spread constraints
This means configuration errors like setting topologyKey: "coreweave.cloud/spiine" (typo) instead of "coreweave.cloud/spine" are not caught by tests. Rather than reimplementing K8s scheduling in our fake, we should use Kind for integration tests that validate scheduling correctness.
Proposed approach
Use both InMemoryK8sService and Kind at different layers:
| Layer |
Tool |
Speed |
What it validates |
| Unit tests |
InMemoryK8sService |
Instant |
Our code: manifest building, state transitions, log fetching |
| Integration tests |
Kind cluster |
~10-30s startup |
The config: scheduling, topology, affinity, RBAC |
Implementation:
-
Add a conftest.py fixture that:
- Spins up a Kind cluster with configurable node pools (labels, taints, resources)
- Yields a
CloudK8sService pointed at the Kind cluster
- Tears down the cluster after tests
-
Mark tests with @pytest.mark.kind (requires Docker, skip in CI without Docker)
-
Write tests for:
- Multi-task job with correct colocation topology key → all pods scheduled
- Typo in topology key → pods stay Pending/Unschedulable
- GPU pod on CPU-only nodepool → Unschedulable
- Resource exhaustion → Pending
- Taint without toleration → Unschedulable
- RBAC: service account without pod creation permission → rejected
-
Stop extending InMemoryK8sService._schedule_pod() with K8s scheduler semantics — let Kind handle scheduling correctness.
Context
This came out of the provider refactoring in #3900. The fake K8s service handles nodeSelector, tolerations, and resource capacity but not affinity rules. The right answer is to use the real scheduler (Kind) rather than reimplement it.
🤖 Generated with Claude Code
Summary
Add integration tests using Kind (Kubernetes in Docker) to validate K8s scheduling correctness — topology constraints, affinity rules, taints, RBAC, resource quotas.
Motivation
The current
InMemoryK8sServicefake is good for unit-testing K8sTaskProvider logic (manifest construction, state mapping, log fetching) but it implements a simplified scheduler that doesn't handle:podAffinity/podAntiAffinitywithtopologyKeyThis means configuration errors like setting
topologyKey: "coreweave.cloud/spiine"(typo) instead of"coreweave.cloud/spine"are not caught by tests. Rather than reimplementing K8s scheduling in our fake, we should use Kind for integration tests that validate scheduling correctness.Proposed approach
Use both InMemoryK8sService and Kind at different layers:
InMemoryK8sServiceImplementation:
Add a
conftest.pyfixture that:CloudK8sServicepointed at the Kind clusterMark tests with
@pytest.mark.kind(requires Docker, skip in CI without Docker)Write tests for:
Stop extending
InMemoryK8sService._schedule_pod()with K8s scheduler semantics — let Kind handle scheduling correctness.Context
This came out of the provider refactoring in #3900. The fake K8s service handles nodeSelector, tolerations, and resource capacity but not affinity rules. The right answer is to use the real scheduler (Kind) rather than reimplement it.
🤖 Generated with Claude Code