Scope: This document describes the architecture of the lint command diagnostic system.
The lint command (kubectl odh lint) validates OpenShift AI cluster configuration and assesses upgrade readiness. This architecture is specific to the lint command and not a generic diagnostic framework.
For general CLI design, see ../design.md. For development practices, see ../development.md.
The lint command diagnostic system is built around a check framework that enables extensible, version-aware validation of OpenShift AI clusters.
All lint checks implement the Check interface:
type Check interface {
ID() string
Name() string
Description() string
Group() CheckGroup
CheckKind() string
CheckType() string
CanApply(ctx context.Context, target Target) (bool, error)
Validate(ctx context.Context, target Target) (*result.DiagnosticResult, error)
}Key methods:
ID()- Unique identifier for the lint checkGroup()- ReturnsCheckGrouptype:GroupComponent,GroupDependency,GroupPlatform,GroupService, orGroupWorkloadCheckKind()- Returns the kind of resource being checked (e.g., "kserve", "codeflare"). Used by validation builders to construct diagnostic resultsCheckType()- Returns the type of check (e.g., "removal", "deprecation"). Used by validation builders to construct diagnostic resultsCanApply()- Determines if lint check is applicable based on version contextValidate()- Executes the lint check and returns(*result.DiagnosticResult, error)
The Target struct provides lint checks with cluster context:
type Target struct {
// Client provides read-only access to Kubernetes API for querying resources.
// Uses the Reader interface to enforce that lint checks cannot perform write operations.
Client client.Reader
// CurrentVersion contains the current/source cluster version as parsed semver
// For lint mode: same as TargetVersion
// For upgrade mode: the version being upgraded FROM
// Nil if no current version available
CurrentVersion *semver.Version
// TargetVersion contains the target version as parsed semver
// For lint mode: the detected cluster version
// For upgrade mode: the version being upgraded TO
// Nil if no target version available
TargetVersion *semver.Version
// Resource is the specific resource being validated (optional)
// Only set for workload checks that operate on discovered CRs
// Nil for component and service checks
Resource *unstructured.Unstructured
// IO provides access to input/output streams for logging (optional)
// Used by checks to log warnings when verbose mode is enabled
IO iostreams.Interface
}Lint checks compare CurrentVersion with TargetVersion to determine execution mode:
- Lint mode:
TargetVersion == CurrentVersion(validate current state) - Upgrade mode:
TargetVersion != CurrentVersion(assess upgrade readiness)
Lint checks are explicitly registered in the NewCommand() constructor. This approach avoids global state and enables full test isolation:
// pkg/lint/command.go - Explicit check registration in NewCommand()
func NewCommand(
streams genericiooptions.IOStreams,
configFlags *genericclioptions.ConfigFlags,
options ...CommandOption,
) *Command {
registry := check.NewRegistry()
// Explicitly register all checks (no global state, full test isolation)
// Platform (2)
registry.MustRegister(dscinitialization.NewDSCInitializationReadinessCheck())
registry.MustRegister(datasciencecluster.NewDataScienceClusterReadinessCheck())
// Components (13)
registry.MustRegister(codeflare.NewRemovalCheck())
registry.MustRegister(dashboard.NewAcceleratorProfileMigrationCheck())
registry.MustRegister(dashboard.NewHardwareProfileMigrationCheck())
registry.MustRegister(datasciencepipelines.NewInstructLabRemovalCheck())
registry.MustRegister(datasciencepipelines.NewRenamingCheck())
registry.MustRegister(kserve.NewServerlessRemovalCheck())
registry.MustRegister(kserve.NewInferenceServiceConfigCheck())
registry.MustRegister(kserve.NewServiceMeshOperatorCheck())
registry.MustRegister(kserve.NewServiceMeshRemovalCheck())
registry.MustRegister(kueue.NewManagementStateCheck())
registry.MustRegister(kueue.NewOperatorInstalledCheck())
registry.MustRegister(modelmesh.NewRemovalCheck())
registry.MustRegister(trainingoperator.NewDeprecationCheck())
// Dependencies (3)
registry.MustRegister(certmanager.NewCheck())
registry.MustRegister(openshift.NewCheck())
registry.MustRegister(servicemesh.NewCheck())
// Workloads (8)
registry.MustRegister(guardrails.NewImpactedWorkloadsCheck())
registry.MustRegister(guardrails.NewOtelMigrationCheck())
registry.MustRegister(kserveworkloads.NewAcceleratorMigrationCheck())
registry.MustRegister(kserveworkloads.NewImpactedWorkloadsCheck())
registry.MustRegister(notebook.NewAcceleratorMigrationCheck())
registry.MustRegister(notebook.NewImpactedWorkloadsCheck())
registry.MustRegister(ray.NewImpactedWorkloadsCheck())
registry.MustRegister(trainingoperatorworkloads.NewImpactedWorkloadsCheck())
return &Command{
SharedOptions: shared,
registry: registry,
}
}Benefits:
- No global state - each command instance has its own registry
- Full test isolation - tests can register only the checks they need
- Explicit dependencies - all registered checks are visible in one place
- Easier debugging - registration order is deterministic
DiagnosticResults follow Kubernetes Custom Resource conventions with metadata, spec, and status sections.
The DiagnosticResult uses a flattened structure (not nested Metadata):
type DiagnosticResult struct {
// Flattened metadata fields (not nested in a Metadata struct)
Group string // "component", "dependency", "platform", "service", "workload"
Kind string // Target: "kserve", "dashboard", etc.
Name string // Check type identifier (e.g., "removal", "deprecation")
Annotations map[string]string // Version metadata with domain-qualified keys
Spec DiagnosticSpec // Description of what the check validates
Status DiagnosticStatus // Condition-based validation results
// ImpactedObjects contains references to resources impacted by this diagnostic
ImpactedObjects []metav1.PartialObjectMetadata
}
type DiagnosticSpec struct {
Description string // What the lint check validates
}
type DiagnosticStatus struct {
Conditions []Condition // Individual validation requirements
}
type Condition struct {
metav1.Condition // Embedded Kubernetes condition (Type, Status, Reason, Message, LastTransitionTime)
Impact Impact // "blocking", "advisory", "" (none)
}Condition Status follows Kubernetes metav1.Condition semantics:
- True: Requirement is MET (check passing)
- False: Requirement is NOT MET (check failing)
- Unknown: Unable to determine if requirement is met
Each condition has an Impact field indicating the upgrade impact:
- blocking: Upgrade cannot proceed (critical issue)
- advisory: Upgrade can proceed with warning (non-critical issue)
- none (empty string): No impact (success state)
Impact is auto-derived from Status unless explicitly overridden:
- Status=True → Impact=None
- Status=False → Impact=Advisory
- Status=Unknown → Impact=Advisory
Checks that truly block upgrades must explicitly opt in via WithImpact(result.ImpactBlocking).
Validation ensures valid Status/Impact combinations:
- Status=True MUST have Impact=None
- Status=False or Unknown MUST have Impact=Blocking or Advisory
Version information is stored in the flattened Annotations map using domain-qualified keys:
check.opendatahub.io/source-version- Current cluster versioncheck.opendatahub.io/target-version- Target version for upgrade assessment
Lint checks with multiple conditions render as multiple table rows (one per condition):
GROUP KIND NAME CONDITION STATUS REASON
components kserve config-check ConfigValid True ConfigCorrect
components kserve config-check ResourcesAvailable False InsufficientMemory
components kserve config-check PermissionsValid True PermissionsCorrect
This provides at-a-glance visibility of all validation requirements.
The lint command automatically detects the cluster's OpenShift AI version using a priority-based strategy.
- DataScienceCluster status - Primary source
- DSCInitialization status - Fallback if DSC not found
- OLM ClusterServiceVersion - Last resort for operator version
type ClusterVersion struct {
Version string // Semantic version (e.g., "2.17.0")
Source VersionSource // Where version was detected from
Confidence Confidence // Detection confidence level
}
type VersionSource string
const (
SourceDataScienceCluster VersionSource = "DataScienceCluster"
SourceDSCInitialization VersionSource = "DSCInitialization"
SourceOLM VersionSource = "OLM"
)Detected versions map to operator repository branches:
- 2.x versions →
stable-2.xbranch - 3.x versions →
mainbranch
This enables version-specific validation logic in lint checks.
The lint command dynamically discovers components, services, and workloads without hardcoded resource lists.
Uses Kubernetes API discovery to find resources:
- Components:
components.platform.opendatahub.ioAPI group - Services:
services.platform.opendatahub.ioAPI group
// Discover components dynamically
resources, err := client.Discovery().ServerResourcesForGroupVersion("components.platform.opendatahub.io/v1")Discovers workload CRDs via label selector:
// Find all workload CRDs
crdList := &apiextv1.CustomResourceDefinitionList{}
err := client.List(ctx, crdList, &client.ListOptions{
LabelSelector: labels.SelectorFromSet(labels.Set{
"platform.opendatahub.io/part-of": "true",
}),
})Workload types include:
- Development: Notebook
- Model Serving: InferenceService, LLMInferenceService
- Distributed Computing: RayCluster, RayJob, RayService
- Training: PyTorchJob, TFJob, MPIJob, XGBoostJob
- Pipelines: DataSciencePipelinesApplication, Workflow
- AI Governance: TrustyAIService, GuardrailsOrchestrator
- Model Registry: ModelRegistry
- Feature Store: FeatureStore
Benefits:
- Automatically supports new components/services added by operator
- No code changes required when new workload types are introduced
- Scales with platform evolution
The lint command follows a consistent lifecycle pattern with four phases.
type Command interface {
Complete() error
Validate() error
Run(ctx context.Context) error
AddFlags(fs *pflag.FlagSet)
}-
AddFlags: Register lint command-specific flags
func (c *Command) AddFlags(fs *pflag.FlagSet) { fs.StringVar(&c.targetVersion, "target-version", "", "Target version") }
-
Complete: Initialize runtime state (client, namespace, parsing)
func (c *Command) Complete() error { c.client, err = utilclient.NewClient(c.shared.ConfigFlags) // Parse flags, populate fields }
-
Validate: Verify all required options are set correctly
func (c *Command) Validate() error { if !isValidFormat(c.shared.OutputFormat) { return fmt.Errorf("invalid output format: %s", c.shared.OutputFormat) } }
-
Run: Execute lint check logic
func (c *Command) Run(ctx context.Context) error { results := c.executeChecks(ctx) return c.renderOutput(results) }
The lint command uses a Command struct (not Options) with constructor NewCommand():
type Command struct {
shared *SharedOptions
targetVersion string
}
func NewCommand(opts CommandOptions) *Command {
return &Command{
shared: opts.Shared,
targetVersion: opts.TargetVersion,
}
}The lint command supports three output formats with consistent structure.
- Table (default): Human-readable, one row per condition
- JSON: Kubernetes List pattern for scripting
- YAML: Kubernetes List pattern for configuration
Results are returned in a list with flattened result fields:
{
"clusterVersion": "2.17.0",
"targetVersion": "3.0.0",
"results": [
{
"group": "component",
"kind": "kserve",
"name": "removal",
"annotations": {
"check.opendatahub.io/target-version": "3.0.0"
},
"spec": { "description": "Validates serverless removal..." },
"status": {
"conditions": [
{
"type": "Compatible",
"status": "True",
"reason": "ServerlessRemoved",
"message": "Serverless components are removed",
"lastTransitionTime": "2024-01-15T10:30:00Z",
"impact": ""
}
]
}
}
]
}Key characteristics:
- Results in execution order (sequential, not grouped by category)
- Category information preserved in flattened
groupfield - Deterministic ordering through sequential execution
- Compatible with
jq/yqfor post-processing
Critical Requirement: Parallel check execution is PROHIBITED. All lint checks MUST execute sequentially to ensure deterministic ordering.
Rationale:
- Diff-based workflows: Deterministic output enables meaningful diffs between lint runs
- Test assertions: Tests can reliably assert on result order
- Reproducible diagnostics: Same cluster state always produces same output order
- Debugging: Sequential execution makes it easier to trace check execution flow
Prohibited:
// ❌ WRONG: Parallel execution
var wg sync.WaitGroup
for _, check := range checks {
wg.Add(1)
go func(c Check) {
defer wg.Done()
results <- c.Validate(ctx, target)
}(check)
}
wg.Wait()Required:
// ✓ CORRECT: Sequential execution
for _, check := range checks {
result, err := check.Validate(ctx, target)
if err != nil {
return fmt.Errorf("executing check %s: %w", check.ID(), err)
}
results = append(results, result)
}The lint command operates fully offline by bundling expected configurations for known OpenShift AI versions.
Requirements:
- NO network access to fetch operator manifests or configurations
- ALL version configurations bundled in the binary at compile time
- Validation against local cluster data only - no external API calls
Bundled Configuration:
pkg/lint/config/
├── v2.17/
│ ├── components.yaml # Expected component configurations
│ ├── services.yaml # Expected service configurations
│ └── workloads.yaml # Expected workload types
├── v3.0/
│ ├── components.yaml
│ ├── services.yaml
│ └── workloads.yaml
└── ...
Rationale:
- Air-gapped environments: Works in disconnected clusters
- Reproducibility: No dependency on external network state
- Performance: No network latency
- Reliability: No external service dependencies
Lint checks operate exclusively on high-level custom resources representing user-facing abstractions.
Permitted targets:
- Component CRs (DataScienceCluster, DSCInitialization)
- Workload CRs (Notebook, InferenceService, RayCluster, etc.)
- Service CRs (platform services)
- CRDs, ClusterServiceVersions
Prohibited targets:
- Low-level Kubernetes primitives (Pod, Deployment, StatefulSet, Service, ConfigMap, Secret)
Rationale: OpenShift AI users interact with high-level CRs, not low-level primitives. Lint checks targeting low-level resources produce noise and don't align with user-facing abstractions.
The lint command operates cluster-wide and scans all namespaces. Namespace filtering is prohibited.
Requirements:
- Component checks examine cluster-scoped resources
- Service checks examine cluster-scoped or all-namespace resources
- Workload checks discover and validate across ALL namespaces
- No
--namespaceor-nflags on lint command
Rationale: OpenShift AI is a cluster-wide platform. Comprehensive diagnostics require visibility into all namespaces to detect misconfigurations and cross-namespace dependencies.
pkg/
├── lint/
│ ├── command.go # Lint command implementation with explicit check registration
│ ├── check/ # Check interface, Target, result types
│ │ └── result/ # DiagnosticResult and related types
│ └── checks/
│ ├── components/ # Component checks (one package per check)
│ ├── dependencies/ # Dependency checks (cert-manager, kueue, etc.)
│ ├── services/ # Service checks
│ ├── workloads/ # Workload checks
│ └── shared/ # Shared utilities
│ ├── base/ # BaseCheck struct for composition
│ ├── components/ # Component utility functions (management state)
│ ├── migration/ # Migration check helper
│ ├── operators/ # OLM operator utilities
│ ├── results/ # Result helper functions
│ └── validate/ # Fluent builders (Component, DSCI, Operator, Workloads)
├── printer/ # Output formatting
├── resources/ # Centralized GVK/GVR definitions
└── util/
├── jq/ # JQ query utilities
├── version/ # Version detection utilities
├── kube/discovery/ # Resource discovery
└── iostreams/ # IOStreams wrapper
Each lint check resides in its own package:
pkg/lint/checks/components/
├── dashboard/
│ ├── dashboard.go
│ └── dashboard_test.go
├── kserve/
│ ├── kserve.go
│ └── kserve_test.go
└── modelmesh/
├── modelmesh.go
└── modelmesh_test.go
Benefits:
- Clear boundaries and dependencies
- Independent testing
- Easy to add/remove lint checks
- Prevents naming conflicts