Fix 14 ruff lint errors failing CI on main by anneschuth · Pull Request #4 · RijksICTGilde/RIG-Cluster

anneschuth · 2026-01-29T07:44:32Z

Summary

Fix 14 auto-fixable ruff lint errors that are failing the "Lint & Type Check" job in CI on main
F541: Remove f prefix from f-string without placeholders in router.py:230
RUF051: Replace if key in dict: del dict[key] with .pop(key, None) in test_nice_url_auto_issuer.py (10 occurrences)
F811: Remove redundant local patch imports in test_rate_limiter.py (3 occurrences)

Test plan

ruff check opi/ tests/ passes locally with 0 errors
CI "Lint & Type Check" job passes
All integration tests pass

🤖 Generated with Claude Code

- F541: Remove f-string prefix without placeholders in router.py - RUF051: Replace if-key-in-dict-del with .pop() in test_nice_url_auto_issuer.py - F811: Remove redundant local patch imports in test_rate_limiter.py

The metrics_router module uses get_metrics_connector() not PrometheusConnector directly. Update all test patches to mock get_metrics_connector instead.

Today, clients have to regex over the message field to tell an ImagePullBackOff from a probe timeout from an OOMKill. Server-side, the categorization is straightforward — and once expressed as a typed enum it lets CLIs and dashboards filter, group, and colorize without string matching. Two new fields on StatusError: - category: ErrorCategory — for automation. One of ImagePull, CrashLoop, OutOfMemory, HealthCheck, SyncFailed, ComparisonError, Unknown. PascalCase to match the rest of the V2 enums. - explanation: str | None — for humans. Static, curated remediation guidance per category. Null for Unknown. Categories are intentionally broader than literal K8s reasons (e.g. ImagePull covers ImagePullBackOff, ErrImagePull, manifest-unknown pulls) so app-level categories can be added later without binding the contract to specific K8s state names. Categorization logic lives in opi/services/deployment_diagnostics.py next to gather_deployment_errors, keeping the data layer co-located. - Add ErrorCategory enum - Add category + explanation fields to StatusError - Add categorize_error() helper with explanation lookup table - Wire categorizer into _fetch_one_deployment_status - 14 new categorizer tests + assertions in V2 endpoint test - Update feature doc with new fields + example

…#57) * Complete issue #51 — deployment read endpoints (status, errors, logs) Initial implementation in 430eeec shipped components/images/URLs but omitted the status fields the issue asked for and any debug data for stuck deployments. This commit adds: - ArgoCD-sourced reconciliation status on the response: sync_status, health_status, revision (full SHA), last_synced_at. Returned as a nested DeploymentStatus sub-object so source-of-truth is encoded in the schema. status is null when the cluster has no Application for the deployment yet. - When health_status != Healthy, the response also includes: - errors[]: aggregated from Argo (resources, resource tree, conditions, syncResult) and namespace events (kubectl) - logs{}: per-component tail (default 50 lines, capped at 500 via hidden log_lines query param) - Drop the image_pull_policy field that was added without being asked for. - Backend-neutral naming (status, DeploymentStatus, "Deployment status backend is unreachable") so the source of truth can change without breaking callers. 503 is returned when the status backend is unreachable or any per-deployment fetch raises — partial state would be misleading. Implementation: - New module opi/services/deployment_diagnostics.py with two pure-data helpers (gather_deployment_errors, gather_component_logs). No FastAPI/Pydantic deps. - opi/api/v2/router.py uses the helpers; healthy deployments still pay only the single Argo status call. - opi/web/router.py was duplicating the entire error-gather logic; refactored to call the shared helper, deleting ~92 lines. Dutch presentation (interpret_argocd_errors, age strings) stays UI-side. Tests: 17 new for deployment_diagnostics, 4 new V2 endpoint tests (unhealthy populates diagnostics, log_lines cap, etc.). * PR feedback #1: drop status.logs and log_lines query param Logs aren't status. The existing GET /api/logs/{project_name} (HTTP) and WS /api/logs/stream/{project_name} endpoints already serve general log access — embedding up to 500 lines × N components in every read of an unhealthy deployment is wasted bytes for polling clients (CLI tab completion, dashboards). Errors stay (cluster events are cheap and distinct from logs); logs go to their own endpoint where they belong. - Drop logs field from DeploymentStatus - Drop log_lines query param from both endpoints - Drop gather_component_logs helper and tests - Drop kubectl log mocks from V2 tests - Update feature doc to point at the existing /api/logs endpoints * PR feedback #2: type sync_status/health_status as enums Replace free-string sync_status and health_status with StrEnum types so OpenAPI emits enum: [...] (consumers can validate, generate typed clients, and know what to expect). Also add StatusReason enum for use with the lenient-list change in the next commit. PascalCase values mirror what ArgoCD already returns (Synced, Healthy, etc.), keeping the wire format identical. Safe extractors (_safe_sync_status, _safe_health_status) map any unknown value Argo might emit in the future to "Unknown" rather than raising — so a new Argo state doesn't break our response. - Add SyncStatus, HealthStatus, StatusReason in opi/api/v2/models.py - DeploymentStatus.sync_status/health_status now use the enum types - Add safe extractors in opi/api/v2/router.py - StatusReason is unused this commit; lenient-list commit consumes it * PR feedback #2 + #3: lenient list endpoint + status_reason The single-deployment endpoint stays strict (503 on any fetch failure — there's one resource, partial truth misleads). But the list endpoint gets lenient: when one deployment's fetch raises, that deployment is returned with status=null, status_reason=Unavailable, and the others come back normally. Whole-backend-down still 503s. This keeps a CLI's `deployment list` working through partial outages — without that, one broken deployment in a project would 503 the whole list and make it impossible to render the others. The status_reason field also disambiguates the existing "status: null" overload: it now means either Pending (cluster doesn't have an Application for this deployment yet) or Unavailable (fetch raised in lenient list mode). status_reason is null when status is set. - Add status_reason: StatusReason | None on DeploymentDetail - Split _fetch_one_deployment_status into the strict per-deployment helper (used by single endpoint) and a lenient batch wrapper (used by list endpoint) - _fetch_one_deployment_status returns (status, reason) so callers can attribute null status to the right cause - Add test_partial_failure_marks_one_unavailable_returns_others - Update test_app_not_yet_known to assert status_reason=Pending - Update feature doc to describe the two-tier failure model * PR feedback #4: typed ErrorCategory + human explanation on StatusError Today, clients have to regex over the message field to tell an ImagePullBackOff from a probe timeout from an OOMKill. Server-side, the categorization is straightforward — and once expressed as a typed enum it lets CLIs and dashboards filter, group, and colorize without string matching. Two new fields on StatusError: - category: ErrorCategory — for automation. One of ImagePull, CrashLoop, OutOfMemory, HealthCheck, SyncFailed, ComparisonError, Unknown. PascalCase to match the rest of the V2 enums. - explanation: str | None — for humans. Static, curated remediation guidance per category. Null for Unknown. Categories are intentionally broader than literal K8s reasons (e.g. ImagePull covers ImagePullBackOff, ErrImagePull, manifest-unknown pulls) so app-level categories can be added later without binding the contract to specific K8s state names. Categorization logic lives in opi/services/deployment_diagnostics.py next to gather_deployment_errors, keeping the data layer co-located. - Add ErrorCategory enum - Add category + explanation fields to StatusError - Add categorize_error() helper with explanation lookup table - Wire categorizer into _fetch_one_deployment_status - 14 new categorizer tests + assertions in V2 endpoint test - Update feature doc with new fields + example * PR feedback #5: clarify last_synced_at is the last attempt, not last success Today's value comes from operationState.finishedAt with a fallback to status.reconciledAt — both of which fire on every sync, regardless of outcome. For a Degraded deployment that means last_synced_at can be the timestamp of a failed sync attempt, which is misleading if read as "last successful deploy." Doc-only fix here. The proper field split (last_attempt_at + last_success_at) is a follow-up; this commit just makes the existing field's semantics honest. * PR feedback: collapse status into a single enum The previous design had `status: DeploymentStatus | null` with a separate `status_reason: StatusReason | null` (Pending/Unavailable). Two fields, two-level nullability, and a sub-object containing typed sync_status + health_status enums. Consumers had to switch on a tree. Replace with one top-level `status` enum that subsumes everything: Healthy, Degraded, Progressing, OutOfSync, Suspended, Missing, Pending, Unavailable, Unknown Argo's two orthogonal dimensions (sync, health) are collapsed using a worst-of-both priority: Degraded/Suspended/Missing > OutOfSync > Progressing > Healthy > Unknown. Pending and Unavailable replace the old nullable+reason combo for "we have no data" cases. What's lost: callers can't distinguish "OutOfSync + Degraded" from plain Degraded anymore. For the user-facing question "what state is my deployment in?", Degraded is what matters. If anyone needs both dimensions later, a debug field can be added — YAGNI for now. DeploymentDetail flattens to: status (enum, always set) sync_revision (was status.revision) last_synced_at errors[] - Drop SyncStatus, HealthStatus, StatusReason enums; keep DeploymentStatus as the single StrEnum and ErrorCategory unchanged - Drop nested DeploymentStatus model (became the enum) - Add internal _LiveStatus NamedTuple for orchestrator/builder hand-off - _collapse_argo_status() does the priority mapping - Tests assert on flat shape: data["status"] == "Healthy" etc. - Doc explains the collapse rules and the value table

Fix 14 auto-fixable ruff lint errors failing CI on main

4a46b5e

- F541: Remove f-string prefix without placeholders in router.py - RUF051: Replace if-key-in-dict-del with .pop() in test_nice_url_auto_issuer.py - F811: Remove redundant local patch imports in test_rate_limiter.py

anneschuth force-pushed the fix/ci-lint-errors branch from 6074dd8 to 4a46b5e Compare January 29, 2026 07:56

Fix integration tests patching non-existent PrometheusConnector

694eaf2

The metrics_router module uses get_metrics_connector() not PrometheusConnector directly. Update all test patches to mock get_metrics_connector instead.

anneschuth force-pushed the fix/ci-lint-errors branch from d463157 to 694eaf2 Compare January 29, 2026 08:20

uittenbroekrobbert approved these changes Jan 29, 2026

View reviewed changes

uittenbroekrobbert merged commit 6778090 into main Jan 29, 2026
9 checks passed

anneschuth deleted the fix/ci-lint-errors branch January 29, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix 14 ruff lint errors failing CI on main#4

Fix 14 ruff lint errors failing CI on main#4
uittenbroekrobbert merged 2 commits into
mainfrom
fix/ci-lint-errors

anneschuth commented Jan 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anneschuth commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anneschuth commented Jan 29, 2026 •

edited

Loading