fix(clone): stop overbodig re-clonen + heldere clone/kubectl-logging#128
Merged
Conversation
… 'failed clone' De pg_dump-clone herstelt een schema altijd onder de BRON-naam (pg_dump kan niet hernoemen), dus bij een generational clone komt het schema binnen als source_schema en hernoemen we het naar target_schema. Dat is het normale pad, geen fout. Toch logde de code ERROR 'Target schema not created' + 'Schemas that DO exist' op elke geslaagde clone, en de generatie-failover noemde een bestaande (vaak geslaagde) generatie 'leftover from failed clone'. Samen lieten ze het lijken alsof clones faalden terwijl ze slaagden. - postgres.py: hernoemen is nu het verwachte pad (INFO); ERROR alleen nog bij een dump die echt geen schema opleverde (lege bron). - database_manager.py: 'already exists (prior generation)' op INFO i.p.v. WARNING 'leftover from failed clone'. Gedrag ongewijzigd: de rename en de failover gebeuren nog steeds; alleen de log-niveaus en bewoording kloppen nu. 60 clone-tests groen.
…p re-cloning) process_project mutates project_data in memory -- set_clone_status sets clone-from.status.completed=True and the clone records the database generation -- but committed the project file WITHOUT saving that state first. Every other write flow calls save_project_data() before commit_and_push(); process_project did not. So the updates lived only in memory and were lost: each reconcile re-read completed=false / generation=None, re-ran the clone, and the generational failover created a fresh full database copy every pass (regel-k4c PRs accumulated v1..vN within a single startup reconcile). Not a concurrency/rebase problem -- a missing persist. read_project_file caches the parsed dict, so save_project_data() writes exactly the mutated object. - project_manager.py: save_project_data() before the project-file commit in process_project. - test: AST regression guard that save_project_data() precedes commit_and_push() in process_project (fails on the pre-fix code).
…ing kubectl command" The connector logged bare "Running kubectl command" / "kubectl command succeeded" DEBUG lines with no command, resource, namespace or result -- useless for debugging the reconcile firehose. Now logs the real argv with secret-bearing flag values masked (--token/--password/--from-literal/--from-file/--docker-password). The "-n <namespace>" in the command identifies the project (rig-prd-<project>), so the lines now answer "what project, what command, what result". Failure and timeout lines include the command too.
| shell_cmd = f"{cmd_str} <<'EOF'\n{stdin_input}\nEOF" | ||
|
|
||
| logger.debug("Running kubectl shell command with stdin") | ||
| logger.debug(f"Running (stdin): {safe_cmd}") |
| stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=timeout) | ||
| except TimeoutError: | ||
| logger.error(f"kubectl command timed out after {timeout}s") | ||
| logger.error(f"Timed out after {timeout}s: {safe_cmd}") |
| cmd.extend(args) | ||
|
|
||
| logger.debug("Running kubectl command") | ||
| logger.debug(f"Running: {safe_cmd}") |
| stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=timeout) | ||
| except TimeoutError: | ||
| logger.error(f"kubectl command timed out after {timeout}s") | ||
| logger.error(f"Timed out after {timeout}s: {safe_cmd}") |
|
|
||
| if process.returncode != 0: | ||
| logger.warning(f"kubectl command failed with code {process.returncode}: {stderr_str}") | ||
| logger.warning(f"Failed (code {process.returncode}): {safe_cmd} :: {stderr_str}") |
| raise KubectlConnectionError(error_msg) | ||
| else: | ||
| logger.debug("kubectl command succeeded") | ||
| logger.debug(f"Succeeded: {safe_cmd}") |
Follow-up on 3a8e0d8. Replace the redact-the-args approach with a values-free summary: _summarize_kubectl_command logs the subcommand, the resource kind and the target namespace (rig-prd-<project>) and nothing else — no flag values, resource names or stdin. A secret can never reach the log regardless of the command, and the lines read as "operation X on project Y" (e.g. "kubectl get pods (project rig-prd-regel-k4c)"). Resolves the 6 CodeQL py/clear-text-logging-sensitive-data alerts: no argv value reaches the log sink, so there is no taint to flag (the previous allowlist redaction masked values but CodeQL could not recognise it as a sanitizer, and in practice the OPI never put secrets on the kubectl argv anyway — secrets go via SOPS manifests applied with -f/stdin). Test asserts the summary shows verb/kind/project and never the secret value or resource name.
…loyment/component) executeDangerAction POSTed via fetch() without an X-CSRF-Token header, so CSRFMiddleware rejected it with 403 and the UI showed "Onbekende fout" — project, deployment AND component delete were all broken. Same class as the #71 CSRF rollout gaps; the refresh button right above it already sends the token via hx-headers. Add the header to the fetch. Also add a sweep regression guard: every template fetch() POST/PUT/PATCH/DELETE must carry an X-CSRF-Token header (fails on the pre-fix template, catches the whole recurring class).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Drie samenhangende fixes uit de regel-k4c re-clone-analyse van 11 juni.
1. Stop overbodig re-clonen — de echte fix (
91e11a19)process_projectmuteerdeclone-from.status.completed=Trueén de database-generatie in memory, maar committe het projectbestand zonder die staat eerst op te slaan —save_project_data()ontbrak, anders dan in elke andere write-flow. Gevolg: elke reconcile las opnieuwcompleted=false/generation=None, re-clonede, en de generational failover maakte elke keer een verse volledige kopie (regel-k4c PRs stapeldenv1..vNbinnen één startup-reconcile).Geen concurrency/rebase-probleem — een ontbrekende persist.
read_project_filecachet de geparste dict, dussave_project_data()schrijft exact het gemuteerde object. AST-regressietest toegevoegd (faalt op de oude code).2. Heldere clone-logging (
89089d3a)pg_dumpherstelt een schema altijd onder de bron-naam (kan niet hernoemen); hernoemen naar de target-naam is dus het normale pad. Toch logde de codeERROR: Target schema not createdop élke geslaagde clone, enWARNING: leftover from failed clonevoor bestaande (vaak geslaagde) generaties. Nu: hernoemen = INFO; ERROR alleen bij een dump die echt geen schema opleverde. Gedrag ongewijzigd.3. kubectl-logging (
3a8e0d8f)Running kubectl commandtoont nu het echte commando + namespace (-n rig-prd-<project>), met maskering van secret-dragende flags (--token,--password,--from-literal, ...).Verificatie
uv run pytest tests/→ 3853 passed, 0 failed. ruff + pyright schoon. Pre-push hook groen.Achtergrond (geverifieerd tegen productie-DB)
regel_k4c_regelrecht(de bron) heeft géén_vN-kopieën: read-only, nooit overschreven._vN-previews ruimen zichzelf op bij PR-verwijdering (delete dropt base + alle generaties).Deploy-noot
Bouw image vanaf deze branch → deploy → dan herstarten. De eerste herstart re-clonet elke preview nog één keer (de
completed-vlag wordt voor het eerst weggeschreven); vanaf de tweede herstart is het stil.Vervolg-onderzoek staat los op het ZAD-board: #127 (queue-coalescing).